Loading ...
Sorry, an error occurred while loading the content.

Batch and server outages

Expand Messages
  • Tom.Westbrook
    I m looking for the best way to hold batch for individual agent boxes (UNIX & Win mostly) when the agent is taken down for maintenance. Currently have
    Message 1 of 9 , Sep 1, 2004
    • 0 Attachment
      Batch and server outages

      Im looking for the best way to hold batch for individual agent boxes (UNIX & Win mostly) when the agent is taken down for maintenance. Currently have operations manually hold jobs, but thats getting to be a pain, since some jobs get missed because we organize the schedule by application rather than by server or agent nodes, and jobs are often scattered among various Agents.

      Anyway, I wonder if others could share how they handle outages on a single Agent when you have a number of other Agents mixed in with your jobs on one CTM Server. We need to do it so that we wait for running jobs to complete, but also prevent any jobs from starting while waiting for the runners to finish.

      Ive thought about a Control resource, but not sure if that would be manageable with over a thousand Agent boxes in the schedule and most of the time there are very few jobs per Agent. Ive tried the ctm_agstat update <node> DISABLE on the CM Server, but that abends running jobs as disappeared, even though the job is still active on the Agent, and it doesnt update job statuses in real time when they do eventually complete. It would an ideal solution if there were some sort of soft-DISABLE setting for ctm_agstat that would block new job submissions, but allow executing jobs to finish.

      Yes, I could script it to hold all jobs on a given node, but Id prefer to not to be vulnerable to DB schema changes or utility name or command-line option changes if I can help it.

      Our CTM Servers are v6.1.02, Agents are a mix of v224 & v6.1 on UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.

      Thanks in advance for your input.

      Tom

    • Larry Sabados (IT)
      We use Quantitative Resources. Each Agent has it s own pool of quantitative resources it s jobs are defined to need this resource to submit. When we schedule
      Message 2 of 9 , Sep 1, 2004
      • 0 Attachment
        Batch and server outages
        We use Quantitative Resources.  Each Agent has it's own pool of quantitative resources it's jobs are defined to need this resource to submit.  When we schedule maintenance on one or more of these Agent machines, Ops simply sets the "Max" value of the resource to 0 (zero).  This allows all jobs needing this resource to complete and no new jobs, needing that Agents Quant. Resource to execute.
         
        -Larry
        -----Original Message-----
        From: Tom.Westbrook [mailto:tom.westbrook@...]
        Sent: Wednesday, September 01, 2004 10:52 AM
        To: Control-X@yahoogroups.com
        Subject: [Control-X] Batch and server outages

        Im looking for the best way to hold batch for individual agent boxes (UNIX & Win mostly) when the agent is taken down for maintenance. Currently have operations manually hold jobs, but thats getting to be a pain, since some jobs get missed because we organize the schedule by application rather than by server or agent nodes, and jobs are often scattered among various Agents.

        Anyway, I wonder if others could share how they handle outages on a single Agent when you have a number of other Agents mixed in with your jobs on one CTM Server. We need to do it so that we wait for running jobs to complete, but also prevent any jobs from starting while waiting for the runners to finish.

        Ive thought about a Control resource, but not sure if that would be manageable with over a thousand Agent boxes in the schedule and most of the time there are very few jobs per Agent. Ive tried the ctm_agstat update <node> DISABLE on the CM Server, but that abends running jobs as disappeared, even though the job is still active on the Agent, and it doesnt update job statuses in real time when they do eventually complete. It would an ideal solution if there were some sort of soft-DISABLE setting for ctm_agstat that would block new job submissions, but allow executing jobs to finish.

        Yes, I could script it to hold all jobs on a given node, but Id prefer to not to be vulnerable to DB schema changes or utility name or command-line option changes if I can help it.

        Our CTM Servers are v6.1.02, Agents are a mix of v224 & v6.1 on UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.

        Thanks in advance for your input.

        Tom



        Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.


      • Stein Arne Storslett
        How many jobs are sharing this quant. res. at the most? And are there any performance issues, as you see it, with this approach? Stein Arne ... -- Stein Arne
        Message 3 of 9 , Sep 1, 2004
        • 0 Attachment
          How many jobs are sharing this quant. res. at the most?

          And are there any performance issues, as you see it, with this
          approach?

          Stein Arne

          * Larry Sabados (IT) <lsabados@...> [040901 17:01]:
          >
          > We use Quantitative Resources. Each Agent has it's own pool of
          > quantitative resources it's jobs are defined to need this resource to
          > submit. When we schedule maintenance on one or more of these Agent
          > machines, Ops simply sets the "Max" value of the resource to 0
          > (zero). This allows all jobs needing this resource to complete and no
          > new jobs, needing that Agents Quant. Resource to execute.
          >
          >
          >
          > -Larry
          >
          > -----Original Message-----
          > From: Tom.Westbrook [mailto:tom.westbrook@...]
          > Sent: Wednesday, September 01, 2004 10:52 AM
          > To: Control-X@yahoogroups.com
          > Subject: [Control-X] Batch and server outages
          >
          > I'm looking for the best way to hold batch for individual agent
          > boxes (UNIX & Win mostly) when the agent is taken down for
          > maintenance. Currently have operations manually hold jobs, but
          > that's getting to be a pain, since some jobs get missed because we
          > organize the schedule by application rather than by server or agent
          > nodes, and jobs are often scattered among various Agents.
          >
          > Anyway, I wonder if others could share how they handle outages on a
          > single Agent when you have a number of other Agents mixed in with
          > your jobs on one CTM Server. We need to do it so that we wait for
          > running jobs to complete, but also prevent any jobs from starting
          > while waiting for the runners to finish.
          >
          > I've thought about a Control resource, but not sure if that would
          > be manageable with over a thousand Agent boxes in the schedule and
          > most of the time there are very few jobs per Agent. I've tried the
          > "ctm_agstat -update <node> DISABLE" on the CM Server, but that
          > abends running jobs as "disappeared", even though the job is still
          > active on the Agent, and it doesn't update job statuses in real
          > time when they do eventually complete. It would an ideal solution
          > if there were some sort of `soft-DISABLE' setting for ctm_agstat
          > that would block new job submissions, but allow executing jobs to
          > finish.
          >
          > Yes, I could script it to hold all jobs on a given node, but I'd
          > prefer to not to be vulnerable to DB schema changes or utility name
          > or command-line option changes if I can help it.
          >
          > Our CTM Servers are v6.1.02, Agents are a mix of v224 & v6.1 on
          > UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.
          >
          > Thanks in advance for your input.

          --
          Stein Arne Storslett
        • Larry Sabados (IT)
          Well, we really don t have a set number of jobs we are restricting at one time. With us, it s either on (Max=9999) or off (Max=0). We might see
          Message 4 of 9 , Sep 1, 2004
          • 0 Attachment
            Well, we really don't have a set number of jobs we are restricting at one time.  With us, it's either "on" (Max=9999) or "off" (Max=0).  We might see performance issues if we had a slew of jobs executing at one, time, but our throughput is manageable at it's current load of jobs. We hope to get a better handle on limiting the quantity of this server-specific quantitative resource, but until then the floodgates are open. Sorry.
             
            -Larry
            -----Original Message-----
            From: Stein Arne Storslett [mailto:sastorsl@...]
            Sent: Wednesday, September 01, 2004 11:33 AM
            To: Control-X@yahoogroups.com
            Subject: Re: [Control-X] Batch and server outages

            How many jobs are sharing this quant. res. at the most?

            And are there any performance issues, as you see it, with this
            approach?

            Stein Arne

            * Larry Sabados (IT) <lsabados@...> [040901 17:01]:
            >
            >    We  use  Quantitative  Resources.   Each  Agent  has  it's own pool of
            >    quantitative  resources it's jobs are defined to need this resource to
            >    submit.  When  we  schedule  maintenance on one or more of these Agent
            >    machines,  Ops  simply  sets  the  "Max"  value  of  the resource to 0
            >    (zero).  This allows all jobs needing this resource to complete and no
            >    new jobs, needing that Agents Quant. Resource to execute.
            >
            >
            >
            >    -Larry
            >
            >    -----Original Message-----
            >    From: Tom.Westbrook [mailto:tom.westbrook@...]
            >    Sent: Wednesday, September 01, 2004 10:52 AM
            >    To: Control-X@yahoogroups.com
            >    Subject: [Control-X] Batch and server outages
            >
            >      I'm  looking  for  the  best way to hold batch for individual agent
            >      boxes  (UNIX  &  Win  mostly)  when  the  agent  is  taken down for
            >      maintenance.  Currently  have  operations  manually  hold jobs, but
            >      that's  getting to be a pain, since some jobs get missed because we
            >      organize the schedule by application rather than by server or agent
            >      nodes, and jobs are often scattered among various Agents.
            >
            >      Anyway, I wonder if others could share how they handle outages on a
            >      single  Agent  when you have a number of other Agents mixed in with
            >      your  jobs  on one CTM Server. We need to do it so that we wait for
            >      running  jobs  to complete, but also prevent any jobs from starting
            >      while waiting for the runners to finish.
            >
            >      I've  thought  about a Control resource, but not sure if that would
            >      be  manageable with over a thousand Agent boxes in the schedule and
            >      most  of the time there are very few jobs per Agent. I've tried the
            >      "ctm_agstat  -update  <node>  DISABLE"  on  the CM Server, but that
            >      abends  running jobs as "disappeared", even though the job is still
            >      active  on  the  Agent,  and it doesn't update job statuses in real
            >      time  when  they do eventually complete. It would an ideal solution
            >      if  there  were  some sort of `soft-DISABLE' setting for ctm_agstat
            >      that  would  block new job submissions, but allow executing jobs to
            >      finish.
            >
            >      Yes,  I  could  script it to hold all jobs on a given node, but I'd
            >      prefer to not to be vulnerable to DB schema changes or utility name
            >      or command-line option changes if I can help it.
            >
            >      Our  CTM  Servers  are  v6.1.02, Agents are a mix of v224 & v6.1 on
            >      UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.
            >
            >      Thanks in advance for your input.

            --
            Stein Arne Storslett


            Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.


          • Stein Arne Storslett
            Okay, I was thinking more in the line of 5000 jobs a day with a small amount of resources, or 5000 jobs a day _with_ resources added. Stein Arne ... -- Stein
            Message 5 of 9 , Sep 1, 2004
            • 0 Attachment
              Okay, I was thinking more in the line of 5000 jobs a day with a
              small amount of resources, or 5000 jobs a day _with_ resources
              added.

              Stein Arne

              * Larry Sabados (IT) <lsabados@...> [040901 17:42]:
              >
              > Well, we really don't have a set number of jobs we are restricting at
              > one time. With us, it's either "on" (Max=9999) or "off" (Max=0). We
              > might see performance issues if we had a slew of jobs executing at
              > one, time, but our throughput is manageable at it's current load of
              > jobs. We hope to get a better handle on limiting the quantity of this
              > server-specific quantitative resource, but until then the floodgates
              > are open. Sorry.
              >
              >
              >
              > -Larry
              >
              > -----Original Message-----
              > From: Stein Arne Storslett [mailto:sastorsl@...]
              > Sent: Wednesday, September 01, 2004 11:33 AM
              > To: Control-X@yahoogroups.com
              > Subject: Re: [Control-X] Batch and server outages
              >
              > How many jobs are sharing this quant. res. at the most?
              > And are there any performance issues, as you see it, with this
              > approach?
              > Stein Arne
              > * Larry Sabados (IT) <lsabados@...> [040901 17:01]:
              > >
              > > We use Quantitative Resources. Each Agent has it's own
              > pool of
              > > quantitative resources it's jobs are defined to need this
              > resource to
              > > submit. When we schedule maintenance on one or more of
              > these Agent
              > > machines, Ops simply sets the "Max" value of the
              > resource to 0
              > > (zero). This allows all jobs needing this resource to
              > complete and no
              > > new jobs, needing that Agents Quant. Resource to execute.
              > >
              > >
              > >
              > > -Larry
              > >
              > > -----Original Message-----
              > > From: Tom.Westbrook [mailto:tom.westbrook@...]
              > > Sent: Wednesday, September 01, 2004 10:52 AM
              > > To: Control-X@yahoogroups.com
              > > Subject: [Control-X] Batch and server outages
              > >
              > > I'm looking for the best way to hold batch for
              > individual agent
              > > boxes (UNIX & Win mostly) when the agent is taken
              > down for
              > > maintenance. Currently have operations manually hold
              > jobs, but
              > > that's getting to be a pain, since some jobs get missed
              > because we
              > > organize the schedule by application rather than by server
              > or agent
              > > nodes, and jobs are often scattered among various Agents.
              > >
              > > Anyway, I wonder if others could share how they handle
              > outages on a
              > > single Agent when you have a number of other Agents mixed
              > in with
              > > your jobs on one CTM Server. We need to do it so that we
              > wait for
              > > running jobs to complete, but also prevent any jobs from
              > starting
              > > while waiting for the runners to finish.
              > >
              > > I've thought about a Control resource, but not sure if
              > that would
              > > be manageable with over a thousand Agent boxes in the
              > schedule and
              > > most of the time there are very few jobs per Agent. I've
              > tried the
              > > "ctm_agstat -update <node> DISABLE" on the CM Server,
              > but that
              > > abends running jobs as "disappeared", even though the job
              > is still
              > > active on the Agent, and it doesn't update job statuses
              > in real
              > > time when they do eventually complete. It would an ideal
              > solution
              > > if there were some sort of `soft-DISABLE' setting for
              > ctm_agstat
              > > that would block new job submissions, but allow executing
              > jobs to
              > > finish.
              > >
              > > Yes, I could script it to hold all jobs on a given node,
              > but I'd
              > > prefer to not to be vulnerable to DB schema changes or
              > utility name
              > > or command-line option changes if I can help it.
              > >
              > > Our CTM Servers are v6.1.02, Agents are a mix of v224 &
              > v6.1 on
              > > UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.
              > >
              > > Thanks in advance for your input.

              --
              Stein Arne Storslett
            • Tom.Westbrook
              I guess the problem I see with this approach is that for our situation we ll have to add a distinct Quantitative Resource (QR) for every Agent box. On our
              Message 6 of 9 , Sep 1, 2004
              • 0 Attachment

                I guess the problem I see with this approach is that for our situation we’ll have to add a distinct Quantitative Resource (QR) for every Agent box. On our biggest CM Server that would mean about 1200 distinct QRs for only approx 3000 jobs, since there are only 1-4 jobs per agent. Not sure how the CM Server will react to managing that many QRs.

                 

                We do use the single QR approach on all the jobs on each CM Server to manage outages on the CM Server box itself. It works fine, but I’d prefer a method that didn’t rely on job changes just to manage external events that don’t really have anything to do with the application the job is for (I’m thinking application as distinct from its hardware). This is especially true since applications can be moved to newer boxes from time-to-time and you’d have to rename/redefine the QRs each time that happened (assuming the box name was part of the QR name somehow).

                 

                I guess I’ll have to go with a script for now to do the holds. But that’s not ideal, since executing jobs will need to be managed separately and manually (doing this with cyclic jobs can be nearly impossible if they trigger right away).

                 

                Maybe I’ll make a wish item with BMC to see if I can talk them into changing ctm_agstat or making a new utility to help manage Agents with a bit more granularity.

                Tom

                -----Original Message-----
                From: Larry Sabados (IT) [mailto:lsabados@...]
                Sent:
                Wednesday, September 01, 2004 10:38 AM
                To:
                Control-X@yahoogroups.com
                Subject: RE: [Control-X] Batch and server outages

                 

                Well, we really don't have a set number of jobs we are restricting at one time.  With us, it's either "on" (Max=9999) or "off" (Max=0).  We might see performance issues if we had a slew of jobs executing at one, time, but our throughput is manageable at it's current load of jobs. We hope to get a better handle on limiting the quantity of this server-specific quantitative resource, but until then the floodgates are open. Sorry.

                 

                -Larry

                -----Original Message-----
                From: Stein Arne Storslett [mailto:sastorsl@...]
                Sent: Wednesday, September 01, 2004 11:33 AM
                To: Control-X@yahoogroups.com
                Subject: Re: [Control-X] Batch and server outages

                How many jobs are sharing this quant. res. at the most?

                And are there any performance issues, as you see it, with this
                approach?

                Stein Arne

                * Larry Sabados (IT) <lsabados@...> [040901 17:01]:
                >
                >    We  use  Quantitative  Resources.   Each  Agent  has  it's own pool of
                >    quantitative  resources it's jobs are defined to need this resource to
                >    submit.  When  we  schedule  maintenance on one or more of these Agent
                >    machines,  Ops  simply  sets  the  "Max"  value  of  the resource to 0
                >    (zero).  This allows all jobs needing this resource to complete and no
                >    new jobs, needing that Agents Quant. Resource to execute.
                >
                >
                >
                >    -Larry
                >
                >    -----Original Message-----
                >    From: Tom.Westbrook [mailto:tom.westbrook@...]
                >    Sent: Wednesday, September 01, 2004 10:52 AM
                >    To: Control-X@yahoogroups.com
                >    Subject: [Control-X] Batch and server outages
                >
                >      I'm  looking  for  the  best way to hold batch for individual agent
                >      boxes  (UNIX  &  Win  mostly)  when  the  agent  is  taken down for
                >      maintenance.  Currently  have  operations  manually  hold jobs, but
                >      that's  getting to be a pain, since some jobs get missed because we
                >      organize the schedule by application rather than by server or agent
                >      nodes, and jobs are often scattered among various Agents.
                >
                >      Anyway, I wonder if others could share how they handle outages on a
                >      single  Agent  when you have a number of other Agents mixed in with
                >      your  jobs  on one CTM Server. We need to do it so that we wait for
                >      running  jobs  to complete, but also prevent any jobs from starting
                >      while waiting for the runners to finish.
                >
                >      I've  thought  about a Control resource, but not sure if that would
                >      be  manageable with over a thousand Agent boxes in the schedule and
                >      most  of the time there are very few jobs per Agent. I've tried the
                >      "ctm_agstat  -update  <node>  DISABLE"  on  the CM Server, but that
                >      abends  running jobs as "disappeared", even though the job is still
                >      active  on  the  Agent,  and it doesn't update job statuses in real
                >      time  when  they do eventually complete. It would an ideal solution
                >      if  there  were  some sort of `soft-DISABLE' setting for ctm_agstat
                >      that  would  block new job submissions, but allow executing jobs to
                >      finish.
                >
                >      Yes,  I  could  script it to hold all jobs on a given node, but I'd
                >      prefer to not to be vulnerable to DB schema changes or utility name
                >      or command-line option changes if I can help it.
                >
                >      Our  CTM  Servers  are  v6.1.02, Agents are a mix of v224 & v6.1 on
                >      UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.
                >
                >      Thanks in advance for your input.

                --
                Stein Arne Storslett


                Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.




                Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.



              • Larry Sabados (IT)
                We experience no problems but yet we only have 259 Quant. Resources for our CONTROL-M/Server to maintain. I do agree that it appears that this option should
                Message 7 of 9 , Sep 1, 2004
                • 0 Attachment
                  We experience no problems but yet we only have 259 Quant. Resources for our CONTROL-M/Server to maintain.  I do agree that it appears that this option should have been a no-brainer for the BMC development folks.  I suppose they assumed that their customers would never take an Agent down for maintenance!!!!
                  -----Original Message-----
                  From: Tom.Westbrook [mailto:tom.westbrook@...]
                  Sent: Wednesday, September 01, 2004 1:41 PM
                  To: Control-X@yahoogroups.com
                  Subject: RE: [Control-X] Batch and server outages

                  I guess the problem I see with this approach is that for our situation we’ll have to add a distinct Quantitative Resource (QR) for every Agent box. On our biggest CM Server that would mean about 1200 distinct QRs for only approx 3000 jobs, since there are only 1-4 jobs per agent. Not sure how the CM Server will react to managing that many QRs.

                   

                  We do use the single QR approach on all the jobs on each CM Server to manage outages on the CM Server box itself. It works fine, but I’d prefer a method that didn’t rely on job changes just to manage external events that don’t really have anything to do with the application the job is for (I’m thinking application as distinct from its hardware). This is especially true since applications can be moved to newer boxes from time-to-time and you’d have to rename/redefine the QRs each time that happened (assuming the box name was part of the QR name somehow).

                   

                  I guess I’ll have to go with a script for now to do the holds. But that’s not ideal, since executing jobs will need to be managed separately and manually (doing this with cyclic jobs can be nearly impossible if they trigger right away).

                   

                  Maybe I’ll make a wish item with BMC to see if I can talk them into changing ctm_agstat or making a new utility to help manage Agents with a bit more granularity.

                  Tom

                  -----Original Message-----
                  From: Larry Sabados (IT) [mailto:lsabados@...]
                  Sent:
                  Wednesday, September 01, 2004 10:38 AM
                  To:
                  Control-X@yahoogroups.com
                  Subject: RE: [Control-X] Batch and server outages

                   

                  Well, we really don't have a set number of jobs we are restricting at one time.  With us, it's either "on" (Max=9999) or "off" (Max=0).  We might see performance issues if we had a slew of jobs executing at one, time, but our throughput is manageable at it's current load of jobs. We hope to get a better handle on limiting the quantity of this server-specific quantitative resource, but until then the floodgates are open. Sorry.

                   

                  -Larry

                  -----Original Message-----
                  From: Stein Arne Storslett [mailto:sastorsl@...]
                  Sent: Wednesday, September 01, 2004 11:33 AM
                  To: Control-X@yahoogroups.com
                  Subject: Re: [Control-X] Batch and server outages

                  How many jobs are sharing this quant. res. at the most?

                  And are there any performance issues, as you see it, with this
                  approach?

                  Stein Arne

                  * Larry Sabados (IT) <lsabados@...> [040901 17:01]:
                  >
                  >    We  use  Quantitative  Resources.   Each  Agent  has  it's own pool of
                  >    quantitative  resources it's jobs are defined to need this resource to
                  >    submit.  When  we  schedule  maintenance on one or more of these Agent
                  >    machines,  Ops  simply  sets  the  "Max"  value  of  the resource to 0
                  >    (zero).  This allows all jobs needing this resource to complete and no
                  >    new jobs, needing that Agents Quant. Resource to execute.
                  >
                  >
                  >
                  >    -Larry
                  >
                  >    -----Original Message-----
                  >    From: Tom.Westbrook [mailto:tom.westbrook@...]
                  >    Sent: Wednesday, September 01, 2004 10:52 AM
                  >    To: Control-X@yahoogroups.com
                  >    Subject: [Control-X] Batch and server outages
                  >
                  >      I'm  looking  for  the  best way to hold batch for individual agent
                  >      boxes  (UNIX  &  Win  mostly)  when  the  agent  is  taken down for
                  >      maintenance.  Currently  have  operations  manually  hold jobs, but
                  >      that's  getting to be a pain, since some jobs get missed because we
                  >      organize the schedule by application rather than by server or agent
                  >      nodes, and jobs are often scattered among various Agents.
                  >
                  >      Anyway, I wonder if others could share how they handle outages on a
                  >      single  Agent  when you have a number of other Agents mixed in with
                  >      your  jobs  on one CTM Server. We need to do it so that we wait for
                  >      running  jobs  to complete, but also prevent any jobs from starting
                  >      while waiting for the runners to finish.
                  >
                  >      I've  thought  about a Control resource, but not sure if that would
                  >      be  manageable with over a thousand Agent boxes in the schedule and
                  >      most  of the time there are very few jobs per Agent. I've tried the
                  >      "ctm_agstat  -update  <node>  DISABLE"  on  the CM Server, but that
                  >      abends  running jobs as "disappeared", even though the job is still
                  >      active  on  the  Agent,  and it doesn't update job statuses in real
                  >      time  when  they do eventually complete. It would an ideal solution
                  >      if  there  were  some sort of `soft-DISABLE' setting for ctm_agstat
                  >      that  would  block new job submissions, but allow executing jobs to
                  >      finish.
                  >
                  >      Yes,  I  could  script it to hold all jobs on a given node, but I'd
                  >      prefer to not to be vulnerable to DB schema changes or utility name
                  >      or command-line option changes if I can help it.
                  >
                  >      Our  CTM  Servers  are  v6.1.02, Agents are a mix of v224 & v6.1 on
                  >      UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.
                  >
                  >      Thanks in advance for your input.

                  --
                  Stein Arne Storslett


                  Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.




                  Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.





                  Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.


                • mark_ceulemans@yahoo.com
                  Hi Tom, this is about the only feature in Tivoli s scheduler I like better than in controlm. One has to set a limit of concurrent jobs on each agent, which has
                  Message 8 of 9 , Sep 2, 2004
                  • 0 Attachment
                    Hi Tom,

                    this is about the only feature in Tivoli's scheduler I like better than in controlm.
                    One has to set a limit of concurrent jobs on each agent, which has nothing to do with resources.
                    If you set this limit to 0, all jobs are held until someone increases the limit again.
                    Thus, there are no resources to maintain.

                    If development is reading this, please take a note of this :-)

                    Yours,

                    Tom.Westbrook wrote:
                    Batch and server outages

                    Im looking for the best way to hold batch for individual agent boxes (UNIX & Win mostly) when the agent is taken down for maintenance. Currently have operations manually hold jobs, but thats getting to be a pain, since some jobs get missed because we organize the schedule by application rather than by server or agent nodes, and jobs are often scattered among various Agents.

                    Anyway, I wonder if others could share how they handle outages on a single Agent when you have a number of other Agents mixed in with your jobs on one CTM Server. We need to do it so that we wait for running jobs to complete, but also prevent any jobs from starting while waiting for the runners to finish.

                    Ive thought about a Control resource, but not sure if that would be manageable with over a thousand Agent boxes in the schedule and most of the time there are very few jobs per Agent. Ive tried the ctm_agstat update <node> DISABLE on the CM Server, but that abends running jobs as disappeared, even though the job is still active on the Agent, and it doesnt update job statuses in real time when they do eventually complete. It would an ideal solution if there were some sort of soft-DISABLE setting for ctm_agstat that would block new job submissions, but allow executing jobs to finish.

                    Yes, I could script it to hold all jobs on a given node, but Id prefer to not to be vulnerable to DB schema changes or utility name or command-line option changes if I can help it.

                    Our CTM Servers are v6.1.02, Agents are a mix of v224 & v6.1 on UNIX, v224 for NT & v6.0 on W2k. EM is 6.1.02.

                    Thanks in advance for your input.

                    Tom



                    Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.



                    -- 
                    Kind regards,
                    Mark Ceulemans
                    
                  • Larry Sabados (IT)
                    Pat- Yes, your method sounds solid, but as you mentioned, because there is not mass update of PreReqs or Conditions, using the CONTROL-M/Desktop (I m at the
                    Message 9 of 9 , Sep 2, 2004
                    • 0 Attachment
                      Pat-
                       
                      Yes, your method sounds solid, but as you mentioned, because there is not mass update of PreReqs or Conditions, using the CONTROL-M/Desktop (I'm at the same version as you) updating all our jobs would be a bit time consuming...but still well worth considering.   I will forward this on to my Production Control staff.
                       
                      -Larry
                      -----Original Message-----
                      From: pat_hicok@... [mailto:pat_hicok@...]
                      Sent: Thursday, September 02, 2004 9:35 AM
                      To: Control-X@yahoogroups.com
                      Subject: Re: [Control-X] Batch and server outages






                      Larry,

                      The concept of using either Prerequisite conditions or Quantatative
                      resources accomplish the same feat. However, the Prereq has it's drawbacks,
                      in that you are not able to 'drain' processing before proceeding.

                      Our challenge was much like yours. We were faced with a way to automate the
                      Reboot procedures for a set of Unix servers. Certainly this could be done
                      utilizing 'IN' conditions, and simply removing them. The problem was, what
                      if something was running? I suppose I could have a control resource that
                      the reboot job needs excusive, and share all the other processes, and then
                      take away the 'IN' conditions--you get the picture.

                      I chose to use Quant resources for the simple reason that it was the first
                      choice I came up with.  Here is how we accomplish automating the reboot.

                      All jobs for a particular agent have two quant resources coded that require
                      "1" each of. We called the <agent name>-IPL and <agent name>-UP  The
                      available Quants to all jobs requires a static number. We use 20. This
                      accomplishes two needs. One for the reboot, but the other to ensure we
                      don't overload the agent with work.

                      JOB-A runs  ECAQRTAB UPDATE <agent name>-UP 0 on the Control-M/Sever.
                      This sets the UP condition to start the draining process.  No other jobs
                      will start, and those running will be allowed to continue.  I pass autoedit
                      parms from the job definition into the script for what the agent name is.
                      This way I only have to maintain one script.

                      JOB-B requires ALL of the <agent name>-IPL resources in order to submit.
                      Once all jobs running complete, it is allowed to run
                      It runs a process that does a unix scirpt that has this coded in it:

                      echo '/usr/sbin/shutdown -Fr' | /usr/bin/at now +1 minute

                      This basically says "one minute from now, issue the shutdown command".
                      This allows the agent to report back that the job completed (if you issue
                      reboot now, the job goes UNKNOWN, then fails when the system comes back
                      with "sysout not found").  It give the ability to trigger the next process.

                      JOB-C is triggered by JOB-B, runs on the Control-M/Server and does the
                      following:

                      Sleeps for xx minutes. This is based on the average time it takes the
                      system to reboot.
                      ECAQRTAB UPDATE <agent name>-UP 20  (effectively says processing can again
                      begin)

                      Now, if the agent isn't up, the jobs simple turn BLUE and will wait for the
                      Agent to become available.

                      That was Phase-I of what we did.

                      These are the additions I added to Phase-II, and I would suggest doing.

                      Added a "between" (From/Until) parameter to JOB-B.  The reboot must fall
                      between these two times, otherwise it will not run.
                      Added a LateSub message to JOB-B if the UNTIL time is exceeded. This
                      notifies Operations Staff that the reboot did not occur. Currently I have
                      them manually Force OK the job, thus allowing JOB-C to run. I suppose I
                      could code IF/THEN statements to automate this, but that is for Phase-3, if
                      needed.

                      This is basically the process in a nutshell (And for those at the Chicago
                      Forum04, you heard me speak of it).

                      Now, keep in mind that I started this at the initial stages of scheduling
                      processes.  If I had to go back and code these statements on 2,000 jobs
                      currently in production, I might rethink my approach. This is simple due to
                      the fact that Desktop does not allow for "mass changes" to conditions and
                      resources (at least not at the level we are 6.1.02.05). I suppose it could
                      be done by exporting to XML, making adjustments, and importing back. Or
                      perhaps using some slick SQL process. Nothing exists and would require
                      developing some code to accomplish it.  I suppose you could start adding
                      these to new processes, and then address the old jobs as time permits, but
                      we all know that 'time permits' rarely ever occurs.

                      If you require further explaination on this, you can contact me
                      offline....or even online here.

                      Regards,
                      Pat Hicok
                      Spartan Stores



                                                                                                
                                   "Larry Sabados                                               
                                   \(IT\)"                                                      
                                   <lsabados@borders                                          To
                                   groupinc.com>             <Control-X@yahoogroups.com>        
                                                                                              cc
                                   09/02/2004 08:30                                             
                                   AM                                                    Subject
                                                             RE: [SPAM Mail - High Probability] 
                                                             Re: [Control-X] Batch and serv er  
                                   Please respond to         outages                            
                                   Control-X@yahoogr                                            
                                       oups.com                                                 
                                                                                                
                                                                                                
                                                                                                
                                                                                                





                      IN conditions...  Good suggestion!  I guess I never thought of  that (using
                      the Agent-specific IN conditions).  Although, I'm not crazy  about having
                      Ops monkey around with Prereq Conditions.  If I could get them  to use the
                      filters more effectively, then I'd be more comfortable with them  going
                      into the conditions and deleting the correct one.

                      -Larry
                      -----Original Message-----
                      From: Stein Arne Storslett  [mailto:sastorsl@...]
                      Sent: Thursday, September 02, 2004 8:02  AM
                      To: 'Control-X@yahoogroups.com'
                      Subject: Re: [SPAM Mail  - High Probability] Re: [Control-X] Batch and serv
                      er  outages


                      This is exactly what I'm concerned  about.
                      The in-condition approach sounds like a good plan.

                      Has  anybody gotten a "best practice" answer from BMC
                      regarding this  issue?

                      Stein Arne

                      * Robert A Neal  <Robert.a.neal.chuh@...> [040902 13:52]:
                      >
                      >    We  don't  use  resources   at  as as we have found them to create
                      huge
                      >     overhead  in  the  EM.  We  have 2  specific IN  condition (one for
                      the
                      >    server  and  one  for the application) that every job that runs on
                      that
                      >    server  needs  in  order   to  be  submitted. When we have an outage
                      we
                      >    simply  delete the condition from the CRF  and everything stops. At
                      one
                      >    time  we   had  the  IN  conditions  defined  as  resources  but our
                      ECS
                      >    performance  suffered considerable  slow-down with this approach as
                      the
                      >    ECS   struggled  to  keep everything up to date. At the time we were
                      on
                      >    ECS  6.0. We are in the process of upgrading  to 6.1.03 so I don't
                      know
                      >    if this will still be an  issue. We were running approx 1700 jobs a
                      day
                      >    across  27 servers.
                      >
                      >     Thanks,
                      >    Robert A Neal .
                      >     Enterprise Scheduling Support
                      >    Technical Analyst
                      >    Corporate South P-2
                      >    Email  *: Robert.A.Neal.CHUH@...
                      >
                      >      -----Original Message-----
                      >    From:  sharon.young@...  [mailto:sharon.young@...]
                      >    Sent: Wednesday,  September 01, 2004 5:48 PM
                      >    To:  Control-X@yahoogroups.com
                      >    Subject:  RE: [SPAM  Mail - High Probability] Re: [Control-X] Batch
                      and
                      >     server outages
                      >
                      >    We  also use resources  for each server as Larry describes below.
                      It's
                      >     the easiest way to control the running of jobs on each  server
                      >    for  scheduled  down   times.     You can even automate this process
                      by
                      >    scheduling  batch  jobs   (using  ecsqrtab  utility) for the down
                      times
                      >    specified  to automatically alter the  resource values for the time
                      the
                      >    device   is  down.  We sometimes include a CONFIRM as well, just
                      incase
                      >    downtime  goes overtime (in which the  Operators reply, once the
                      server
                      >    team have  finished).  Just a thought.
                      >    Shaz.....
                      >
                      >    Please respond to Control-X@yahoogroups.com
                      >
                      >    To:         Control-X@yahoogroups.com
                      >     cc:
                      >     Subject:            RE:  [SPAM Mail - High Probability] Re:
                      [Control-X]
                      >    Batch  and server outages
                      >    A   shared   control resource would probably be a better option for
                      all
                      >    jobs  using  a    particular  agent.  The agent is simply there, or
                      its
                      >    not.   To  turn   the   scheduling off, simply add the control
                      resource
                      >    manually and no other jobs will  be  scheduled to it.
                      >    -----Original  Message-----
                      >    From: Stein Arne Storslett   [mailto:sastorsl@...]
                      >    Sent: Thursday, 2  September 2004 1:33  AM
                      >    To:  Control-X@yahoogroups.com
                      >    Subject:  [SPAM   Mail  -   High Probability] Re: [Control-X] Batch
                      and
                      >    server  outages
                      >     How many jobs are sharing this quant. res. at  the  most?
                      >    And are there any performance issues, as you  see it, with  this
                      >     approach?
                      >    Stein Arne
                      >    *  Larry Sabados (IT)  <lsabados@...> [040901  17:01]:
                      >    >
                      >     >     We   use  Quantitative    Resources.   Each  Agent  has  it's
                      own
                      >    pool  of
                      >     >     quantitative   resources   it's  jobs  are   defined to need
                      this
                      >    resource to
                      >     >    submit.   When  we  schedule   maintenance on one or more of
                      these
                      >      Agent
                      >    >      machines,   Ops   simply     sets   the   "Max"   value   of
                      the
                      >    resource to  0
                      >     >     (zero).  This allows all jobs needing  this  resource to
                      complete
                      >    and  no
                      >    >    new jobs, needing  that  Agents Quant. Resource to execute.
                      >     >
                      >    >
                      >     >
                      >    >     -Larry
                      >    >
                      >     >     -----Original  Message-----
                      >    >    From:  Tom.Westbrook   [mailto:tom.westbrook@...]
                      >     >    Sent: Wednesday,  September 01, 2004 10:52  AM
                      >    >    To:   Control-X@yahoogroups.com
                      >    >     Subject: [Control-X] Batch  and server outages
                      >     >
                      >    >        I'm   looking  for  the  best way to hold batch for
                      individual
                      >     agent
                      >     >       boxes    (UNIX    &  Win  mostly)  when   the  agent  is
                      taken
                      >    down  for
                      >    >        maintenance.    Currently   have    operations   manually
                      hold
                      >     jobs,  but
                      >     >       that's   getting   to  be  a  pain,  since some jobs get
                      missed
                      >    because we
                      >     >        organize the schedule by  application rather than by server
                      or
                      >      agent
                      >    >      nodes, and  jobs are often  scattered among various Agents.
                      >     >
                      >    >        Anyway, I wonder if others could share how they handle
                      outages
                      >    on  a
                      >     >      single  Agent  when you  have  a number of other Agents mixed
                      in
                      >     with
                      >    >         your   jobs   on  one CTM Server. We need to do it so  that
                      we
                      >    wait  for
                      >     >       running   jobs    to   complete, but also prevent any jobs
                      from
                      >    starting
                      >     >      while waiting for the runners  to  finish.
                      >    >
                      >     >       I've   thought  about  a Control resource, but not sure if
                      that
                      >      would
                      >    >        be   manageable  with  over  a    thousand  Agent  boxes in
                      the
                      >    schedule  and
                      >    >         most   of  the  time  there are very few jobs per  Agent.
                      I've
                      >    tried   the
                      >    >       "ctm_agstat  -update   <node>  DISABLE"   on  the CM Server,
                      but
                      >      that
                      >    >        abends  running jobs as  "disappeared", even though the job
                      is
                      >     still
                      >     >      active  on  the    Agent,  and it doesn't update job statuses
                      in
                      >     real
                      >     >       time   when    they  do  eventually complete. It would an
                      ideal
                      >    solution
                      >     >       if   there    were    some  sort  of `soft-DISABLE' setting
                      for
                      >    ctm_agstat
                      >     >       that   would    block  new job submissions, but allow
                      executing
                      >     jobs  to
                      >    >       finish.
                      >    >
                      >     >       Yes,   I    could  script  it to hold all jobs on a given
                      node,
                      >    but  I'd
                      >     >      prefer to not to be vulnerable to DB   schema changes or
                      utility
                      >     name
                      >    >      or   command-line option changes if I can help it.
                      >     >
                      >    >        Our   CTM   Servers   are  v6.1.02, Agents  are a mix of v224
                      &
                      >    v6.1   on
                      >    >      UNIX, v224 for  NT & v6.0 on W2k.  EM is 6.1.02.
                      >     >
                      >    >      Thanks in  advance  for your input.

                      --
                      Stein Arne  Storslett


                      Control-X email list does not tolerate spam. For  more information
                      http://s390.8m.com/controlm.html  DO NOT Spam this list or any members. To
                      unsubscribe go to http://groups.yahoo.com/group/Control-X  and click on
                      User Center. Not affiliated with BMC Software.




                      Control-X email list does not tolerate spam. For more information
                      http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To
                      unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User
                      Center. Not affiliated with BMC Software.


                                                                                                 
                                                   Yahoo! Groups Sponsor                         
                                                                                                 
                                                          [IMAGE]                                
                                                                                                 
                          [IMAGE]                                                                
                                                                                                 




                      Yahoo! Groups Links

                         To visit your group on the web, go to:
                         http://groups.yahoo.com/group/Control-X/

                         To unsubscribe from this group, send an email to:
                         Control-X-unsubscribe@yahoogroups.com

                         Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.






                      Control-X email list does not tolerate spam. For more information http://s390.8m.com/controlm.html DO NOT Spam this list or any members. To unsubscribe go to http://groups.yahoo.com/group/Control-X and click on User Center. Not affiliated with BMC Software.


                    Your message has been successfully submitted and would be delivered to recipients shortly.