Loading ...
Sorry, an error occurred while loading the content.

Re: [ARRL-LOTW] Some LotW Statistics ...

Expand Messages
  • Philip Leonard
    That would be some fun toys to play with. Philip ... That would be some fun toys to play with. Philip On 1/3/2013 9:13 PM, Robert Chudek - K0RC wrote: I
    Message 1 of 14 , Jan 4, 2013
    • 0 Attachment
      That would be some fun "toys" to play with.

      Philip

      On 1/3/2013 9:13 PM, Robert Chudek - K0RC wrote:
      I was under the 'assumption' that the original SAN storage was going to be replaced with SSD technology. Later it was 'leaked' that in addition to this change, the server hosting the LoTW application was also going to be replaced as well. The HP PrioLiant Gen8 name was inserted as the replacement platform.

      If anyone has researched this product, you will find it to be 'bleeding edge' technology, innovative, and extremely powerful. HP claims this new product line will shake up the industry, sending other enterprise hardware suppliers back to the drawing boards. If you are interested in learning more regarding this new technology, here's the portal to enter: http://h17007.www1.hp.com/us/en/whatsnew/proliantgen8/index.aspx

      In one of the subsets of the online videos and documentation, they state: 

      "By converging server and storage together and optimizing for solid state performance, ProLiant Gen8 servers get rid of the bottlenecks and accelerate your most demanding apps.
      • Solid state optimized: 100x faster storage performance.
      • Smart data protection: Confidently protect your data and help ensure uptime.
      • Smart data services: Enable real-time application tuning and acceleration."

      There isn't a reference to 100-times faster than what previous technology (NAS, SAN, DAS, or Floppy Disks), but if the current hardware supporting LoTW is indeed 10 years old, their performance figure might be close to the mark (or at least we can hope).

      Another reflector participant backed away from discussions after revealing his professional connection with this technology. Whether that is as a HP employee or at a Systems Integrator level, I would encourage his return to the discussion. Unless there is a NDA (non-discolsure agreement) in place between the ARRL and the vendor, I don't see a conflict of interest. The purchasing decision has (obviously) been made, the order placed, and according to Harold, WJ1B, the hardware is 'in house' being setup and test prior to deployment.

      73 de Bob - KØRC in MN



    • Joe Subich, W4TV
      ... The smoking gun is *why* the server was at capacity and that was due to the redundant processing. The worst part of it is that at least some people at
      Message 2 of 14 , Jan 4, 2013
      • 0 Attachment
        > So the smoking gun is the server was already at capacity which what
        > the ARRL stated.

        The smoking gun is *why* the server was at capacity and that was due to
        the redundant processing. The worst part of it is that at least some
        people at ARRL knew about the redundant processing but nothing was done
        to reduce it. Then the problem was made worse by the poorly worded
        suggestion that users upload logs than had not processed.

        If steps had been taken a year ago to reduce the redundant processing
        capacity improvements could have been made on a more orderly basis and
        not in a crisis environment.

        > In addition, because gas stations didn't have power from the electric
        > company because the electric company didn't have the capacity to
        > being them online quickly,

        Silly state of New Jersey ... Florida has rules that require gas
        stations to have generators just for hurricane emergencies. Some
        other states in heavy snow/ice areas do as well. I get tired of
        New Jersey and new York complaining about the hurricane issues -
        we just deal with them here in Florida and certainly don't get
        the taxpayer handouts being demanded by NY/NJ.

        73,

        ... Joe, W4TV


        On 1/4/2013 9:48 AM, k2dsl wrote:
        > Here is a perfectly plausible scenario/timeline and real life correlation to what we are experiencing which is not the users fault.
        >
        > - The server was right at capacity in October. Shame on the ARRL for letting that happen.
        > - All the contests logs come in and there is a backlog due to server throughput as has occurred in the past. Should we have told the contest sponsors to cancel their contests because it would cause undue stress on the already overloaded LoTW platform?
        > - Because of an LoTW bug that existed for a long time, some operators uploaded files are (again) lost.
        > - Delays and lost files cause operators to reupload their unprocessed/lost logs since they aren't processed in a timely fashion and don't stare at the LoTW pages.
        > - ARRL fixes their lost upload files bug and suggests folks reupload their logs if not processed by a certain date.
        > - Regular usage of LoTW still continues.
        > - LoTW users continue to reupload their still unprocessed logs because they just don't see them processed in the expected amount of time they have for years and still aren't staring at the LoTW home page.
        >
        > So the smoking gun is the server was already at capacity which what the ARRL stated.
        >
        > I can relate this well to Hurricane Sandy. I lost power for exactly a week and the power company didn't have the capacity to quickly get me back on line and service their other customers (ARRL & LoTW Server). I have a generator but now many people do so many folks were looking for gas beyond normal usage (contest logs). In addition, because gas stations didn't have power from the electric company because the electric company didn't have the capacity to being them online quickly, there were long lines for gas (LoTW input queue backed up and not getting shorter). Correlating this to your statements, its the fault of the people that have generators that the gas lines were long, not the capacity of the power company to bring service back to an acceptable level. The cause of the long delays are not the fault of those with generators or the gas stations that didn't have power but the capacity of the electric company to service its customers.
        >
        > I feel the above mirrors the current situation we are experiencing with LoTW. Sure you can obsess and blame the gas stations (logging programs) and say they should have their own power so they could all be open without electricity and protect the customers from larger outages. Sure you can obsess and blame the businesses and homeowners with gasoline powered generators that they are the cause and should switch to propane or natural gas to reduce demand since it isn't fair to those that don't have generators that want to put gas in their cars (the tQSL users). But the situation is solely because the power company (ARRL) reached their capacity to service/bring back online the needs of its customers (ham operators). That's the smoking gun. Everything else is just downstream fallout from that. And when the power came back on and capacity by the utility to service its customers was restored, no one cared who had a generator and who didn't. I could run my generator that requires gas 24x7
        and no one would now care. It wouldn't delay my neighbor from filling his car up or running his generator 24x7.
        >
        > You can't blame the users of the utility. Because a LoTW user isn't necessarily a paying customer isn't a valid excuse (I don't pay for anything Google and there is plenty of capacity for me). Because I use more capacity and leave my lights on 24x7 isn't an excuse - the capacity is expected to be there. The utility needs to provide the capacity its user's demand requires or it is their problem and the reason the system doesn't work. If they can't afford to meet that basic need, they will be out of business or customers will go elsewhere if there's a choice.
        >
        > David - K2DSL
        >
        > --- In ARRL-LOTW@yahoogroups.com, "Joe Subich, W4TV" wrote:
        >>
        >>
        >> > What's the smoking gun and cause of the problem?
        >>
        >> The "smoking gun" is that total uploads with 70% duplicates exceeded
        >> the fixed maximum processing capacity of the server. As I said, "This
        >> is a classic resource limited system that has reached it's saturation
        >> point early due to continued reprocessing the same data." If one
        >> backs out the growth rate to early October, it is clear that the
        >> system was running just below its maximum capability. With the
        >> "rush" from CQWW SSB it went into saturation and that was made much
        >> worse by folks like you telling everyone to upload their entire log
        >> without regard for duplicates.
        >>
        >> Fortunately the new server/storage systems will relieve the pressure
        >> for a time but depending on the server capacity and the ability of
        >> the AA6YQ/K1MU project to remove duplicates on the front end, the
        >> problem *will return* unless a substantial portion of the redundant
        >> processing is removed from the system. As K0RC showed, those who
        >> upload an entire log in order to add incremental contact generate
        >> redundant input at an exponential rate. Unless something is done
        >> to prevent the wasting of system resources a very few users have the
        >> potential sink massive amounts of server resources for unproductive
        >> processing as the exponential growth in previously uploaded QSO
        >> records swamps the ability to add more cores and faster storage.
        >>
        >> 73,
        >>
        >> ... Joe, W4TV
        >>
        >>
        >> On 1/3/2013 10:34 PM, k2dsl wrote:
        >>> What's the smoking gun and cause of the problem? That your current analytics put the capacity of the system at 750k records per day? What was the processing capacity in Oct or Sept when there wasn't a multi-day backlog? Is the processing capacity going down over the course of weeks/months?
        >>>
        >>> What will the processing capacity be with the new disks (and server if that is also replaced)?
        >>>
        >>> No sure what you've identified as the cause beyond what the ARRL has told us which is disk I/O. If the new disk eliminates the issue I guess the smoking gun was the disks.
        >>>
        >>> David - K2DSL
        >
        >
        >
        >
        > ------------------------------------
        >
        > Yahoo! Groups Links
        >
        >
        >
        >
      • David Levine
        ... !!! The server was at capacity because it didn t grow to support the increased volume. It s not the user s fault as everything the user did was perfectly
        Message 3 of 14 , Jan 4, 2013
        • 0 Attachment
           


          > So the smoking gun is the server was already at capacity which what
          > the ARRL stated.

          The smoking gun is *why* the server was at capacity and that was due to
          the redundant processing. The worst part of it is that at least some
          people at ARRL knew about the redundant processing but nothing was done
          to reduce it. Then the problem was made worse by the poorly worded
          suggestion that users upload logs than had not processed.

          !!! The server was at capacity because it didn't grow to support the increased volume. It's not the user's fault as everything the user did was perfectly valid. The system's are responsible for protecting themselves and for monitoring/alerting and the system owners are responsible for capacity. 
           

          > In addition, because gas stations didn't have power from the electric
          > company because the electric company didn't have the capacity to
          > being them online quickly,

          Silly state of New Jersey ... Florida has rules that require gas
          stations to have generators just for hurricane emergencies. Some
          other states in heavy snow/ice areas do as well. I get tired of
          New Jersey and new York complaining about the hurricane issues -
          we just deal with them here in Florida and certainly don't get
          the taxpayer handouts being demanded by NY/NJ.

          !!! Short memory Joe? In 2005 when Hurricane Wilma devastated Florida, the gas stations in Florida were without power just like NJ. AFTER that hurricane and another prior, in 2006 Florida enacted a law and came up with the following:

          "The Florida Legislature eventually passed a law ( http://www.cga.ct.gov/2011/rpt/2011-R-0389.htm ) directing that certain gas stations along evacuation routes be equipped to switch to generator-based power, but not that they actually own generators. The law also requires owners of eight or more fuel pumps in any one county to have access to at least one generator."

          Not along a hurricane evacuation route, then no need to support (not necessarily have on hand) external power. Not that it matters to my analogy but just providing the facts vs the eyes of the world thru W4TV. 


          73,

          ... Joe, W4TV


          On 1/4/2013 9:48 AM, k2dsl wrote:
          > Here is a perfectly plausible scenario/timeline and real life correlation to what we are experiencing which is not the users fault.
          >
          > - The server was right at capacity in October. Shame on the ARRL for letting that happen.
          > - All the contests logs come in and there is a backlog due to server throughput as has occurred in the past. Should we have told the contest sponsors to cancel their contests because it would cause undue stress on the already overloaded LoTW platform?
          > - Because of an LoTW bug that existed for a long time, some operators uploaded files are (again) lost.
          > - Delays and lost files cause operators to reupload their unprocessed/lost logs since they aren't processed in a timely fashion and don't stare at the LoTW pages.
          > - ARRL fixes their lost upload files bug and suggests folks reupload their logs if not processed by a certain date.
          > - Regular usage of LoTW still continues.
          > - LoTW users continue to reupload their still unprocessed logs because they just don't see them processed in the expected amount of time they have for years and still aren't staring at the LoTW home page.
          >
          > So the smoking gun is the server was already at capacity which what the ARRL stated.
          >
          > I can relate this well to Hurricane Sandy. I lost power for exactly a week and the power company didn't have the capacity to quickly get me back on line and service their other customers (ARRL & LoTW Server). I have a generator but now many people do so many folks were looking for gas beyond normal usage (contest logs). In addition, because gas stations didn't have power from the electric company because the electric company didn't have the capacity to being them online quickly, there were long lines for gas (LoTW input queue backed up and not getting shorter). Correlating this to your statements, its the fault of the people that have generators that the gas lines were long, not the capacity of the power company to bring service back to an acceptable level. The cause of the long delays are not the fault of those with generators or the gas stations that didn't have power but the capacity of the electric company to service its customers.
          >
          > I feel the above mirrors the current situation we are experiencing with LoTW. Sure you can obsess and blame the gas stations (logging programs) and say they should have their own power so they could all be open without electricity and protect the customers from larger outages. Sure you can obsess and blame the businesses and homeowners with gasoline powered generators that they are the cause and should switch to propane or natural gas to reduce demand since it isn't fair to those that don't have generators that want to put gas in their cars (the tQSL users). But the situation is solely because the power company (ARRL) reached their capacity to service/bring back online the needs of its customers (ham operators). That's the smoking gun. Everything else is just downstream fallout from that. And when the power came back on and capacity by the utility to service its customers was restored, no one cared who had a generator and who didn't. I could run my generator that requires gas 24x7
          and no one would now care. It wouldn't delay my neighbor from filling his car up or running his generator 24x7.
          >
          > You can't blame the users of the utility. Because a LoTW user isn't necessarily a paying customer isn't a valid excuse (I don't pay for anything Google and there is plenty of capacity for me). Because I use more capacity and leave my lights on 24x7 isn't an excuse - the capacity is expected to be there. The utility needs to provide the capacity its user's demand requires or it is their problem and the reason the system doesn't work. If they can't afford to meet that basic need, they will be out of business or customers will go elsewhere if there's a choice.
          >
          > David - K2DSL
          >
          > --- In ARRL-LOTW@yahoogroups.com, "Joe Subich, W4TV" wrote:
          >>
          >>
          >> > What's the smoking gun and cause of the problem?
          >>
          >> The "smoking gun" is that total uploads with 70% duplicates exceeded
          >> the fixed maximum processing capacity of the server. As I said, "This
          >> is a classic resource limited system that has reached it's saturation
          >> point early due to continued reprocessing the same data." If one
          >> backs out the growth rate to early October, it is clear that the
          >> system was running just below its maximum capability. With the
          >> "rush" from CQWW SSB it went into saturation and that was made much
          >> worse by folks like you telling everyone to upload their entire log
          >> without regard for duplicates.
          >>
          >> Fortunately the new server/storage systems will relieve the pressure
          >> for a time but depending on the server capacity and the ability of
          >> the AA6YQ/K1MU project to remove duplicates on the front end, the
          >> problem *will return* unless a substantial portion of the redundant
          >> processing is removed from the system. As K0RC showed, those who
          >> upload an entire log in order to add incremental contact generate
          >> redundant input at an exponential rate. Unless something is done
          >> to prevent the wasting of system resources a very few users have the
          >> potential sink massive amounts of server resources for unproductive
          >> processing as the exponential growth in previously uploaded QSO
          >> records swamps the ability to add more cores and faster storage.
          >>
          >> 73,
          >>
          >> ... Joe, W4TV
          >>
          >>
          >> On 1/3/2013 10:34 PM, k2dsl wrote:
          >>> What's the smoking gun and cause of the problem? That your current analytics put the capacity of the system at 750k records per day? What was the processing capacity in Oct or Sept when there wasn't a multi-day backlog? Is the processing capacity going down over the course of weeks/months?
          >>>
          >>> What will the processing capacity be with the new disks (and server if that is also replaced)?
          >>>
          >>> No sure what you've identified as the cause beyond what the ARRL has told us which is disk I/O. If the new disk eliminates the issue I guess the smoking gun was the disks.
          >>>
          >>> David - K2DSL
          >

        • Joe Subich, W4TV
          Your interpretation - at least in Florida the gas stations don t close when power is out due to hurricanes. New jersey should have learned from Florida s
          Message 4 of 14 , Jan 4, 2013
          • 0 Attachment
            Your interpretation - at least in Florida the gas stations don't close
            when power is out due to hurricanes. New jersey should have learned
            from Florida's experience just like ARRL should have learned from the
            2011 delays and started to trim the redundant processing a year ago.

            Instead, what should have been a manageable issue became a crisis.
            The ignorance and indolence of a small number of users resulted in a
            disaster for 50,000+ users. Of course, that's the political response
            - make everyone pay for the indolence of the few.

            73,

            ... Joe, W4TV


            On 1/4/2013 11:00 AM, David Levine wrote:
            >>
            >>> So the smoking gun is the server was already at capacity which what
            >>> the ARRL stated.
            >>
            >> The smoking gun is *why* the server was at capacity and that was due to
            >> the redundant processing. The worst part of it is that at least some
            >> people at ARRL knew about the redundant processing but nothing was done
            >> to reduce it. Then the problem was made worse by the poorly worded
            >> suggestion that users upload logs than had not processed.
            >>
            >
            > !!! The server was at capacity because it didn't grow to support the
            > increased volume. It's not the user's fault as everything the user did was
            > perfectly valid. The system's are responsible for protecting themselves and
            > for monitoring/alerting and the system owners are responsible for capacity.
            >
            >
            >>
            >>> In addition, because gas stations didn't have power from the electric
            >>> company because the electric company didn't have the capacity to
            >>> being them online quickly,
            >>
            >> Silly state of New Jersey ... Florida has rules that require gas
            >> stations to have generators just for hurricane emergencies. Some
            >> other states in heavy snow/ice areas do as well. I get tired of
            >> New Jersey and new York complaining about the hurricane issues -
            >> we just deal with them here in Florida and certainly don't get
            >> the taxpayer handouts being demanded by NY/NJ.
            >>
            >> !!! Short memory Joe? In 2005 when Hurricane Wilma devastated Florida, the
            > gas stations in Florida were without power just like NJ. AFTER that
            > hurricane and another prior, in 2006 Florida enacted a law and came up with
            > the following:
            >
            > "The Florida Legislature eventually passed a
            > law<http://www.cga.ct.gov/2011/rpt/2011-R-0389.htm> (
            > http://www.cga.ct.gov/2011/rpt/2011-R-0389.htm ) directing that certain gas
            > stations along evacuation routes be equipped to switch to generator-based
            > power, but not that they actually own generators. The law also requires
            > owners of eight or more fuel pumps in any one county to have access to at
            > least one generator."
            >
            > Not along a hurricane evacuation route, then no need to support (not
            > necessarily have on hand) external power. Not that it matters to my analogy
            > but just providing the facts vs the eyes of the world thru W4TV.
            >
            >
            > 73,
            >>
            >> ... Joe, W4TV
            >>
            >>
            >> On 1/4/2013 9:48 AM, k2dsl wrote:
            >>> Here is a perfectly plausible scenario/timeline and real life
            >> correlation to what we are experiencing which is not the users fault.
            >>>
            >>> - The server was right at capacity in October. Shame on the ARRL for
            >> letting that happen.
            >>> - All the contests logs come in and there is a backlog due to server
            >> throughput as has occurred in the past. Should we have told the contest
            >> sponsors to cancel their contests because it would cause undue stress on
            >> the already overloaded LoTW platform?
            >>> - Because of an LoTW bug that existed for a long time, some operators
            >> uploaded files are (again) lost.
            >>> - Delays and lost files cause operators to reupload their
            >> unprocessed/lost logs since they aren't processed in a timely fashion and
            >> don't stare at the LoTW pages.
            >>> - ARRL fixes their lost upload files bug and suggests folks reupload
            >> their logs if not processed by a certain date.
            >>> - Regular usage of LoTW still continues.
            >>> - LoTW users continue to reupload their still unprocessed logs because
            >> they just don't see them processed in the expected amount of time they have
            >> for years and still aren't staring at the LoTW home page.
            >>>
            >>> So the smoking gun is the server was already at capacity which what the
            >> ARRL stated.
            >>>
            >>> I can relate this well to Hurricane Sandy. I lost power for exactly a
            >> week and the power company didn't have the capacity to quickly get me back
            >> on line and service their other customers (ARRL & LoTW Server). I have a
            >> generator but now many people do so many folks were looking for gas beyond
            >> normal usage (contest logs). In addition, because gas stations didn't have
            >> power from the electric company because the electric company didn't have
            >> the capacity to being them online quickly, there were long lines for gas
            >> (LoTW input queue backed up and not getting shorter). Correlating this to
            >> your statements, its the fault of the people that have generators that the
            >> gas lines were long, not the capacity of the power company to bring service
            >> back to an acceptable level. The cause of the long delays are not the fault
            >> of those with generators or the gas stations that didn't have power but the
            >> capacity of the electric company to service its customers.
            >>>
            >>> I feel the above mirrors the current situation we are experiencing with
            >> LoTW. Sure you can obsess and blame the gas stations (logging programs) and
            >> say they should have their own power so they could all be open without
            >> electricity and protect the customers from larger outages. Sure you can
            >> obsess and blame the businesses and homeowners with gasoline powered
            >> generators that they are the cause and should switch to propane or natural
            >> gas to reduce demand since it isn't fair to those that don't have
            >> generators that want to put gas in their cars (the tQSL users). But the
            >> situation is solely because the power company (ARRL) reached their capacity
            >> to service/bring back online the needs of its customers (ham operators).
            >> That's the smoking gun. Everything else is just downstream fallout from
            >> that. And when the power came back on and capacity by the utility to
            >> service its customers was restored, no one cared who had a generator and
            >> who didn't. I could run my generator that requires gas 24x7
            >> and no one would now care. It wouldn't delay my neighbor from filling his
            >> car up or running his generator 24x7.
            >>>
            >>> You can't blame the users of the utility. Because a LoTW user isn't
            >> necessarily a paying customer isn't a valid excuse (I don't pay for
            >> anything Google and there is plenty of capacity for me). Because I use more
            >> capacity and leave my lights on 24x7 isn't an excuse - the capacity is
            >> expected to be there. The utility needs to provide the capacity its user's
            >> demand requires or it is their problem and the reason the system doesn't
            >> work. If they can't afford to meet that basic need, they will be out of
            >> business or customers will go elsewhere if there's a choice.
            >>>
            >>> David - K2DSL
            >>>
            >>> --- In ARRL-LOTW@yahoogroups.com, "Joe Subich, W4TV" wrote:
            >>>>
            >>>>
            >>>>> What's the smoking gun and cause of the problem?
            >>>>
            >>>> The "smoking gun" is that total uploads with 70% duplicates exceeded
            >>>> the fixed maximum processing capacity of the server. As I said, "This
            >>>> is a classic resource limited system that has reached it's saturation
            >>>> point early due to continued reprocessing the same data." If one
            >>>> backs out the growth rate to early October, it is clear that the
            >>>> system was running just below its maximum capability. With the
            >>>> "rush" from CQWW SSB it went into saturation and that was made much
            >>>> worse by folks like you telling everyone to upload their entire log
            >>>> without regard for duplicates.
            >>>>
            >>>> Fortunately the new server/storage systems will relieve the pressure
            >>>> for a time but depending on the server capacity and the ability of
            >>>> the AA6YQ/K1MU project to remove duplicates on the front end, the
            >>>> problem *will return* unless a substantial portion of the redundant
            >>>> processing is removed from the system. As K0RC showed, those who
            >>>> upload an entire log in order to add incremental contact generate
            >>>> redundant input at an exponential rate. Unless something is done
            >>>> to prevent the wasting of system resources a very few users have the
            >>>> potential sink massive amounts of server resources for unproductive
            >>>> processing as the exponential growth in previously uploaded QSO
            >>>> records swamps the ability to add more cores and faster storage.
            >>>>
            >>>> 73,
            >>>>
            >>>> ... Joe, W4TV
            >>>>
            >>>>
            >>>> On 1/3/2013 10:34 PM, k2dsl wrote:
            >>>>> What's the smoking gun and cause of the problem? That your current
            >> analytics put the capacity of the system at 750k records per day? What was
            >> the processing capacity in Oct or Sept when there wasn't a multi-day
            >> backlog? Is the processing capacity going down over the course of
            >> weeks/months?
            >>>>>
            >>>>> What will the processing capacity be with the new disks (and server if
            >> that is also replaced)?
            >>>>>
            >>>>> No sure what you've identified as the cause beyond what the ARRL has
            >> told us which is disk I/O. If the new disk eliminates the issue I guess the
            >> smoking gun was the disks.
            >>>>>
            >>>>> David - K2DSL
            >>>
            >>
            >
          • David Levine
            ... !!! A historic event which hasn t ever occurred before. It would be telling you that for a 100 year snowstorm that hits FL that they should have been
            Message 5 of 14 , Jan 4, 2013
            • 0 Attachment
              On Fri, Jan 4, 2013 at 11:13 AM, Joe Subich, W4TV <lists@...> wrote:
               


              Your interpretation - at least in Florida the gas stations don't close
              when power is out due to hurricanes. New jersey should have learned
              from Florida's experience just like ARRL should have learned from the
              2011 delays and started to trim the redundant processing a year ago.


              !!! A historic event which hasn't ever occurred before. It would be telling you that for a "100 year snowstorm" that hits FL that they should have been prepared with thousands of plows, tons of salt, etc. An unrealistic expectation. Maybe they should all have helipads installed on every service center roof in case the highway infrastructure is somehow unavailable to receive fuel shipments? I could go on (like you) but I won't. 

              Suffice is to say that the LoTW problem has been identified by the team responsible as the platform infrastructure at this point. Doesn't mean other areas can't be optimized but that wouldn't solve the underlying issue. You can post all your "current state" metrics but since it was a problem BEFORE today and you don't have any of those metrics, today's numbers, though actual, don't identify or solve the original problem which the ARRL is addressing. 

              David - K2DSL


              Instead, what should have been a manageable issue became a crisis.
              The ignorance and indolence of a small number of users resulted in a
              disaster for 50,000+ users. Of course, that's the political response
              - make everyone pay for the indolence of the few.

              73,

              ... Joe, W4TV



              On 1/4/2013 11:00 AM, David Levine wrote:
              >>
              >>> So the smoking gun is the server was already at capacity which what
              >>> the ARRL stated.
              >>
              >> The smoking gun is *why* the server was at capacity and that was due to
              >> the redundant processing. The worst part of it is that at least some
              >> people at ARRL knew about the redundant processing but nothing was done
              >> to reduce it. Then the problem was made worse by the poorly worded
              >> suggestion that users upload logs than had not processed.
              >>
              >
              > !!! The server was at capacity because it didn't grow to support the
              > increased volume. It's not the user's fault as everything the user did was
              > perfectly valid. The system's are responsible for protecting themselves and
              > for monitoring/alerting and the system owners are responsible for capacity.
              >
              >
              >>
              >>> In addition, because gas stations didn't have power from the electric
              >>> company because the electric company didn't have the capacity to
              >>> being them online quickly,
              >>
              >> Silly state of New Jersey ... Florida has rules that require gas
              >> stations to have generators just for hurricane emergencies. Some
              >> other states in heavy snow/ice areas do as well. I get tired of
              >> New Jersey and new York complaining about the hurricane issues -
              >> we just deal with them here in Florida and certainly don't get
              >> the taxpayer handouts being demanded by NY/NJ.
              >>
              >> !!! Short memory Joe? In 2005 when Hurricane Wilma devastated Florida, the
              > gas stations in Florida were without power just like NJ. AFTER that
              > hurricane and another prior, in 2006 Florida enacted a law and came up with
              > the following:
              >
              > "The Florida Legislature eventually passed a
              > lawhttp://www.cga.ct.gov/2011/rpt/2011-R-0389.htm> (

              > http://www.cga.ct.gov/2011/rpt/2011-R-0389.htm ) directing that certain gas
              > stations along evacuation routes be equipped to switch to generator-based
              > power, but not that they actually own generators. The law also requires
              > owners of eight or more fuel pumps in any one county to have access to at
              > least one generator."
              >
              > Not along a hurricane evacuation route, then no need to support (not
              > necessarily have on hand) external power. Not that it matters to my analogy
              > but just providing the facts vs the eyes of the world thru W4TV.
              >
              >
              > 73,
              >>
              >> ... Joe, W4TV
              >>
              >>
              >> On 1/4/2013 9:48 AM, k2dsl wrote:
              >>> Here is a perfectly plausible scenario/timeline and real life
              >> correlation to what we are experiencing which is not the users fault.
              >>>
              >>> - The server was right at capacity in October. Shame on the ARRL for
              >> letting that happen.
              >>> - All the contests logs come in and there is a backlog due to server
              >> throughput as has occurred in the past. Should we have told the contest
              >> sponsors to cancel their contests because it would cause undue stress on
              >> the already overloaded LoTW platform?
              >>> - Because of an LoTW bug that existed for a long time, some operators
              >> uploaded files are (again) lost.
              >>> - Delays and lost files cause operators to reupload their
              >> unprocessed/lost logs since they aren't processed in a timely fashion and
              >> don't stare at the LoTW pages.
              >>> - ARRL fixes their lost upload files bug and suggests folks reupload
              >> their logs if not processed by a certain date.
              >>> - Regular usage of LoTW still continues.
              >>> - LoTW users continue to reupload their still unprocessed logs because
              >> they just don't see them processed in the expected amount of time they have
              >> for years and still aren't staring at the LoTW home page.
              >>>
              >>> So the smoking gun is the server was already at capacity which what the
              >> ARRL stated.
              >>>
              >>> I can relate this well to Hurricane Sandy. I lost power for exactly a
              >> week and the power company didn't have the capacity to quickly get me back
              >> on line and service their other customers (ARRL & LoTW Server). I have a
              >> generator but now many people do so many folks were looking for gas beyond
              >> normal usage (contest logs). In addition, because gas stations didn't have
              >> power from the electric company because the electric company didn't have
              >> the capacity to being them online quickly, there were long lines for gas
              >> (LoTW input queue backed up and not getting shorter). Correlating this to
              >> your statements, its the fault of the people that have generators that the
              >> gas lines were long, not the capacity of the power company to bring service
              >> back to an acceptable level. The cause of the long delays are not the fault
              >> of those with generators or the gas stations that didn't have power but the
              >> capacity of the electric company to service its customers.
              >>>
              >>> I feel the above mirrors the current situation we are experiencing with
              >> LoTW. Sure you can obsess and blame the gas stations (logging programs) and
              >> say they should have their own power so they could all be open without
              >> electricity and protect the customers from larger outages. Sure you can
              >> obsess and blame the businesses and homeowners with gasoline powered
              >> generators that they are the cause and should switch to propane or natural
              >> gas to reduce demand since it isn't fair to those that don't have
              >> generators that want to put gas in their cars (the tQSL users). But the
              >> situation is solely because the power company (ARRL) reached their capacity
              >> to service/bring back online the needs of its customers (ham operators).
              >> That's the smoking gun. Everything else is just downstream fallout from
              >> that. And when the power came back on and capacity by the utility to
              >> service its customers was restored, no one cared who had a generator and
              >> who didn't. I could run my generator that requires gas 24x7
              >> and no one would now care. It wouldn't delay my neighbor from filling his
              >> car up or running his generator 24x7.
              >>>
              >>> You can't blame the users of the utility. Because a LoTW user isn't
              >> necessarily a paying customer isn't a valid excuse (I don't pay for
              >> anything Google and there is plenty of capacity for me). Because I use more
              >> capacity and leave my lights on 24x7 isn't an excuse - the capacity is
              >> expected to be there. The utility needs to provide the capacity its user's
              >> demand requires or it is their problem and the reason the system doesn't
              >> work. If they can't afford to meet that basic need, they will be out of
              >> business or customers will go elsewhere if there's a choice.
              >>>
              >>> David - K2DSL
              >>>
              >>> --- In ARRL-LOTW@yahoogroups.com, "Joe Subich, W4TV" wrote:
              >>>>
              >>>>
              >>>>> What's the smoking gun and cause of the problem?
              >>>>
              >>>> The "smoking gun" is that total uploads with 70% duplicates exceeded
              >>>> the fixed maximum processing capacity of the server. As I said, "This
              >>>> is a classic resource limited system that has reached it's saturation
              >>>> point early due to continued reprocessing the same data." If one
              >>>> backs out the growth rate to early October, it is clear that the
              >>>> system was running just below its maximum capability. With the
              >>>> "rush" from CQWW SSB it went into saturation and that was made much
              >>>> worse by folks like you telling everyone to upload their entire log
              >>>> without regard for duplicates.
              >>>>
              >>>> Fortunately the new server/storage systems will relieve the pressure
              >>>> for a time but depending on the server capacity and the ability of
              >>>> the AA6YQ/K1MU project to remove duplicates on the front end, the
              >>>> problem *will return* unless a substantial portion of the redundant
              >>>> processing is removed from the system. As K0RC showed, those who
              >>>> upload an entire log in order to add incremental contact generate
              >>>> redundant input at an exponential rate. Unless something is done
              >>>> to prevent the wasting of system resources a very few users have the
              >>>> potential sink massive amounts of server resources for unproductive
              >>>> processing as the exponential growth in previously uploaded QSO
              >>>> records swamps the ability to add more cores and faster storage.
              >>>>
              >>>> 73,
              >>>>
              >>>> ... Joe, W4TV
              >>>>
              >>>>
              >>>> On 1/3/2013 10:34 PM, k2dsl wrote:
              >>>>> What's the smoking gun and cause of the problem? That your current
              >> analytics put the capacity of the system at 750k records per day? What was
              >> the processing capacity in Oct or Sept when there wasn't a multi-day
              >> backlog? Is the processing capacity going down over the course of
              >> weeks/months?
              >>>>>
              >>>>> What will the processing capacity be with the new disks (and server if
              >> that is also replaced)?
              >>>>>
              >>>>> No sure what you've identified as the cause beyond what the ARRL has
              >> told us which is disk I/O. If the new disk eliminates the issue I guess the
              >> smoking gun was the disks.
              >>>>>
              >>>>> David - K2DSL
              >>>
              >>
              >


            • Peter Laws
              ... Standard warning: don t believe everything a vendor says. That said, HP makes a solid product. The Proliant servers are of Compaq heritage and they are
              Message 6 of 14 , Jan 4, 2013
              • 0 Attachment
                On Thu, Jan 3, 2013 at 9:13 PM, Robert Chudek - K0RC <k0rc@...> wrote:
                >
                >
                >
                > I was under the 'assumption' that the original SAN storage was going to be replaced with SSD technology. Later it was 'leaked' that in addition to this change, the server hosting the LoTW application was also going to be replaced as well. The HP PrioLiant Gen8 name was inserted as the replacement

                Standard warning: don't believe everything a vendor says.

                That said, HP makes a solid product. The Proliant servers are of
                Compaq heritage and they are done right. When I worked at a major
                financial services company, our group did a bake-off of all the
                players in the Intel server space (IBM, HP, Sun, Dell, I forget who
                else at the time) and HP was the clear winner. After that, the HPs
                (running RHAS 2.x at that time) came in as fast as we could take the
                Solaris/UltraSPARC stuff out (which didn't make me happy being a Sun
                guy!). While I'm skeptical of that 100X claim, HP Proliants are solid
                boxes.

                If it's true that "p1k" is a general purpose server at HQ that happens
                to have LOTW on it (nmapping it makes me think that's true!) , then
                the act of putting LOTW on its own system alone will pay huge
                dividends regardless of any marketing hooey.

                --
                Peter Laws | N5UWY | plaws plaws net | Travel by Train!
              • KD8NNU
                I totally disagree with Joe on the smoking gun. The smoking gun is no preprocessing of the uploaded logs. This is the problem so STOP BLAMING THE
                Message 7 of 14 , Jan 4, 2013
                • 0 Attachment
                  I totally disagree with Joe on the smoking gun.
                   
                  The smoking gun is no preprocessing of the uploaded logs.   This is the problem so STOP BLAMING THE USERS!!!!!!!!
                   
                  This ongoing discussion on how all the users are just STUPID is totally ludicrous.
                   
                  Don
                  KD8NNU
                  -.- -.. ---.. -. -. ..-
                   
                  Sent: Friday, January 04, 2013 12:36 AM
                  Subject: Re: [ARRL-LOTW] Re: Some LotW Statistics ...
                   
                   


                  > What's the smoking gun and cause of the problem?

                  The "smoking gun" is that total uploads with 70% duplicates exceeded
                  the fixed maximum processing capacity of the server. As I said, "This
                  is a classic resource limited system that has reached it's saturation
                  point early due to continued reprocessing the same data." If one
                  backs out the growth rate to early October, it is clear that the
                  system was running just below its maximum capability. With the
                  "rush" from CQWW SSB it went into saturation and that was made much
                  worse by folks like you telling everyone to upload their entire log
                  without regard for duplicates.

                  Fortunately the new server/storage systems will relieve the pressure
                  for a time but depending on the server capacity and the ability of
                  the AA6YQ/K1MU project to remove duplicates on the front end, the
                  problem *will return* unless a substantial portion of the redundant
                  processing is removed from the system. As K0RC showed, those who
                  upload an entire log in order to add incremental contact generate
                  redundant input at an exponential rate. Unless something is done
                  to prevent the wasting of system resources a very few users have the
                  potential sink massive amounts of server resources for unproductive
                  processing as the exponential growth in previously uploaded QSO
                  records swamps the ability to add more cores and faster storage.

                  73,

                  ... Joe, W4TV

                  On 1/3/2013 10:34 PM, k2dsl wrote:
                  > What's the smoking gun and cause of the problem? That your
                  current analytics put the capacity of the system at 750k records per day? What was the processing capacity in Oct or Sept when there wasn't a multi-day backlog? Is the processing capacity going down over the course of weeks/months?
                  >
                  > What will the processing capacity be with the new
                  disks (and server if that is also replaced)?
                  >
                  > No sure what you've
                  identified as the cause beyond what the ARRL has told us which is disk I/O. If the new disk eliminates the issue I guess the smoking gun was the disks.
                  >
                  > David - K2DSL
                  >
                  > --- In
                  href="mailto:ARRL-LOTW%40yahoogroups.com">mailto:ARRL-LOTW%40yahoogroups.com, "Joe Subich, W4TV" wrote:
                  >>
                  >>
                  >> Based on
                  historical mileposts, data collected from the LotW homepage
                  >> and the
                  LotW queue status report, here are some interesting statistics:
                  >>
                  >> Average logs (queue) per day: 2,532
                  (processed or dupes)
                  >> Average user files (home page) per day:
                  2,309
                  >> Difference (identical logs): 221
                  (8.4%)
                  >>
                  >> Average QSOs (queue) per day: 820,422
                  >>
                  QSOs processed per day: 751,506
                  >> New QSO records per day:
                  177,912
                  >> duplicate or updated QSOs: 573,594 (Processed -
                  New)
                  >> Percentage updated of duplicate 76.3%
                  >>
                  >>
                  Average daily increase in backlog: 42,216 QSOs (5.1%)
                  >> Average daily
                  number of QSOs uploaded: 862,638 (removed + increase)
                  >>
                  >>
                  Compounded annual growth (QSO Records): 23.65%
                  >> Time to double ~ 3.3
                  years
                  >>
                  >> Looking at the historical trends, it appears that
                  duplicates/updates
                  >> have remained a fairly consistent part of the
                  input stream - perhaps
                  >> with a modest increase after some mistakenly
                  interpreted the call to
                  >> upload missing QSO data to mean upload all
                  old logs. The long term
                  >> growth rate and number of QSO records in the
                  system have tracked
                  >> quite closely back to the 200 million QSO level
                  in January 2009. Even
                  >> to the short term processing delays in 2011
                  can be attributed to
                  >> seasonal volume peaks when the system was
                  already operating at more
                  >> than 80% of
                  saturation.
                  >>
                  >> The onset of the backlog correlates *very
                  closely* to the time at which
                  >> long-term growth pushed the average
                  number of QSOs uploaded per day
                  >> above the roughly 3/4 million QSO
                  per day processing capability of
                  >> the system. That point would have
                  occurred suddenly - even without the
                  >> unnecessary and ill-advised
                  full-log uploads - due to the natural
                  >> up-tick in volume associated
                  with CQWW SSB, Sweepstakes, CQWW CW and
                  >> end of the year DXPedition
                  uploads. The full log uploads only made
                  >> what would have been a bad
                  situation into a disaster. The full-log
                  >> uploads (representing what
                  appears to be an extra 15% in volume) have
                  >> turned what would have
                  been an extended three to four day backlog into
                  >> one with peak delays
                  of over 12 days (so far). With the queue again
                  >> growing at 7% or more
                  per day, processing delays will soon approach
                  >> the 12 day mark absent
                  changes.
                  >>
                  >> Make of it what you will ... I consider this to
                  be the "smoking gun"
                  >> - removing all doubt as to the cause of the
                  current problems. This
                  >> is a classic resource limited system that has
                  reached it's saturation
                  >> point early due to continued reprocessing
                  the same data. "Draconian"
                  >> or not, removing a large portion of the
                  redundant processing would
                  >> go a long way to restoring stable
                  operation - and will prevent early
                  >> exhaustion of the additional
                  capacity due to be brought on line soon.
                  >>
                  >>
                  73,
                  >>
                  >> ... Joe,
                  W4TV
                  >>
                  >
                  >
                  >
                  >
                  >
                  ------------------------------------
                  >
                  > Yahoo! Groups
                  Links
                  >
                  >
                  >
                  >

                • Peter Laws
                  ... Without actual data on system performance over time, there is no smoking gun no matter how forcefully stated. You re right, though, that this isn t a user
                  Message 8 of 14 , Jan 4, 2013
                  • 0 Attachment
                    On Fri, Jan 4, 2013 at 5:45 PM, KD8NNU <goldtr8@...> wrote:
                    >
                    >
                    >
                    > I totally disagree with Joe on the smoking gun.

                    Without actual data on system performance over time, there is no
                    smoking gun no matter how forcefully stated. You're right, though,
                    that this isn't a user issue.


                    --
                    Peter Laws | N5UWY | plaws plaws net | Travel by Train!
                  • Joe Subich, W4TV
                    But we have long term data ... we know that LotW reached 200 million QSO Records in January 2009. It s not hard to determine the growth rate and extremely
                    Message 9 of 14 , Jan 4, 2013
                    • 0 Attachment
                      But we have long term data ... we know that LotW reached 200 million
                      QSO Records in January 2009. It's not hard to determine the growth
                      rate and extremely easy to see how much of the input is identical
                      files and reprocessed material.

                      Yes, LotW has a design issue that allows reprocessing to consume so
                      much of its bandwidth. That needs to be resolved - new capacity or
                      not. Part of the issue is getting software authors to make it hard
                      to generate the redundant files and I intend to start engaging the
                      more prominent developers on that matter.

                      73,

                      ... Joe, W4TV


                      On 1/4/2013 6:48 PM, Peter Laws wrote:
                      > On Fri, Jan 4, 2013 at 5:45 PM, KD8NNU <goldtr8@...> wrote:
                      >>
                      >>
                      >>
                      >> I totally disagree with Joe on the smoking gun.
                      >
                      > Without actual data on system performance over time, there is no
                      > smoking gun no matter how forcefully stated. You're right, though,
                      > that this isn't a user issue.
                      >
                      >
                      > --
                      > Peter Laws | N5UWY | plaws plaws net | Travel by Train!
                      >
                      >
                      > ------------------------------------
                      >
                      > Yahoo! Groups Links
                      >
                      >
                      >
                      >
                    Your message has been successfully submitted and would be delivered to recipients shortly.