Loading ...
Sorry, an error occurred while loading the content.

Re: [ARRL-LOTW] Duplicates

Expand Messages
  • KA9JAC
    Joe, Give it a rest people are tired of it....
    Message 1 of 147 , Feb 20, 2013
    • 0 Attachment
      Joe,
      Give it a rest
      people are tired of it....

      Joe Subich, W4TV wrote:
       


      > The numbers are all over the place and what one would call
      > statistically meaningless. Averaging statistically meaningless data
      > over a long period of time still results in statistically meaningless
      > data. It's only when there is a large log that you even have a
      > backlog and anything being caught to report on.

      There is where you are wrong and show a lack of understanding of
      statistics. As long as the sample is sufficiently large - and samples
      representing more than 10% of any population is statistically *very*
      large - the variation/randomness in the sample is of no real concern
      unless you know that the "population" (the entire collection being
      sampled) has some characteristic that makes it impossible to get a
      representative sample.

      When you are dealing with something as simple as number of QSOs in
      the average log, there is nothing that prevents a representative
      sample, particularly when doing regular, periodic samples.

      >
      > Looking just at the last 5 hourly reports when I was replying to
      > thisI see the following:
      >
      > QSOs / Logs = Avg QSOs based on a single snapshot
      > =====================
      > 39 / 5 = 8
      > 38247 / 6 = 6375
      > 20499 / 3 = 6833
      > 1975 / 4 = 494
      > 18422 / 134 = 137

      The only thing this shows is that with only five samples you don't have
      a large enough sample to be significant. If you take hourly samples
      over a longer period - a week perhaps - that variation is substantially
      less and the sample becomes a much more accurate reflection of the
      population as a whole. This is the essence of statistical sampling.

      > Let's look at one of the previous queue snapshots where there are
      > 38427 Qs in the queue for 6 logs. The log which is probably
      > representing 38200 of the 38247 was uploaded only 39 secs before the
      > hour snapshot was taken. There could have been 500 logs processed in
      > the previous 59 mins 21 secs with an avg of 10 QSOs per log but your
      > logic is going to calculate the average log being 6375 for that hour?

      That is the essence of sampling ... if indeed there was one log with
      38,200 of the 38,247 QSOs uploaded one minute before the status report
      it doesn't matter. There are dozens of reports with 1 log/2 QSOs, or
      3 logs/4 QSOs, even 0 logs/0 QSOs .... those could be cases where one
      log with 50,000 QSOs was uploaded one minute *after* the report and
      was processed completely before the next report. The key to sampling
      is that for every "extreme" value on one end of the scale that is
      missed, similar "extreme" values on the other end of the scale are also
      missed. The point is that with a sufficiently large sample - typically
      1 to 2 % - the accuracy of the estimates of the characteristics of the
      population as a whole based on the characteristics of the sample are
      very high (or high enough to draw valid conclusions).

      In this case, totaling the number of logs in 165 hourly "snap shots"
      during one week results in a sample that contains more than 10% of
      the logs processed in that week. 10% is a *huge* sample generally
      large enough to be within less than 1% of the composition of the
      entire "population" being sampled.

      > In my original post to this thread where you claimed % increases, the
      > only valid ones based on fact are % increase of new QSOs per day/week
      > and % increase of uploaded logs per day/week with the rest being
      > meaningless.

      That ignores sampling - the science of which has been settled for a
      long time. Yes, increases in the number of new QSOs per day/week and
      the number of logs processed per day/week are easy to "prove" because
      LotW provides "full population" data - a count of each. However, that
      does not invalidate estimates made based on statistical sampling when
      the sample size is sufficient to support the inferences drawn.

      > I expect you to call me names, so fire away.

      I don't have to - you've shown your stripe in the first paragraph.

      73,

      ... Joe, W4TV

      On 2/20/2013 10:23 PM, David Levine wrote:
      > For anyone that isn't aware, W4TV who calls other names when he doesn't
      > agree with them, is the owner of MicroHam. In case you want to avoid
      > dealing with an individual that might treat you this same way if you have
      > an issue with a purchase, you might want to avoid dealing with those
      > products.
      >
      > Now, you are saying your statistics are based on what is reported on
      > http://www.arrl.org/logbook-queue-status each hour to calculate average #
      > of QSOs per upload? That is completely flawed and after the below, I'm done
      > trying to rationalize that 1 + 1 = 74 because it someone helps your cause.
      >
      > Looking just at the last 5 hourly reports when I was replying to this I see
      > the following:
      >
      > QSOs / Logs = Avg QSOs based on a single snapshot
      > =====================
      > 39 / 5 = 8
      > 38247 / 6 = 6375
      > 20499 / 3 = 6833
      > 1975 / 4 = 494
      > 18422 / 134 = 137
      >
      > The numbers are all over the place and what one would call statistically
      > meaningless. Averaging statistically meaningless data over a long period of
      > time still results in statistically meaningless data. It's only when there
      > is a large log that you even have a backlog and anything being caught to
      > report on. Furthermore, the actual very low "time in the queue" reflects
      > that the average actual size of the logs are probably MUCH smaller then
      > these would even represent for the large #s.
      >
      > Let's look at one of the previous queue snapshots where there are 38427 Qs
      > in the queue for 6 logs. The log which is probably representing 38200 of
      > the 38247 was uploaded only 39 secs before the hour snapshot was taken.
      > There could have been 500 logs processed in the previous 59 mins 21 secs
      > with an avg of 10 QSOs per log but your logic is going to calculate the
      > average log being 6375 for that hour? In fact, the 5 QSOs uploaded after
      > the large log were probably another group with 10 Q's per upload. Your
      > logic, if I'm understanding what you are calculating is unacceptable and
      > would invalidate any results you would try to use this info on.
      >
      > In my original post to this thread where you claimed % increases, the only
      > valid ones based on fact are % increase of new QSOs per day/week and %
      > increase of uploaded logs per day/week with the rest being meaningless.
      >
      > I expect you to call me names, so fire away.
      >
      > K2DSL - David
      >
      >
      >
      > On Wed, Feb 20, 2013 at 8:15 PM, Joe Subich, W4TV lists@...> wrote:
      >
      >> **
      >>
      >>
      >>
      >>> What are the column headings for the 5 data elements in the table?
      >>> How are you determining each?
      >>
      >> The column headings are:
      >> Week Ending
      >> New QSOs
      >> Logs Processed
      >> Average QSOs/Log
      >> % Reprocessed
      >>
      >>
      >>> +++ The only numbers i can see obvious to scrape from the pages each
      >>> hour are a snapshot of # QSOs in the system with the diff from the
      >>> prev hour being # of new QSOs over the past hour. The same LoTW page
      >>> shows # of files uploaded which like the above can tell how many
      >>> uploaded files in the past hour. Where do any other numbers you use
      >>> in calculations come from? The status page doesnt show any data that
      >>> can provide info on the past hour unless the queue is backed up over
      >>> 1 hour.
      >>
      >> You already know how the "New QSOs" and "Logs Processed" are determined
      >> - you give the procedure yourself. The are the difference between the
      >> LotW Status numbers at the end of one week vs. the end of the previous
      >> week.
      >>
      >> Average QSOs/Log is simply the average of the number of QSOs given in
      >> each LotW Queue Status report (165 per week) divided by the number of
      >> logs in the same report. Alternately, the calculation can be made by
      >> dividing the sum of QSOs reported at each Status report by the total
      >> number of logs in the reports. The latter calculation is better at
      >> preventing a small number of large logs from skewing the average.
      >>
      >> 165 samples a week generally results in a sample size of more than 2000
      >> logs which is far more than needed to provide high confidence when the
      >> total number of logs processed per week varies between 20 and 35,000
      >> as is currently the case.
      >>
      >> "% Reprocessed" is simply [(number of logs * QSOs/log) - New QSOs]
      >> divided by (number of logs * QSOs/Log) or (total QSOs processed in
      >> the week - number of new QSOs) / total QSOs processed.
      >>
      >>
      >>> Should be a simple response.
      >>
      >> Yes, it is a simply process - one that is used in manufacturing quality
      >> control routinely.
      >>
      >> 73,
      >>
      >> ... Joe, W4TV
      >>
      >>
      >


    • David Cole
      I run ACLog... When you hit the ALL SINCE button, change the date to be something about a week prior to the LoTW failure... I believe , ACLog got very
      Message 147 of 147 , Aug 24, 2014
      • 0 Attachment
        I run ACLog... When you hit the "ALL SINCE" button, change the date to
        be something about a week prior to the LoTW failure...

        I "believe", ACLog got very confused as a result of the fail mode of
        LoTW. That corrected a very similar problem for me.
        --
        Thanks and 73's,
        For equipment, and software setups and reviews see:
        www.nk7z.net
        for MixW support see;
        http://groups.yahoo.com/neo/groups/mixw/info
        for Dopplergram information see:
        http://groups.yahoo.com/neo/groups/dopplergram/info
        for MM-SSTV see:
        http://groups.yahoo.com/neo/groups/MM-SSTV/info


        On Sun, 2014-08-24 at 09:05 -0700, reillyjf@... [ARRL-LOTW]
        wrote:
        >
        >
        > Thanks for the suggestion. I did a complete download, and beat the
        > number of duplicates down from 275 to 30. No exactly sure why the
        > N3FJP ACL is missing this information.
        > - 73, John, N0TA
        >
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.