Sorry, an error occurred while loading the content.

Re: [ARRL-LOTW] Duplicates

Expand Messages
• Joe, Give it a rest people are tired of it....
Message 1 of 147 , Feb 20, 2013
Joe,
Give it a rest
people are tired of it....

Joe Subich, W4TV wrote:

> The numbers are all over the place and what one would call
> statistically meaningless. Averaging statistically meaningless data
> over a long period of time still results in statistically meaningless
> data. It's only when there is a large log that you even have a
> backlog and anything being caught to report on.

There is where you are wrong and show a lack of understanding of
statistics. As long as the sample is sufficiently large - and samples
representing more than 10% of any population is statistically *very*
large - the variation/randomness in the sample is of no real concern
unless you know that the "population" (the entire collection being
sampled) has some characteristic that makes it impossible to get a
representative sample.

When you are dealing with something as simple as number of QSOs in
the average log, there is nothing that prevents a representative
sample, particularly when doing regular, periodic samples.

>
> Looking just at the last 5 hourly reports when I was replying to
> thisI see the following:
>
> QSOs / Logs = Avg QSOs based on a single snapshot
> =====================
> 39 / 5 = 8
> 38247 / 6 = 6375
> 20499 / 3 = 6833
> 1975 / 4 = 494
> 18422 / 134 = 137

The only thing this shows is that with only five samples you don't have
a large enough sample to be significant. If you take hourly samples
over a longer period - a week perhaps - that variation is substantially
less and the sample becomes a much more accurate reflection of the
population as a whole. This is the essence of statistical sampling.

> Let's look at one of the previous queue snapshots where there are
> 38427 Qs in the queue for 6 logs. The log which is probably
> representing 38200 of the 38247 was uploaded only 39 secs before the
> hour snapshot was taken. There could have been 500 logs processed in
> the previous 59 mins 21 secs with an avg of 10 QSOs per log but your
> logic is going to calculate the average log being 6375 for that hour?

That is the essence of sampling ... if indeed there was one log with
38,200 of the 38,247 QSOs uploaded one minute before the status report
it doesn't matter. There are dozens of reports with 1 log/2 QSOs, or
3 logs/4 QSOs, even 0 logs/0 QSOs .... those could be cases where one
log with 50,000 QSOs was uploaded one minute *after* the report and
was processed completely before the next report. The key to sampling
is that for every "extreme" value on one end of the scale that is
missed, similar "extreme" values on the other end of the scale are also
missed. The point is that with a sufficiently large sample - typically
1 to 2 % - the accuracy of the estimates of the characteristics of the
population as a whole based on the characteristics of the sample are
very high (or high enough to draw valid conclusions).

In this case, totaling the number of logs in 165 hourly "snap shots"
during one week results in a sample that contains more than 10% of
the logs processed in that week. 10% is a *huge* sample generally
large enough to be within less than 1% of the composition of the
entire "population" being sampled.

> In my original post to this thread where you claimed % increases, the
> only valid ones based on fact are % increase of new QSOs per day/week
> and % increase of uploaded logs per day/week with the rest being
> meaningless.

That ignores sampling - the science of which has been settled for a
long time. Yes, increases in the number of new QSOs per day/week and
the number of logs processed per day/week are easy to "prove" because
LotW provides "full population" data - a count of each. However, that
does not invalidate estimates made based on statistical sampling when
the sample size is sufficient to support the inferences drawn.

> I expect you to call me names, so fire away.

I don't have to - you've shown your stripe in the first paragraph.

73,

... Joe, W4TV

On 2/20/2013 10:23 PM, David Levine wrote:
> For anyone that isn't aware, W4TV who calls other names when he doesn't
> agree with them, is the owner of MicroHam. In case you want to avoid
> dealing with an individual that might treat you this same way if you have
> an issue with a purchase, you might want to avoid dealing with those
> products.
>
> Now, you are saying your statistics are based on what is reported on
> http://www.arrl.org/logbook-queue-status each hour to calculate average #
> of QSOs per upload? That is completely flawed and after the below, I'm done
> trying to rationalize that 1 + 1 = 74 because it someone helps your cause.
>
> Looking just at the last 5 hourly reports when I was replying to this I see
> the following:
>
> QSOs / Logs = Avg QSOs based on a single snapshot
> =====================
> 39 / 5 = 8
> 38247 / 6 = 6375
> 20499 / 3 = 6833
> 1975 / 4 = 494
> 18422 / 134 = 137
>
> The numbers are all over the place and what one would call statistically
> meaningless. Averaging statistically meaningless data over a long period of
> time still results in statistically meaningless data. It's only when there
> is a large log that you even have a backlog and anything being caught to
> report on. Furthermore, the actual very low "time in the queue" reflects
> that the average actual size of the logs are probably MUCH smaller then
> these would even represent for the large #s.
>
> Let's look at one of the previous queue snapshots where there are 38427 Qs
> in the queue for 6 logs. The log which is probably representing 38200 of
> the 38247 was uploaded only 39 secs before the hour snapshot was taken.
> There could have been 500 logs processed in the previous 59 mins 21 secs
> with an avg of 10 QSOs per log but your logic is going to calculate the
> average log being 6375 for that hour? In fact, the 5 QSOs uploaded after
> the large log were probably another group with 10 Q's per upload. Your
> logic, if I'm understanding what you are calculating is unacceptable and
> would invalidate any results you would try to use this info on.
>
> In my original post to this thread where you claimed % increases, the only
> valid ones based on fact are % increase of new QSOs per day/week and %
> increase of uploaded logs per day/week with the rest being meaningless.
>
> I expect you to call me names, so fire away.
>
> K2DSL - David
>
>
>
> On Wed, Feb 20, 2013 at 8:15 PM, Joe Subich, W4TV lists@...> wrote:
>
>> **
>>
>>
>>
>>> What are the column headings for the 5 data elements in the table?
>>> How are you determining each?
>>
>> The column headings are:
>> Week Ending
>> New QSOs
>> Logs Processed
>> Average QSOs/Log
>> % Reprocessed
>>
>>
>>> +++ The only numbers i can see obvious to scrape from the pages each
>>> hour are a snapshot of # QSOs in the system with the diff from the
>>> prev hour being # of new QSOs over the past hour. The same LoTW page
>>> shows # of files uploaded which like the above can tell how many
>>> uploaded files in the past hour. Where do any other numbers you use
>>> in calculations come from? The status page doesnt show any data that
>>> can provide info on the past hour unless the queue is backed up over
>>> 1 hour.
>>
>> You already know how the "New QSOs" and "Logs Processed" are determined
>> - you give the procedure yourself. The are the difference between the
>> LotW Status numbers at the end of one week vs. the end of the previous
>> week.
>>
>> Average QSOs/Log is simply the average of the number of QSOs given in
>> each LotW Queue Status report (165 per week) divided by the number of
>> logs in the same report. Alternately, the calculation can be made by
>> dividing the sum of QSOs reported at each Status report by the total
>> number of logs in the reports. The latter calculation is better at
>> preventing a small number of large logs from skewing the average.
>>
>> 165 samples a week generally results in a sample size of more than 2000
>> logs which is far more than needed to provide high confidence when the
>> total number of logs processed per week varies between 20 and 35,000
>> as is currently the case.
>>
>> "% Reprocessed" is simply [(number of logs * QSOs/log) - New QSOs]
>> divided by (number of logs * QSOs/Log) or (total QSOs processed in
>> the week - number of new QSOs) / total QSOs processed.
>>
>>
>>> Should be a simple response.
>>
>> Yes, it is a simply process - one that is used in manufacturing quality
>> control routinely.
>>
>> 73,
>>
>> ... Joe, W4TV
>>
>>
>

• I run ACLog... When you hit the ALL SINCE button, change the date to be something about a week prior to the LoTW failure... I believe , ACLog got very
Message 147 of 147 , Aug 24, 2014
I run ACLog... When you hit the "ALL SINCE" button, change the date to
be something about a week prior to the LoTW failure...

I "believe", ACLog got very confused as a result of the fail mode of
LoTW. That corrected a very similar problem for me.
--
Thanks and 73's,
For equipment, and software setups and reviews see:
www.nk7z.net
for MixW support see;
http://groups.yahoo.com/neo/groups/mixw/info
for Dopplergram information see:
http://groups.yahoo.com/neo/groups/dopplergram/info
for MM-SSTV see:
http://groups.yahoo.com/neo/groups/MM-SSTV/info

On Sun, 2014-08-24 at 09:05 -0700, reillyjf@... [ARRL-LOTW]
wrote:
>
>
> Thanks for the suggestion. I did a complete download, and beat the
> number of duplicates down from 275 to 30. No exactly sure why the
> N3FJP ACL is missing this information.
> - 73, John, N0TA
>
>
>
Your message has been successfully submitted and would be delivered to recipients shortly.