Re: [ARRL-LOTW] Duplicates

Message 1 of 147 , Feb 20, 2013
Joe,
Give it a rest
people are tired of it....

Joe Subich, W4TV wrote:

> The numbers are all over the place and what one would call
> statistically meaningless. Averaging statistically meaningless data
> over a long period of time still results in statistically meaningless
> data. It's only when there is a large log that you even have a
> backlog and anything being caught to report on.

There is where you are wrong and show a lack of understanding of
statistics. As long as the sample is sufficiently large - and samples
representing more than 10% of any population is statistically *very*
large - the variation/randomness in the sample is of no real concern
unless you know that the "population" (the entire collection being
sampled) has some characteristic that makes it impossible to get a
representative sample.

When you are dealing with something as simple as number of QSOs in
the average log, there is nothing that prevents a representative
sample, particularly when doing regular, periodic samples.

>
> Looking just at the last 5 hourly reports when I was replying to
> thisI see the following:
>
> QSOs / Logs = Avg QSOs based on a single snapshot
> =====================
> 39 / 5 = 8
> 38247 / 6 = 6375
> 20499 / 3 = 6833
> 1975 / 4 = 494
> 18422 / 134 = 137

The only thing this shows is that with only five samples you don't have
a large enough sample to be significant. If you take hourly samples
over a longer period - a week perhaps - that variation is substantially
less and the sample becomes a much more accurate reflection of the
population as a whole. This is the essence of statistical sampling.

> Let's look at one of the previous queue snapshots where there are
> 38427 Qs in the queue for 6 logs. The log which is probably
> representing 38200 of the 38247 was uploaded only 39 secs before the
> hour snapshot was taken. There could have been 500 logs processed in
> the previous 59 mins 21 secs with an avg of 10 QSOs per log but your
> logic is going to calculate the average log being 6375 for that hour?

That is the essence of sampling ... if indeed there was one log with
38,200 of the 38,247 QSOs uploaded one minute before the status report
it doesn't matter. There are dozens of reports with 1 log/2 QSOs, or
3 logs/4 QSOs, even 0 logs/0 QSOs .... those could be cases where one
log with 50,000 QSOs was uploaded one minute *after* the report and
was processed completely before the next report. The key to sampling
is that for every "extreme" value on one end of the scale that is
missed, similar "extreme" values on the other end of the scale are also
missed. The point is that with a sufficiently large sample - typically
1 to 2 % - the accuracy of the estimates of the characteristics of the
population as a whole based on the characteristics of the sample are
very high (or high enough to draw valid conclusions).

In this case, totaling the number of logs in 165 hourly "snap shots"
during one week results in a sample that contains more than 10% of
the logs processed in that week. 10% is a *huge* sample generally
large enough to be within less than 1% of the composition of the
entire "population" being sampled.

> In my original post to this thread where you claimed % increases, the
> only valid ones based on fact are % increase of new QSOs per day/week
> and % increase of uploaded logs per day/week with the rest being
> meaningless.

That ignores sampling - the science of which has been settled for a
long time. Yes, increases in the number of new QSOs per day/week and
the number of logs processed per day/week are easy to "prove" because
LotW provides "full population" data - a count of each. However, that
does not invalidate estimates made based on statistical sampling when
the sample size is sufficient to support the inferences drawn.

> I expect you to call me names, so fire away.

I don't have to - you've shown your stripe in the first paragraph.

73,

... Joe, W4TV

• I run ACLog... When you hit the ALL SINCE button, change the date to be something about a week prior to the LoTW failure... I believe , ACLog got very
Message 147 of 147 , Aug 24, 2014
I run ACLog... When you hit the "ALL SINCE" button, change the date to
be something about a week prior to the LoTW failure...

I "believe", ACLog got very confused as a result of the fail mode of
LoTW. That corrected a very similar problem for me.
--
Thanks and 73's,
For equipment, and software setups and reviews see:
www.nk7z.net
for MixW support see;
http://groups.yahoo.com/neo/groups/mixw/info
for Dopplergram information see:
http://groups.yahoo.com/neo/groups/dopplergram/info
for MM-SSTV see:
http://groups.yahoo.com/neo/groups/MM-SSTV/info

On Sun, 2014-08-24 at 09:05 -0700, reillyjf@... [ARRL-LOTW]
wrote:
>
>
> Thanks for the suggestion. I did a complete download, and beat the
> number of duplicates down from 275 to 30. No exactly sure why the
> N3FJP ACL is missing this information.
> - 73, John, N0TA
>
>
>
