> K2DSL: You need a data point you don't have to solve for either total

I'm not using P to solve for T .... T is determined *independently* by

> # of QSOs or previously processed QSOs.

> Formula: Total # QSOs uploaded - New QSOs Uploaded = Previously

> Uploaded QSOs T - N = P abbreviating the above.

> Data isn't available for T or P so you have 2 unknowns. What you

> have presented is your personal determination of T based based on

> assumptions extrapolated from the data snapshots. That's what I noted

> in my original comment above - there are 2 unknowns.

sampling the entire population and P is calculated from that. There is

only one unknown - T (total QSOs) since P is T-N and we know N from the

status data at https://p1k.arrl.org/lotwuser/default.

> Your start date is 12/3/12 but in looking at the data, 1/18 is when

The backlog reliably went under one day at 17:59 12 January, below four

> the system seems to have reliably caught up where the queue reached

> 0 and didn't have any significant backlogs re-occur. That's a

> sampling of just 36 days.

hours at 04:59 on 13 January and less than one hour at 09:59 on 13 Jan.

For various reasons - including large log uploads, large numbers of

uploads, and system outages delays have ranged from zero to 3 hours,

13 minutes since that time but for our purposes LotW can be considered

to be "operating normally" (or have "caught up") by 9:59 on 1/13.

However, whether "normal" operation is defined as starting 1/13 or 1/18

is immaterial.

> I also removed any hourly data points where there was no data

We are *not concerned* with the number of sample points - the purpose

> reported leaving data where there was at least 1 log in the queue.

> This reduced the number of hourly data points to 738. I didn't look

> to see what Joe previously stated the sample size was but that's

> what I consider the sample size of the data for analysis purposes.

is not to determine the number of logs being processed as we already

know that *exactly* from https://p1k.arrl.org/lotwuser/default. Our

purpose is to determine the average number of QSOs in the log so we

can determine the number of QSOs processed: T = Number of logs (N)

times the average number of QSOs in each log (Q).

In the period I used - 23:59z 13 Jan, 2013 through 23:59z 22 Feb 2013 -

there are 30,318 logs in the sample (excluding samples that are not

valid because of duplication when the backlog is longer than one hour)

from a total population of 194,874 logs ("user files") processed in

that time period. That means the sample is slightly more than 15% of

all logs processed (which is a huge level of "over sampling").

> The median helps to minimize skewing from the more extreme outliers

Median, mean and standard deviation are for a normal distribution.

> in the data such as hourly snapshots with very small # logs & QSOs

> and with very large # of logs & QSOs. The median of 101 QSOs per

> snapshot is approximately the middle value where 1/2 of the 738

> snapshots are less than 101 QSOs and 1/2 the 738 snapshots are

> greater than 101 QSOs.

This population is far from "normal" - it is a Poisson process. A

Poisson process is one in which events happen discretely and are

independent - the number of customers arriving at a bank in an hour,

the number of trees in an acre of forest, the number of pieces of

litter along one mile of highway, etc. The number of QSOs in a log

upload is also a Poisson process.

In Poisson statistics we deal with only a Mean and variance (which

are identical). The entire goal here is to have enough samples (k)

so that the probability that the mean we calculate is within the

error value we are willing to accept of the true mean of the whole

population. By over sampling we assure that the calculated value

is "close enough" - thanks to the properties of limits (the error

in a sample goes to zero as the size of the sample reaches the

whole population).

> Averaging comes out to be 328 QSOs per log (10801/33). The median

As I've shown above, the number of "snapshots" means nothing - it is

> comes out > to be 25 QSOs per log (101/4). You can see that assuming

> the average number is representative of all logs shows a vastly

> different picture from what the median shows across 738 snapshots.

the number of logs in the sample that is important - the greater the

number of logs, the more accurate will be the estimation of the mean.

In any case, we know absolutely that your "normal" median can not be

anywhere close to the actual median as for the last five weeks the

average (mean) number of *new* QSOs per log (New QSOs divided by User

Files from https://p1k.arrl.org/lotwuser/default) was 63, 68, 65, 82

and 70. Averaged across the entire period the average number of *new*

QSOs per log is 70. The fact that "new" QSOs alone are nearly three

times greater than your median would argue that the true mean is much

closer to what you give as the mean. The "maximum likelihood" of the

Poisson distribution happens to be the simple mean but that is also

a *minimum value unbiased estimator.* This is more statistical theory

than we need to go into here - but it simply says the mean of any

Poisson distribution will be no lower than the simple mean of the

samples.

Again, the only issue becomes whether the number of independent samples

- in this case the *sum of the logs* in the hourly reports - is large

enough that the probability their mean will be within N% of the mean

of the population. With a sample size that exceeds 15% - the answer is

an unequivocal *yes* for the entire six week period.

I have not calculated the sample sizes for the individual weeks but I

have no reason to doubt that they will also be more than sufficient

as well.

It is unfortunate that ARRL have not seen fit to release the "input"

data the way they release the "new QSOs" and "user files" numbers as

there would be no question concerning the level of wasted processing

but even your mean (maximum likelihood) puts the level of previously

processed QSOs at more than 75% for the five week period.

73,

... Joe, W4TV

