Re: [ARRL-LOTW] Re: Duplicates
- DavidI think this article may pretty much describes Joe. It's not the actual percentage of dupes that Joe is defending but the idea he may be wrong. He may have Atychiphobia. Everyone should read the article and make their own decision about Joe.Atychiphobia (from the Greek phóbos, meaning "fear" or "morbid fear" and atyches meaning "unfortunate") is the abnormal, unwarranted, and persistent fear of failure or being wrong.KeithFrom: David Levine <davidnj@...>
Sent: Sunday, February 24, 2013 5:49 AM
Subject: Re: [ARRL-LOTW] Re: DuplicatesI'm sorry but I must respond one more time and I think even others that would normally ignore this should read through as you might get a kick out of it.Joe, you are like a college professor spewing theory without applying an ounce of real world knowledge to the situation. You know the kind of professor I'm talking about. He's the one that once you had a job and thought back to what the old coot said, you just shook your head and wondered why you needed to pay so much for so little. My replies below begin with $$$.On Sun, Feb 24, 2013 at 12:06 AM, Joe Subich, W4TV <lists@...> wrote:One last attempt to get you to see standard mathematical principles -> +++ From the above you only have # uploaded logs & Newly Upload > QSOs. In every equation you have 2 unknown data points. You cannot > claim avg # QSOs per log as known because you don't have that as a > fact but as an assumption relying on your calculation being a valid > average # QSOs per log, which it is not.No, the average number of QSOs per log is what we are solving for in the statistical process. It is not *exact* because no sampling based methodology will ever be exact. However, once we arrive at the number of QSOs per log (+/- some sampling error) the other values are given by the formulas - they are no longer *unknown* only subject to the same uncertainty as the average number of QSOs per log.$$$ I have shown the situation and formula so often my computer now types it for me. Even in your own twisted words, above, you are agreeing with me, though you write it as you are not.(Avg # QSOs per log * # uploaded logs) - Newly Uploaded QSOs = Previously Uploaded QSOs where we don't know Avg # QSOs per log and we don't know Previously Uploaded QSOs. That's 2 variables in the equation we don't know to be facts. Just because you, W4TV, believe you have derived Avg # QSOs per log, it is made up. It's not statistically sound at all which you can see below.> +++ I didn't remove 80% of the samples. I removed 20%. Why would you > say I removed 80% of the samples when I wrote I removed 20% multiple > times?OK, you removed the top two deciles of samples and removed 95% of the QSOs. That *still* doesn't pass the test of valid statistics. I've looked at the sample data four ways ... first the Maximum Likelihood of all samples (samples with no data or no logs do not impact the the Maximum Likelihood), second - the Maximum Likelihood with any sample of more than one hour removed, third - the Maximum Likelihood with any sample where the *next* sample covers more than one hour removed, and fourth - the Maximum Likelihood with any sample covering more than one hour reduced in proportion to its time (e.g. a sample covering 1:30 would be reduced by 33%). *None* of those methods reduce the number of QSOs by anywhere close to 95% - the biggest reduction is the third case but the number of logs is reduced by nearly the same percentage.$$$ Since you are such a fan of sending someone to Wikipedia, grab your college provided mouse from your lecture hall and view the following:$$$ http://en.wikipedia.org/wiki/Median_household_income#Median_household_income_and_the_US_economy where it references "The median income is considered by many statisticians to be a better indicator than the average household income as it is not dramatically affected by unusually high or low values." This is exactly the same sound logic I'm applying to the LoTW data. Since 95% of the QSOs are in the top 20% of the data, the whole is severely skewed by the few. Applying an average to ALL records based on the skewed data is not the best approach according to your often cited source.$$$ http://en.wikipedia.org/wiki/File:BeforetaxfamilyincomemeanUS1989-2004.svg shows data that is rather topical and matches the above logic. Though it's not the 1% we always here about, the wealthy top 10% in this case are rather extreme outliers (values greater than 3x the next closest 15%). We have that same inequality with the LoTW of data by applying real world knowledge before blindly processing it. 95% of the greatest QSOs/log fall in just the top 20% of status records where 80% of the status records make up just 5% of the much smaller QSOs/log. Maybe we need an Occupy LoTW movement.$$$ I will again send Professor Joe to his favorite source to view http://en.wikipedia.org/wiki/Outlier and you will see among other relevant info the following... "Outliers can occur by chance in any distribution, but they are often indicative either of measurement error or that the population has a heavy-tailed distribution. [snip] while in the latter case they indicate that the distribution has high kurtosis and that one should be very cautious in using tools or intuitions that assume a normal distribution." And you can learn what kurtosis means at your favorite site - http://en.wikipedia.org/wiki/Kurtosis where it means and what I have often stated "In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution". Note the key point from above - "one should be cautious using tools that assume a normal distribution". Let me repeat that... "one should be cautious using tools that assume a normal distribution". In case it still hasn't sunk in "one should be cautious using tools that assume a normal distribution".
So if it makes you feel better, I used big words like you to say rather simple things and I sent you off to Wikipedia as you do with others. That should make you feel warm and fuzzy inside. But none of your self proclaimed god-like statistical know-how means a hill of beans if you don't apply simple knowledge & logic to a situation and instead blindly process raw and known inadequate data.
Lopping off the more extreme outliers as I have done is sound approach as multiple references above confirm and even more reasonable based on applying knowledge of the data. Because it doesn't agree with your rant and obvious agenda that "dups rule the world" you claim others must be false, but in reality it is probably much closer to the truth then your blind application of sum() and average() Excel formulas. Knowledge is "key" and it seems you are unfortunately "locked out".
David - K2DSL
- I run ACLog... When you hit the "ALL SINCE" button, change the date to
be something about a week prior to the LoTW failure...
I "believe", ACLog got very confused as a result of the fail mode of
LoTW. That corrected a very similar problem for me.
Thanks and 73's,
For equipment, and software setups and reviews see:
for MixW support see;
for Dopplergram information see:
for MM-SSTV see:
On Sun, 2014-08-24 at 09:05 -0700, reillyjf@... [ARRL-LOTW]
> Thanks for the suggestion. I did a complete download, and beat the
> number of duplicates down from 275 to 30. No exactly sure why the
> N3FJP ACL is missing this information.
> - 73, John, N0TA