Re: [ARRL-LOTW] Re: Duplicates
- View SourceI'm sorry but I must respond one more time and I think even others that would normally ignore this should read through as you might get a kick out of it.
Joe, you are like a college professor spewing theory without applying an ounce of real world knowledge to the situation. You know the kind of professor I'm talking about. He's the one that once you had a job and thought back to what the old coot said, you just shook your head and wondered why you needed to pay so much for so little. My replies below begin with $$$.
On Sun, Feb 24, 2013 at 12:06 AM, Joe Subich, W4TV <lists@...> wrote:
One last attempt to get you to see standard mathematical principles -> +++ From the above you only have # uploaded logs & Newly UploadNo, the average number of QSOs per log is what we are solving for in
> QSOs. In every equation you have 2 unknown data points. You cannot
> claim avg # QSOs per log as known because you don't have that as a
> fact but as an assumption relying on your calculation being a valid
> average # QSOs per log, which it is not.
the statistical process. It is not *exact* because no sampling based
methodology will ever be exact. However, once we arrive at the number
of QSOs per log (+/- some sampling error) the other values are given
by the formulas - they are no longer *unknown* only subject to the
same uncertainty as the average number of QSOs per log.$$$ I have shown the situation and formula so often my computer now types it for me. Even in your own twisted words, above, you are agreeing with me, though you write it as you are not.(Avg # QSOs per log * # uploaded logs) - Newly Uploaded QSOs = Previously Uploaded QSOs
where we don't know Avg # QSOs per log and we don't know Previously Uploaded QSOs. That's 2 variables in the equation we don't know to be facts. Just because you, W4TV, believe you have derived Avg # QSOs per log, it is made up. It's not statistically sound at all which you can see below.> +++ I didn't remove 80% of the samples. I removed 20%. Why would youOK, you removed the top two deciles of samples and removed 95% of the
> say I removed 80% of the samples when I wrote I removed 20% multiple
QSOs. That *still* doesn't pass the test of valid statistics. I've
looked at the sample data four ways ... first the Maximum Likelihood
of all samples (samples with no data or no logs do not impact the the
Maximum Likelihood), second - the Maximum Likelihood with any sample
of more than one hour removed, third - the Maximum Likelihood with any
sample where the *next* sample covers more than one hour removed, and
fourth - the Maximum Likelihood with any sample covering more than
one hour reduced in proportion to its time (e.g. a sample covering
1:30 would be reduced by 33%). *None* of those methods reduce the
number of QSOs by anywhere close to 95% - the biggest reduction is
the third case but the number of logs is reduced by nearly the same
percentage.$$$ Since you are such a fan of sending someone to Wikipedia, grab your college provided mouse from your lecture hall and view the following:$$$ http://en.wikipedia.org/wiki/Median_household_income#Median_household_income_and_the_US_economy where it references "The median income is considered by many statisticians to be a better indicator than the average household income as it is not dramatically affected by unusually high or low values." This is exactly the same sound logic I'm applying to the LoTW data. Since 95% of the QSOs are in the top 20% of the data, the whole is severely skewed by the few. Applying an average to ALL records based on the skewed data is not the best approach according to your often cited source.$$$ http://en.wikipedia.org/wiki/File:BeforetaxfamilyincomemeanUS1989-2004.svg shows data that is rather topical and matches the above logic. Though it's not the 1% we always here about, the wealthy top 10% in this case are rather extreme outliers (values greater than 3x the next closest 15%). We have that same inequality with the LoTW of data by applying real world knowledge before blindly processing it. 95% of the greatest QSOs/log fall in just the top 20% of status records where 80% of the status records make up just 5% of the much smaller QSOs/log. Maybe we need an Occupy LoTW movement.$$$ I will again send Professor Joe to his favorite source to view http://en.wikipedia.org/wiki/Outlier and you will see among other relevant info the following... "Outliers can occur by chance in any distribution, but they are often indicative either of measurement error or that the population has a heavy-tailed distribution. [snip] while in the latter case they indicate that the distribution has high kurtosis and that one should be very cautious in using tools or intuitions that assume a normal distribution." And you can learn what kurtosis means at your favorite site - http://en.wikipedia.org/wiki/Kurtosis where it means and what I have often stated "In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a probability distribution". Note the key point from above - "one should be cautious using tools that assume a normal distribution". Let me repeat that... "one should be cautious using tools that assume a normal distribution". In case it still hasn't sunk in "one should be cautious using tools that assume a normal distribution".
So if it makes you feel better, I used big words like you to say rather simple things and I sent you off to Wikipedia as you do with others. That should make you feel warm and fuzzy inside. But none of your self proclaimed god-like statistical know-how means a hill of beans if you don't apply simple knowledge & logic to a situation and instead blindly process raw and known inadequate data.
Lopping off the more extreme outliers as I have done is sound approach as multiple references above confirm and even more reasonable based on applying knowledge of the data. Because it doesn't agree with your rant and obvious agenda that "dups rule the world" you claim others must be false, but in reality it is probably much closer to the truth then your blind application of sum() and average() Excel formulas.
Knowledge is "key" and it seems you are unfortunately "locked out".
David - K2DSL
- View SourceOn Wed, Feb 27, 2013 at 1:02 PM, k4dl@... <k4dl@...> wrote:
> Well I thought I would go try it again if it's so easy and I cannot use it until I get something in the mail. I thought the idea was to not have to use the mail! If they want me on it then make it work when I download it. Don't make me wait for something in the mail.Using a postcard is the cheapest way for HQ to verify that the person
who made the certificate request is the same person listed on the
license. It would be trivial to create and submit a fake certificate
request. it would be much harder to falsely change someone's license
Peter Laws | N5UWY | plaws plaws net | Travel by Train!