Re: Bad reviewing of statistics in one of our journals
- My impression of the medical sciences is that effect sizes is a poor
alternative to using minimal important differences. Effect sizes were
used much more often in the past, and now people try to think about
MID. But it is difficult. So many people continue to use effect
Ian Shrier MD, PhD, Dip Sport Med, FACSM
Associate Professor, Dep't of Fam Med, McGill University
Past-President, Canadian Academy of Sport Medicine
Check out: www.casm-acms.org
SKYPE name: ian.shrier
Centre for Clinical Epidemiology and Community Studies
SMBD-Jewish General Hospital
3755 Cote Ste-Catherine Rd
Montreal, Qc H3T 1E2
On 1-Sep-09, at 6:02 PM, Rowlands, David wrote:
> First up, on the review, while we can't say from the information
> provided whether the science was not up to scratch, the reviewer's
> comments are totally out of kilter.
> Second, I hope to help here on the issue many scientists strike when
> trying to adapt magnitude based inference to physiological data.
> After a few years of getting my head around it, I endorse Will's
> comment that standardisation (via effect size) of a physiological,
> psychological or other mechanistic or related measures provides a
> statistically valid effect threshold. Like with the estimates of the
> smallest effects on performance, the effect size encompasses the
> variability of the measure and the magnitude of the outcome, which
> relate in totality to the relevance and utility of the measure in
> the real world.
> I look to clinical medicine for comparison where you read about
> smallest clinical effects, which appear to come from observations in
> practice = sampling from the population. From my recent but limited
> research experience in clinical exercise science and sports med, the
> smallest effect size comes out at about a similar magnitude to the
> subjectively estimated smallest clinical effect. This is probably
> not surprising because for any effect to matter, it will likely have
> to score consistently above the within-subject CV for the measure
> which makes up a sizeable component of the sample variance along
> with individual variability. It is likely similar patterns for
> physiological outcomes will emerge once researchers put their mind
> to it, drop significance testing, and adopt magnitude-based
> evaluation approaches that provide inference closer to the
> biological response (=more likely to get the science right).
> Consideration of statistical power and assessment via probabilities
> need also
> to be placed near the top of the priority list.
> I think we have to trust the life-time work by people like Jacob
> Cohen on this, whose views and opinions were developed through
> rigorous investigation, scenario modeling, and peer-review, and
> should therefore be respected and seriously considered as best
> practice in modern scientific analysis and inference.
> Three of the greatest constraints to progress are: 1) statistical
> skills (=stats teaching) and user friendly tools (=market demand);
> 2) the attention (and ignorance?) and lack of discipline (not
> insisting their own journal guidelines are met) on the matter by
> journal editors; 3) magnitude-based probabilistic inference requires
> a greater investment in time than hypothesis testing, and in this
> error of governance by accountancy the short-term solution rules.
> Post messages to firstname.lastname@example.org. To (un)subscribe,
> send any message to sportscience-(un)email@example.com. View
> all messages at http://groups.yahoo.com/group/sportscience/.Yahoo!
> Groups Links
- I've already had one message from a colleague who has got the wrong idea as
a result of the message Dave Rowlands just posted, so I have to clarify it.
Dave wrote: "...From my recent but limited research experience in clinical
exercise science and sports med, the smallest effect size comes out at about
a similar magnitude to the subjectively estimated smallest clinical effect.
This is probably not surprising because for any effect to matter, it will
likely have to score consistently above the within-subject CV for the
measure which makes up a sizeable component of the sample variance along
with individual variability,,."
Dave's first sentence is simply a statement saying that 0.20 of the
between-subject standard deviation-which is the default smallest important
difference or change in a mean, and the standardization approach I was
talking about-is similar to what people would regard as being reasonable as
the smallest important difference or change. Fine.
Dave's next sentence brings in error of measurement, which is the
within-subject variability or variability a subject shows from test to test,
and I think he's got it wrong. Here's my spin on it. If that variability
represents real variability in the subject from test to test, such as you
might get in a fitness test or a measurement of a concentration of something
in the blood, then you don't have to worry about it. Basically, the SD you
get when you measure subjects is the SD to use for standardization. But if
part of that within-subject variability from measurement to measurement
arises from random error injected by the measuring instrument ("technical
error" is the term some of us use for such error), then it should not be
considered as part of the between-subject SD for purposes of estimating 0.20
of the SD to get the smallest effect. In principle you should subtract
off the extra noise from the between-subject SD you get with the instrument
before you take 0.20 of it. You subtract off the noise by subtracting the
square of it from the square of the observed between-subject SD, then taking
the square root.
The noise injected by an instrument is often negligible, in my experience:
the variability a subject shows from measurement to measurement arises from
variation in the subject, not random noise in the instrument. But even if
the noise is comparable to the within-subject variability, "negligible"
needs to be assessed not in relation to the within-subject variability but
in relation to the smallest important effect, which is 0.20 of the
between-subject SD, if it's a mechanism or physiological variable or a test
measure for a team-sport athlete or a test measure that has an unknown
relationship with competitive performance.
Let's be clear about the smallest effect for individual competitive athletic
performance: it's 0.3 of the variability top athletes show from competition
to competition, provided the athletes compete independently. I am
responsible for stating that it is 0.5 rather than 0.3 in many previous
papers, but in the first paper of all (Hopkins, Hawley and Burke, MSSE,
1999) I did state 0.3, and I stupidly fudged it after that so that sample
sizes were not impractically large. Recently I have worked out the
thresholds for moderate, large, very large and extremely large effects, so I
had to go back to 0.3 for the threshold for small(est). The full set of
thresholds is 0.3, 0.9, 1.6, 2.5 and 4.0. These thresholds correspond to a
top athlete winning an extra medal in 1, 3, 5, 7 and 9 competitions in every
10 competitions. These thresholds, and the thresholds for standardized
differences or changes in a mean (0.2, 0.6, 1.2, 2.0, 4.0) are stated in the
2009 MSSE paper on progressive stats.
The challenge for sport scientists who want to assess their athletes with
performance tests is to convert the smallest effect in competitive
performance into a smallest effect in test performance. I submitted a
proposal for a tutorial lecture on this and related topics for the 2008 ACSM
meeting. The proposal was rejected, and it was rejected when I submitted it
again for this year's meeting last May. Hence (in part) my decision not to
attend this year's meeting! I have submitted the same proposal again this
year. Third time lucky? Let's see.
Will G Hopkins, PhD FACSM
Contact info: http://sportsci.org/will
Be creative: break rules.