Loading ...
Sorry, an error occurred while loading the content.
 

Re: Bad reviewing of statistics in one of our journals

Expand Messages
  • Ian Shrier
    My impression of the medical sciences is that effect sizes is a poor alternative to using minimal important differences. Effect sizes were used much more often
    Message 1 of 5 , Sep 1, 2009
      My impression of the medical sciences is that effect sizes is a poor
      alternative to using minimal important differences. Effect sizes were
      used much more often in the past, and now people try to think about
      MID. But it is difficult. So many people continue to use effect
      sizes....


      Ian Shrier MD, PhD, Dip Sport Med, FACSM
      Associate Professor, Dep't of Fam Med, McGill University
      Past-President, Canadian Academy of Sport Medicine
      Check out: www.casm-acms.org
      SKYPE name: ian.shrier

      Centre for Clinical Epidemiology and Community Studies
      SMBD-Jewish General Hospital
      3755 Cote Ste-Catherine Rd
      Montreal, Qc H3T 1E2
      Tel: 514-340-7563
      Fax: 514-340-7564





      On 1-Sep-09, at 6:02 PM, Rowlands, David wrote:

      > First up, on the review, while we can't say from the information
      > provided whether the science was not up to scratch, the reviewer's
      > comments are totally out of kilter.
      >
      > Second, I hope to help here on the issue many scientists strike when
      > trying to adapt magnitude based inference to physiological data.
      > After a few years of getting my head around it, I endorse Will's
      > comment that standardisation (via effect size) of a physiological,
      > psychological or other mechanistic or related measures provides a
      > statistically valid effect threshold. Like with the estimates of the
      > smallest effects on performance, the effect size encompasses the
      > variability of the measure and the magnitude of the outcome, which
      > relate in totality to the relevance and utility of the measure in
      > the real world.
      >
      > I look to clinical medicine for comparison where you read about
      > smallest clinical effects, which appear to come from observations in
      > practice = sampling from the population. From my recent but limited
      > research experience in clinical exercise science and sports med, the
      > smallest effect size comes out at about a similar magnitude to the
      > subjectively estimated smallest clinical effect. This is probably
      > not surprising because for any effect to matter, it will likely have
      > to score consistently above the within-subject CV for the measure
      > which makes up a sizeable component of the sample variance along
      > with individual variability. It is likely similar patterns for
      > physiological outcomes will emerge once researchers put their mind
      > to it, drop significance testing, and adopt magnitude-based
      > evaluation approaches that provide inference closer to the
      > biological response (=more likely to get the science right).
      > Consideration of statistical power and assessment via probabilities
      > need also
      > to be placed near the top of the priority list.
      >
      > I think we have to trust the life-time work by people like Jacob
      > Cohen on this, whose views and opinions were developed through
      > rigorous investigation, scenario modeling, and peer-review, and
      > should therefore be respected and seriously considered as best
      > practice in modern scientific analysis and inference.
      >
      > Three of the greatest constraints to progress are: 1) statistical
      > analytical
      > skills (=stats teaching) and user friendly tools (=market demand);
      > 2) the attention (and ignorance?) and lack of discipline (not
      > insisting their own journal guidelines are met) on the matter by
      > journal editors; 3) magnitude-based probabilistic inference requires
      > a greater investment in time than hypothesis testing, and in this
      > error of governance by accountancy the short-term solution rules.
      >
      > David
      >
      >
      >
      >
      > ------------------------------------
      >
      > Post messages to sportscience@yahoogroups.com. To (un)subscribe,
      > send any message to sportscience-(un)subscribe@yahoogroups.com. View
      > all messages at http://groups.yahoo.com/group/sportscience/.Yahoo!
      > Groups Links
      >
      >
      >
    • Will Hopkins
      I ve already had one message from a colleague who has got the wrong idea as a result of the message Dave Rowlands just posted, so I have to clarify it. Dave
      Message 2 of 5 , Sep 1, 2009
        I've already had one message from a colleague who has got the wrong idea as
        a result of the message Dave Rowlands just posted, so I have to clarify it.

        Dave wrote: "...From my recent but limited research experience in clinical
        exercise science and sports med, the smallest effect size comes out at about
        a similar magnitude to the subjectively estimated smallest clinical effect.
        This is probably not surprising because for any effect to matter, it will
        likely have to score consistently above the within-subject CV for the
        measure which makes up a sizeable component of the sample variance along
        with individual variability,,."

        Dave's first sentence is simply a statement saying that 0.20 of the
        between-subject standard deviation-which is the default smallest important
        difference or change in a mean, and the standardization approach I was
        talking about-is similar to what people would regard as being reasonable as
        the smallest important difference or change. Fine.

        Dave's next sentence brings in error of measurement, which is the
        within-subject variability or variability a subject shows from test to test,
        and I think he's got it wrong. Here's my spin on it. If that variability
        represents real variability in the subject from test to test, such as you
        might get in a fitness test or a measurement of a concentration of something
        in the blood, then you don't have to worry about it. Basically, the SD you
        get when you measure subjects is the SD to use for standardization. But if
        part of that within-subject variability from measurement to measurement
        arises from random error injected by the measuring instrument ("technical
        error" is the term some of us use for such error), then it should not be
        considered as part of the between-subject SD for purposes of estimating 0.20
        of the SD to get the smallest effect. In principle you should subtract
        off the extra noise from the between-subject SD you get with the instrument
        before you take 0.20 of it. You subtract off the noise by subtracting the
        square of it from the square of the observed between-subject SD, then taking
        the square root.

        The noise injected by an instrument is often negligible, in my experience:
        the variability a subject shows from measurement to measurement arises from
        variation in the subject, not random noise in the instrument. But even if
        the noise is comparable to the within-subject variability, "negligible"
        needs to be assessed not in relation to the within-subject variability but
        in relation to the smallest important effect, which is 0.20 of the
        between-subject SD, if it's a mechanism or physiological variable or a test
        measure for a team-sport athlete or a test measure that has an unknown
        relationship with competitive performance.

        Let's be clear about the smallest effect for individual competitive athletic
        performance: it's 0.3 of the variability top athletes show from competition
        to competition, provided the athletes compete independently. I am
        responsible for stating that it is 0.5 rather than 0.3 in many previous
        papers, but in the first paper of all (Hopkins, Hawley and Burke, MSSE,
        1999) I did state 0.3, and I stupidly fudged it after that so that sample
        sizes were not impractically large. Recently I have worked out the
        thresholds for moderate, large, very large and extremely large effects, so I
        had to go back to 0.3 for the threshold for small(est). The full set of
        thresholds is 0.3, 0.9, 1.6, 2.5 and 4.0. These thresholds correspond to a
        top athlete winning an extra medal in 1, 3, 5, 7 and 9 competitions in every
        10 competitions. These thresholds, and the thresholds for standardized
        differences or changes in a mean (0.2, 0.6, 1.2, 2.0, 4.0) are stated in the
        2009 MSSE paper on progressive stats.

        The challenge for sport scientists who want to assess their athletes with
        performance tests is to convert the smallest effect in competitive
        performance into a smallest effect in test performance. I submitted a
        proposal for a tutorial lecture on this and related topics for the 2008 ACSM
        meeting. The proposal was rejected, and it was rejected when I submitted it
        again for this year's meeting last May. Hence (in part) my decision not to
        attend this year's meeting! I have submitted the same proposal again this
        year. Third time lucky? Let's see.

        Will
        Will G Hopkins, PhD FACSM
        Contact info: http://sportsci.org/will
        Sportscience: http://sportsci.org
        Statistics: http://newstats.org
        Be creative: break rules.
      Your message has been successfully submitted and would be delivered to recipients shortly.