Loading ...
Sorry, an error occurred while loading the content.

Re: Bad reviewing of statistics in one of our journals

Expand Messages
  • Rowlands, David
    First up, on the review, while we can t say from the information provided whether the science was not up to scratch, the reviewer s comments are totally out of
    Message 1 of 5 , Sep 1, 2009
    • 0 Attachment
      First up, on the review, while we can't say from the information provided whether the science was not up to scratch, the reviewer's comments are totally out of kilter.

      Second, I hope to help here on the issue many scientists strike when trying to adapt magnitude based inference to physiological data. After a few years of getting my head around it, I endorse Will's comment that standardisation (via effect size) of a physiological, psychological or other mechanistic or related measures provides a statistically valid effect threshold. Like with the estimates of the smallest effects on performance, the effect size encompasses the variability of the measure and the magnitude of the outcome, which relate in totality to the relevance and utility of the measure in the real world.

      I look to clinical medicine for comparison where you read about smallest clinical effects, which appear to come from observations in practice = sampling from the population. From my recent but limited research experience in clinical exercise science and sports med, the smallest effect size comes out at about a similar magnitude to the subjectively estimated smallest clinical effect. This is probably not surprising because for any effect to matter, it will likely have to score consistently above the within-subject CV for the measure which makes up a sizeable component of the sample variance along with individual variability. It is likely similar patterns for physiological outcomes will emerge once researchers put their mind to it, drop significance testing, and adopt magnitude-based evaluation approaches that provide inference closer to the biological response (=more likely to get the science right). Consideration of statistical power and assessment via probabilities need also to be placed near the top of the priority list.

      I think we have to trust the life-time work by people like Jacob Cohen on this, whose views and opinions were developed through rigorous investigation, scenario modeling, and peer-review, and should therefore be respected and seriously considered as best practice in modern scientific analysis and inference.

      Three of the greatest constraints to progress are: 1) statistical analytical
      skills (=stats teaching) and user friendly tools (=market demand); 2) the attention (and ignorance?) and lack of discipline (not insisting their own journal guidelines are met) on the matter by journal editors; 3) magnitude-based probabilistic inference requires a greater investment in time than hypothesis testing, and in this error of governance by accountancy the short-term solution rules.

      David
    • Ian Shrier
      My impression of the medical sciences is that effect sizes is a poor alternative to using minimal important differences. Effect sizes were used much more often
      Message 2 of 5 , Sep 1, 2009
      • 0 Attachment
        My impression of the medical sciences is that effect sizes is a poor
        alternative to using minimal important differences. Effect sizes were
        used much more often in the past, and now people try to think about
        MID. But it is difficult. So many people continue to use effect
        sizes....


        Ian Shrier MD, PhD, Dip Sport Med, FACSM
        Associate Professor, Dep't of Fam Med, McGill University
        Past-President, Canadian Academy of Sport Medicine
        Check out: www.casm-acms.org
        SKYPE name: ian.shrier

        Centre for Clinical Epidemiology and Community Studies
        SMBD-Jewish General Hospital
        3755 Cote Ste-Catherine Rd
        Montreal, Qc H3T 1E2
        Tel: 514-340-7563
        Fax: 514-340-7564





        On 1-Sep-09, at 6:02 PM, Rowlands, David wrote:

        > First up, on the review, while we can't say from the information
        > provided whether the science was not up to scratch, the reviewer's
        > comments are totally out of kilter.
        >
        > Second, I hope to help here on the issue many scientists strike when
        > trying to adapt magnitude based inference to physiological data.
        > After a few years of getting my head around it, I endorse Will's
        > comment that standardisation (via effect size) of a physiological,
        > psychological or other mechanistic or related measures provides a
        > statistically valid effect threshold. Like with the estimates of the
        > smallest effects on performance, the effect size encompasses the
        > variability of the measure and the magnitude of the outcome, which
        > relate in totality to the relevance and utility of the measure in
        > the real world.
        >
        > I look to clinical medicine for comparison where you read about
        > smallest clinical effects, which appear to come from observations in
        > practice = sampling from the population. From my recent but limited
        > research experience in clinical exercise science and sports med, the
        > smallest effect size comes out at about a similar magnitude to the
        > subjectively estimated smallest clinical effect. This is probably
        > not surprising because for any effect to matter, it will likely have
        > to score consistently above the within-subject CV for the measure
        > which makes up a sizeable component of the sample variance along
        > with individual variability. It is likely similar patterns for
        > physiological outcomes will emerge once researchers put their mind
        > to it, drop significance testing, and adopt magnitude-based
        > evaluation approaches that provide inference closer to the
        > biological response (=more likely to get the science right).
        > Consideration of statistical power and assessment via probabilities
        > need also
        > to be placed near the top of the priority list.
        >
        > I think we have to trust the life-time work by people like Jacob
        > Cohen on this, whose views and opinions were developed through
        > rigorous investigation, scenario modeling, and peer-review, and
        > should therefore be respected and seriously considered as best
        > practice in modern scientific analysis and inference.
        >
        > Three of the greatest constraints to progress are: 1) statistical
        > analytical
        > skills (=stats teaching) and user friendly tools (=market demand);
        > 2) the attention (and ignorance?) and lack of discipline (not
        > insisting their own journal guidelines are met) on the matter by
        > journal editors; 3) magnitude-based probabilistic inference requires
        > a greater investment in time than hypothesis testing, and in this
        > error of governance by accountancy the short-term solution rules.
        >
        > David
        >
        >
        >
        >
        > ------------------------------------
        >
        > Post messages to sportscience@yahoogroups.com. To (un)subscribe,
        > send any message to sportscience-(un)subscribe@yahoogroups.com. View
        > all messages at http://groups.yahoo.com/group/sportscience/.Yahoo!
        > Groups Links
        >
        >
        >
      • Will Hopkins
        I ve already had one message from a colleague who has got the wrong idea as a result of the message Dave Rowlands just posted, so I have to clarify it. Dave
        Message 3 of 5 , Sep 1, 2009
        • 0 Attachment
          I've already had one message from a colleague who has got the wrong idea as
          a result of the message Dave Rowlands just posted, so I have to clarify it.

          Dave wrote: "...From my recent but limited research experience in clinical
          exercise science and sports med, the smallest effect size comes out at about
          a similar magnitude to the subjectively estimated smallest clinical effect.
          This is probably not surprising because for any effect to matter, it will
          likely have to score consistently above the within-subject CV for the
          measure which makes up a sizeable component of the sample variance along
          with individual variability,,."

          Dave's first sentence is simply a statement saying that 0.20 of the
          between-subject standard deviation-which is the default smallest important
          difference or change in a mean, and the standardization approach I was
          talking about-is similar to what people would regard as being reasonable as
          the smallest important difference or change. Fine.

          Dave's next sentence brings in error of measurement, which is the
          within-subject variability or variability a subject shows from test to test,
          and I think he's got it wrong. Here's my spin on it. If that variability
          represents real variability in the subject from test to test, such as you
          might get in a fitness test or a measurement of a concentration of something
          in the blood, then you don't have to worry about it. Basically, the SD you
          get when you measure subjects is the SD to use for standardization. But if
          part of that within-subject variability from measurement to measurement
          arises from random error injected by the measuring instrument ("technical
          error" is the term some of us use for such error), then it should not be
          considered as part of the between-subject SD for purposes of estimating 0.20
          of the SD to get the smallest effect. In principle you should subtract
          off the extra noise from the between-subject SD you get with the instrument
          before you take 0.20 of it. You subtract off the noise by subtracting the
          square of it from the square of the observed between-subject SD, then taking
          the square root.

          The noise injected by an instrument is often negligible, in my experience:
          the variability a subject shows from measurement to measurement arises from
          variation in the subject, not random noise in the instrument. But even if
          the noise is comparable to the within-subject variability, "negligible"
          needs to be assessed not in relation to the within-subject variability but
          in relation to the smallest important effect, which is 0.20 of the
          between-subject SD, if it's a mechanism or physiological variable or a test
          measure for a team-sport athlete or a test measure that has an unknown
          relationship with competitive performance.

          Let's be clear about the smallest effect for individual competitive athletic
          performance: it's 0.3 of the variability top athletes show from competition
          to competition, provided the athletes compete independently. I am
          responsible for stating that it is 0.5 rather than 0.3 in many previous
          papers, but in the first paper of all (Hopkins, Hawley and Burke, MSSE,
          1999) I did state 0.3, and I stupidly fudged it after that so that sample
          sizes were not impractically large. Recently I have worked out the
          thresholds for moderate, large, very large and extremely large effects, so I
          had to go back to 0.3 for the threshold for small(est). The full set of
          thresholds is 0.3, 0.9, 1.6, 2.5 and 4.0. These thresholds correspond to a
          top athlete winning an extra medal in 1, 3, 5, 7 and 9 competitions in every
          10 competitions. These thresholds, and the thresholds for standardized
          differences or changes in a mean (0.2, 0.6, 1.2, 2.0, 4.0) are stated in the
          2009 MSSE paper on progressive stats.

          The challenge for sport scientists who want to assess their athletes with
          performance tests is to convert the smallest effect in competitive
          performance into a smallest effect in test performance. I submitted a
          proposal for a tutorial lecture on this and related topics for the 2008 ACSM
          meeting. The proposal was rejected, and it was rejected when I submitted it
          again for this year's meeting last May. Hence (in part) my decision not to
          attend this year's meeting! I have submitted the same proposal again this
          year. Third time lucky? Let's see.

          Will
          Will G Hopkins, PhD FACSM
          Contact info: http://sportsci.org/will
          Sportscience: http://sportsci.org
          Statistics: http://newstats.org
          Be creative: break rules.
        Your message has been successfully submitted and would be delivered to recipients shortly.