Re: Interesting idea - Musical Tuning and Human Biology - today's NewScientist -
I exposed the flaw in the method used three time, with diffrent
wording and/or more detail, asking you to pinpoint exactly where you
either disagree or do not understand (because seemingly I am not that
good at being clear).
Seemingly, you are unwilling to enter a fair discussion, but just
repeatedly state that I misinform the group (how?), that I state false
fact (which one please?), and that my writing is unclear (in spite of
the fact that some other group members like Paul tip me that I am not
After that carload of amabilities, you state that "I" should apologize!!!!
Ok, I take it as humour...
> My question was, did they put it into theIn this figure 3A, there is indeed a method decision, even though,
> paper by choosing a special trick to get it in.
> Fig 3A is an empirical one.
> It does reflect the nature of human speech,
> not a methods decision by the authors!!!
probably not a malicious one. It is "currently admitted" (sorry, no
bibliography) that there is no significant physical coupling between
the vocal fold and the vocal tract, so there is no significant
correlate between f0 (the pitch) and F1, F2 .. (formant freequencies).
So as random variables, f0 and F1 can be considered continuous
independent random variable (or slightly correlated, that is not
important for the following).
The value of Fm do not correspond to a physical property of the vocal
tract conversely to F1 (effective vocal tract). F1 can be computed by
various interpolation method from the signal. F1 is continuously
distributed and is by no way forced to be an integer multiple of f0.
On the other hand side Fm is, by definition, a multiple of f0, thus,
it is a roundoff of the real physical value (F1). Fm has no physical
meaning (numerology excluded). By this simple process of rounding-off
Fm to an integer multiple of f0, a continuous spectrum is artificially
transformed in a discrete spectrum with a very limited number of
Harmonic number of Fm is not a "natural" measure, it is the result of
a computation that creates an artificial set of discrete random
variable that have, conviently, a very limited ranged, that are
afterward "shaked and baked" to produce a very poor continuous
"normalized" spectrum that is just a discrete spectrum in disguise.
So no, figure 3A reflect nothing in the nature of human speech. Taking
round (F1 / locutor height in centimeter)
would have given very similar results.
Its another way to explain the error in the method
Again, as soon as you pinpoint where I am wrong/unclear, I am
absolutely willing admit my error (it would not be the first time I
make a fool of myself on this list) or explain until cristal-like
clarity is reached (I am not that good at pedagogy, but I may try
again). You are my guest Martin
> MartinNo error in the FFT presented, as I explained, things goes bad just
> What then was this error? I can see no error in the FFT analysis.
Even more good will, I recap below, in three sentence as you asked
(even though it is just a summary, not a clarication
1- The theoritical "normalized" spectrum is a spectrum of discrete
values at some integer ratio. No surprize that actual normalized
spectrum has peak.
2- The physical relationship limits between f0 range and F1 range
limits the number of significant peaks in interval [1,2]
3- this lead to the trivial result (peak that are physically unlikely
do not appear, those that are more likely due to the distribution of
harmonic number of Fm as N/2, N/3, N/4 N/5 are proeminent).
All this is just due to acoustics, elementary algebrae and fairly
simple stats concepts; it is totally neuroscience-free, otherwise, I
would not have permit to challenge you Martin :).
All the detail is in my previous posts
- Hello Martin,
A final post (as far as I am concerned) on this discussion that
becomes very technical, more and more distant to the list concerns and
> So, on the window average, f5/f4 seems to be inharmonic
> (1.18 instead of 1.25).
> To get rid of this annoying effect, you have to use shorter window,
> but for short windows, the uncertainty principle kicks in. So
> compromise must be made on window size to get best spectral estimate.
> This computational inharmonicity do exist (and in fact occur all the
> time for .1 sec window)
> I am glad that we agree on the description of the phenomenon of
> disharmonization in pitch shifts. It's also fine that you saw in
> data what I expected by only thinking through the physics.example)
> But again:
> - This inharmonicity is not physical
> Here we still disagree. You are right that the ratio 5/4 (of your
> does not disappear. But what happens, if your time windows are soshort that
> the "errors" through averaging disappear? You'll see this: in somewindows
> there is no power at the 4th partial and in other windows there isno power
> at the 5th partials.Not at all, what appears is:
- as window size get smaller, harmonic peaks get broader, until they
(and no peak extraction algorith is of any help).
- As they get broader, the assesment of harmonicity is less and less
> So you have a 5/4 ratio all the time, but one that is(those that
> not real. What you have in reality is a ratio between real peaks
> have power) which deviates from 5/4.Even if that occur (an harmonic is so small that a given peak vanishes
in noise) that does not mean that it cease to exist, no more that
vanishing stars cease to exist in daylight. In other words, that some
peaks cease to be measurable is more a signal-to-noise problem than
anything related to a modification of the underlying physical process.
> Fran�oisWe never agree on so much before, great!
> - It shall spread the peaks but not shift them on the average as they
> are as likely to contribute at right or at left of each peak.
> again we seem to agree
> The peaks are not shifted, of course, but the majority of ratio
> is FLAT, that is BETWEEN the low-order-ratio peaks. This is what the
> figures of the study show.
> The paper rightfully focus on the peaks, not on the background.
> The interpretation of the authors focuses on the peaks. But they
> the complete spectra for anybody to see.20 time? what do you mean? 13 dB? where do you see a "floor" (at which
> Secondly, I see no flat floor but gentle slope on each side of each
> peak that eventually merge with the neighbouring peaks.
> François, below the slopes - in fact: below the dips - there is a HIGH
> plateau of noise !!!
> For example in Fig.2C the noise floor is 20 times (!) as high as the
> difference between the peak at 5/4 and the valley between 5/4 and 6/5.
> Fran�ois:There is no floor, but an steepy slope of around -20db per octave
> I do not understand why you focus on what occur BETWEEN the peaks;
> this is very secondary.
between 1 and 2.
To have a clearer picture, we should remove or "normalise" this slope.
But to so, we ought to know where does it come from.
Where does it come from? On normal speech, there is around -6dB from
the glottal source (may vary much) and -6dB from the lips radiation
caracterists, so roughly -12dB per Octave. The missing -8dB come from
the fact that most data gatered for ratio between 1 and 2 are done on
the right hand side of a formant. But is certainly not a straight,
simple -8dB / octave, because near 1, the more complex N+1/N ratio
such as 7:6 (compared to less complex 3:2) contribute more due to
their average proximity to formant top. I see no way to "predict" this
value of -20 dB because it mixes up contributions of formant bandwith,
mde even more complex by the normalisation process, and glottis and
lips radiation. Instead of isolating variables contributing to the
background, the normalisation make them impossible to disentangle.
Not being able to modelize properly this high frequency decay hamper
any attempt to interpret the background. But that is not the topic of
the paper anyways.
> Martin:at the
> Well, in the example above there is much more between the peaks than
> peaks. This is important to note, because it shows the big differenceAn this has nothing to do with harmonicity/inharmonicity anayways and
> between "clean" theoretically derived textbook spectra and real speech
> spectra. The value of the study is not to have replicated simple general
> wisdom on the general harmonicity of speech. It's value is to have shown
> what the harmonicity looks like in REAL data.
inharmonicity has nothing to do the topic of the paper.
> So Martin, when you write:I was only half kidding. The paper goes from a rather broad hypothesis
>> Fran�ois, none of your suggestions would be of help in finding
>> the questions of the research project. It seems to me you misunderstood
>> what the authors tried to investigate.
> you are absolutely correct all the way:
> - none of my suggestion would produce valuable results
> - I do not understand what they try to investigate
> - I see, but hardly, what are the questions of the research project
> You could have read that what you did not understand or see in the first
> section of the paper, which is called "introduction".
to a not less broad conclusion through the very small bottleneck of a
debatable (to say the less) process.
> If the idea is worth discussing, and you even like it, then the same
> also apply to the details of the results. That is, which ratiosstick out of
> the noise and which don't.I explained clearly and quantitatively enough, which ration stick out
and maintain what I said:
> I just say that their protocol is wrong from the beginning, andwould not
> produces trivial, predictable and otherwise uninteresting results,
> that's all.
> Had the authors asked you, before starting this study, you had grossly
> mispredicted the amount of noise between the peaks. And you also
> have been able to predict the limit beyond which simple ratiosdisappear in
> the noise.Well, in experimental science, who care about "predicting" the noise.
Understanding the source of the noise in order to reduce it in the
data gathering process is the only useful issue.
> Martin:I have been able to "predict" :-) peak locations and rough relatives
> You might have predicted sex differences, but not their exact
amplitudes from harmonic number distribution alone, explain sex
difference, describe roughly different contribution to background
noise and spectral decay.
I think that it is not bad for a dilettante.
I must go back to the work I am paid for