[snip]

Hi Chaosheng:

A few points about log transforms. See below.

> The reason why I care about this issue is that there are at least two

There shouldn't be any loss of information since the log transformation

> problems related to data transformation (in order to follow the normal

> distribution):

>

> (1) The measurement scale is reduced. The orignal ratio/interval scale may

> be reduced to the lower level of ordinal, even close to nominal, which

> results in loss of raw information.

is a one-to-one mapping. The sample variance is much smaller in the log

scale but the log transform is often used precisely for that purpose.

> (2) Artificial relationship is introduced. We know that the lognormal

Rather, when you apply the log transform you a-priori assume the

> distribution is widely accepted. In correlation analysis, if the

> log-transformed data are used, the correlation becomes the "log-log"

> relationship, not the oginal linear relationship. In bivariate regression

> analysis, the original function is:

> y = a x + b

> However, for the log-transformed data, the function becomes:

> log(y) = a log(x) + b

> or y = exp (a log(x) + b)

> In many cases, it is not clear if the relationship should be linear or

> "log-linear". However, the artificially introduced "log-linear" relationship

> need to be proved.

existence of what you call the 'artificial relation', and this

assumption refer to the algebraic form of the error term rather than to

the relation between y and x. Say E(y)=f(x) is the model for y versus x,

where E is the expectation operator. If you assume an additive error

structure y_i=f(x_i)+e_i, where i indexes observation, and you consider

the e_i's as iid normal random variates, then there is no reason to

apply the log tranform. On the other hand if you assume a multiplicative

error structure such as y_i=f(x_i)*e_i, and you assume that the e_i are

iid lognormal random variates, then the log transform yields

ln(y_i)=lnf(x_i)+ln(e_i) and now the ln(e_i) are iid normal random

variates (with a much smaller variance than the e_i). The theory of the

lognormal is well developed so that there isn't actually much need to

transform the data to make it normal (e.g. Crow and Shimizu, 1988,

Lognormal distributions, Dekker Inc, NY).

> The most difficult situation is that if scientifically the relationship

When the relation between x and y is linear such as in E(y)=a*x+b, the

> between x and y is linear, should the data transformation still be carried

> out (just to satisfy the statistical requirement)?

errors are assumed to be additive (people usually do not believe that

y_i=(a*x_i+b)*e_i but rather y_i=(a*x_i+b)+e_i), so applying a log

transform to such case does not satisfy statistical requirements. On the

contrary, it goes against statistical advice.

The multiplicative error structure arises in models of the form

E(y)=a*x^b or E(y)=a*exp(b*x), or in general, in all multiplicative

processes.

As there is a central limit theoren for additive processes leading to

the normal, there is a central limit theorem for multiplicative

processes leading to the lognormal.

In the case of geostatistics, as pointed out by other people, the

kriging equations do not require distributional assumptions (though the

fitting of the model variogram to the moment-based Matheron variogram

does). If the frequency distribution of the regionalised variable looks

lognormal, it means that there is un underlying mechanism which is

multiplicative, but still i don't see why the variable should be

transformed for geostatistical analysis, except perhaps for fitting the

variogram.

Cheers

Ruben

--

* To post a message to the list, send it to ai-geostats@...

* As a general service to the users, please remember to post a summary of any useful responses to your questions.

* To unsubscribe, send an email to majordomo@... with no subject and "unsubscribe ai-geostats" followed by "end" on the next line in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the list

* Support to the list is provided at http://www.ai-geostats.org