## RE: [ai-geostats] Median computation in real time...

Expand Messages
• ... The standard deviation is easy: you need to construct the sum of the data and the sum of squares of the data at the previous step and at the current step.
Message 1 of 2 , Jun 16, 2005
On 16-Jun-05 Simone Sammartino wrote:
> Dear list
> a strange question
> I've to estimate median value of a certain vector Z. Such
> vector increases with time; it's like a box that is filled
> any moment.
> I can compute basic statistics (mean, median, min, max,
> #value, std dev) but I can't preserve them. I mean that in
> any moment I can use data to compute statistics but immediately
> after I've to delete them....now, my question is...how to
> update basic statistics in real time?...only knowing the basic
> statistics of the previous step?...
> For example:
> minimim value = min(prev_step) - min(actual_step) if
> min(actual_step) < min(prev_step)
> maximum value = max(prev_step) + max(actual_step) if
> max(actual_step) > max(prev_step)
> mean value = {[mean(prev_step)*#value(prev_step)] +
> sum(actual_step)}/[#value(prev_step)+#value(actual_step)]
> but I can't image some similar algorithm for median and
> standard deviation computation...
> Any idea?
> Thank you
> Simone

The standard deviation is easy: you need to construct the
sum of the data and the sum of squares of the data at the
previous step and at the current step.

Therefore

Previous step:

sum(data) = mean(data) * number(data)

sum(data^2) = (number(data)-1) * SD(data)^2
+ number(data) * mean(data)^2

Current step:

Add new X to sum(data), (new X)^2 to sum(data^2),
1 to number(data), and re-compute the SD.

However, the median is impossible, from the information
you will have in hand. The reason is that, given the previous
median and the new X, you need to know how many of the previously
obtained data values lie between the previous median and the
new X (and the value of at least one of them). You do not have
this information.

As a more general comment, I am wondering why you are in this
situation. There is something apparently contradictory in your
description.

You say you can, at any time, obtain the summary statistics,
but you must then immediately delete them. But you are also
asking how you can update them from the summary statistics
of the previous step and the new data. But if you have deleted
them from the previous step, how can you use them later?

Furthermore, if you have the possibility at each step to perform
the kind of updating computation we are discussing, how is it
that you do not have the possibility simply to store the data,
or the sequence of summary statistics?

I am puzzled ... !

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding@...>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Jun-05 Time: 17:30:57
------------------------------ XFMail ------------------------------
Your message has been successfully submitted and would be delivered to recipients shortly.