- To: Synoptic

In Response To: Bob Schacht

On: The Persimmon Paradox

From: Bruce

BOB: Bruce, you're missing the point about Bayesian statistics. The

philosophical issue here is: Is "truth" absolute, or is it relative?

BRUCE: I don't know what "absolute truth" means, it sounds to me

uncomfortably theological. Truth to me is absolute in the sense that it is

not relative to my ideas of it; that it is "out there" and doesn't depend on

my opinion of it, or on my thinking about it at all. If I kick a stone that

I didn't suspect was there, it will still hurt. At least so Samuel Johnson

has suggested, and I am content to adopt his conclusion as convincing.

If I want to find out a particular truth (say, the vote in the 1946 Rumanian

election, or the 1998 Guam election), I may have to put out an effort, and

the election returns may have been selectively destroyed, or the election

itself rigged, but these are merely complications. One of them, or one of a

set including them, is what, to borrow a term from Ranke, really happened.

I suspect that even Bayesian procedures aim to approach closer to "what is

really so," and not merely to clarify, for internal purposes, "what I think

is so." You can start out by thinking that all games of chance are even, but

additional information (the insertion of additional information being the

Bayesian trademark) will give you a better idea, not of the inside of your

head, but of the odds of blackjack versus the odds of roulette. Certainly

that was the view held by Mosteller in his conspicuously Bayesian analysis

of the authorship of the Federalist Papers. He did not assume that the

Federalist Papers were a figment of his imagination. He presented his

Bayesian-derived conclusions as a better approximation to the facts of

Federalist Papers authorship. At least in my edition of his book.

Either Buddha predeceased Mahavira, or the reverse. Not both, not neither.

The answer may be indeterminable, but it is not indeterminate. At the other

end of the telescope, one of them happened.

BOB: You take the side of absolute truth. This is all well and good, but

every day people make truth judgments without having all the facts. You do,

too. David's just-so story was meant to illustrate this.

BRUCE: Again, I am shy of the term "absolute truth." The view I hold is that

the situation is what it is (or was), regardless of how much we may know

about it, or care about it, or conceive of it. In the absence of time or

means to determine what the situation is, we often have to make best

judgements and proceed on that basis. Sometimes we succeed, sometimes we

don't. People differ in their ability to act on imperfect knowledge. (For

that matter, be it noted, they also differ in their ability to act on

perfect knowledge). The ones who are better at acting on imperfect knowledge

tend to make better generals or (according to taste) text critics. That

there exists an art of judging from imperfectly known facts raises, in

principle, no questions about whether there *are* facts, and it involves no

methodological issues about how to become better *acquainted* with the facts

(the situation that is or was). I see no brief for Bayesian as opposed to

Bernoullian (if that is the discussion we are having) in any of this.

BOB: The question that you have not come to grips with is this: How do you

know when you have all the facts?

BRUCE: You never have all the facts; what you want is enough facts to base a

reasonable conclusion on. How much is that? Statistically, at least in some

situations, there are ways of determining that level; Neyman among others

gave years of life to clarifying that type of question. If you argue from

silence about the absence of a particular artifact type in a particular

period, and additional excavations continue to be made, and if they turn up

further examples of the known artifact types but no new types, then your

conclusion is proportionately strengthened. More precisely, if you want to

determine a proportion in a population, how large must your sample be? That

is a fairly well researched and adequately answered question. If the

population distribution is Poisson, for example, there is the Rule of Three,

which holds that to determine with reasonable assurance (and the term

"reasonable" is reasonably well defined in statistics) a proportion of 1/n

in the population, you need to take a sample of 3n. Thus, if you want to

establish a proportion of 1 in 1,000,000, you need to examine 3,000,000

cases. That is not exact (it is more precisely x times the y of z), but it

is close enough for practical purposes, and it is also easy to memorize. And

so on.

It is a piece of luck that, for samples of that size, Poisson closely

approximates normal, so that the utility of the Rule of Three is wider than

I may have implied in stating it. Engineers use a rough version of the Rule

of Three all the time. Most of their bridges hold up pretty well.

BOB: That is why I've come to the conclusion that ALL of our conclusions

must of necessity be viewed as tentative.

BRUCE: It is my impression that every responsible investigator, in physics

as well as history, whether a Bayesian aficionado or not, reaches that same

conclusion. The successful theory is the one which best accommodates the

most data. Discovery of new data, or a better argument from the old data,

will always displace the previous theory with a more adequate one. That is

how science operates.

Or almost always (people's emotions, including the emotions of

scientifically trained people, continually clog up this particular

equation).

It seems to me that discussion like this one often blur two separate

questions: (a) is there an empirical situation "out there," independent of

my wishes or knowledge, and (b) if so, can I have adequate knowledge of that

situation? For my money, the answer to (a) is Yes, and the answer to (b) is,

It Depends. Depends on what? On what sort of situation it is, and how much

knowledge is adequate for your purposes. The general who knows that the

opposing force contains only 4,537 soldiers, whereas his contains 14,537,

knows enough to make his next move. He doesn't need to know the names of the

4,537 soldiers.

Or, for that matter, of the 14,537, even though those names will in

principle be available to him if needed. But his first thought on being

offered that list, I imagine, would be to decline it, as hampering rather

than assisting his thought processes. No?

E Bruce Brooks

Warring States Project - At 08:55 PM 4/12/2006, E Bruce Brooks wrote:
>To: Synoptic

Well, you flat out declared, after quoting the problematic passage from

>In Response To: Bob Schacht

>On: The Persimmon Paradox

>From: Bruce

>

>BOB: Bruce, you're missing the point about Bayesian statistics. The

>philosophical issue here is: Is "truth" absolute, or is it relative?

>

>BRUCE: I don't know what "absolute truth" means, it sounds to me

>uncomfortably theological.. . .

David's essay,

>My response would be: No, it was wrong period.

[snip]

>BOB: You take the side of absolute truth. This is all well and good, but

But you are not shy about responding "No, it was wrong period. "

>every day people make truth judgments without having all the facts. You do,

>too. David's just-so story was meant to illustrate this.

>

>BRUCE: Again, I am shy of the term "absolute truth." . . .

You can only declare it wrong on the basis of information not previously

known. At time A, it seemed right. But then at time B, you now declare it

"wrong period." So now what if I discover another piece of evidence that

shows you that it is probably right?

You modestly profess being shy about "absolute truth," but are not hesitant

to proclaim it.

So it again seems to me that you still do not understand the Bayesian process.

Bob

[Non-text portions of this message have been removed] - To: Synoptic

In Response To: Bob Schacht

On: Truth

From: Bruce

BOB: But you are not shy about responding "No, it was wrong period. " / You

can only declare it wrong on the basis of information not previously known.

At time A, it seemed right. But then at time B, you now declare it "wrong

period." So now what if I discover another piece of evidence that shows you

that it is probably right? / You modestly profess being shy about "absolute

truth," but are not hesitant to proclaim it.

BRUCE: I am shy about words like "absolute" and "truth" because they are all

too liable to be written in capital letters, and to get out of hand in an

ordinary secular argument. What I feel less shy about, in the present

example, is that the procedure offered in Dave's example is operationally

wrong, and so is every other statistical procedure applied to equally

inadequate data. I can find no merit in an answer being "right except not

corresponding to the truth." It merely means that the arithmetic was done

correctly. It doesn't mean that it was correct to do the arithmetic in the

first place.

Questions of who said what to who are rarely enlightening, but this one

happens to be both recent and on record. Let me in the interest of context,

and to get a little away from Truth questions, recapitulate Dave's

statement, and also my reply. Here goes:

DAVE (QUOTED BY ME): "On the other hand, a quite legitimate criticism of a

Bayesian result is that important information, that is known, was not

considered that would effect the outcome. For example, if we know it is

daylight, and that snicky wugs only come out at night, then we have omitted

important information that will change our answer. Our first answer was not

wrong; it was correct for its information set. And, with our new insight

into the nocturnal behavior of snicky wugs, we now have a new correct

probability for our new information set."

ME (IN RESPONSE): My response would be: No, it was wrong period. I don't see

this as exclusively an objection to Bayesian, it is a caution about

statistics in general. In my view (not wholly unshared by elementary

textbook writers), if relevant information is omitted, or if irrelevant

information is put into the statistical grinder, or if we attempt to use the

wrong tool to open the right pecan, nothing good will ensue. Let me

illustrate this . . . [the Persimmon Paradox followed, and was promptly

solved by Jeffery Hodges]

MY FURTHER COMMENT: I think it will be clear that my objection was (and I

herewith confirm that it remains) to using statistical methods, Bayesian or

other, when we don't have enough data, or the right data, to use them on.

This is a methodological objection. It is also categorical, in that it

applies to all use of statistics on insufficient data. Subject to

counterexamples, I don't concede that any statistical process, beginning

with insufficient data, can produce sufficient data, or can reliably reach

the same solution as it would have reached had sufficient (or appropriate)

data been available. If sufficient (or appropriate) data later become

available, the thing to do, in my opinion, is not to throw them into the

previous insufficient procedure, but to run a procedure on them de novo.

Don't spill milk on spilt milk.

It is open for any Bayesian here present to give an example of how operating

on an insufficient data set can produce a sufficient conclusion. This would

be a counter to my Persimmon example (to which I proceeded in the abridged

quote above), which tends to suggest that any operation on an insufficient

data set is invalid, and that any answer it may reach is in principle

perilous and in practice inactionable. My example is capable of

demonstration, at any desired length.

Failing such counterexample, I think my point stands. To me, it is useless

to say "Well, my answer would have been right if there had been enough data

to reach the right answer." The thing we need to know is when we *have*

enough data to reach the right answer, and the thing we need to do, in case

we do not have enough data, is not to calculate, but to refrain from

calculating.

The infamous Literary Digest poll which mispredicted, by a landslide, the

outcome of the 1936 US Presidential election, was not "right on its

premises," that is meaningless. So is the proposition that the people who

ran that poll were nice people. It may well be true, but it's not relevant.

The poll was wrong on its assumptions; it was faulty as experiment design,

it was flawed from the outset, it was erroneous without extenuation. Its

wrongness is frequently expounded in elementary textbooks. It may have

"seemed right" to those who engineered it, but there is no content to that

rightness. It was and remains a mistake, in saecula saeculorum.

Statistics textbooks frequently give practice problems of an unreal sort,

whose effect is to accustom the statistic student to applying techniques to

unreal situations. I think the effect is bad. In the problem sets following

my lesson on the Poisson Distribution, I sometimes give the "textbook"

answers, just to practice using the tables, but I also attempt to show what

is wrong with the problems as there stated. I think this is a more helpful

approach. Apparently there are those in the math and engineering worlds who

think so too; at any rate, that page has been linked to by several

statistics classes in the academic sector, and queries about valid

application have been received from several engineers in the commercial

sector.

EMPHATIC CONCLUSION

Be that as it may, I do not wish all this pother to obscure my initial

comment, which was that Dave Gentile's argument about the Markan "salt"

passages, insofar as it is based on the lexical and ritual facts, seems to

me well presented and worth considering. I am glad he put it online, and

hope he will get useful feedback from having done so. I suspect that in the

end, like any other proposal of its type, it will stand or fall on how well

it addresses the lexical and ritual facts, as well as on what facts others

may cite which it does not address, or what possibilities others may propose

which it did not envision. I can't see how the attached Bayesian argument

enhances Dave's conclusions; to my mind, it threatens to disfigure them. Not

that I object to statistics, au contraire, but rather that I don't (so far)

find in this sort of data material on which statistics can fruitfully

operate.

Bruce

E Bruce Brooks

Warring States Project

University of Massachusetts at Amherst

http://www.umass.edu/wsp - Thank you very much for the response, and the favorable words.

I'd like to respond here to both this post and some issues raised on the

subsequent thread.

I think Bayesian analysis is agnostic about the existence of absolute

truth, but if absolute truth does exist, in Bayesian analysis, you need

infinite/all possible information to arrive at it. Only if you can

eliminate the possibility that additional information exists, can 100%

certainty be achieved. So all knowledge is indeed tentative, as far as

Bayesian analysis is concerned.

Addressing the issue of doing statistics with inadequate or

inappropriate inputs -

An analogy may be helpful here. What axioms are to deductive logic, the

information set is to Bayesian analysis. A deductive argument may be

completely sound, but it is only as good at approximation of "truth" as

its axioms. In Bayesian analysis if the math is done correctly, the

answer is only as good as the information fed into it. So when I say

"Our first answer was not wrong; it was correct for its information

set." I mean that in the much the same way that a deductive argument can

be sound but counter to our best estimate of reality.

So I would maintain that when the math is done correctly a Bayesian

answer is correct for its information set. Whether or not a Bayesian

answer is useful for practical implementation in a given situation can

depend on other factors. For example, if we have used all the

information available to us, but we strongly suspect others have

important additional information, it may be unwise to proceed based on

our answer i.e. if we write a program that is good a making money based

on market patterns, we might want to use it, but we don't want to use it

to bet against insider trading.

So in critical analysis of a deductive argument one wants to examine the

assumptions or axioms. In critical analysis of a Bayesian argument we

want to look for important information that was omitted. For my salt

argument, I believe I've made appropriate use of the information at my

disposable, but of course, I may be uninformed on some key point.

Is Bayesian statistics appropriate here? I think you are right when you

say "I suspect that in the end, like any other proposal of its type, it

will stand or fall on how well it addresses the lexical and ritual

facts, as well as on what facts others may cite which it does not

address, or what possibilities others may propose which it did not

envision." All the Bayesian analysis does is add a bit of rigor to the

thought process, and give a quantitative answer associated with the

result. (Which is only as valid as its information set).

I think having such a number could be useful. In New Testament studies,

at least it seems to me, things that only seem to be established at say

70% probable are cited as near fact, and things that seem highly

probable say at the 99.9% level are routinely questioned. At least that

is my subjective observation. Having some sort of quantitative estimate

of the certainty of conclusion, even if such quantities are estimates

subject to revision, would seem to be useful, at least to me.

Thank you again for the response.

Dave Gentile

Sr. Systems Engineer/Statistician

EMC Captiva

EMC Corporation

601 Oakmont Lane,

Westmont, IL 60559

P: 630-321-2985

F: 630-654-1607

E: Gentile_Dave@...

-----Original Message-----

From: E Bruce Brooks [mailto:ebbrooks@...]

Sent: Wednesday, April 12, 2006 10:44 PM

To: Synoptic@yahoogroups.com; Gentile, David

Subject: Re: [Synoptic-L] Bayesian statistics and salt

To: Synoptic

Cc: Dave Gentile

On: Bayesian Salt

From: Bruce

All that thought and work should not pass without comment, and so I

venture

to make a comment, if only to satisfy my Chinese notions of propriety.

I have looked at Dave's page

(http://www.davegentile.com/synoptics/Mark.html), and find the first

part,

the argument from tradition and from established meanings of words and

usages of ritual, to be interesting and perhaps in the end convincing.

On

the last point I am personally holding out for the moment, but as a

commentary, the argument seems to me to have merit. I am glad Dave did

it,

and I will certainly file it with my notes on this particular problem

area

of Mark, and continue to ponder it.

I can also locate the point on the page where I part company with its

learned author, and perhaps not surprisingly it is the methodological

part.

I find myself losing it at about the following paragraph:

"On the other hand, a quite legitimate criticism of a Bayesian result is

that important information, that is known, was not considered that would

effect the outcome. For example, if we know it is daylight, and that

snicky

wugs only come out at night, then we have omitted important information

that

will change our answer. Our first answer was not wrong; it was correct

for

its information set. And, with our new insight into the nocturnal

behavior

of snicky wugs, we now have a new correct probability for our new

information set."

My response would be: No, it was wrong period. I don't see this as

exclusively an objection to Bayesian, it is a caution about statistics

in

general. In my view (not wholly unshared by elementary textbook

writers), if

relevant information is omitted, or if irrelevant information is put

into

the statistical grinder, or if we attempt to use the wrong tool to open

the

right pecan, nothing good will ensue. Let me illustrate this by what I

will

call (borrowing a term from another list where I raised this same

question)

the Persimmon Paradox:

Take or make a sheet of graph paper, and plot the following five points:

(1,2), (2,4), (3,6), (4,8), (5,8). Question: which of them is aberrant?

Bruce

E Bruce Brooks

Warring States Project

University of Massachusetts at Amherst