Loading ...
Sorry, an error occurred while loading the content.

3130Re: Statistical patterns

Expand Messages
  • Robert B. Waltz
    Sep 1, 1997
      On Mon, 1 Sep 1997, Timothy John Finney <finney@...>

      >Here are the figures which I alluded to before:
      >
      >NOS = number of states = number of readings in a variation unit.
      >FRQ = frequency = how often a given number of states occurs in the sampled
      >variation units.
      >FIT = fitted value using the equation F(n) = C x exp[-(a + bn)^2/2], C =
      >399, a = 1.50, b = 0.23.
      >
      >Hebrews + Romans (for variation units listed in the UBS 4th edn apparatus)
      >
      >NOS 1 2 3 4 5 6 7+
      >
      >FRQ ? 58 36 21 11 5 3
      >FIT 89 58 36 21 12 6 6
      >
      >As you can see, the fit is quite good for 2 to 6 states.

      Almost too good to be true. :-) But I'll return to this point below.

      >Jimmy Adair is right to point out that the curves that are generated will
      >depend to a very large extent on the sampling technique. If there is an
      >underlying law which obeys this equation, then using a different edition,
      >or simply widening the scope from Hebrews and Romans to the whole Pauline
      >corpus in the UBS edition, will change the constants of the equation but
      >not its shape. In other words, changing the sample size or sampling
      >technique will generate new members of one family of equations.
      >
      >Bob Waltz is right to say that the definition of a variation unit will
      >also affect the results. This is a sticky problem. (Perhaps someone will
      >one day come up with an indisputable way of defining the density of
      >variation at consecutive places in the text). Nevertheless, no matter how
      >the UBS Committee arrived at the given arrangements of variation units and
      >their readings, it still seems strange to me that they should appear to
      >fit such an equation. Hence my request for a statistician to enlighten us
      >concerning possible causes.

      As an experiment, I took a bunch of data which I had on hand --
      the readings of all uncials and papyri, plus the minuscules 330 1739,
      in Colossians 1. This proved to be a bit more complicated than
      it sounds, because of nonsense readings and scribal errors. I did
      my best to treat these realistically, and came up with the following
      numbers (out of 71 variants):

      NOS 1 2 3 4 5 6 7+
      FRQ - 49 17 3 2 0 0

      Let's rewrite the above formula as my calculator understands it:

      2
      -(.23n + 1.5)
      --------------
      2
      FIT = 399 e

      This gives a total of 58+36+21+12+6+6 = 139 readings.

      Normalizing to percents gives us

      NOS 1 2 3 4 5 6 7+
      FIT - 42 26 15 9 4 4

      Over 71 readings, this gives us
      NOS 2 3 4 5+
      Expected 30 18 11 12
      Actual 49 17 3 2

      So the fit doesn't work -- although I agree that the data does look
      exponential. (I'm too lazy to fit my own data. :-)

      The problem is, we are dealing with *three* variables:

      1. The definition of a variant
      2. The method of selecting variants
      3. The number and nature of the manuscripts in the sample set.

      Given that (1) had a vague definition, (2) has as yet no definition
      at all, and (3) is something that needs to be explored, perhaps we
      shouldn't expect much at this point.

      Also keep in mind that we are dealing with very few data points here --
      in Tim's set, only six (data for 2, 3, 4, 5, 6, and 7+-order variants);
      in mine, an even smaller 4-point data set (or, arguably, 5; we could
      throw in the results for 6+).

      I suspect Tim is right, and there is an exponential fall-off. But
      with only six data points, and a monotonically decreasing function,
      we could get a good fit for an exponential even if the actual function
      were of some other form.

      Now note: I think this is a very important subject to pursue. The
      mean number of significant variant readings at each point of
      variation has an immense impact on the statistics we can use to
      compare manuscripts. I just think we need a greater degree of
      rigour here (sorry, Tim. :-)

      >On the significance of the predicted 89 units with 1 state, I take this to
      >mean that if 228 sections of the UBS text (89 + 58 + 36 + 21 + 12 + 6 + 6)
      >with a certain standard size were examined, on average 89 (39%) would
      >display no variation, 58 (25%) would have two possible readings, 36 (16%)
      >would have three, and so on. Romans and Hebrews together have about 12,036
      >words, resulting in this standard size being about 53 words.

      I think this last is a statement that needs to be clarified. Your
      actual claim is that 39% of your 53 word samples would show *no variant
      of interest to the UBS committee*. (The fact is, of course, that there
      are variants in just about every word of the NT). But this, in turn,
      gives us some problems. There are instances in the UBS text of as
      many as 3 variants in a single verse. (UBS3 had four variants in
      Hebrews 13:21; one of them was dropped in UBS4). Taking Hebrews 13:21
      as an example, the verse is 30 words long. The variants show
      3, 2, and 2 readings. Would this be considered a single point
      of variation with 12 readings (3x2x2) or something else?

      -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

      Robert B. Waltz
      waltzmn@...

      Want more loudmouthed opinions about textual criticism?
      Try my web page: http://www.skypoint.com/~waltzmn
      (A site inspired by the Encyclopedia of NT Textual Criticism)
    • Show all 9 messages in this topic