Loading ...
Sorry, an error occurred while loading the content.

Re: [Synoptic-L] Some questions to David

Expand Messages
  • David Gentile
    Hello Emmanuel, Comments in-line. ... not ? ... We ve all ready done one division to get to frequencies. I m not sure what impact your idea would have.
    Message 1 of 5 , Dec 4, 2001
    • 0 Attachment
      Hello Emmanuel,

      Comments in-line.

      >
      > Some questions to David.
      >
      > 1] Why a subtraction and not a division ?
      >
      > > So now we can compute the frequencies by category, relative
      > > to what we would expect.
      > > For "the" in category "222" we get .02 - .025 = -.005.
      > > "the" occurs less frequently in "222" that it does in all categories.
      > > For "bottom" in category "222" we get .005 - .004 = .001
      >
      > Why do you use the subtraction, and not the division, for that operation ?
      > Your justification, with the word "relative", induce logically a division.
      > You said elsewhere that you need to balance the high frequency of "the"
      > but you can do it only by a division, for instance :
      >
      > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
      > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
      >
      > This operation would give a better relative representation. Do you think
      not ?
      > For that time being, I do not understand the purpose of your subtraction.
      >

      We've all ready done one division to get to frequencies. I'm not sure what
      impact your idea would have. However, the best way (as I did with the SAS
      tool) is to compute the individual frequencies, then treat them as
      variables, and treat the overall frequency as a partial variable, and do
      what is called a "partial-correlation". It takes into account the standard
      deviations, as well as doing conceptually, about the same thing I did with
      the subtraction.

      >
      > 2] Imagine that hypothetical simple schema :
      >
      > A => Mt
      > A + Mt => Lk
      > A + Mt + Lk => Mk
      >
      > In 222, you may find :
      > - words of A that have been kept by Mt, by Lk and Mk.
      > - words of Mt that have been kept by Mk and Lk
      >
      > In 122 you may find :
      > - words of A that have been changed by Mt, but kept by Lk, and Mk.
      > - words of Lk, kept by Mk.
      >
      > Only for that simple schema, your basic vectors (122, 222, etc.)
      > may be constituted by composite redactional features. Particularly,
      > it may produce a combined correlation effect with different material
      > of gospels that separately would have produced an anti-correlative
      > effect.

      If I understand you correctly, I agree. I've discovered many of the results
      are very difficult to untangle. (See the note I just posted.)

      >
      > With other words, your data are not a complete overview of styles of
      > gospels, since it does not take into account location of occurrences.
      > (exactly as I can not describe the demography of USA just by giving an
      > histogram of population according states). Your data are a snap shot
      > of gospel styles, where a single appearing style profile may in fact
      > hide three of four very different redaction flows.

      True, they are all ground up, only the frequency of the words is being
      studied. Although, I'm comparing that to what I know about the structure.

      >
      >
      > 3] What is the method you choose for clustering ?
      > I guess K-mean, but may you confirm it ?

      Average-linkage cluster analysis. It takes each member of each cluster and
      compares in to each member of the other cluster to determine distance.

      >
      > Whatever the case, since it is possible to produce a correlative
      > effect with a combination of anti-correlated pairs of vectors,
      > your clustering is more negative than positive : you may not
      > warrant that a cluster is a single style element, even if the
      > Particularly, your cluster A can not be considered as a natural
      > groups of close elements.

      I'm not sure I got the first part. I agree cluster A is a bit surprising.
      I think I now understand the 202-222 result as both being the result of
      being kept by both Matthew and Luke. The 200-222 connection is not at a high
      confidence level, and from a scatter plot, its hardly there at all. I think
      the 202-200 connection may be the one real significant result of all this,
      however.

      >
      > 4] On the opposite, the anti-correlative phenomenon looks as
      > an unexpected effect. May you assess how significant it is ?
      > And is it possible to imagine a single writer producing a
      > pattern of anti-correlative elements ?

      See my previous note about this.
      An example of what seems to happen:
      22x and 12x have a negative correlation.
      If Matthew dislikes a word, he will end up lowering the 22x frequency and
      raising the 12x, tending to make the anti-correlate.

      Thanks for the input,

      Dave Gentile
      Riverside, Illinois
      M.S. Physics
      PhD Management Science candidate




      Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
      List Owner: Synoptic-L-Owner@...
    • David Gentile
      Hello again Emmanuel, I thought about your first suggestion here, a little more. ... ? ... division. ... subtraction. ... The current method gives more weight
      Message 2 of 5 , Dec 4, 2001
      • 0 Attachment
        Hello again Emmanuel,

        I thought about your first suggestion here, a little more.

        > > 1] Why a subtraction and not a division ?
        > >
        > > > So now we can compute the frequencies by category, relative
        > > > to what we would expect.
        > > > For "the" in category "222" we get .02 - .025 = -.005.
        > > > "the" occurs less frequently in "222" that it does in all categories.
        > > > For "bottom" in category "222" we get .005 - .004 = .001
        > >
        > > Why do you use the subtraction, and not the division, for that operation
        ?
        > > Your justification, with the word "relative", induce logically a
        division.
        > > You said elsewhere that you need to balance the high frequency of "the"
        > > but you can do it only by a division, for instance :
        > >
        > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
        > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
        > >
        > > This operation would give a better relative representation. Do you think
        > not ?
        > > For that time being, I do not understand the purpose of your
        subtraction.
        > >


        The current method gives more weight to common words. Your system would give
        each word an equal weight. I think that might lead to more noise, since low
        frequency words might not be very well distributed.

        Thanks again,

        Dave Gentile
        Riverside, Illinois
        M.S. Physics
        PhD Management Science candidate




        Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
        List Owner: Synoptic-L-Owner@...
      • Emmanuel Fritsch
        ... OK. But in that case, what is the purpose of this substraction ? I still not understand it. a+ manu Synoptic-L Homepage:
        Message 3 of 5 , Dec 5, 2001
        • 0 Attachment
          > > > Why do you use the subtraction, and not the division [...] ?
          > > >
          > > > Your justification, with the word "relative", induce logically a
          > > > division. You said elsewhere that you need to balance the high
          > > > frequency of "the" but you can do it only by a division, for
          > > > instance :
          > > >
          > > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
          > > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
          > > >
          > > > This operation would give a better relative representation.
          > > > Do you think not ?
          > > > For that time being, I do not understand the purpose of your
          > > > subtraction.
          >
          > The current method gives more weight to common words. Your system would give
          > each word an equal weight. I think that might lead to more noise, since low
          > frequency words might not be very well distributed.

          OK. But in that case, what is the purpose of this substraction ?
          I still not understand it.

          a+
          manu

          Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
          List Owner: Synoptic-L-Owner@...
        • David Gentile
          Hello Emmanuel, ... Without the subtraction we d be asking: Do these documents have similar word frequencies? They do, because they are both samples of Greek
          Message 4 of 5 , Dec 5, 2001
          • 0 Attachment
            Hello Emmanuel,


            >
            > OK. But in that case, what is the purpose of this substraction ?
            > I still not understand it.
            >
            > a+
            > manu

            Without the subtraction we'd be asking:
            "Do these documents have similar word frequencies?"
            They do, because they are both samples of Greek language.

            With the subtraction, we are asking:
            "Do these documents depart from the average Greek language frequency, in a
            similar way?"

            Does that help any?

            Dave Gentile
            Riverside, Illinois
            M.S. Physics
            PhD Management Science


            Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
            List Owner: Synoptic-L-Owner@...
          Your message has been successfully submitted and would be delivered to recipients shortly.