Loading ...
Sorry, an error occurred while loading the content.
 

Re: [Synoptic-L] Some questions to David

Expand Messages
  • Emmanuel Fritsch
    Some questions to David. 1] Why a substraction and not a division ? ... Why do you use the substraction, and not the division, for that operation ? Your
    Message 1 of 5 , Dec 4, 2001
      Some questions to David.

      1] Why a substraction and not a division ?

      > So now we can compute the frequencies by category, relative
      > to what we would expect.
      > For "the" in category "222" we get .02 - .025 = -.005.
      > "the" occurs less frequently in "222" that it does in all categories.
      > For "bottom" in category "222" we get .005 - .004 = .001

      Why do you use the substraction, and not the division, for that operation ?
      Your justificatoin, with the word "relative", induce logically a division.
      You said elsewhere that you need to balance the high frequency of "the"
      but you can do it only by a division, for instance :

      For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
      For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25

      This operation would give a better relative representation. Do you think not ?
      For that time being, I do not understand the purpose of your substraction.


      2] Imagine that hypothetical simple schema :

      A => Mt
      A + Mt => Lk
      A + Mt + Lk => Mk

      In 222, you may find :
      - words of A that have been kept by Mt, by Lk and Mk.
      - words of Mt that have been kept by Mk and Lk

      In 122 you may find :
      - words of A that have been changed by Mt, but kept by Lk, and Mk.
      - words of Lk, kept by Mk.

      Only for that simple schema, your basic vectors (122, 222, etc.)
      may be constituted by composite redactional features. Particularly,
      it may produce a combined correlation effect with different material
      of gospels that separatly would have produced an anti-correlative
      effect.

      With other words, your data are not a complete overview of styles of
      gospels, since it does not take into account location of occurences.
      (exactly as I can not describe the demography of USA just by giving an
      histogram of population according states). Your data are a snap shot
      of gospel styles, where a single appearing syle profile may in fact
      hide three of four very different redaction flows.


      3] What is the method you choose for clustering ?
      I guess K-mean, but may you confirm it ?

      Whatever the case, since it is possible to produce a correlative
      effect with a combination of anti-correlated pairs of vectors,
      your clustering is more negative than positive : you may not
      warrant that a cluster is a single style element, even if the
      Particularly, your cluster A can not be considered as a natural
      groups of close elements.

      4] On the opposite, the anti-correlative phenomenon looks as
      an unexpected effect. May you assess how significant it is ?
      And is it possible to imagine a single writer producing a
      pattern of anti-correlative elements ?

      a+
      manu

      Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
      List Owner: Synoptic-L-Owner@...
    • David Gentile
      Hello Emmanuel, Comments in-line. ... not ? ... We ve all ready done one division to get to frequencies. I m not sure what impact your idea would have.
      Message 2 of 5 , Dec 4, 2001
        Hello Emmanuel,

        Comments in-line.

        >
        > Some questions to David.
        >
        > 1] Why a subtraction and not a division ?
        >
        > > So now we can compute the frequencies by category, relative
        > > to what we would expect.
        > > For "the" in category "222" we get .02 - .025 = -.005.
        > > "the" occurs less frequently in "222" that it does in all categories.
        > > For "bottom" in category "222" we get .005 - .004 = .001
        >
        > Why do you use the subtraction, and not the division, for that operation ?
        > Your justification, with the word "relative", induce logically a division.
        > You said elsewhere that you need to balance the high frequency of "the"
        > but you can do it only by a division, for instance :
        >
        > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
        > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
        >
        > This operation would give a better relative representation. Do you think
        not ?
        > For that time being, I do not understand the purpose of your subtraction.
        >

        We've all ready done one division to get to frequencies. I'm not sure what
        impact your idea would have. However, the best way (as I did with the SAS
        tool) is to compute the individual frequencies, then treat them as
        variables, and treat the overall frequency as a partial variable, and do
        what is called a "partial-correlation". It takes into account the standard
        deviations, as well as doing conceptually, about the same thing I did with
        the subtraction.

        >
        > 2] Imagine that hypothetical simple schema :
        >
        > A => Mt
        > A + Mt => Lk
        > A + Mt + Lk => Mk
        >
        > In 222, you may find :
        > - words of A that have been kept by Mt, by Lk and Mk.
        > - words of Mt that have been kept by Mk and Lk
        >
        > In 122 you may find :
        > - words of A that have been changed by Mt, but kept by Lk, and Mk.
        > - words of Lk, kept by Mk.
        >
        > Only for that simple schema, your basic vectors (122, 222, etc.)
        > may be constituted by composite redactional features. Particularly,
        > it may produce a combined correlation effect with different material
        > of gospels that separately would have produced an anti-correlative
        > effect.

        If I understand you correctly, I agree. I've discovered many of the results
        are very difficult to untangle. (See the note I just posted.)

        >
        > With other words, your data are not a complete overview of styles of
        > gospels, since it does not take into account location of occurrences.
        > (exactly as I can not describe the demography of USA just by giving an
        > histogram of population according states). Your data are a snap shot
        > of gospel styles, where a single appearing style profile may in fact
        > hide three of four very different redaction flows.

        True, they are all ground up, only the frequency of the words is being
        studied. Although, I'm comparing that to what I know about the structure.

        >
        >
        > 3] What is the method you choose for clustering ?
        > I guess K-mean, but may you confirm it ?

        Average-linkage cluster analysis. It takes each member of each cluster and
        compares in to each member of the other cluster to determine distance.

        >
        > Whatever the case, since it is possible to produce a correlative
        > effect with a combination of anti-correlated pairs of vectors,
        > your clustering is more negative than positive : you may not
        > warrant that a cluster is a single style element, even if the
        > Particularly, your cluster A can not be considered as a natural
        > groups of close elements.

        I'm not sure I got the first part. I agree cluster A is a bit surprising.
        I think I now understand the 202-222 result as both being the result of
        being kept by both Matthew and Luke. The 200-222 connection is not at a high
        confidence level, and from a scatter plot, its hardly there at all. I think
        the 202-200 connection may be the one real significant result of all this,
        however.

        >
        > 4] On the opposite, the anti-correlative phenomenon looks as
        > an unexpected effect. May you assess how significant it is ?
        > And is it possible to imagine a single writer producing a
        > pattern of anti-correlative elements ?

        See my previous note about this.
        An example of what seems to happen:
        22x and 12x have a negative correlation.
        If Matthew dislikes a word, he will end up lowering the 22x frequency and
        raising the 12x, tending to make the anti-correlate.

        Thanks for the input,

        Dave Gentile
        Riverside, Illinois
        M.S. Physics
        PhD Management Science candidate




        Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
        List Owner: Synoptic-L-Owner@...
      • David Gentile
        Hello again Emmanuel, I thought about your first suggestion here, a little more. ... ? ... division. ... subtraction. ... The current method gives more weight
        Message 3 of 5 , Dec 4, 2001
          Hello again Emmanuel,

          I thought about your first suggestion here, a little more.

          > > 1] Why a subtraction and not a division ?
          > >
          > > > So now we can compute the frequencies by category, relative
          > > > to what we would expect.
          > > > For "the" in category "222" we get .02 - .025 = -.005.
          > > > "the" occurs less frequently in "222" that it does in all categories.
          > > > For "bottom" in category "222" we get .005 - .004 = .001
          > >
          > > Why do you use the subtraction, and not the division, for that operation
          ?
          > > Your justification, with the word "relative", induce logically a
          division.
          > > You said elsewhere that you need to balance the high frequency of "the"
          > > but you can do it only by a division, for instance :
          > >
          > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
          > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
          > >
          > > This operation would give a better relative representation. Do you think
          > not ?
          > > For that time being, I do not understand the purpose of your
          subtraction.
          > >


          The current method gives more weight to common words. Your system would give
          each word an equal weight. I think that might lead to more noise, since low
          frequency words might not be very well distributed.

          Thanks again,

          Dave Gentile
          Riverside, Illinois
          M.S. Physics
          PhD Management Science candidate




          Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
          List Owner: Synoptic-L-Owner@...
        • Emmanuel Fritsch
          ... OK. But in that case, what is the purpose of this substraction ? I still not understand it. a+ manu Synoptic-L Homepage:
          Message 4 of 5 , Dec 5, 2001
            > > > Why do you use the subtraction, and not the division [...] ?
            > > >
            > > > Your justification, with the word "relative", induce logically a
            > > > division. You said elsewhere that you need to balance the high
            > > > frequency of "the" but you can do it only by a division, for
            > > > instance :
            > > >
            > > > For "the" in category "222" we get (.02 - .025) / 0.025 = -.2
            > > > For "bottom" in category "222" we get (.005 - .004) / 0.004 = .25
            > > >
            > > > This operation would give a better relative representation.
            > > > Do you think not ?
            > > > For that time being, I do not understand the purpose of your
            > > > subtraction.
            >
            > The current method gives more weight to common words. Your system would give
            > each word an equal weight. I think that might lead to more noise, since low
            > frequency words might not be very well distributed.

            OK. But in that case, what is the purpose of this substraction ?
            I still not understand it.

            a+
            manu

            Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
            List Owner: Synoptic-L-Owner@...
          • David Gentile
            Hello Emmanuel, ... Without the subtraction we d be asking: Do these documents have similar word frequencies? They do, because they are both samples of Greek
            Message 5 of 5 , Dec 5, 2001
              Hello Emmanuel,


              >
              > OK. But in that case, what is the purpose of this substraction ?
              > I still not understand it.
              >
              > a+
              > manu

              Without the subtraction we'd be asking:
              "Do these documents have similar word frequencies?"
              They do, because they are both samples of Greek language.

              With the subtraction, we are asking:
              "Do these documents depart from the average Greek language frequency, in a
              similar way?"

              Does that help any?

              Dave Gentile
              Riverside, Illinois
              M.S. Physics
              PhD Management Science


              Synoptic-L Homepage: http://www.bham.ac.uk/theology/synoptic-l
              List Owner: Synoptic-L-Owner@...
            Your message has been successfully submitted and would be delivered to recipients shortly.