Loading ...
Sorry, an error occurred while loading the content.

Idea for deciding vocabulary

Expand Messages
  • cmore123579
    Hi, I m new here, but thought I d run an idea I had past everybody and see what the reaction is. My idea is for a semi-automated process for choosing new
    Message 1 of 7 , Sep 1, 2008
    • 0 Attachment
      Hi, I'm new here, but thought I'd run an idea I had past everybody
      and see what the reaction is.

      My idea is for a semi-automated process for choosing new Folkspraak
      (or any other constructed language) words:

      First step: for a particular word, say walk/laufen/loop, we make a
      list of the versions of the word in all the widely spoken Germanic
      languages.

      Second step: we then run these words through an algorithm which
      finds a new word that minimizes the sum of the Lehvenstein distances
      between this new word and all the existing words.

      Third step: Take these computer generated words and review them for
      sensibility. If it comes up with a silly word, just discard it and
      try to come up with a new word yourself.

      This method could be run on 1000's of words and could potentially
      speed up the process of establishing a new vocabulary.

      Please let me know what you think. If there is interest I will
      write a programme to do the "finding" of new words and people can
      try it out.

      P.s. When will it become compulsory for posts on this group to be in
      Folkspraak? :-)
    • Nightvid F. Cole
      No, we should minimize the sum of the SQUARES of the Levenshtein distances (for statistical reasons.) And not to be negative or anything, but I am pretty sure
      Message 2 of 7 , Sep 1, 2008
      • 0 Attachment
        No, we should minimize the sum of the SQUARES of the Levenshtein distances (for statistical reasons.) And not to be negative or anything, but I am pretty sure that non-cognates will pose quite a problem unless we find a specific way of dealing with that issue. Ideally, we could try to count cognates when possible but put in a 'penalty' for semantic drift (in your example, the correct English word is 'leap'.)
        Nightvid

        ----- Original Message -----
        From: cmore123579 <charlesmore@...>
        Date: Monday, September 1, 2008 8:05 am
        Subject: [folkspraak] Idea for deciding vocabulary
        To: folkspraak@yahoogroups.com

        > Hi, I'm new here, but thought I'd run an idea I had past everybody
        > and see what the reaction is.
        >
        > My idea is for a semi-automated process for choosing new
        > Folkspraak
        > (or any other constructed language) words:
        >
        > First step: for a particular word, say walk/laufen/loop, we make a
        > list of the versions of the word in all the widely spoken Germanic
        > languages.
        >
        > Second step: we then run these words through an algorithm which
        > finds a new word that minimizes the sum of the Lehvenstein
        > distances
        > between this new word and all the existing words.
        >
        > Third step: Take these computer generated words and review them
        > for
        > sensibility. If it comes up with a silly word, just discard it
        > and
        > try to come up with a new word yourself.
        >
        > This method could be run on 1000's of words and could potentially
        > speed up the process of establishing a new vocabulary.
        >
        > Please let me know what you think. If there is interest I will
        > write a programme to do the "finding" of new words and people can
        > try it out.
        >
        > P.s. When will it become compulsory for posts on this group to be
        > in
        > Folkspraak? :-)
        >
        >


        [Non-text portions of this message have been removed]
      • David Parke
        The problem is who decides and how to decide what words to compare with what. Why choose laufen/lopen/walk? Why not laufen/lopen/stroll? Or
        Message 3 of 7 , Sep 1, 2008
        • 0 Attachment
          The problem is who decides and how to decide what words to compare with what.

          Why choose laufen/lopen/walk? Why not laufen/lopen/stroll? Or gehen/lopen/saunter?

          Also DE laufen and NL lopen don't always have the same meaning -- there are contexts or
          idioms where they mean different things. DE laufen means to run more often than not. NL
          lopen means to walk more often than DE laufen does. Also English has leap (the exact
          cognate of laufen/lopen). It also has lope (a borrowing from Norse that has a meaning
          more aligned with DE laufen/NL lopen.

          It is naive to think that for every word, there is a one to one mapping of words and
          meanings in other languages. Most words in English can be translated to more than one
          word in German. Most German words have more than one single translation into English.
          Even with a very obvious word like DE Haus, it doesn't always translate to EN house.
          Sometime "home" would be a better translation. As would "building".
          It is this reason that I am highly skeptical of the long-term viability of "cross-words" (as
          used in Ingmar's Middelsprake, these are mixes of words of un-related origin).

          You'd need an objetive criterion on which words to compare with what.
          I use the criterion of etymological cognates. You not only need to reconseil the FORM, but
          also the MEANING of the outcome.
          For example if you compare EN leap, EN lope, NL lopen, DE laufen, DA løbe, NO løpe, SW
          löpa, you might come up with a word (by taking the minimized sum of the Lehvenstein
          distances -- whatever that means) with a form of "loepe" or something. But what exactly
          does this word mean -- given that the source words have a variety of meaning, only some
          of them shared?

          --- In folkspraak@yahoogroups.com, "Nightvid F. Cole" <ncole@...> wrote:
          >
          > No, we should minimize the sum of the SQUARES of the Levenshtein distances (for
          statistical reasons.) And not to be negative or anything, but I am pretty sure that non-
          cognates will pose quite a problem unless we find a specific way of dealing with that issue.
          Ideally, we could try to count cognates when possible but put in a 'penalty' for semantic
          drift (in your example, the correct English word is 'leap'.)
          > Nightvid
          >
          > ----- Original Message -----
          > From: cmore123579 <charlesmore@...>
          > Date: Monday, September 1, 2008 8:05 am
          > Subject: [folkspraak] Idea for deciding vocabulary
          > To: folkspraak@yahoogroups.com
          >
          > > Hi, I'm new here, but thought I'd run an idea I had past everybody
          > > and see what the reaction is.
          > >
          > > My idea is for a semi-automated process for choosing new
          > > Folkspraak
          > > (or any other constructed language) words:
          > >
          > > First step: for a particular word, say walk/laufen/loop, we make a
          > > list of the versions of the word in all the widely spoken Germanic
          > > languages.
          > >
          > > Second step: we then run these words through an algorithm which
          > > finds a new word that minimizes the sum of the Lehvenstein
          > > distances
          > > between this new word and all the existing words.
          > >
          > > Third step: Take these computer generated words and review them
          > > for
          > > sensibility. If it comes up with a silly word, just discard it
          > > and
          > > try to come up with a new word yourself.
          > >
          > > This method could be run on 1000's of words and could potentially
          > > speed up the process of establishing a new vocabulary.
          > >
          > > Please let me know what you think. If there is interest I will
          > > write a programme to do the "finding" of new words and people can
          > > try it out.
          > >
          > > P.s. When will it become compulsory for posts on this group to be
          > > in
          > > Folkspraak? :-)
          > >
          > >
          >
          >
          > [Non-text portions of this message have been removed]
          >
        • cmore123579
          I suspected the responses wouldn t be too positive. Just to clarify the point about minimising the sum of the Levenshtein distances: there s no statistical
          Message 4 of 7 , Sep 2, 2008
          • 0 Attachment
            I suspected the responses wouldn't be too positive. Just to clarify
            the point about minimising the sum of the Levenshtein distances:
            there's no statistical reason for summing the squares of the
            distances because the distances themselves are always greater than
            or equal to 0. Minimising the squared error is only necessary where
            values may be less than 0.

            I have thought about the problem of deciding which words to compare
            with which words. I only speak English, Dutch and Afrikaans to a
            level where I can appreciate the subtleties of meaning (I am a
            statistician, not a linguist), but even so I appreciate your point
            about there not being one-to-one translations for words.

            To solve the problem I thought about allowing any number of words
            per langauge in the search for a new word, e.g. in your example we
            would have walk, leap and lope representing English and whichever 3,
            4 or 5 German words you like. The algorithm then considers all
            combinations of words from different languages to find the one that
            results in the least possible Levenshtein distance among all the
            possible combinations.

            Anyway, I am going to try it for the languages I know and if anybody
            feels like contributing some words from languages they speak I'd be
            very happy for the help. I don't suppose the experiment will change
            the world, but it will at the very least be interesting to see what
            comes out a the end of it.




            --- In folkspraak@yahoogroups.com, "David Parke" <parked@...> wrote:
            >
            > The problem is who decides and how to decide what words to compare
            with what.
            >
            > Why choose laufen/lopen/walk? Why not laufen/lopen/stroll? Or
            gehen/lopen/saunter?
            >
            > Also DE laufen and NL lopen don't always have the same meaning --
            there are contexts or
            > idioms where they mean different things. DE laufen means to run
            more often than not. NL
            > lopen means to walk more often than DE laufen does. Also English
            has leap (the exact
            > cognate of laufen/lopen). It also has lope (a borrowing from Norse
            that has a meaning
            > more aligned with DE laufen/NL lopen.
            >
            > It is naive to think that for every word, there is a one to one
            mapping of words and
            > meanings in other languages. Most words in English can be
            translated to more than one
            > word in German. Most German words have more than one single
            translation into English.
            > Even with a very obvious word like DE Haus, it doesn't always
            translate to EN house.
            > Sometime "home" would be a better translation. As would "building".
            > It is this reason that I am highly skeptical of the long-term
            viability of "cross-words" (as
            > used in Ingmar's Middelsprake, these are mixes of words of un-
            related origin).
            >
            > You'd need an objetive criterion on which words to compare with
            what.
            > I use the criterion of etymological cognates. You not only need to
            reconseil the FORM, but
            > also the MEANING of the outcome.
            > For example if you compare EN leap, EN lope, NL lopen, DE laufen,
            DA løbe, NO løpe, SW
            > löpa, you might come up with a word (by taking the minimized sum
            of the Lehvenstein
            > distances -- whatever that means) with a form of "loepe" or
            something. But what exactly
            > does this word mean -- given that the source words have a variety
            of meaning, only some
            > of them shared?
            >
            > --- In folkspraak@yahoogroups.com, "Nightvid F. Cole" <ncole@>
            wrote:
            > >
            > > No, we should minimize the sum of the SQUARES of the Levenshtein
            distances (for
            > statistical reasons.) And not to be negative or anything, but I am
            pretty sure that non-
            > cognates will pose quite a problem unless we find a specific way
            of dealing with that issue.
            > Ideally, we could try to count cognates when possible but put in
            a 'penalty' for semantic
            > drift (in your example, the correct English word is 'leap'.)
            > > Nightvid
            > >
            > > ----- Original Message -----
            > > From: cmore123579 <charlesmore@>
            > > Date: Monday, September 1, 2008 8:05 am
            > > Subject: [folkspraak] Idea for deciding vocabulary
            > > To: folkspraak@yahoogroups.com
            > >
            > > > Hi, I'm new here, but thought I'd run an idea I had past
            everybody
            > > > and see what the reaction is.
            > > >
            > > > My idea is for a semi-automated process for choosing new
            > > > Folkspraak
            > > > (or any other constructed language) words:
            > > >
            > > > First step: for a particular word, say walk/laufen/loop, we
            make a
            > > > list of the versions of the word in all the widely spoken
            Germanic
            > > > languages.
            > > >
            > > > Second step: we then run these words through an algorithm
            which
            > > > finds a new word that minimizes the sum of the Lehvenstein
            > > > distances
            > > > between this new word and all the existing words.
            > > >
            > > > Third step: Take these computer generated words and review
            them
            > > > for
            > > > sensibility. If it comes up with a silly word, just discard
            it
            > > > and
            > > > try to come up with a new word yourself.
            > > >
            > > > This method could be run on 1000's of words and could
            potentially
            > > > speed up the process of establishing a new vocabulary.
            > > >
            > > > Please let me know what you think. If there is interest I
            will
            > > > write a programme to do the "finding" of new words and people
            can
            > > > try it out.
            > > >
            > > > P.s. When will it become compulsory for posts on this group to
            be
            > > > in
            > > > Folkspraak? :-)
            > > >
            > > >
            > >
            > >
            > > [Non-text portions of this message have been removed]
            > >
            >
          • David Parke
            If that s how you want to procede, it d make an interesting experiment, and I d like to see the results -- I m just not sure if they would be usable -- I
            Message 5 of 7 , Sep 2, 2008
            • 0 Attachment
              If that's how you want to procede, it'd make an interesting experiment,
              and I'd like to see the results -- I'm just not sure if they would be
              usable -- I forsee a few hundred practical words and thousands upon
              thousands of un-recognisable mush-words.

              I would suggest using Google Translate as your source of translations.
              For one thing, that could be your arbitrary criterion and therefore
              consistant. For another thing, it allows you to input a single English
              word and get a single word translation into just about all the big
              Germanic languages.
              Example, if you put in English "walk", you'd get
              EN walk
              NL lopen
              DE gehen
              DA gå
              NO gå
              SV gå

              I'd like to hope or assume that some effort has been put into Googles'
              translation engine to create the most likely translation in cases of a
              complete lack of context.
              BTW, just by way of illustration of how dubious I think such an
              enterprise could be, if you try to translate DE gehen back to EN, you
              don't get "walk", you get "go".

              I'd also suggest maybe using an "uninvolved" language as the standard or
              base dictionary. The FS word has to be translated into at least one
              other language for it to have meaning and precedent of how it's meant to
              be used. It might make the analysis less murky if initially that
              language was not one of the source languages but a fixed point outside
              of them. Because English is used as one of the sources, maybe the master
              FS dictionary should be in another language. Let's call this the
              "Master" language. If you tried to create the language from the point of
              view of another Germanic language, say German, you might end up with
              different results. Eg translating DE "gehen" we get
              EN go
              NL gaan
              DE gehen
              DA start -- WTF?? I have my suspicions that Google Translate is doing
              some translations via a 3rd language (English of course)
              NO dra
              SV gå


              Let's pick an outsider language -- Spanish for the sake of arguement.
              So maybe we'd be looking for the FS word for Spanish "andar". We'd get:
              EN walk
              NL lopen
              DE gehen
              DA gå
              NO gå
              SV gå


              Anyway one thing that I fear would happen would be that if you take a
              bunch of synomyns or even similar meaning words in you "Master"
              language", you'd for each synonym and near-synonym get a somewhat
              different mix of words in the Germanic languages. Not a totally
              different mix of words, just somewhat different, but some words the
              same. So for each of those input words from the Master language, you
              arrive at a subtly different FS word. So you might end up with some FS
              words such as *go, *ga, *glo *lo etc, all meaning the essentially the
              same thing.

              cmore123579 wrote:

              >I suspected the responses wouldn't be too positive. Just to clarify
              >the point about minimising the sum of the Levenshtein distances:
              >there's no statistical reason for summing the squares of the
              >distances because the distances themselves are always greater than
              >or equal to 0. Minimising the squared error is only necessary where
              >values may be less than 0.
              >
              >I have thought about the problem of deciding which words to compare
              >with which words. I only speak English, Dutch and Afrikaans to a
              >level where I can appreciate the subtleties of meaning (I am a
              >statistician, not a linguist), but even so I appreciate your point
              >about there not being one-to-one translations for words.
              >
              >To solve the problem I thought about allowing any number of words
              >per langauge in the search for a new word, e.g. in your example we
              >would have walk, leap and lope representing English and whichever 3,
              >4 or 5 German words you like. The algorithm then considers all
              >combinations of words from different languages to find the one that
              >results in the least possible Levenshtein distance among all the
              >possible combinations.
              >
              >Anyway, I am going to try it for the languages I know and if anybody
              >feels like contributing some words from languages they speak I'd be
              >very happy for the help. I don't suppose the experiment will change
              >the world, but it will at the very least be interesting to see what
              >comes out a the end of it.
              >
              >
              >
              >
              >--- In folkspraak@yahoogroups.com, "David Parke" <parked@...> wrote:
              >
              >
              >>The problem is who decides and how to decide what words to compare
              >>
              >>
              >with what.
              >
              >
              >>Why choose laufen/lopen/walk? Why not laufen/lopen/stroll? Or
              >>
              >>
              >gehen/lopen/saunter?
              >
              >
              >>Also DE laufen and NL lopen don't always have the same meaning --
              >>
              >>
              >there are contexts or
              >
              >
              >>idioms where they mean different things. DE laufen means to run
              >>
              >>
              >more often than not. NL
              >
              >
              >>lopen means to walk more often than DE laufen does. Also English
              >>
              >>
              >has leap (the exact
              >
              >
              >>cognate of laufen/lopen). It also has lope (a borrowing from Norse
              >>
              >>
              >that has a meaning
              >
              >
              >>more aligned with DE laufen/NL lopen.
              >>
              >>It is naive to think that for every word, there is a one to one
              >>
              >>
              >mapping of words and
              >
              >
              >>meanings in other languages. Most words in English can be
              >>
              >>
              >translated to more than one
              >
              >
              >>word in German. Most German words have more than one single
              >>
              >>
              >translation into English.
              >
              >
              >>Even with a very obvious word like DE Haus, it doesn't always
              >>
              >>
              >translate to EN house.
              >
              >
              >>Sometime "home" would be a better translation. As would "building".
              >> It is this reason that I am highly skeptical of the long-term
              >>
              >>
              >viability of "cross-words" (as
              >
              >
              >>used in Ingmar's Middelsprake, these are mixes of words of un-
              >>
              >>
              >related origin).
              >
              >
              >>You'd need an objetive criterion on which words to compare with
              >>
              >>
              >what.
              >
              >
              >>I use the criterion of etymological cognates. You not only need to
              >>
              >>
              >reconseil the FORM, but
              >
              >
              >>also the MEANING of the outcome.
              >>For example if you compare EN leap, EN lope, NL lopen, DE laufen,
              >>
              >>
              >DA løbe, NO løpe, SW
              >
              >
              >>löpa, you might come up with a word (by taking the minimized sum
              >>
              >>
              >of the Lehvenstein
              >
              >
              >>distances -- whatever that means) with a form of "loepe" or
              >>
              >>
              >something. But what exactly
              >
              >
              >>does this word mean -- given that the source words have a variety
              >>
              >>
              >of meaning, only some
              >
              >
              >>of them shared?
              >>
              >>--- In folkspraak@yahoogroups.com, "Nightvid F. Cole" <ncole@>
              >>
              >>
              >wrote:
              >
              >
              >>>No, we should minimize the sum of the SQUARES of the Levenshtein
              >>>
              >>>
              >distances (for
              >
              >
              >>statistical reasons.) And not to be negative or anything, but I am
              >>
              >>
              >pretty sure that non-
              >
              >
              >>cognates will pose quite a problem unless we find a specific way
              >>
              >>
              >of dealing with that issue.
              >
              >
              >>Ideally, we could try to count cognates when possible but put in
              >>
              >>
              >a 'penalty' for semantic
              >
              >
              >>drift (in your example, the correct English word is 'leap'.)
              >>
              >>
              >>>Nightvid
              >>>
              >>>----- Original Message -----
              >>>From: cmore123579 <charlesmore@>
              >>>Date: Monday, September 1, 2008 8:05 am
              >>>Subject: [folkspraak] Idea for deciding vocabulary
              >>>To: folkspraak@yahoogroups.com
              >>>
              >>>
              >>>
              >>>>Hi, I'm new here, but thought I'd run an idea I had past
              >>>>
              >>>>
              >everybody
              >
              >
              >>>>and see what the reaction is.
              >>>>
              >>>>My idea is for a semi-automated process for choosing new
              >>>>Folkspraak
              >>>>(or any other constructed language) words:
              >>>>
              >>>>First step: for a particular word, say walk/laufen/loop, we
              >>>>
              >>>>
              >make a
              >
              >
              >>>>list of the versions of the word in all the widely spoken
              >>>>
              >>>>
              >Germanic
              >
              >
              >>>>languages.
              >>>>
              >>>>Second step: we then run these words through an algorithm
              >>>>
              >>>>
              >which
              >
              >
              >>>>finds a new word that minimizes the sum of the Lehvenstein
              >>>>distances
              >>>>between this new word and all the existing words.
              >>>>
              >>>>Third step: Take these computer generated words and review
              >>>>
              >>>>
              >them
              >
              >
              >>>>for
              >>>>sensibility. If it comes up with a silly word, just discard
              >>>>
              >>>>
              >it
              >
              >
              >>>>and
              >>>>try to come up with a new word yourself.
              >>>>
              >>>>This method could be run on 1000's of words and could
              >>>>
              >>>>
              >potentially
              >
              >
              >>>>speed up the process of establishing a new vocabulary.
              >>>>
              >>>>Please let me know what you think. If there is interest I
              >>>>
              >>>>
              >will
              >
              >
              >>>>write a programme to do the "finding" of new words and people
              >>>>
              >>>>
              >can
              >
              >
              >>>>try it out.
              >>>>
              >>>>P.s. When will it become compulsory for posts on this group to
              >>>>
              >>>>
              >be
              >
              >
              >>>>in
              >>>>Folkspraak? :-)
              >>>>
              >>>>
              >>>>
              >>>>
              >>>[Non-text portions of this message have been removed]
              >>>
              >>>
              >>>
              >
              >
              >
              >
              >
              >------------------------------------------------------------------------
              >
              >
              >No virus found in this incoming message.
              >Checked by AVG - http://www.avg.com
              >Version: 8.0.169 / Virus Database: 270.6.14/1643 - Release Date: 30/08/2008 5:18 p.m.
              >
              >
              >
            • cmore123579
              I like your idea of using an outside language like Spanish in the google translations (subject of course to the obvious flaws in google s translation as you
              Message 6 of 7 , Sep 2, 2008
              • 0 Attachment
                I like your idea of using an "outside" language like Spanish in the
                google translations (subject of course to the obvious flaws in
                google's translation as you describe below).

                I will try it on a couple of words and see what I come up with.

                Thanks for the advice!

                --- In folkspraak@yahoogroups.com, David Parke <parked@...> wrote:
                >
                > If that's how you want to procede, it'd make an interesting
                experiment,
                > and I'd like to see the results -- I'm just not sure if they would
                be
                > usable -- I forsee a few hundred practical words and thousands
                upon
                > thousands of un-recognisable mush-words.
                >
                > I would suggest using Google Translate as your source of
                translations.
                > For one thing, that could be your arbitrary criterion and
                therefore
                > consistant. For another thing, it allows you to input a single
                English
                > word and get a single word translation into just about all the big
                > Germanic languages.
                > Example, if you put in English "walk", you'd get
                > EN walk
                > NL lopen
                > DE gehen
                > DA gå
                > NO gå
                > SV gå
                >
                > I'd like to hope or assume that some effort has been put into
                Googles'
                > translation engine to create the most likely translation in cases
                of a
                > complete lack of context.
                > BTW, just by way of illustration of how dubious I think such an
                > enterprise could be, if you try to translate DE gehen back to EN,
                you
                > don't get "walk", you get "go".
                >
                > I'd also suggest maybe using an "uninvolved" language as the
                standard or
                > base dictionary. The FS word has to be translated into at least
                one
                > other language for it to have meaning and precedent of how it's
                meant to
                > be used. It might make the analysis less murky if initially that
                > language was not one of the source languages but a fixed point
                outside
                > of them. Because English is used as one of the sources, maybe the
                master
                > FS dictionary should be in another language. Let's call this the
                > "Master" language. If you tried to create the language from the
                point of
                > view of another Germanic language, say German, you might end up
                with
                > different results. Eg translating DE "gehen" we get
                > EN go
                > NL gaan
                > DE gehen
                > DA start -- WTF?? I have my suspicions that Google Translate is
                doing
                > some translations via a 3rd language (English of course)
                > NO dra
                > SV gå
                >
                >
                > Let's pick an outsider language -- Spanish for the sake of
                arguement.
                > So maybe we'd be looking for the FS word for Spanish "andar". We'd
                get:
                > EN walk
                > NL lopen
                > DE gehen
                > DA gå
                > NO gå
                > SV gå
                >
                >
                > Anyway one thing that I fear would happen would be that if you
                take a
                > bunch of synomyns or even similar meaning words in you "Master"
                > language", you'd for each synonym and near-synonym get a somewhat
                > different mix of words in the Germanic languages. Not a totally
                > different mix of words, just somewhat different, but some words
                the
                > same. So for each of those input words from the Master language,
                you
                > arrive at a subtly different FS word. So you might end up with
                some FS
                > words such as *go, *ga, *glo *lo etc, all meaning the essentially
                the
                > same thing.
                >
                > cmore123579 wrote:
                >
                > >I suspected the responses wouldn't be too positive. Just to
                clarify
                > >the point about minimising the sum of the Levenshtein distances:
                > >there's no statistical reason for summing the squares of the
                > >distances because the distances themselves are always greater
                than
                > >or equal to 0. Minimising the squared error is only necessary
                where
                > >values may be less than 0.
                > >
                > >I have thought about the problem of deciding which words to
                compare
                > >with which words. I only speak English, Dutch and Afrikaans to a
                > >level where I can appreciate the subtleties of meaning (I am a
                > >statistician, not a linguist), but even so I appreciate your
                point
                > >about there not being one-to-one translations for words.
                > >
                > >To solve the problem I thought about allowing any number of words
                > >per langauge in the search for a new word, e.g. in your example
                we
                > >would have walk, leap and lope representing English and whichever
                3,
                > >4 or 5 German words you like. The algorithm then considers all
                > >combinations of words from different languages to find the one
                that
                > >results in the least possible Levenshtein distance among all the
                > >possible combinations.
                > >
                > >Anyway, I am going to try it for the languages I know and if
                anybody
                > >feels like contributing some words from languages they speak I'd
                be
                > >very happy for the help. I don't suppose the experiment will
                change
                > >the world, but it will at the very least be interesting to see
                what
                > >comes out a the end of it.
                > >
                > >
                > >
                > >
                > >--- In folkspraak@yahoogroups.com, "David Parke" <parked@> wrote:
                > >
                > >
                > >>The problem is who decides and how to decide what words to
                compare
                > >>
                > >>
                > >with what.
                > >
                > >
                > >>Why choose laufen/lopen/walk? Why not laufen/lopen/stroll? Or
                > >>
                > >>
                > >gehen/lopen/saunter?
                > >
                > >
                > >>Also DE laufen and NL lopen don't always have the same meaning --

                > >>
                > >>
                > >there are contexts or
                > >
                > >
                > >>idioms where they mean different things. DE laufen means to run
                > >>
                > >>
                > >more often than not. NL
                > >
                > >
                > >>lopen means to walk more often than DE laufen does. Also English
                > >>
                > >>
                > >has leap (the exact
                > >
                > >
                > >>cognate of laufen/lopen). It also has lope (a borrowing from
                Norse
                > >>
                > >>
                > >that has a meaning
                > >
                > >
                > >>more aligned with DE laufen/NL lopen.
                > >>
                > >>It is naive to think that for every word, there is a one to one
                > >>
                > >>
                > >mapping of words and
                > >
                > >
                > >>meanings in other languages. Most words in English can be
                > >>
                > >>
                > >translated to more than one
                > >
                > >
                > >>word in German. Most German words have more than one single
                > >>
                > >>
                > >translation into English.
                > >
                > >
                > >>Even with a very obvious word like DE Haus, it doesn't always
                > >>
                > >>
                > >translate to EN house.
                > >
                > >
                > >>Sometime "home" would be a better translation. As
                would "building".
                > >> It is this reason that I am highly skeptical of the long-term
                > >>
                > >>
                > >viability of "cross-words" (as
                > >
                > >
                > >>used in Ingmar's Middelsprake, these are mixes of words of un-
                > >>
                > >>
                > >related origin).
                > >
                > >
                > >>You'd need an objetive criterion on which words to compare with
                > >>
                > >>
                > >what.
                > >
                > >
                > >>I use the criterion of etymological cognates. You not only need
                to
                > >>
                > >>
                > >reconseil the FORM, but
                > >
                > >
                > >>also the MEANING of the outcome.
                > >>For example if you compare EN leap, EN lope, NL lopen, DE
                laufen,
                > >>
                > >>
                > >DA løbe, NO løpe, SW
                > >
                > >
                > >>löpa, you might come up with a word (by taking the minimized sum
                > >>
                > >>
                > >of the Lehvenstein
                > >
                > >
                > >>distances -- whatever that means) with a form of "loepe" or
                > >>
                > >>
                > >something. But what exactly
                > >
                > >
                > >>does this word mean -- given that the source words have a
                variety
                > >>
                > >>
                > >of meaning, only some
                > >
                > >
                > >>of them shared?
                > >>
                > >>--- In folkspraak@yahoogroups.com, "Nightvid F. Cole" <ncole@>
                > >>
                > >>
                > >wrote:
                > >
                > >
                > >>>No, we should minimize the sum of the SQUARES of the
                Levenshtein
                > >>>
                > >>>
                > >distances (for
                > >
                > >
                > >>statistical reasons.) And not to be negative or anything, but I
                am
                > >>
                > >>
                > >pretty sure that non-
                > >
                > >
                > >>cognates will pose quite a problem unless we find a specific way
                > >>
                > >>
                > >of dealing with that issue.
                > >
                > >
                > >>Ideally, we could try to count cognates when possible but put in
                > >>
                > >>
                > >a 'penalty' for semantic
                > >
                > >
                > >>drift (in your example, the correct English word is 'leap'.)
                > >>
                > >>
                > >>>Nightvid
                > >>>
                > >>>----- Original Message -----
                > >>>From: cmore123579 <charlesmore@>
                > >>>Date: Monday, September 1, 2008 8:05 am
                > >>>Subject: [folkspraak] Idea for deciding vocabulary
                > >>>To: folkspraak@yahoogroups.com
                > >>>
                > >>>
                > >>>
                > >>>>Hi, I'm new here, but thought I'd run an idea I had past
                > >>>>
                > >>>>
                > >everybody
                > >
                > >
                > >>>>and see what the reaction is.
                > >>>>
                > >>>>My idea is for a semi-automated process for choosing new
                > >>>>Folkspraak
                > >>>>(or any other constructed language) words:
                > >>>>
                > >>>>First step: for a particular word, say walk/laufen/loop, we
                > >>>>
                > >>>>
                > >make a
                > >
                > >
                > >>>>list of the versions of the word in all the widely spoken
                > >>>>
                > >>>>
                > >Germanic
                > >
                > >
                > >>>>languages.
                > >>>>
                > >>>>Second step: we then run these words through an algorithm
                > >>>>
                > >>>>
                > >which
                > >
                > >
                > >>>>finds a new word that minimizes the sum of the Lehvenstein
                > >>>>distances
                > >>>>between this new word and all the existing words.
                > >>>>
                > >>>>Third step: Take these computer generated words and review
                > >>>>
                > >>>>
                > >them
                > >
                > >
                > >>>>for
                > >>>>sensibility. If it comes up with a silly word, just discard
                > >>>>
                > >>>>
                > >it
                > >
                > >
                > >>>>and
                > >>>>try to come up with a new word yourself.
                > >>>>
                > >>>>This method could be run on 1000's of words and could
                > >>>>
                > >>>>
                > >potentially
                > >
                > >
                > >>>>speed up the process of establishing a new vocabulary.
                > >>>>
                > >>>>Please let me know what you think. If there is interest I
                > >>>>
                > >>>>
                > >will
                > >
                > >
                > >>>>write a programme to do the "finding" of new words and people
                > >>>>
                > >>>>
                > >can
                > >
                > >
                > >>>>try it out.
                > >>>>
                > >>>>P.s. When will it become compulsory for posts on this group to
                > >>>>
                > >>>>
                > >be
                > >
                > >
                > >>>>in
                > >>>>Folkspraak? :-)
                > >>>>
                > >>>>
                > >>>>
                > >>>>
                > >>>[Non-text portions of this message have been removed]
                > >>>
                > >>>
                > >>>
                > >
                > >
                > >
                > >
                > >
                > >------------------------------------------------------------------
                ------
                > >
                > >
                > >No virus found in this incoming message.
                > >Checked by AVG - http://www.avg.com
                > >Version: 8.0.169 / Virus Database: 270.6.14/1643 - Release Date:
                30/08/2008 5:18 p.m.
                > >
                > >
                > >
                >
              • David Parke
                I think you could encounter some difficulty. Take for example the German word Zeit and it s close cognate in Dutch tijd. There isn t much common between Zeit
                Message 7 of 7 , Sep 3, 2008
                • 0 Attachment
                  I think you could encounter some difficulty. Take for example the German word Zeit and
                  it's close cognate in Dutch tijd.
                  There isn't much common between "Zeit" and "tijd". In fact, if you don't know about the
                  common differences and similarities between DE and NL, you would be forgiven for
                  equating DE Zeit with NL zijde.

                  I had another idea of what you could do with a statistical analysis. You could use it to
                  identify possible or likely cognates in lists of words with the same meaning in several
                  languages.


                  --- In folkspraak@yahoogroups.com, "cmore123579" <charlesmore@...> wrote:
                  >
                  > I like your idea of using an "outside" language like Spanish in the
                  > google translations (subject of course to the obvious flaws in
                  > google's translation as you describe below).
                  >
                  > I will try it on a couple of words and see what I come up with.
                  >
                  > Thanks for the advice!
                  >
                  > --- In folkspraak@yahoogroups.com, David Parke <parked@> wrote:
                  > >
                  > > If that's how you want to procede, it'd make an interesting
                  > experiment,
                  > > and I'd like to see the results -- I'm just not sure if they would
                  > be
                  > > usable -- I forsee a few hundred practical words and thousands
                  > upon
                  > > thousands of un-recognisable mush-words.
                  > >
                  > > I would suggest using Google Translate as your source of
                  > translations.
                  > > For one thing, that could be your arbitrary criterion and
                  > therefore
                  > > consistant. For another thing, it allows you to input a single
                  > English
                  > > word and get a single word translation into just about all the big
                  > > Germanic languages.
                  > > Example, if you put in English "walk", you'd get
                  > > EN walk
                  > > NL lopen
                  > > DE gehen
                  > > DA gå
                  > > NO gå
                  > > SV gå
                  > >
                  > > I'd like to hope or assume that some effort has been put into
                  > Googles'
                  > > translation engine to create the most likely translation in cases
                  > of a
                  > > complete lack of context.
                  > > BTW, just by way of illustration of how dubious I think such an
                  > > enterprise could be, if you try to translate DE gehen back to EN,
                  > you
                  > > don't get "walk", you get "go".
                  > >
                  > > I'd also suggest maybe using an "uninvolved" language as the
                  > standard or
                  > > base dictionary. The FS word has to be translated into at least
                  > one
                  > > other language for it to have meaning and precedent of how it's
                  > meant to
                  > > be used. It might make the analysis less murky if initially that
                  > > language was not one of the source languages but a fixed point
                  > outside
                  > > of them. Because English is used as one of the sources, maybe the
                  > master
                  > > FS dictionary should be in another language. Let's call this the
                  > > "Master" language. If you tried to create the language from the
                  > point of
                  > > view of another Germanic language, say German, you might end up
                  > with
                  > > different results. Eg translating DE "gehen" we get
                  > > EN go
                  > > NL gaan
                  > > DE gehen
                  > > DA start -- WTF?? I have my suspicions that Google Translate is
                  > doing
                  > > some translations via a 3rd language (English of course)
                  > > NO dra
                  > > SV gå
                  > >
                  > >
                  > > Let's pick an outsider language -- Spanish for the sake of
                  > arguement.
                  > > So maybe we'd be looking for the FS word for Spanish "andar". We'd
                  > get:
                  > > EN walk
                  > > NL lopen
                  > > DE gehen
                  > > DA gå
                  > > NO gå
                  > > SV gå
                  > >
                  > >
                  > > Anyway one thing that I fear would happen would be that if you
                  > take a
                  > > bunch of synomyns or even similar meaning words in you "Master"
                  > > language", you'd for each synonym and near-synonym get a somewhat
                  > > different mix of words in the Germanic languages. Not a totally
                  > > different mix of words, just somewhat different, but some words
                  > the
                  > > same. So for each of those input words from the Master language,
                  > you
                  > > arrive at a subtly different FS word. So you might end up with
                  > some FS
                  > > words such as *go, *ga, *glo *lo etc, all meaning the essentially
                  > the
                  > > same thing.
                  > >
                  > > cmore123579 wrote:
                  > >
                  > > >I suspected the responses wouldn't be too positive. Just to
                  > clarify
                  > > >the point about minimising the sum of the Levenshtein distances:
                  > > >there's no statistical reason for summing the squares of the
                  > > >distances because the distances themselves are always greater
                  > than
                  > > >or equal to 0. Minimising the squared error is only necessary
                  > where
                  > > >values may be less than 0.
                  > > >
                  > > >I have thought about the problem of deciding which words to
                  > compare
                  > > >with which words. I only speak English, Dutch and Afrikaans to a
                  > > >level where I can appreciate the subtleties of meaning (I am a
                  > > >statistician, not a linguist), but even so I appreciate your
                  > point
                  > > >about there not being one-to-one translations for words.
                  > > >
                  > > >To solve the problem I thought about allowing any number of words
                  > > >per langauge in the search for a new word, e.g. in your example
                  > we
                  > > >would have walk, leap and lope representing English and whichever
                  > 3,
                  > > >4 or 5 German words you like. The algorithm then considers all
                  > > >combinations of words from different languages to find the one
                  > that
                  > > >results in the least possible Levenshtein distance among all the
                  > > >possible combinations.
                  > > >
                  > > >Anyway, I am going to try it for the languages I know and if
                  > anybody
                  > > >feels like contributing some words from languages they speak I'd
                  > be
                  > > >very happy for the help. I don't suppose the experiment will
                  > change
                  > > >the world, but it will at the very least be interesting to see
                  > what
                  > > >comes out a the end of it.
                  > > >
                  > > >
                  > > >
                  > > >
                  > > >--- In folkspraak@yahoogroups.com, "David Parke" <parked@> wrote:
                  > > >
                  > > >
                  > > >>The problem is who decides and how to decide what words to
                  > compare
                  > > >>
                  > > >>
                  > > >with what.
                  > > >
                  > > >
                  > > >>Why choose laufen/lopen/walk? Why not laufen/lopen/stroll? Or
                  > > >>
                  > > >>
                  > > >gehen/lopen/saunter?
                  > > >
                  > > >
                  > > >>Also DE laufen and NL lopen don't always have the same meaning --
                  >
                  > > >>
                  > > >>
                  > > >there are contexts or
                  > > >
                  > > >
                  > > >>idioms where they mean different things. DE laufen means to run
                  > > >>
                  > > >>
                  > > >more often than not. NL
                  > > >
                  > > >
                  > > >>lopen means to walk more often than DE laufen does. Also English
                  > > >>
                  > > >>
                  > > >has leap (the exact
                  > > >
                  > > >
                  > > >>cognate of laufen/lopen). It also has lope (a borrowing from
                  > Norse
                  > > >>
                  > > >>
                  > > >that has a meaning
                  > > >
                  > > >
                  > > >>more aligned with DE laufen/NL lopen.
                  > > >>
                  > > >>It is naive to think that for every word, there is a one to one
                  > > >>
                  > > >>
                  > > >mapping of words and
                  > > >
                  > > >
                  > > >>meanings in other languages. Most words in English can be
                  > > >>
                  > > >>
                  > > >translated to more than one
                  > > >
                  > > >
                  > > >>word in German. Most German words have more than one single
                  > > >>
                  > > >>
                  > > >translation into English.
                  > > >
                  > > >
                  > > >>Even with a very obvious word like DE Haus, it doesn't always
                  > > >>
                  > > >>
                  > > >translate to EN house.
                  > > >
                  > > >
                  > > >>Sometime "home" would be a better translation. As
                  > would "building".
                  > > >> It is this reason that I am highly skeptical of the long-term
                  > > >>
                  > > >>
                  > > >viability of "cross-words" (as
                  > > >
                  > > >
                  > > >>used in Ingmar's Middelsprake, these are mixes of words of un-
                  > > >>
                  > > >>
                  > > >related origin).
                  > > >
                  > > >
                  > > >>You'd need an objetive criterion on which words to compare with
                  > > >>
                  > > >>
                  > > >what.
                  > > >
                  > > >
                  > > >>I use the criterion of etymological cognates. You not only need
                  > to
                  > > >>
                  > > >>
                  > > >reconseil the FORM, but
                  > > >
                  > > >
                  > > >>also the MEANING of the outcome.
                  > > >>For example if you compare EN leap, EN lope, NL lopen, DE
                  > laufen,
                  > > >>
                  > > >>
                  > > >DA løbe, NO løpe, SW
                  > > >
                  > > >
                  > > >>löpa, you might come up with a word (by taking the minimized sum
                  > > >>
                  > > >>
                  > > >of the Lehvenstein
                  > > >
                  > > >
                  > > >>distances -- whatever that means) with a form of "loepe" or
                  > > >>
                  > > >>
                  > > >something. But what exactly
                  > > >
                  > > >
                  > > >>does this word mean -- given that the source words have a
                  > variety
                  > > >>
                  > > >>
                  > > >of meaning, only some
                  > > >
                  > > >
                  > > >>of them shared?
                  > > >>
                  > > >>--- In folkspraak@yahoogroups.com, "Nightvid F. Cole" <ncole@>
                  > > >>
                  > > >>
                  > > >wrote:
                  > > >
                  > > >
                  > > >>>No, we should minimize the sum of the SQUARES of the
                  > Levenshtein
                  > > >>>
                  > > >>>
                  > > >distances (for
                  > > >
                  > > >
                  > > >>statistical reasons.) And not to be negative or anything, but I
                  > am
                  > > >>
                  > > >>
                  > > >pretty sure that non-
                  > > >
                  > > >
                  > > >>cognates will pose quite a problem unless we find a specific way
                  > > >>
                  > > >>
                  > > >of dealing with that issue.
                  > > >
                  > > >
                  > > >>Ideally, we could try to count cognates when possible but put in
                  > > >>
                  > > >>
                  > > >a 'penalty' for semantic
                  > > >
                  > > >
                  > > >>drift (in your example, the correct English word is 'leap'.)
                  > > >>
                  > > >>
                  > > >>>Nightvid
                  > > >>>
                  > > >>>----- Original Message -----
                  > > >>>From: cmore123579 <charlesmore@>
                  > > >>>Date: Monday, September 1, 2008 8:05 am
                  > > >>>Subject: [folkspraak] Idea for deciding vocabulary
                  > > >>>To: folkspraak@yahoogroups.com
                  > > >>>
                  > > >>>
                  > > >>>
                  > > >>>>Hi, I'm new here, but thought I'd run an idea I had past
                  > > >>>>
                  > > >>>>
                  > > >everybody
                  > > >
                  > > >
                  > > >>>>and see what the reaction is.
                  > > >>>>
                  > > >>>>My idea is for a semi-automated process for choosing new
                  > > >>>>Folkspraak
                  > > >>>>(or any other constructed language) words:
                  > > >>>>
                  > > >>>>First step: for a particular word, say walk/laufen/loop, we
                  > > >>>>
                  > > >>>>
                  > > >make a
                  > > >
                  > > >
                  > > >>>>list of the versions of the word in all the widely spoken
                  > > >>>>
                  > > >>>>
                  > > >Germanic
                  > > >
                  > > >
                  > > >>>>languages.
                  > > >>>>
                  > > >>>>Second step: we then run these words through an algorithm
                  > > >>>>
                  > > >>>>
                  > > >which
                  > > >
                  > > >
                  > > >>>>finds a new word that minimizes the sum of the Lehvenstein
                  > > >>>>distances
                  > > >>>>between this new word and all the existing words.
                  > > >>>>
                  > > >>>>Third step: Take these computer generated words and review
                  > > >>>>
                  > > >>>>
                  > > >them
                  > > >
                  > > >
                  > > >>>>for
                  > > >>>>sensibility. If it comes up with a silly word, just discard
                  > > >>>>
                  > > >>>>
                  > > >it
                  > > >
                  > > >
                  > > >>>>and
                  > > >>>>try to come up with a new word yourself.
                  > > >>>>
                  > > >>>>This method could be run on 1000's of words and could
                  > > >>>>
                  > > >>>>
                  > > >potentially
                  > > >
                  > > >
                  > > >>>>speed up the process of establishing a new vocabulary.
                  > > >>>>
                  > > >>>>Please let me know what you think. If there is interest I
                  > > >>>>
                  > > >>>>
                  > > >will
                  > > >
                  > > >
                  > > >>>>write a programme to do the "finding" of new words and people
                  > > >>>>
                  > > >>>>
                  > > >can
                  > > >
                  > > >
                  > > >>>>try it out.
                  > > >>>>
                  > > >>>>P.s. When will it become compulsory for posts on this group to
                  > > >>>>
                  > > >>>>
                  > > >be
                  > > >
                  > > >
                  > > >>>>in
                  > > >>>>Folkspraak? :-)
                  > > >>>>
                  > > >>>>
                  > > >>>>
                  > > >>>>
                  > > >>>[Non-text portions of this message have been removed]
                  > > >>>
                  > > >>>
                  > > >>>
                  > > >
                  > > >
                  > > >
                  > > >
                  > > >
                  > > >------------------------------------------------------------------
                  > ------
                  > > >
                  > > >
                  > > >No virus found in this incoming message.
                  > > >Checked by AVG - http://www.avg.com
                  > > >Version: 8.0.169 / Virus Database: 270.6.14/1643 - Release Date:
                  > 30/08/2008 5:18 p.m.
                  > > >
                  > > >
                  > > >
                  > >
                  >
                Your message has been successfully submitted and would be delivered to recipients shortly.