Loading ...
Sorry, an error occurred while loading the content.

Re: Code-switching corpora

Expand Messages
  • leonikotze
    Dear Gustav, I suppose literature would be your best bet. I have a play by Athol Fugard entitled Boesman and Lena . It is FULL of code-switching between
    Message 1 of 4 , Apr 4, 2012
    • 0 Attachment
      Dear Gustav,

      I suppose literature would be your best bet. I have a play by Athol Fugard entitled 'Boesman and Lena'. It is FULL of code-switching between English and Afrikaans. I think the play might be ideal for you, as the book in which the play appears, has a glossary which explains the meaning of the Afrikaans terms. I will scan the play for you if you wish and send it to your private email. Kindly let me know whether you would like me to do this.

      Kindest Regards
      Leoni Kotze, South Africa

      --- In code-switching@yahoogroups.com, Gustav Eje Henter <ghe@...> wrote:
      >
      > Dear code-switching experts,
      >
      > My name is Gustav Henter, and I am a Ph.D. student in signal processing and
      > machine learning at KTH - The Royal Institute of Technology in Stockholm,
      > Sweden. Currently, I am looking for a large body of roman-alphabet text (>2
      > million characters, say) with significant code-switching. I intend to use the
      > data to build a character-level Markov chain model for some experiments with a
      > computer algorithm I am researching.
      >
      > Do you know where such a corpus or corpora can be obtained in digital form?
      >
      > I have already experimented with Vivan de Klerk's Xhosa English corpus (thanks,
      > Vivian!), but it had too low a rate of code-switching---about 1 switch for every
      > 500 words---to show any significant difference in my experiments.
      >
      > Best regards,
      > Gustav Henter
      >
      > ======================================================================
      > Gustav Eje Henter, Ph.D. student E-mail: gustav.henter@...
      > Sound and Image Processing Lab, EES, Web: http://www.ee.kth.se/sip/
      > KTH - Royal Institute of Technology Phone: (+46) 8 790 7420
      > Osquldas väg 10, SE-100 44 Stockholm, SWEDEN Office: A:327, floor 3
      > ======================================================================
      >
    • Gustav Eje Henter
      Hello Anna, This was a very interesting link indeed. I am right now experimenting with the BilingBank Blum Snow corpus, and have obtained an interesting Markov
      Message 2 of 4 , Apr 5, 2012
      • 0 Attachment
        Hello Anna,

        This was a very interesting link indeed. I am right now experimenting with the
        BilingBank Blum Snow corpus, and have obtained an interesting Markov chain model
        which regularly mixes English and Hebrew.

        Thanks to everyone who has shared their advice and/or data so far!

        Best regards,
        Gustav


        PS. Just for fun, here is a (lightly edited) random sample from a
        character-level 5-gram Markov chain based on the Blum Snow data:

        eha'avir et ze anashim your fork. al ha# matay maspik. bananot ve# pit'om here's
        a twenty-five about he's probably where write lots of them tembelit. lo ta'avir

        On 2012-04-04 18:14, Anna S wrote:
        > Dear Gustav,
        >
        > I recommend you to have a look at www.talkbank.org, especially http://talkbank.org/data/BilingBank/
        >
        > On that website you find monolingual and bilingual spoken data, sometimes even with the original audio files available.
        >
        > I used the eppler corpus for a study on the grammatical side of codeswitching, it contains cs german-english. on the website you will also find other language pairs; some even with more than two languages switching.
        >
        > Good luck!
        >
        > Best regards from Bremen, Germany,
        >
        > Anna (M.A. linguistics)
        >
        > To: code-switching@yahoogroups.com
        > From: ghe@...
        > Date: Tue, 3 Apr 2012 18:34:48 +0200
        > Subject: [code-switching] Code-switching corpora
        >
        > Dear code-switching experts,
        >
        > My name is Gustav Henter, and I am a Ph.D. student in signal processing and
        >
        > machine learning at KTH - The Royal Institute of Technology in Stockholm,
        >
        > Sweden. Currently, I am looking for a large body of roman-alphabet text (>2
        >
        > million characters, say) with significant code-switching. I intend to use the
        >
        > data to build a character-level Markov chain model for some experiments with a
        >
        > computer algorithm I am researching.
        >
        >
        >
        > Do you know where such a corpus or corpora can be obtained in digital form?
        >
        >
        >
        > I have already experimented with Vivan de Klerk's Xhosa English corpus (thanks,
        >
        > Vivian!), but it had too low a rate of code-switching---about 1 switch for every
        >
        > 500 words---to show any significant difference in my experiments.
        >
        >
        >
        > Best regards,
        >
        > Gustav Henter
        >
        >
        >
        > ======================================================================
        >
        > Gustav Eje Henter, Ph.D. student E-mail: gustav.henter@...
        >
        > Sound and Image Processing Lab, EES, Web: http://www.ee.kth.se/sip/
        >
        > KTH - Royal Institute of Technology Phone: (+46) 8 790 7420
        >
        > Osquldas väg 10, SE-100 44 Stockholm, SWEDEN Office: A:327, floor 3
        >
        > ======================================================================
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >
        > [Non-text portions of this message have been removed]
        >
        >
        >
        > ------------------------------------
        >
        > To Post a message: code-switching @ yahoogroups.com
        > To Unsubscribe, send a blank message to:
        > code-switching-unsubscribe @ yahoogroups.com
        > Web page: http//groups.yahoo.com/group/code-switchingYahoo! Groups Links
        >
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.