Loading ...
Sorry, an error occurred while loading the content.
 

Fwd: [Corpora-List] New features at BNC/VIEW: view.byu.edu

Expand Messages
  • Olga Lashevskaia
    This is a forwarded message from Corpora List ===8
    Message 1 of 1 , Oct 15, 2005
      This is a forwarded message from Corpora List

      ===8<==============Original message text===============
      BNC/VIEW is a new architecture and interface for the 100 million word
      British National Corpus. It is freely availably on the web at
      http://view.byu.edu

      A number of new features have recently been added to the corpus. They
      include the following:

      ** 1) CHARTS
      You can now see (in graphical form) the frequency of a word, phrase, or
      grammatical construction in the six major registers (e.g. spoken,
      academic) and then each of the sub-registers (sermons, poetry, medical,
      etc).

      ** 2) IMPROVED SEARCHES FOR COLLOCATES
      The search frame is now much wider when you search for collocates /
      surrounding words -- up to ten words on the left and on the right.
      Examples: the most common nouns near [kitchen], comparing the nouns near
      [uncover] with those near [reveal], and comparing nouns near [chair] in
      fiction and academic registers.

      ** 3) CUSTOMIZED, USER-DEFINED LISTS
      You can create an unlimited number of customized lists, containing words
      that are related together in any way that you might imagine. Via the web
      interface you store these word lists, and you can then re-use them at
      any point in the future.

      ** 4) SORTING BY RELEVANCE (MODIFIED Z-SCORE)
      You can now sort by relevance, which provides a much better
      understanding of which words are most tightly related together. This
      type of query, which is similar to a z-score calculation, takes into
      account the overall frequency of collocates and sorts out high-frequency
      "noise" words.


      Features that were already available include the following:


      ** 5) BASIC QUERIES BY SUBSTRING, WORD, PHRASE, AND PART OF SPEECH
      For example, the frequency of a given word, set of words, phrase,
      substring (e.g. *heart*), part of speech (e.g. [av*] [aj*]: very clear),
      or combinations of these (e.g. [vv*] it/them [avp]: took them away, give
      it up)

      ** 6) REGISTER-BASED QUERIES
      You can find the frequency of words and phrases in any combination of
      registers that you define -- on the fly -- e.g. spoken, academic,
      poetry, or medical. In addition, you can compare between registers --
      for example, verbs that are more common in legal or medical texts,
      phrases like [I * that] that are more common in conversation than in
      non-fiction texts, nouns near "break" (v) that are found primarily in
      academic writings, etc.

      ** 7) FREQUENCY IN ALL 70 REGISTERS
      You can click on a word or phrase in any of the results sets to see the
      frequency in all 70 registers. Sorted initially by normalized frequency,
      you can re-sort by register name, number of tokens, etc.

      ** 8) COMPARING COLLOCATES WITH RELATED WORDS
      For example, nouns that occurs after [utter] but not with [sheer] or
      [total], adjectives within ten words of [man] that do not occur near
      [woman], etc. All of this is done via one simple query from the web
      interface. This may be quite useful for language learners, to allow
      them to compare the uses of competing synonyms.

      ** 9) INTEGRATION WITH WORDNET (SEMANTICALLY-BASED QUERIES)
      For example:
      [=bad] [nn*]: any synonym of [bad] followed by any noun, e.g. wicked
      witch, foul play, terrible storm, etc
      my/your [@body]: [my] or [your] followed by a part of the body: my
      leg, your shoulder, etc
      [<eat] the [<food]: a more specific word for [eat] followed by a more
      specific word for [food], e.g. devour the hamburger, munch the cookies
      Again, all of this is done via one simple query from the search form

      ** 10) COMPLETE RESULTS FAST AND FAST QUERIES
      Unlike some other interfaces for the BNC, this one allows you to find
      all of the matching strings -- not just those that occur three times or
      more. In addition, queries of the 100 million word corpus are quite
      fast -- less than one or two seconds for most searches.

      Again, the corpus is freely available at http://view.byu.edu Please
      feel free to email me with any questions or comments.

      Mark Davies
      Brigham Young University


      =================================================

      Mark Davies
      Assoc. Prof., Linguistics
      Brigham Young University
      (phone) 801-422-9168 / (fax) 801-422-0906

      http://davies-linguistics.byu.edu

      ** Corpus design and use // Linguistic databases **
      ** Historical linguistics // Language variation **
      ** English, Spanish, and Portuguese **

      =================================================
    Your message has been successfully submitted and would be delivered to recipients shortly.