Loading ...
Sorry, an error occurred while loading the content.

FW: Web-Based Language Documentation and Description

Expand Messages
  • Popplestone, Ann
    ... From: Steven Bird [mailto:sb@UNAGI.CIS.UPENN.EDU] Sent: Thursday, July 27, 2000 1:27 PM To: ANTHRO-L@LISTSERV.ACSU.BUFFALO.EDU Subject: CFP: Web-Based
    Message 1 of 1 , Jul 27, 2000
    • 0 Attachment
      FW: Web-Based Language Documentation and Description

      -----Original Message-----
      From: Steven Bird [mailto:sb@...]
      Sent: Thursday, July 27, 2000 1:27 PM
      To: ANTHRO-L@...
      Subject: CFP: Web-Based Language Documentation and Description


                              CALL FOR PARTICIPATION


                 Web-Based Language Documentation and Description

                      Philadelphia USA, 12-15 December 2000

                      http://www.ldc.upenn.edu/exploration/


                   Institute for Research in Cognitive Science
                            University of Pennsylvania

       Organizers: Steven Bird (U Penn) and Gary Simons (SIL International)


      [The full version of this abridged CFP is available from the above page.]

      This workshop will lay the foundation of an open, web-based
      infrastructure for collecting, storing and disseminating the primary
      materials which document and describe human languages, including
      wordlists, lexicons, annotated signals, interlinear texts, paradigms,
      field notes, and linguistic descriptions, as well as the metadata
      which indexes and classifies these materials.  The infrastructure will
      support the modeling, creation, archiving and access of these
      materials, using centralized respositories of metadata, data, best
      practice guidelines, and open software tools.

      BACKGROUND

      Recent years have witnessed dramatic advances in the mass storage and
      web delivery technologies, making it possible to house virtually
      unlimited quantities of speech data online, and to disseminate this
      data over the web.  The development of XML and Unicode greatly
      facilitate the interchange and reuse of structured multimodal and
      multilingual data and the development of interoperating software
      tools.  These developments are having a pervasive influence on the way
      primary linguistic data are gathered, stored, analyzed and
      disseminated, as demonstrated by the initiatives surveyed on the
      linguistic exploration page (http://www.ldc.upenn.edu/exploration/),
      and the papers presented at the Linguistic Exploration Workshop at
      the Chicago LSA Meeting (http://www.ldc.upenn.edu/exploration/LSA/).

      CHALLENGES

      With these new technological opportunities are concomitant needs
      and challenges for modeling, creating, archiving and accessing data:

      I   Data Models.  A diverse range of data types are required in language
          documentation and linguistic fieldwork, including word lists,
          lexicons, annotated signals, writing system documentation,
          interlinear texts, paradigms, field notes, and linguistic
          descriptions.  We need flexible and general models for these data
          types (including links between them), and good ways to represent
          information which is either partial, uncertain, evolving, or
          disputed.  We need to develop a consensus in the community
          regarding best practice for modeling these kinds of data, to
          ensure maximal reusability of data and software.

      II  Data Archives.  Whether just the private collection of a single
          researcher or a large and centralized repository, language data
          needs to be stored and reused.  To support this, we need durable
          and open storage and interchange formats that embody the best
          practice consensus.  We need to convert (parochial) 8-bit
          character codings to Unicode, using a general tool for character
          conversion along with a host of conversion tables for specific
          character sets.  We also need to convert markup into the best
          practice formats we have defined.  We need a mechanism to support
          durable citation of data, so that document authors do not need to
          duplicate all the data they reference just to be sure that the
          links will not break.  More generally, we need a metadata standard
          for indexing the resources, regardless of format and availability,
          and a wide-coverage index conforming to the standard, so that
          someone interested in a particular language or region can find all
          the electronic resources that are pertinent to it, without having
          to determine how each of several different archives have named and
          classified their holdings.

      III Data Creation.  Now that mass storage is so inexpensive,
          researchers are creating large amounts of digital data covering
          the types listed above.  Both the number and scale of these
          collection efforts are growing rapidly.  We need software tools
          supporting data creation, conforming with best practice, and
          covering primary collection of textual data (wordlists, texts) and
          recordings (audio, video, physiological), along with transcription
          and annotation of the primary materials conforming to a broad
          range of descriptive and analytical practices.

      IV Data Access.  Once data has been created and archived, there exist
          a variety of access modes.  A region of data is identified by
          browsing, by launching a query, or by following a reference.  The
          selection is displayed according to appropriate conventions and
          styles, or converted into some other form (e.g. for statistical
          analysis and visualization).  The selection may be corrected,
          imported into a document, analyzed, and annotated, leading to the
          creation of secondary data and/or the elicitation of new primary
          data.  We need to develop suitable delivery mechanisms including
          stylesheets, conversion tools, indexing methods, and query
          languages, which encompass the needs for security and privacy.  We
          need standard application programming interfaces and a library of
          reusable components, to support the development of software for
          new modes of access.


      Many of the activities listed above are already underway; the lure of
      the technology is great despite the lack of infrastructure.  However,
      it is beyond the capacity of any single individual or institution to
      develop this infrastructure of standards and tools on their own.  There
      is a pressing need for close cooperation between these initiatives, so
      that scarce human, software and data resources are used optimally.

      WORKSHOP OBJECTIVES

      This workshop will lay the foundation of an open, web-based
      infrastructure for collecting, storing and disseminating the primary
      materials which document and describe human languages.  The
      infrastructure will support the modeling, creation, archiving and
      access of these materials, using centralized respositories of
      metadata, data, best practice guidelines, and open software tools.

      To meet this goal, we have identified three main objectives which can
      be substantially achieved at the present time:

      Objective 1: to develop a comprehensive framework which identifies all
          the infrastructural needs, designates appropriate roles for
          existing results as pieces of an overall solution, and sets out a
          coordinated response to the remaining challenges.

      Objective 2: to found centralized repositories (and nominate existing
          ones) for housing components of the infrastructure, so that data,
          tools, formats and standards can be collected, indexed, and made
          available to the community.

      Objective 3: to begin construction of the repositories, by identifying
          the contribution of past and present activities by the
          participants and by other individuals and institutions, and
          by gathering the results and their documentation.

      CALL FOR PARTICIPATION

      The workshop will include paper presentations and working sessions to
      develop the infrastructure.  Interested members of the community are
      invited to participate in the workshop.  There is a limit on available
      places, and participants will be identified on the basis of submitted
      abstracts.  Funding is available for authors of accepted papers.

      Abstracts.  One page abstracts are invited which describe substantive
      contributions to the repositories, or which discuss concrete problems
      for web-based language documentation and description, and describe
      possible solutions.

      Papers.  Authors of accepted abstracts will be asked to prepare a
      2-3,000 word paper plus associated materials.

      Address submissions to: Steven.Bird@..., Gary_Simons@...

      Timetable.

      Friday 1 September     Abstract deadline
      Friday 29 September    Acceptance notification
      Friday 24 November     Paper deadline
      12-15 December         Workshop

      IMPORTANT: FOR FURTHER INFORMATION

      Intending authors should consult the EXTENDED CFP, available from the
      linguistic exploration page (http://www.ldc.upenn.edu/exploration/).
      To be sure of receiving future announcements, please subscribe to the
      LINGUISTIC-EXPLORATION mailing list, referenced from that page.

      --
      Steven Bird                    Gary Simons
      University of Pennsylvania     SIL International
      Steven.Bird@...      Gary_Simons@...
      http://www.ldc.upenn.edu/sb    http://www.sil.org/SIL/roster/simons.htm

      >>>>>> To unsubscribe from this mailing list, send the command <<<<<<
      >>>>>> UNSUB ANTHRO-L to LISTSERV@... .  <<<<<<

    Your message has been successfully submitted and would be delivered to recipients shortly.