Loading ...
Sorry, an error occurred while loading the content.

41Technical Challenges -- 7 Points

Expand Messages
  • Marc
    Nov 22, 2002
      Hi gang,
      I've been using the following catagories/points for exploring the Technical
      challenges that we face in trying to create a Christian Digital Library:

      1. Storage (and Acquisition).
      2. User interface.
      3. Classification and indexing.
      4. Information retrieval.
      5. Content delivery.
      6. Presentation.
      7. Administrative.

      Clearly, we need a two-pronged effort to prepare both large quantities of
      'holdings' (the actual e-materials, meta/card data, and formats/delivery)
      and the actual system design (from UIs to Data Methods to Admin

      Luckily we do not have to totaly reinvent the (e-)wheel -- at this point we
      are already starting to have LOTs of electronic materials, and some of the
      first prototype system components/parts are being created and
      explored/tried. Progress is good, many challenges/decisions remain, but we
      are well underway!!

      Thus keep up the good work ... get out there and find/make e-stuff ;-)


      = = = = = = = = = = = = = = = =
      Technical Challenges
      Our Digital library development faces challenges in several areas,
      including the main points summarized here.

      1. Storage.

      A digital library's storage system must be capable of storing a large
      amount of data in a variety of formats and accessing this data as quickly
      as possible. Text-only documents—stored in formats such as ASCII, LaTex,
      HTML, SGML, and PostScript—are by far the easiest to store. Digital audio
      and video are more difficult to store because they require significantly
      more storage space and their delivery is time-dependent.

      A typical digital library uses a variety of database-management
      systems. Current DBMSs range from relational and extended relational
      systems to object-oriented database systems. Relational DBMSs are most
      often used for the storage of metadata and indexes with attributes that
      contain pointers to files in a file system. Most of the commercial RDBMSs
      also support the storage of Binary Large Objects (BLOBs); in an Oracle
      RDBMS, BLOBs can be as large as 2 Gbytes. Object-oriented database systems
      are slowly gaining acceptance and overcoming earlier performance and
      implementation problems. An OODBS can make it easier to model, store, and
      work with real-world objects such as images or maps.

      Compression techniques save storage. For text-only documents, the
      Unix compress or freeware gzip utilities provide anywhere from 10- to
      60-percent compression. Several compression standards exist for digital
      images (JPEG), audio (uLaw), and video (MPEG).

      Digital library collections that are too large to store entirely on a
      disk use hierarchical storage mechanisms. In an HSM, the most frequently
      used data is kept on fast disks while less frequently used data is kept in
      near-line such as an automated (for example, robotic) tape library. Using
      data-usage statistics, the HSM can automatically migrate data from tape
      (near-line) storage to disk (on-line) and back, as required.

      2. User interface.

      The user interface, perhaps the most important digital library component,
      must incorporate a wide variety of techniques to afford rich interaction
      between users and the information they seek. For computer workstations,
      graphical user interfaces such as X-Windows, Microsoft Windows, and
      Macintosh System 7 interfaces are the status quo.

      A user interface for digital libraries must display large volumes of
      data effectively. Typically the user is presented with one or more
      overlapping windows that can be resized and rearranged. In digital
      libraries, a large amount of data spread through a number of resources
      necessitates intuitive interfaces for users to query and retrieve
      information. The ability to smoothly change the user's perspective from
      high-level summarization information down to a specific paragraph of a
      document or scene from a film remains a challenge to user interface

      3. Classification and indexing.

      Classification and indexing schemes are used to collect related content
      into groups that are intuitive to a user. Classifing and indexing objects
      is filled with pitfalls, however, because individual perceptions vary.
      Another complicating factor in indexing and classifying is the tremendous
      amount of potential content that remains to be indexed. It is clear that
      manual methods for classification are insufficient for all but the most
      trivial digital library.

      Automated classification systems differ significantly in their
      approaches, depending on the type of content under consideration.
      Classifying short stories is quite different from classifying maps, both in
      terms of the mechanics involved and the appropriate classes. These
      distinctions make current automated classification efforts highly

      Automated document classification methods can be grouped into two
      general approaches, but neither can yet capture the meaning of words in the
      documents. Image classification approaches are conceptually different from
      those used for text classification. Although many domain-specific systems
      allow "content-based" querying, most are relegated to a very narrow range
      of images and may require the services of human classifiers. Video
      classification and indexing requires systems that can parse video into
      manageable portions, typically called camera shots. As with image
      classification, the type of classification and indexing performed on video
      is driven by the types of queries posed by users. The classification of
      audio, musical notation, and maps presents additional research challenges.

      4. Information retrieval.

      The concepts underlying information retrieval were conceived long before
      computers and information systems were employed to store library materials.
      In the digital library domain, there are a variety of information-retrieval
      techniques, including metadata searching, full-text document searching, and
      content searching for other data types.

      The success of information retrieval can be measured in terms of the
      percentage of relevant and extraneous information retrieved. It is
      difficult to pinpoint quantitatively the effectiveness of information
      retrieval; only an individual user can determine what is truly useful.
      Techniques to improve retrieval effectiveness include preprocessing
      documents to extract additional metadata before storing them in a document

      Several researchers are focusing on automating the creation and
      maintenance of user profiles and applying these profiles to information
      retrieval. Software agents are an extension of filtering techniques,
      although filtering tends to imply passive mechanisms whereas the use of
      agents implies a more proactive approach. Many people have put forth
      definitions of software agents, ranging from an adaptable information
      filter to an autonomous program that works in conjunction with or on behalf
      of a human user. Software agents also embody the notion of improving over
      time as they record additional user actions and reactions.

      5. Content delivery.

      Once an item of interest has been located in a digital library, it may be
      delivered in several ways. If the content is small, such as 100 pages of
      text or a 50-Kbyte image file, it may be delivered through the same channel
      used for information retrieval and querying. Content such as movies and
      software, however, demands much higher bandwidth. In these cases, delivery
      is over dedicated leased lines (for example, cable TV or videoconferencing
      systems) or satellite-based systems such as the Hughes Network Systems
      project (http://www.hns.com)

      Increased demands for networking bandwidth come from two main fronts.
      First, the number of digital library users will undoubtedly increase. If
      the Internet is any indication, exponential growth in the number of users
      will be the rule. Second, as the delivery of multimedia data becomes the
      norm, the demands for high bandwidth increase. However, high bandwidth, in
      and of itself, is not enough to support digital libraries. The intelligent
      use of bandwidth and the ability to guarantee bandwidth for a given time
      period are also required.

      Today's open networking standards such as TCP/IP and the ever-growing
      Internet make it clear that successful digital libraries must be built on
      an open, interoperable networking infrastructure. Current digital libraries
      may be run exclusively on a single computer, on several computers connected
      on a LAN, or on a large number of computers spread out over a wide area
      network. Delivery systems that require high bandwidth such as video and
      image libraries are predominantly installed using LANs that run at 10 to
      100 Mbits per second. In contrast, the Internet's major backbones run at
      1.5 Mbps to 150 Mbps, while links to individual organizations fall in the
      56 Kbps to 1.5 Mbps range. Individual users typically connect to the
      Internet through service providers, local universities, or other
      organizations at 2.4 Kbps to 28.8 Kbps.

      6. Presentation.

      Users of a traditional library usually want to read a book or watch a
      videotape; other uses are rare. With digital technology, it now becomes
      possible to listen to a book being read, watch a video of a musical
      performance alongside the original score, or hold a mechanical hand as it
      forms American Sign Language. Other possible uses are highly
      personal—individuals may dream up many distinct variations. A digital
      library's presentation systems must be flexible and highly customizable.
      They must also be aware of the output hardware's capabilities and
      limitations, automatically adjusting to deliver the best possible
      presentation quality at all times.

      7. Administrative.

      Traditional libraries store a final copy of a book or other documents.
      Digital libraries store several versions of a document in a way that makes
      multiple revisions by multiple authors possible. In addition, the content
      for a digital library may have multiple owners in terms of the sources of
      the content and annotations made to the content of the library. An
      administrative system ensures that materials intended for public viewing
      can indeed be viewed by anyone while private collections and personal
      annotations may only be viewed by a select group or single individual. And
      data-versioning techniques track the history of such revisions.

      There may also be times when a small group of individuals want access
      to a portion of digital library content such as when authors are preparing
      initial drafts of a document. In these cases, security mechanisms must be
      put into place to ensure that only authorized users gain access. Current
      digital libraries employ the basic security measures offered by the
      supporting operating systems. For example, any digital library running on
      Unix can restrict access using username and password authentication and
      protect files using group membership and file-access rights. This basic
      security will not meet the demands of large-scale digital libraries.

      Finally, digital libraries must protect the identity of their users,
      who may wish to browse content that may be embarrassing.

      Extracted from: http://cimic.rutgers.edu/ieee_dltf.html