41Technical Challenges -- 7 Points
- Nov 22, 2002Hi gang,
I've been using the following catagories/points for exploring the Technical
challenges that we face in trying to create a Christian Digital Library:
1. Storage (and Acquisition).
2. User interface.
3. Classification and indexing.
4. Information retrieval.
5. Content delivery.
Clearly, we need a two-pronged effort to prepare both large quantities of
'holdings' (the actual e-materials, meta/card data, and formats/delivery)
and the actual system design (from UIs to Data Methods to Admin
Luckily we do not have to totaly reinvent the (e-)wheel -- at this point we
are already starting to have LOTs of electronic materials, and some of the
first prototype system components/parts are being created and
explored/tried. Progress is good, many challenges/decisions remain, but we
are well underway!!
Thus keep up the good work ... get out there and find/make e-stuff ;-)
= = = = = = = = = = = = = = = =
Our Digital library development faces challenges in several areas,
including the main points summarized here.
A digital library's storage system must be capable of storing a large
amount of data in a variety of formats and accessing this data as quickly
as possible. Text-only documentsstored in formats such as ASCII, LaTex,
HTML, SGML, and PostScriptare by far the easiest to store. Digital audio
and video are more difficult to store because they require significantly
more storage space and their delivery is time-dependent.
A typical digital library uses a variety of database-management
systems. Current DBMSs range from relational and extended relational
systems to object-oriented database systems. Relational DBMSs are most
often used for the storage of metadata and indexes with attributes that
contain pointers to files in a file system. Most of the commercial RDBMSs
also support the storage of Binary Large Objects (BLOBs); in an Oracle
RDBMS, BLOBs can be as large as 2 Gbytes. Object-oriented database systems
are slowly gaining acceptance and overcoming earlier performance and
implementation problems. An OODBS can make it easier to model, store, and
work with real-world objects such as images or maps.
Compression techniques save storage. For text-only documents, the
Unix compress or freeware gzip utilities provide anywhere from 10- to
60-percent compression. Several compression standards exist for digital
images (JPEG), audio (uLaw), and video (MPEG).
Digital library collections that are too large to store entirely on a
disk use hierarchical storage mechanisms. In an HSM, the most frequently
used data is kept on fast disks while less frequently used data is kept in
near-line such as an automated (for example, robotic) tape library. Using
data-usage statistics, the HSM can automatically migrate data from tape
(near-line) storage to disk (on-line) and back, as required.
2. User interface.
The user interface, perhaps the most important digital library component,
must incorporate a wide variety of techniques to afford rich interaction
between users and the information they seek. For computer workstations,
graphical user interfaces such as X-Windows, Microsoft Windows, and
Macintosh System 7 interfaces are the status quo.
A user interface for digital libraries must display large volumes of
data effectively. Typically the user is presented with one or more
overlapping windows that can be resized and rearranged. In digital
libraries, a large amount of data spread through a number of resources
necessitates intuitive interfaces for users to query and retrieve
information. The ability to smoothly change the user's perspective from
high-level summarization information down to a specific paragraph of a
document or scene from a film remains a challenge to user interface
3. Classification and indexing.
Classification and indexing schemes are used to collect related content
into groups that are intuitive to a user. Classifing and indexing objects
is filled with pitfalls, however, because individual perceptions vary.
Another complicating factor in indexing and classifying is the tremendous
amount of potential content that remains to be indexed. It is clear that
manual methods for classification are insufficient for all but the most
trivial digital library.
Automated classification systems differ significantly in their
approaches, depending on the type of content under consideration.
Classifying short stories is quite different from classifying maps, both in
terms of the mechanics involved and the appropriate classes. These
distinctions make current automated classification efforts highly
Automated document classification methods can be grouped into two
general approaches, but neither can yet capture the meaning of words in the
documents. Image classification approaches are conceptually different from
those used for text classification. Although many domain-specific systems
allow "content-based" querying, most are relegated to a very narrow range
of images and may require the services of human classifiers. Video
classification and indexing requires systems that can parse video into
manageable portions, typically called camera shots. As with image
classification, the type of classification and indexing performed on video
is driven by the types of queries posed by users. The classification of
audio, musical notation, and maps presents additional research challenges.
4. Information retrieval.
The concepts underlying information retrieval were conceived long before
computers and information systems were employed to store library materials.
In the digital library domain, there are a variety of information-retrieval
techniques, including metadata searching, full-text document searching, and
content searching for other data types.
The success of information retrieval can be measured in terms of the
percentage of relevant and extraneous information retrieved. It is
difficult to pinpoint quantitatively the effectiveness of information
retrieval; only an individual user can determine what is truly useful.
Techniques to improve retrieval effectiveness include preprocessing
documents to extract additional metadata before storing them in a document
Several researchers are focusing on automating the creation and
maintenance of user profiles and applying these profiles to information
retrieval. Software agents are an extension of filtering techniques,
although filtering tends to imply passive mechanisms whereas the use of
agents implies a more proactive approach. Many people have put forth
definitions of software agents, ranging from an adaptable information
filter to an autonomous program that works in conjunction with or on behalf
of a human user. Software agents also embody the notion of improving over
time as they record additional user actions and reactions.
5. Content delivery.
Once an item of interest has been located in a digital library, it may be
delivered in several ways. If the content is small, such as 100 pages of
text or a 50-Kbyte image file, it may be delivered through the same channel
used for information retrieval and querying. Content such as movies and
software, however, demands much higher bandwidth. In these cases, delivery
is over dedicated leased lines (for example, cable TV or videoconferencing
systems) or satellite-based systems such as the Hughes Network Systems
Increased demands for networking bandwidth come from two main fronts.
First, the number of digital library users will undoubtedly increase. If
the Internet is any indication, exponential growth in the number of users
will be the rule. Second, as the delivery of multimedia data becomes the
norm, the demands for high bandwidth increase. However, high bandwidth, in
and of itself, is not enough to support digital libraries. The intelligent
use of bandwidth and the ability to guarantee bandwidth for a given time
period are also required.
Today's open networking standards such as TCP/IP and the ever-growing
Internet make it clear that successful digital libraries must be built on
an open, interoperable networking infrastructure. Current digital libraries
may be run exclusively on a single computer, on several computers connected
on a LAN, or on a large number of computers spread out over a wide area
network. Delivery systems that require high bandwidth such as video and
image libraries are predominantly installed using LANs that run at 10 to
100 Mbits per second. In contrast, the Internet's major backbones run at
1.5 Mbps to 150 Mbps, while links to individual organizations fall in the
56 Kbps to 1.5 Mbps range. Individual users typically connect to the
Internet through service providers, local universities, or other
organizations at 2.4 Kbps to 28.8 Kbps.
Users of a traditional library usually want to read a book or watch a
videotape; other uses are rare. With digital technology, it now becomes
possible to listen to a book being read, watch a video of a musical
performance alongside the original score, or hold a mechanical hand as it
forms American Sign Language. Other possible uses are highly
personalindividuals may dream up many distinct variations. A digital
library's presentation systems must be flexible and highly customizable.
They must also be aware of the output hardware's capabilities and
limitations, automatically adjusting to deliver the best possible
presentation quality at all times.
Traditional libraries store a final copy of a book or other documents.
Digital libraries store several versions of a document in a way that makes
multiple revisions by multiple authors possible. In addition, the content
for a digital library may have multiple owners in terms of the sources of
the content and annotations made to the content of the library. An
administrative system ensures that materials intended for public viewing
can indeed be viewed by anyone while private collections and personal
annotations may only be viewed by a select group or single individual. And
data-versioning techniques track the history of such revisions.
There may also be times when a small group of individuals want access
to a portion of digital library content such as when authors are preparing
initial drafts of a document. In these cases, security mechanisms must be
put into place to ensure that only authorized users gain access. Current
digital libraries employ the basic security measures offered by the
supporting operating systems. For example, any digital library running on
Unix can restrict access using username and password authentication and
protect files using group membership and file-access rights. This basic
security will not meet the demands of large-scale digital libraries.
Finally, digital libraries must protect the identity of their users,
who may wish to browse content that may be embarrassing.
Extracted from: http://cimic.rutgers.edu/ieee_dltf.html