CPSC 461: Copyright (C) 2003 Katrin Becker Last Modified November 25, 2003 12:06 PM

DIGITAL LIBRARIES

Why should we care?

Quoted from: * Witten, Ian H., and David Bainbridge , How to Build a Digital Library, 2003, Morgan Kaufmann Publishers ISBN 1-55860-790-0 WebSite: www.mkp.com/DL [note: this book is a good read - highly recommended. K.B. Note also: the bulk of the material on this page is condensed or summarized from the above text.]

"Kataayi is a grassroots cooperative organization based in the village of Kakunyu in rural Uganda. In recent years its enterprising members have built ferrocement rainwater catchment tanks, utilized renewable energy technologies such as solar, wind, and biogas, and established a local industry making clay roofing tiles - among many other projects. But amid such human resourcefulness, information resources remain scarce. The nearest public phone, fax, library, newspapers, and periodicals are found in the district town, Masaka, 20 kn distant over rough roads. Masaka boasts no email or internet access. The difficulty of getting there discourages local inhabitants from taking advantage of the information and communication technologies that we take for granted in developed countries.

The Kataayi community believe that an information and communication center will have a major impact in their area. They laid the groundwork by acquiring a computer and solar power generation equipment. They established an e-mail connection via cellualr phone and set up a computer training program. They constructed a brick building to house the centre. And they gathered several books. But they need more information resources - lots more. They are looking for books covering topics such as practical technology, fair0trade marketing, agriculture, environmental conservation, sprituality, and social justice issues.

Then they discovered digital libraries. The Human Development Library is a compendium of some 1,200 authoritative books and periodicals on just such topics, produced by many disparate organizations - UN agencies and other international organizations. In print these books would weigh 340 kg, cost $20,000 and occupy a small library bookstack. Instead, the collection takes the form of a digital library and is distributed on a single CD-ROM throughout the developing world at essentially no cost. Related digital library collections cover such topics as disaster relief, agriculture, the environment, medicine and health, food and nutrition; more are coming. These digital libraries will increase Kataayi's information resources immeasurably, at a minisulce fraction of the cost of paper books."


A Digital Library is:
- an organized collection of information
- a focused collection of digital objects, including text, video, and audio, along with methods for access and retrieval, aand for selection, organization, and maintenance of the collection.
"Digital Librbaies are about new ways of dealing with knowledge: preserving, collecting, organizing, propagating, and accessing it - not about deconstructing existing institutions and putting them in an electronic box." ***

Digital Objects inculdes, but is not limited to:
- text, graphics, video, audio,
- collected data
- 3D objects
- simulations
- dynamic visualizations
- virtual-reality worlds

User: Roles & Needs:
- access and retrieval

Librarian: Roles and Needs:
- selection, organization, and maintenance


Libraries :
Then
Recent
- about storage and preservation
- accessible to the minority who posessed sufficient status & knew how to read
- volumes were chained in public reading places
- access to books through catalogues & librarians
Acquisitions:
- Mark Antony stole the rival library of Pergamum and gave it , in its entirety (200,000 volumes) to Cleopatra [200 BC?]
- public libraries began in the 19th century
- 'free' access to books became popular in the 20th century (note: 21st century sems to be movement towards "NOT free" again)
- far more user-centered
- increased emphasis on information exchange
- demand for more information curiosity driven
Acquisitions:
- 1801: decree : a copy of every book printed in the British Isles is to be donated to the Trinity College Library (also now: the British National Library; Oxfod & Cambridge Libraries; National Libraries of Scitland & Wales)
- 1537 same deal for the French National Library (for french publications)
- same deal (US publications) for Library of Congress
Trinity college: 1835 began work on catalog: by 1851, they had finished 'A' and 'B'. (and their collection was moderate : 250,000 volumes)
Now
- user fees in physical libraries
- demand for more information driven by economics
Library Catalog == metadata
Ability of a virtual library to replace physical library depends on the assumption that books, etc. are adequately represented by the information they contain alone.
What makes a book valuable?
Technological advances in developing countries sometimes leapfrog those in advanced ones. Why?
How can a digital library help in a developing country?
- disseminating humanitarian information
- disaster relief
- preserving indigenous culture
- locally produced information
Collection: numerous documents (usually 1000's or millions)
Document: any information-bearing message in electronically recorded form.
- in libraries, documents areusually text, although they may also include images; audio; or video
- as a rule (although not exclusively so), digital libraries typically contain a less varied collection of document "types" than do object repositories.
Some typical / desireable requirements for Digital Library Software (see also information on Greenstone Software, this is a list of the Greenstone featues ***)
[for each, consider what it is, why it is 'good', what might be the cost, why it might be hard or what problems may be encountered]
- accessible via web browser
- runs on multiple, popular platforms
- permits full-text and fielded search
- offers flexible browsing techniques
- creates access structures automatically
- makes use of available metadata
- capabilities can be extended by plug-ins
- can handle documents in any language
- can display user interface in multiple languages
- can handle collections of text, pictures, audio, video
- allows hierarchical browsing
- desinged for multi-gigabyte collections
- uses compression techniques
- permits authentication of users
- offers user logging
- provides an administrative function
- updates and adds new collections dynamically
- publishes collections on CD-ROM
- supports distributed collections
- open-source (Greenstone digital library (http://www.nzdl.org)
Ethics of Digital Libraries:
- implications of collecting information and making it widely available are far-reaching
Copyright
- access to information in digital libraries is typically far less controlled than print
- possession is not ownership
- can re-sell a copy of something you buy, but cannot re-distribute
- who owns a work?
- original creator?
- the one who pays them?
Time-span of a copyright:
- in US, older works: 95 years after date of first publication
- 1998 Copyright Extention Act:
- from the "moment of their fixation in tangible medium of expression" until 70 years after the author's death.
- works for hire are protected for 95 years after publication or 120 years after creation, whichever comes first
- rules vary from country to country
- most allow copying for fair use (like research & teaching)
Copyright on the Web:
- way fuzzier
- browsers make local copies of everything they display; does this contravene copyright? Should they be illegal? Controlled?
- search engines keep partial copies of things
- computers make many internal copies of files when they are used
- can you save them for personal use?
- link to them?
DO we need to change our view of what it means to "copy"?
Digital Library Projects
- involve digitizing documents
- are they public domain?
- is it a faithful reproduction (what does that mean)?
- if material was donated, does that automatically imply you can do with it as you please?
- does this constitute fair use?

Collecting from / Protection on the Web:
- search engines use software "robots"
- search engines assume permission unless explicitly excluded (robot exclusion protocol)
- what about those trying to archive the entire web (copy of record - for historical reasons)?
- what if, 20 years from now, you want to erase something dumb you said while in school?
- what if a polictical candidate wants to erase something they said before that contradicts their current campaign?
Current search technology has given us access to information that, while public before, was essentially inaccessible. No longer is it possible to hide something "in plain view".
What about illegal and harmful material? Who decides what is harmful? Can we decide for Madegascar? Can Luxembourg decide for us? How do we keep something we find OK but someone else find harmful inside our own borders?
What about on-line gambling? What if the site resides entirely in a physical location where gambling is legal?
How do we incorporate cultural sensitivity? What about labels (some can have profound cultural connotations).
What Goes Into Building  A Digital Library?
1. Preliminaries:
- ideaology
purpose
distinction between a work and a document
how to handle duplicates?
when do new editions replace old?
- converting an existing library - why?
3 main advantages of digital libraries over conventional ones;
1. they are easier to access remotely
2. they are easier to search and browse
3. serve as foundation for value-added services
What drives library collections:
1. priority of utility
2. local imperative
3. preference for novelty
4. implication of intertextuallity
5. scarcity of resources
6. commitment to the transition
- bulding a new collection
why
copyrights?
metatdata?
- making use of virtual libraries
- does not itself hold content
- designing the bibiography
- what goes into it?
- how to decide?
- figuring out modes of access
- searching
- matching (list of 48 different ways to spell Muammar Qaddafi; 15 different titles for "Hamlet")
- digitizing documents
scanning,
physical handling,
project planning
management

2. User Interfaces:
- organization : hierarchy; OO; flat
- page images
- audio
- video
- foreign languages
- metadata
- searching
- browsing

3. The Documents:
- representing (data formats)
- unicode
- PDF & PostScript
- word-processor
- fonts
- indexing
- audio formats
- video formats
- compression

4. Markup and Metadata
- formats
- presentaion of marked-up documents
- bibliographical data
- extracting metadata (you mean, we're not doing this by hand?)

Back to TopCPSC 461: Copyright (C) 2003 Katrin Becker Last Modified November 25, 2003 12:06 PM