[Last Update: Tuesday, May 13, 2003]
Cpsc 461 - Copyright (C) 2003 Katrin Becker
Computer Science 461
Course Information and Outline
Spring / Fall 2003
Course Requirements: Cpsc 333 is not a prerequisite for this course, but 331 and 355 are. Students must have an un-weighted average of C- or better in the two exams in order to pass the course. Students must attempt and submit ALL assignments in order to pass the course!!!!
Calculation of Final Mark

General Information

What this course is about.
461 Topics List
Other topics to cover as time permits.
Marks will be calculated as follows: (see course schedule for dates)
40% - Assignments
20% - In-Class Midterm (1 hour - open book)
40% - Final Exam (2 hours - open book)

 A Note on Course Requirements:

'A' means: "Excellent: Superior performance showing comprehensive understanding of subject matter."
'B' means: "Good: clearly above average performance with knowledge of subject matter generally complete."
'C' means: "Satisfactory: basic understanding of the subject matter."
'D' means: "Minimal Pass: marginal performance; generally insufficient preparation for subsequent courses in the same subject."
'F' means: "Fail: unsatisfactory performance or failure to meet course requirements."

In order to pass this course, students must:

1. Submit ALL assignments (assignments may not be skipped: all reasonable efforts will be considered adequate for this requirement).
2. Achieve a minimum, unweighted average of C- (1.7/4.0) on the 2 exams. This grade is calculated as follows:
((adjusted score on midterm in %) + (adjusted score on final in %))*2
to give a score out of 4.0
RECOMMENDED [not required] : Course Text: File Structures - An Object-Oriented Approach With C++, M.J.Folk, B.Zoellick & Riccardi (Addison Wesley)
RECOMMENDED [not required] : Mac Hall Copy Centre Course Pack: 461 Web Notes [These are hard copy versions of the notes available on the web]

Top 10 List [my current favorite sources of information for this course; not necessarily in order]:
  1. Salomon, David Data Compression, The Complete Reference, 2E, 2000 Springer ISBN 0-387-95045-1
  2. Fortner, Brand, The Data Handbook, 2nd Ed., 1995, Spinger-Verlag ISBN 0-387-94505-9
  3. Barker, Richard, and Paul Massiglia, Storage Area Network Essentials, 2002 John Wiley & Sons, Inc. ISBN 0-471-03445-2
  4. DDJ, Doctor Dobb's CD-ROM Library: Dr. Dobb's Essential Books on File Formats (Release 1 on CD-ROM), 2000 Dr.Dobb's Journal
  5. Chapman, Nigel & Jenny, Digital Multimedia, 2000 John Wiley & Sons ISBN 0-471-98386-1
  6. Bentley, Jon, Programming Pearls 2nd Ed, 2000 Addison-Wesley ISBN 0-201-65788-0
  7. DeGrace, Peter, and Leslie Hulet Stahl, Wicked Problems, Righteous Solutions, 1990 Yourdon Press Computing Series ISBN 0-13-590126-X
  8. Nielsen, Jakob Designing Web Usability, 1999 New Riders ISBN 1-56205-810-X
  9. Weinberg, Gerald M., The Psychology of Computer Programming, Silver Anniversary Edition, 1998, Dorset House ISBN: 0-932633-42-0
  10. Khurshudov, Dr. Andrei , The Essential Guide to Computer Data Storage 2001 Prencitce-Hall ISBN 0-13-092739-2
Assignments: Assignments are due at midnight (23:59) on the due date and will be penalized one full letter grade for each 24 hours late. NOTE: System failures (weekend or any other time) will not be grounds for extension of assignment deadlines. Note deadlines carefully and pace your work accordingly.
Asst 1: HARDWARE ASSIGNMENT (5%) [as there is no third assignment in the spring, this assignment is worth 10% IN SPRING ONLY]
Asst 2: THREE-PART ASSIGNMENT: (split as 5%,10%15% = 30%)
(see main page for due dates)
Anything discussed in class, labs, or assigned as reading is fair game for exams. Topics found in this site that were NOT discussed in class or labs will not be included on either exam.
Mid-Term Exam: will be held during class time on OCTOBER 30 IN CLASS. This is a one-hour exam. Students will not be given extra time if they arrive late. The spring midterm is scheduled for JUNE 12, during the second half of the class.
Final Exam: will be scheduled by the Registrar's Office. It will be a two hour final worth 40% of the final grade.


Labs in this course are intended as a supplement to lectures. The Lab Instructor will be available during the designated lab times to answer course-related questions. Occasionally, there will be testable material to be covered in labs that will not be covered in lectures. All students are responsible for learning this material. As it is the Lab Instructor who marks your assignments, questions about what is acceptable and explanations of marks given are to be directed at him/her in a timely manner.

General Course Information


- textbook
- 461 website (course notes are password protected)
- selected research papers (will be handed out in class or made available on line)
- see the resources page for other books (many are now available in the library)
- see the external links page for links to Web-Sites
- see the on-line local page for a list of the contents of the course directory (/home/profs/becker/Courses/461)
- any other electronic/ hardcopy/ real-life sources as appropriate


The topics covered in this course include:

- Fundamental File Operations
- Secondary Storage Devices
- File Anatomy: bytes; fields; records; etc.
- Indexing
- Data Compression
- File Formats (graphics, scientific)
- Signatures
- UNIX File Systems
-Cosequential Processing

, plus some of (depending on time):

- Hierarchical Files
- Distributed Files
- Other File Systems (NT, BeOS, etc.)
- File Security
- Digital Libraries
- Storage Area Networks
- Data Mining
- Web Site Design and Organization
- File Design for Games

What this course is about:

Information Storage and Retrieval, largely but not exclusively with respect to files.

Dealing with data in files:
- storing
- manipulating
- organizing
- representing data
Dealing with the files themselves:
- file systems
- working with sets of files
Objectives: (what I want you to take away from this course):
- data structures are merely building blocks used to organize information
- algorithms merely outline approaches to the solution of a problem
- the mix & match approach is to be encouraged


The main focus of this course will be qualitative rather than quantitative. Well-developed communication skills are very important to your success in this course. You need to be able to assess numerous approaches to a problem and justify your answers. There is rarely one correct answer to any question in this course. The answer to almost every question in this course begins with "It depends..."
Understanding the algorithms and concepts is key. Many, if not all of the algorithms discussed in this course are well-understood and well-tested. There will rarely be a need (after this course) to "re-invent" any of them. In many cases the code exists as public domain to be re-used as appropriate. If you remember and understand the algorithms as well as when and why they are useful, you will always be able to evaluate 'pre-written' code for its suitability rather than having to spend time re-writing it yourself (pick the right algorithms for the given circumstances and "apply as needed").


This is a senior level course and assumes a certain level of sophistication and maturity in both your learning and your programming skills. To succeed in this course you must be self-motivated and capable of doing some 'research' on your own.

461 Topics

The following is a list of the topics (linked to related notes) that may be covered in this course. Details and major references for each section are linked to the notes. NOTE: Specific topics for this term and the order in which they are covered are subject to change. See the site's main page for the schedule as it unfolds.

Back to TopIntroduction
Why is knowledge of file architecture and processing important? [text CH 1 ]
Back to TopBasic File Ops
What are the basic file operations and structure? [text CH 2 ]
UNIX I/O - in C (may be done in labs)
Binary I/O (2001: NEW )
The Role of Cache Memory (2001: NEW )
Gaining Control of Files Structures and I/O

Learning objectives:
  1. Describe the basic file operations independently of any specific programming language.
  2. Explain the distinction between the physical and logical representation of files and how to implement this oneself.
  3. Explain basic information storage and retrieval concepts.
Back to TopHardware Bits: Mass Storage [text CH 3 & Web References ] (Updated Annualy)
Disks : HD, Floppy, SuperDisk, Zip & Jazz, RAIDs
Magnetic Tape : Open Reel, Exabyte, DLT, QicTape, DAT, Pereos
Optical Devices : CD-ROM, CD-R, MO, WORM, PD, DVD
Memory Devices
Mass Storage for Hand-Held Devices [Wireless]
New Technologies

Learning objectives: Summarize the evolution of mass storage devices from early visions up through modern offerings, distinguishing their respective capabilities and future potential.
Back to TopFile Anatomy [text CH 4&5 ]
Fields & Records
Record Access
Other kinds of Files

Learning objectives: Explain the concepts of records, record types, and files, as well as the different techniques for placing file records on disk.
Back to TopOrganizing Efficiently [text CH 6 ]
Recovering Wasted Space
Finding Things
'Creative Sorting'

Learning objectives: Justify the need for self-determined internal control of files.
Back to TopIndexing [text CH 7, Knuth ]
Indexed Sequential
Multiple Keys
Secondary Keys and Multiple Indicies
Inverted Lists
Transposed Lists
Free Space

Learning objectives:
  1. Give examples of the application of primary, secondary, and clustering indexes.
  2. Distinguish between a nondense index and a dense index.
Back to TopCosequential Processing and Sorting Large Files [text CH 8, Knuth ]
Working with two lists
K-Way Merge
Selection Trees
Merging to sort large files on disk
including polyphase & cascade merge
Sorting on tape (of historical significance)
Memory Sorts (2001: NEW )

Learning objectives:
  1. Describe the key issues involved in the design of efficient multi-file manipulation strategies.
Back to TopSignatures [NOT IN TEXT see Web Notes & References, Tharp ]
Binary Attributes
Use of Signatures
Partial Match Retrieval
Bloom Filters

Learning objectives:
  1. Describe the role of signatures in searching.
  2. Implement several signature strategies.
Back to TopData Compression [text Ch. 6 & Web Notes & References, Solomon ] (Added in 98)
Run Length Encoding and other Intuitive Methods
Statistical Methods
Arithmetic Coding
Dictionary Methods
Image Compression

Learning objectives:
  1. Explain and implement the common compression algorithms.
  2. Explain the relative advantages and disadvantages of each.
Back to TopFile Formats [NOT IN TEXT ] (Added in 99)
Graphics File Formats [ NOT IN TEXT see Web References, Murray ] (Added in 98)
Raster: eg.TIF, GIF, JPG, PCD, BMP
Vector: eg. PIX, PS, DG
Scientific File Formats [ NOT IN TEXT see Web References, Fortner ] (Added in 99)
Audio File Formats [ NOT IN TEXT see Web References, Fortner ] (Added in 2000)


Devices, device drivers, control signals and protocols, DSPs
Applications, media editors, authoring systems, and authoring
Streams/structures, capture/represent/transform, spaces/domains, compression/coding
Content-based analysis, indexing, and retrieval of audio, images, and video
Presentation, rendering, synchronization, multi-modal integration/interfaces
Real-time delivery, quality of service, audio/video conferencing, video-on-demand

Learning objectives:

  1. Describe the media and supporting devices commonly associated with multimedia information and systems.
  2. Explain basic multimedia presentation concepts.
  3. Demonstrate the use of content-based information analysis in a multimedia information system.
  4. Critique multimedia presentations in terms of their appropriate use of audio, video, graphics, color, and other information presentation concepts.
  5. Implement a multimedia application using a commercial authoring system.

 Back to TopFile Systems (Added in 98)
What UNIX Does [NOT IN TEXT, D.Grant ]
Windows95 and NT File Systems [NOT IN TEXT, Mitchell, Nagar ]
BeOS [NOT IN TEXT, Giampaolo ]


Files: data, metadata, operations, organization, buffering, sequential, nonsequential
Directories: contents and structure
File systems: partitioning, mount/unmount, virtual file systems
Standard implementation techniques
Memory-mapped files
Special-purpose file systems
Naming, searching, access, backups

Learning objectives:

  1. Summarize the full range of considerations that support file systems.
  2. Compare and contrast different approaches to file organization, recognizing the strengths and weaknesses of each.
  3. Summarize how hardware developments have lead to changes in our priorities for the design and the management of file systems.

Back to TopData Forensics [NOT IN TEXT, various ] (New in 2002 )
Introduction Only

Learning objectives:
  1. Describe the term "forenzics" w.r.t. CS.
  2. Discuss various techniques used for data recovery.
Back to TopDigital Libraries [NOT IN TEXT, various ] (New in 2003 )
Digitization, storage, and interchange
Digital objects, composites, and packages
Metadata, cataloging, author submission
Naming, repositories, archives
Spaces (conceptual, geographical, 2/3D, VR)
Architectures (agents, buses, wrappers/mediators), interoperability
Services (searching, linking, browsing, and so forth)
Intellectual property rights management, privacy, protection (watermarking)
Archiving and preservation, integrity

Learning objectives:
  • Explain the underlying technical concepts in building a digital library.
  • Describe the basic service requirements for searching, linking, and browsing.
  • Critique scenarios involving appropriate and inappropriate use of a digital library, and determine the social, legal, and economic consequences for each scenario.
  • Describe some of the technical solutions to the problems related to archiving and preserving information in a digital library.
  • Design and implement a small digital library.
 Back to TopDistributed and Multi Files [NOT IN TEXT, Wiederhold ] (Added in 99)
Techniques used for data fragmentation, replication, and allocation

Learning objectives:

Back to TopInformation Storage and Retrieval (Added in 2003)
Possible Topics:

History and motivation for information systems
Information storage and retrieval (IS&R)
Information management applications
Information capture and representation
Analysis and indexing
Search, retrieval, linking, navigation
Information privacy, integrity, security, and preservation
Scalability, efficiency, and effectiveness
Characters, strings, coding, text
Documents, electronic publishing, markup, and markup languages
Tries, inverted files, PAT trees, signature files, indexing
Morphological analysis, stemming, phrases, stop lists
Term frequency distributions, uncertainty, fuzziness, weighting
Vector space, probabilistic, logical, and advanced models
Information needs, relevance, evaluation, effectiveness
Thesauri, ontologies, classification and categorization, metadata
Bibliographic information, bibliometrics, citations
Routing and (community) filtering
Search and search strategy, information seeking behavior, user modeling, feedback
Information summarization and visualization
Integration of citation, keyword, classification scheme, and other terms
Protocols and systems (including Z39.50, OPACs, WWW engines, research systems)

Learning objectives:
Compare and contrast information with data and knowledge.
Summarize the evolution of information systems from early visions up through modern offerings, distinguishing their respective capabilities and future potential.
Critique/defend a small- to medium-size information application with regard to its satisfying real user information needs.
Describe several technical solutions to the problems related to information privacy, integrity, security, and preservation.
Explain measures of efficiency (throughput, response time) and effectiveness (recall, precision).
Describe approaches to ensure that information systems can scale from the individual to the global.
Explain basic information storage and retrieval concepts.
Describe what issues are specific to efficient information retrieval.
Give applications of alternative search strategies and explain why the particular search strategy is appropriate for the application.
Perform Internet-based research.
Design and implement a small to medium size information storage and retrieval system.
Back to TopReliability [NOT IN TEXT see Web Notes & References, Wiederhold ] (Added in 99)
Error Correction

 Back to TopHierarchical File Structures [NOT IN TEXT see On-Line Notes, Wiederhold, Tharp ]

Ring Files (2001: Expanded )
Hierarchical Rings
Record Structure
Searching Rings
Grid Files (2001: Expanded )
Web Site Design:
Web Sites
Organizing Information for Web Sites
Navigation Systems
Labeling Systems
Searching Systems
Clustered Files
Remote Distributed Files
Federated Systems
Back to TopFile Structures for Games [NOT IN TEXT, various ] (Added in 98)
 Introduction Only
Back to TopStorage Area Networks [NOT IN TEXT, various ] (Added in 99)
Introduction Only
Back to TopData Warehousing and Data Mining [NOT IN TEXT, various ] (2001: Expanded )
Introduction Only

The usefulness of data mining
Associative and sequential patterns
Data clustering
Market basket analysis
Data cleaning
Data visualization

Learning objectives:
  1. Compare and contrast different conceptions of data mining as evidenced in both research and application.
  2. Explain the role of finding associations in commercial market basket data.
  3. Characterize the kinds of patterns that can be discovered by association rule mining.
  4. Describe how to extend a relational system to find patterns using association rules.
  5. Evaluate methodological issues underlying the effective application of data mining.
  6. Identify and characterize sources of noise, redundancy, and outliers in presented data.
  7. Identify mechanisms (on-line aggregation, anytime behavior, interactive visualization) to close the loop in the data mining process.
  8. Describe why the various close-the-loop processes improve the effectiveness of data mining.

Back to Top Cpsc 461 - Copyright © 2002 Katrin Becker Last Modified May 13, 2003 02:49 PM