CPSC 461: Copyright (C) 2003 Katrin Becker Last Modified June 21, 2003 03:15 PM

Nyberg, Barclay, Cvetanovic,Gray,Lomat

"AlphaSort: A Cache-Sensitive Parallel External Sort"

- Summary

Keywords:

cache-sensitive, memory-intensive, clustered data structures, cache locality, file striping, QuickSort, replacement-selection, MinuteSort, PennySort

The Sort Benchmark:

Performance metric is the elapsed time of the following seven steps:

  1. Launch the Sort Program
  2. Open the Input File and Create the Ouput File
  3. Read the Input File
  4. Sort the records in key-ascending order
  5. Write the output file
  6. Close the Files
  7. Terminate the Program

Bottlenecks during a sort:

Typical Memory Hierarchy:

Optimizing the use of processor cache on a sort:

Replacement Selection:

QuickSort:

The Sorts:
Record Sort
This is the sort with which we are most familiar.
The records themselves are compared against each other and moved around until they are all “in place”.
*many accesses***many moves***no space overhead*



Pointer Sort
Place pointers to the records in the correct order without moving the records themselves.
When done, retrieve and move the records.
Still requires records access for each compare.
*many accesses***one move***little space overhead*


Key/Pointer Sort
Get keys, and place key-pointer pairs in the correct order without moving the records themselves.
When done, retrieve and move the records.
Still requires records access for each compare.
*one access***one move***moderate space overhead*



Key-Prefix Pointer Sort
Get prefixes, and place prefix-pointer pairs in the correct order without moving the records themselves. Record access required when prefixes are the same.
When done, retrieve and move the records.
Still requires records access for each compare.
*few accesses***one move***middle space overhead*



Shared-Memory Multiprocessor Optimizations:
- posses a different situation from a multi-processor system where each processor has its own memory (more like a network then)
- break up the sorting work into independent chores that can be handled by "the workers",
Chores that can be done independently:
- generating the arrays of prefix-pointer pairs
- do the QuickSort (each gets one run )
ROOT merges all key-prefix/pointer pairs to produce a string of sorted pointers
- gather records into output buffers
ROOT writes them

File Striping:
- breaking up file across multiple devices
- bandwidth growth is near linear until a contoller saturates (one controller can cope with > 1 disk)


General Comments on Cache Misses:
- a program that doesn't fit in the cache will suffer from either instruction or data cache misses
- can reduce cache-misses when using a tree by clustering the nodes of the tree
- can reduce cache misses by using a "line-list": like a linked-list, but each "line" is the size of cache
- watch out for potential fragmentation problems (sound familiar?)
- on the other hand, line-lists can improve memory utilization by reducing the number of pointers required
- can be used effectively when the data being accessed is bigger than cache
- can aslo cluster items B-tree style (but un this case it is far more static and therefor easier to maintain)

Back to TopCPSC 461: Copyright (C) 2003 Katrin Becker Last Modified June 21, 2003 03:15 PM