CS 240 Data Structures and Data Management
Revised April 28, 2010
General Description
An introduction to widely used and effective methods of data
organization, focusing on data structures, their algorithms, and the
performance of these algorithms.
Specific topics include priority queues, sorting, dictionaries, and data
structures for text processing.
Students will learn to analyze, implement and use various
data structures and data-management techniques in a variety
of applications.
- Perform rigorous complexity analyses of simple
algorithms and data structures. This includes writing correct
mathematical proofs on
inductively-defined structures, and solving simple recurrence relations.
- Compare different data-structuring techniques from
the point of view of time, memory requirements, etc. Determine a good
data structure to solve a specific algorithmic problem.
- Implement data structures correctly and efficiently in one or more
high-level programming languages, including C++.
Logistics
Intended for 2B students in Computer Science. Normally available Fall,
Winter and Spring.
Related courses (see calendar for official details)
- Predecessors: CS 245 (logic) or SE 212; CS 246 or 247 (programming); any of STAT 206,
230, 240 (probability).
-
Successors: Most third-year CS major courses.
-
Conflicts: Other courses that seriously consider efficiency and
correctness of fundamental data structures and their algorithms.
Hardware/Software used: UNIX, C++
Typical Reference(s)
R. Sedgewick, Algorithms in C, 3rd ed,
Parts 1–4. Addison Wesley.
Required preparation
To begin the course, students will need the ability to do the following.
Analytical:
- Define and explain order notation; perform
complexity analyses of simple algorithms.
-
Define "abstract data type" or ADT; explain the utility of this
concept.
-
Perform basic computations of discrete probability and expectation.
-
Use mathematical induction on recursively defined structures,
including finding simple errors in inductive arguments.
-
Prove simple properties of program fragments correct, through the use
of pre-conditions and post-conditions for loops and recursive calls.
Computational and algorithmic:
- Design, implement, test, and debug C++ programs to
solve problems requiring many hundreds of lines of code.
-
Define the ADTs for stacks and queues; write efficient implementations
in C/C++.
-
Describe tree structures, including binary search trees
and multi-way trees; use these structures in C/C++ programs.
-
Describe basic sorting algorithms (including Quicksort)
and implement them in C/C++.
-
Preferably, explain the notion of a hash table (but need not be able to
describe the algorithms or their efficiency).
Learning objectives
By the end of the course, students should have the ability to
- Give rigorous asymptotic analyses of simple algorithms and express the
result using order notation; compare algorithms based on their asymptotic
complexity; and prove formal results involving order
notation.
-
Apply the priority-queue ADT to solve various application problems,
implement a priority queue using heaps, and analyze the
complexity of common implementations of heap operations.
-
Perform best-, worst- and average-case analyses of sorting
algorithms, including Quicksort, and explain the ramifications
of these analyses in practice; explain the basic
principles of randomized algorithms and their potential
advantages, specifically in the case of Quicksort;
explain the difference between comparison-based sorting and
non-comparison-based sorting algorithms, and when and why
the latter may run faster; and apply sorting-based techniques to
algorithmic problems such as elimination of duplicates.
-
Implement bounded-height search trees that accommodate efficient
(i.e., O(log n)) implementations of search,
insert and delete; evaluate which search tree
techniques are best suited to various application scenarios (e.g.,
B-trees are useful for large-scale data structures stored in external
memory).
-
Explain the advantages and disadvantages of various
hashing techniques; determine the best hashing
techniques to use in a particular application scenario; and
recognize when hashing techniques are preferable to other
dictionary implementations.
-
Design data structures for real-world data (where keys
are often inherently multidimensional) in such a way that
common operations (including range search) can be
implemented efficiently.
-
Design special data structures that can efficiently store
and process words and strings, and select and implement a suitable
technique for data compression in a specific application
scenario.
In addition to the above language-independent skills, students
should have the ability to implement (code, debug, and test) any of
the above structures and algorithms in C++, using appropriate design
methodologies and language features. They should have preparation to
transfer these abilities to other languages (once learned).
Typical syllabus
Introduction and review (3 hours)
The basic computer model: the random-access machine. The
runtime of an algorithm: worst-case, best-case, and average-case.
Asymptotic analysis, order notation, growth rates, and complexity.
Stacks, queues, and priority queues (3 hours)
Review of stacks and queues.
The priority queue ADT and simple implementations.
Heaps and Heapsort.
Using heaps to solve the selection problem.
Sorting and analysis of randomized algorithms (5 hours)
Quicksort (non-randomized): worst-case, best-case and
average-case complexity.
Randomized quicksort and its analysis; application to selection and
its analysis.
Lower bound on comparison-based sorting.
Non-comparison-based sorting algorithms (e.g., Counting Sort and Radix
Sort).
Search trees (5 hours)
The Dictionary ADT and simple implementations.
Binary search trees (insert and delete operations and analysis)
Balanced search trees (insert and delete operations and analysis;
instructors will normally choose two or more of AVL trees,
2-3 trees, red-black trees, etc.).
2-3-4 trees and B-trees (search, insert and delete operations and
analysis).
Hashing (5 hours)
Key-indexed search, simple hash functions.
Collision resolution: chaining and open addressing.
Complexity of search, insertion and deletion.
Extendible hashing.
Self-organizing search.
Range search and multidimensional dictionaries (5 hours)
Range search in a binary search tree.
Data structures for orthogonal range search: quad trees, Kd-trees,
range trees.
Algorithms and data structures for text processing (8 hours)
Dictionaries for text strings: radix trees, tries, compressed tries,
suffix tries.
String matching algorithms: brute force, finite automata, the Knuth-Morris-Pratt algorithm.
Text compression: Huffman codes, Lempel-Ziv.


Last modified: Thursday, 08-Mar-2012 15:42:29 EST