December 11, 2013

CS 234: Data Types and Structures


General description

This course introduces widely used and effective methods of data organization, focusing on data structures, their algorithms, and the performance of these algorithms.

Students learn how the choice of a data structure affects the performance and usability of applications, e.g., in web browsing and searching, computer databases, data analysis, text processing, etc. Specific topics include lists, stacks, queues, sets, maps, priority queues, trees, and graphs, together with general algorithmic techniques, such as sorting, searching, and other transformations on data. Students who successfully complete the course can use these tools to design and develop efficient programs for a wide variety of applications.

Logistics

Audience

  • Students wishing to understand how choices made in program design can significantly affect performance and how to make these choices
  • Students doing a Computer Science minor
  • Highly recommended for students in Information Management and similar fields and for those in the Mathematics/Teaching plan
  • Computer Science majors and SE and CFM students should take CS 240 instead

Normally available

  • Fall, Spring

Related courses

  • Predecessors: One of CS 116, 136, 138, 145 taken fall 2010 or earlier, 146; Not open to Computer Science students. The initial topics in CS 234 will present some material new to those coming out of CS 116 (but covered in CS 136) and some material new to those coming out of CS 136 (but covered in CS 116). Students who have taken introductory computer courses outside the Math Faculty should consult a CS advisor to determine if they are prepared for this course.
  • Successors: CS 338
  • Co-requisites: None
  • Conflicts: CS 240

For official details, see the UW calendar.

Software/hardware used

  • Python (Note: Students should check which version of Python will be used in class.)

Typical reference(s)

  • Rance D. Neśaise, Data Structures and Algorithms Using Python, Wiley, 2010

Required preparation

At the start of the course, students should be able to

  • Use and follow mathematical reasoning and notation; in particular, work comfortably with simple predicate logic, functions, and elementary probability.
  • Describe basic quadratic sorting algorithms, such as Selection Sort or Insertion Sort.
  • Given a simple, well-specified computational task, write a program in a high-level imperative language (e.g., Python, C, or Java) and generate appropriate test cases. Use as appropriate simple types, built-in compound types and control idioms, such as selection, iteration, and recursion. These programs should not require more than a single routine, possibly together with a small number of additional helper routines.
  • Define and write programs that work with recursively defined structures, such as binary trees.

Note: The course uses Python for its programming examples and assignments. The necessary syntax and idiom will be presented at the start of the course as a review for those who know Python and an explanation for those who don't. Experience in other common imperative languages, such as C or Java, will suffice.

Learning objectives

At the end of the course, students should be able to

  • Describe several common computer applications where the choice of data structure can make the difference between efficient usability and complete non-functionality. For each application, describe one data structure appropriate to the application and one structure inappropriate to the application (although appropriate in some other context).
  • Explain the concept of "asymptotic run time" and the effects of logarithmic, linear, quasi-linear, quadratic, cubic, and exponential run times. Given a simple algorithm, determine and informally justify its asymptotic run-time, as expressed in order notation.
  • Explain the benefits of abstraction and representation independence.
  • Explain the concept of Abstract Data Type (ADT). Define several different ADTs (e.g., Stack, Queue, Priority Queue, and Map), and give a sample application for each. Explain the complexity of some common implementations of these ADTs. Given a well-specified computational problem, determine which (if any) would assist the solution.
  • Explain how external memory differs from internal memory, and describe how this difference affects the performance of a program. Give several examples of real-world tasks requiring external memory, and describe a structure and algorithm appropriate for each.
  • Explain and justify the best- and worst-case behaviours of simple quadratic sorting algorithms, as well as Mergesort, Heapsort, Quicksort, and at least one non-comparison-based algorithm. Select the appropriate algorithm for a given set of circumstances. Apply sorting-based techniques to algorithmic problems, such as elimination of duplicates.
  • Describe data structures and algorithms for efficient traversals of graphs (networks) and computations of minimum spanning trees.
  • Explain how specialized data structures can lead to efficient solutions to problems. Illustrate this principle by describing algorithms and data structures for a particular real-world problem (e.g., text compression, processing geometric data, etc.).
  • Write, test, and debug programs using the above structures and algorithms in the Python language or another imperative language known to or subsequently learned by the student. (At the instructor's discretion, students may be allowed to code partially in a language other than in Python. Regardless of such permission, students may be tested on the Python language in assignments and/or exams.)

Typical syllabus

Basic concepts (12 hours)

Note: Each student should find some parts of this material as new and some as review depending on their background.

  • Python language: basic data types, functions, selection, iteration, recursion, strings, association lists, console I/O, and file I/O
  • Runtime of an algorithm: worst-case, best-case, and average-case
  • Asymptotic analysis and order notation
  • Definition and examples of abstract data types
  • Arrays and linked lists

Sorting algorithms (3 hours)

  • Review of Selection Sort and Insertion Sort
  • Mergesort
  • Quicksort
  • Non-comparison-based sorting algorithms (e.g., Bucket Sort, Radix Sort)
  • The worst-case, best-case, and average-case complexity of these algorithms
  • Selecting an appropriate algorithm for a specific application

Stacks, queues, and priority queues (4 hours)

  • Stacks and queues
  • Priority Queue ADT and simple implementations
  • Heaps and Heapsort
  • Application to problems, such as selection

Map (or Dictionary) ADT (9 hours)

  • Map ADT and simple implementations
  • Linear search and binary search
  • Self-organizing lists
  • Balanced search trees (e.g., 2-3-4 trees)
  • Searching in external memory
  • Hashing: key-indexed search, simple hash functions, and collision-resolution strategies
  • Example applications

Graphs (5 hours)

  • Adjacency list and matrix representations
  • Depth-first and breadth-first traversals of graphs
  • Minimum spanning trees as a case study of algorithms and data structures

Minimum spanning trees as a case study of algorithms and data structures (3 hours)

  • A case study in selecting, designing, and implementing data types for a real-world problem (e.g., text compression/retrieval, geometric data, etc.)