CS 848: Advanced Topics in Databases: Distributed Information Systems

Project Information

The term project is the most important part of this course. Projects are to be conducted in teams of two to three students. There are three types of projects that are acceptable in this course:

  1. Research project: Pick a research problem related to a topic in the course and work on it. You are free to pursue a research problem from some other area, e.g. your area, with a distributed (e.g. web or cloud) data management twist to it. Here is a sample project paper from a previous offering of the course.
  2. Implementation project: Design, develop and implement a piece of software or tool on one of the topics related to the course. For example, you could implement a tool that advises on how data can be distributed/replicated in a cloud environment to optimize performance, or you can implement algorithms that solve a common problem and study their performance. For this kind of a project, I expect you to have a working demo at the end of the term.
  3. Survey project: Write a survey on a topic in the course. The expectation for survey projects is that you will use a common and consistent terminology throughout your survey paper even though you will not find this in the papers that you survey. Your survey must provide value beyond what one can get from the papers by just reading them, and that simply paraphrasing papers will not suffice. If your survey, for example, contributes a taxonomy or classification of the algorithms/techniques from multiple papers, and illustrates the workings of these algorithms/techniques using a running example, this would be an acceptable contribution. You are reminded not to plagiarize your sources.

Successful projects could lead to publishable results. Obviously, publishing is not a requirement, but this is the mindset that you should have.

Project Ideas

You will need to form your team and come up with a project idea in the first six weeks of the term. I strongly encourage you to come up with your own project ideas. Choose some open question from the papers on the reading list. Or pick a problem that interests you from outside of data management and give it either a distributed computing or data management twist. Occasionally, I will present a possible direction for projects in class. If you are stuck, you can talk to me about possible project directions.

Timeline

  • 28 February: Two-page project proposal due
  • 28 March and 4 April: Project presentations in class
  • 18 April: Project due (1:00 PM)

Project Proposal

The project proposal should be two pages long in 10 point font and should be given to me in both hardcopy and e-mailed to me as a PDF document. It should include the title of the project, names of the team members, a description of the problem you will work on, why it is important, and how you will attempt to solve it. Try to clearly identify your objectives, methodology, milestones, and metrics for success.

Project Report

The project report should be written in research paper style. It should motivate your problem, discuss related work, and present your solution in detail including its benefits and limitations. The report should be 12 pages maximum (22 pages for survey projects). It should be formatted in the two-column ACM proceedings format, using one of the ACM SIG Proceedings Templates. The ACM templates include headings for "Categories and Subject Descriptors," "General Terms," and "Keywords." You do not need to use any of these headings.

Deliverables

  • Project report in hard copy and PDF.

  • Source code and scripts required to replicate your solution.

  • Demo if applicable.

Evaluation

You will be evaluated on the depth and novelty of your work, on the quality of your written report, and on your research methodology (problem definition, choosing the correct level of abstraction, quality of  implementation, evaluation methodology, experimentation, etc.).

Remember that the aim of the project is to produce a piece of research. You are then to give a presentation and write a paper summarizing your work. In writing a good research paper (or to do a good piece of research as well), the following are usually considered important:

    Statement of the problem. Provide a clear statement of the problem you are solving.

    Motivation. Provide motivation for the problem. Why is the problem interesting? important? challenging?

    Background material. Include enough background material so that the paper will be understandable to your intended audience.

    Review of previous work. Describe the work that other researchers have done on this problem and show why these previous approaches are inadequate.

    Statement of technical solutions. Provide a clear statement of the techniques used to solve the problem. Show how the problem was approached, by what methods and techniques. State any assumptions made for the techniques to work.

    Evaluation. Show how successful your solutions were, and why they were successful. Provide convincing evidence in support of all claims. Carefully evaluate the strengths and limitations of your contribution. Some dimensions for evaluation include empirical results and theoretical analyses. If the evaluation is empirical, are the test instances representative? If the evaluation is emprirical or theoretical, are the simplifying assumptions justified?

    Clarity. Is the paper well written and well organized? Does the writing enable a substantive evaluation of the work? Is good use made of examples to illustrate the problem and solutions?

    Additional advice.