Models of Computation for Massive Data
Instructor : Jeff Phillips |
Office house: Mondays, Wednesdays 12:05 pm - 1:05 pm @ MEB 3147 or MEB 3442
TA: Bigyan Mukherjee |
Office hours: Mondays, Wednesdays 3-4pm @ MEB 3115.
Fall 2011 | Mondays, Wednesdays 10:45 am - 12:05 pm
MEB 3147
Catalog number: CS 7960 01
Description:
This course will explore advanced models of computation pertinent for processing massive data sets.
As data sets grow to terabyte and petabyte scales, traditional models and paradigms of sequential computation become obsolete.
Different efficiency trade-offs analyzing memory usage, I/O calls, or inter-node communication become the dominant bottlenecks.
These paradigms are formalized as I/O-Efficient, Parallel, Streaming, GPU-based, Map-Reduce, and other distributed algorithmic models of computation.
This course will study the history and specifics of these models.
Students in the class will learn the proper settings in which to use these paradigms, the advantages and disadvantages of each model, and how to analyze algorithms with these settings.
They will be evaluated on both analysis problem sets and basic programming assignments within these models.
Schedule: (subject to change)
Grading: The grades will be based on the homework assignments (50%) and a project (50%).
Project: A major component of this class will be a project where you will investigate in-depth a focused topic in a particular model or the relation between several models. Details.
FAQ: Q: Will there be a book?
A: No. Most information is available online and I will post links on the (under-construction) webpage.
As far as I know, this class has not been taught before. Different aspects have, but not as a whole. For instance there is a new book on MapReduce that will definitely help guide that section of this course.
Q: How hands on will the class be?
A: My current aim for the class lecture material is roughly 50% analysis and 50% systems background. But the work for the course will be able to be tailored more towards each student's preference. This is all subject to change, but the plan is half the work will be a series of short assignments which will be half analysis and half small implementation projects. About one for each model we cover. I also plan for there to be a project due at the end of the term, constituting about half the course's workload. This could be more hands on, or could be purely analysis-based. So each student could range from 25% analysis, 75% hands-on to 75% analysis, 25% hands-on.
So, the plan is that everyone will get hands-on experiences working with some/most of the models, and some who choose to go that direction will get their hands much more dirty :).