Data Mining Seminar : Matrix Sketching
Instructors : Jeff Phillips and Mina Ghashami
Spring 2015 | Fridays 1:45 pm - 3:00 pm
Location : MEB 3147 (the LCR)
Catalog number: CS 7931 or CS 6961

A very common way to represent very large data sets is as a matrix. For instance if there are n data points, and each data points has d attributes, then this can be thought of an nxd matrix A with n rows and d columns. While matrix approximation and decomposition has been studied in numerical linear algebra for many decades, these methods often require more space and time than is feasible for very large scale settings, and also often worry about more precision than is required. The last decade has witnessed an explosion of work in matrix sketching where input matrix A is efficiently approximated with a more compact matrix B (or product of a few matrices) so that B preserves most of the properties of A up to some guaranteed approximation ratio. This class will attempt to survey the large and growing literature on this topic, focusing on simple algorithms, intuition for error bounds, and practical performance.

This 1-credit seminar will meet once a week. Instructors will give most lecturs. Students will be expected to carry out a small project explore one or more of the topics we discuss in a bit of depth, and pushing the boundaries of research. They will give a short presentation of their results at the end of class.
Schedule: (subject to change)
Date Topic References Speaker
Fri 1.16 Overview Jeff Phillips
Fri 1.23 Column Sampling Woodruff 2.4 | DGP 2.1, 3.1, 5.1 | Mahoney Jeff Phillips
Fri 1.30 Random Projection and Hashing Woodruff 2.1 | DGP 2.2, 5.2 Mina Ghashami
Fri 2.06 Iterative (Frequent Directions) GLPW | DGP 2.3, 3.2, 5.3 Jeff Phillips
Fri 2.13 CUR Decompositions Woodruff 4.1, 4.2 | Mahoney Mina Ghashami
Fri 2.20 (No Class - Grad Visit Day)
Fri 2.27 Matrix Concentration Bounds Tropp (Ch 5+6) Mina Ghashami
Fri 3.06 Lower Bounds Woodruff 6 Jeff Phillips
Fri 3.13 Sparsification @ 3:15 in WEB 1705 Mina Ghashami
Fri 3.20 (Fall Break - No Class)
Fri 3.27 Regression and L1 (and Lp) Bounds @ 3:15 in WEB 1705 Woodruff 2.5, 3, YMM Jeff Phillips
Fri 4.03 Distributed Models Woodruff 4.4 | GPL Mina Ghashami
Fri 4.10 Tensors Decompositions Mina Ghashami
Fri 4.17 Project Presentations
Fri 4.24 Project Presentations

Useful references:
  • Woodruff : David P. Woodruff Sketching as a Tool for Numerical Linear Algebra. Foundations and Trends in Theoretical Computer Science. Vol. 10,(2014) pages 1-157.
  • Tropp : Joel A. Tropp An Introduction to Matrix Concentration Inequalities. arXiv:1501.01571. To appear in Foundations and Trends in Machine Learning.
  • GLPW : Mina Ghashami, Edo Liberty, Jeff M. Phillips, and David Woodruff Frequent Directions : Simple and Deterministic Matrix Sketching. arXiv:1501.01711.
  • DGP : Amey Desai and Mina Ghashami and Jeff M. Phillips Improved Practical Matrix Sketching with Guarantees. arXiv:1501.06561.
  • Mahoney : Michael W. Mahoney Randomized Algorthims for Matrices and Data. Foundations and Trends in Machine Learning. Vol. 3, (2011) pages 123-224.
  • YMM : Jiyan Yang and Xiangrui Meng and Michael W. Mahoney Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments. arxiv:1502.03032.