\documentclass[11pt]{article}
\usepackage{classDM14}
\usepackage{hyperref}
\title{Asmt 5: Regression}
\author{Turn in through Canvas by 5pm: \\
Wednesday, April 09 \\
20 points}
\date{}
\begin{document}
\maketitle
%\end{titlepage}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{Overview}
In this assignment you will explore regression techniques on high-dimensional data.
You will use a few data sets for this assignment:
\begin{itemize} \denselist
\item \href{http://www.cs.utah.edu/~jeffp/teaching/cs5140/A5/A.dat}{\texttt{http://www.cs.utah.edu/\~{}jeffp/teaching/cs5140/A5/A.dat}}
\item \href{http://www.cs.utah.edu/~jeffp/teaching/cs5140/A5/X.dat}{\texttt{http://www.cs.utah.edu/\~{}jeffp/teaching/cs5140/A5/X.dat}}
\item \href{http://www.cs.utah.edu/~jeffp/teaching/cs5140/A5/Y.dat}{\texttt{http://www.cs.utah.edu/\~{}jeffp/teaching/cs5140/A5/Y.dat}}
\item \href{http://www.cs.utah.edu/~jeffp/teaching/cs5140/A5/M.dat}{\texttt{http://www.cs.utah.edu/\~{}jeffp/teaching/cs5140/A5/M.dat}}
\item \href{http://www.cs.utah.edu/~jeffp/teaching/cs5140/A5/W.dat}{\texttt{http://www.cs.utah.edu/\~{}jeffp/teaching/cs5140/A5/W.dat}}
\end{itemize}
and a file stub:
\begin{itemize} \denselist
\item \href{http://www.cs.utah.edu/~jeffp/teaching/cs5140/A5/FD.m}{\texttt{http://www.cs.utah.edu/\~{}jeffp/teaching/cs5140/A5/FD.m}}
\end{itemize}
These data sets are in matrix format and can be loaded into MATLAB or OCTAVE. By calling
\\
\texttt{load filename} (for instance \texttt{load X.dat})
\\
it will put in memory the data in the file, for instance in the above example the matrix \texttt{X}. You can then display this matrix by typing
\texttt{X}
\vspace{.1in}
\emph{As usual, it is highly recommended that you use LaTeX for this assignment. If you do not, you may lose points if your assignment is difficult to read or hard to follow. Find a sample form in this directory:
\url{http://www.cs.utah.edu/~jeffp/teaching/latex/}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Singular Value Decomposition (3 points)}
First we will compute the SVD of the matrix $A$ we have loaded
\noindent
\texttt{[U,S,V] = svd(A)}
Then take the top $k$ components of $A$ for values of $k = 1$ through $k=10$ using
\noindent
\texttt{Uk = U(:,1:k)}
\\ \noindent
\texttt{Sk = S(1:k,1:k)}
\\ \noindent
\texttt{Vk = V(:,1:k)}
\\ \noindent
\texttt{Ak = Uk*Sk*Vk'}
Compute and report the $L_2$ norm of the difference between $A$ and $Ak$ for each value of $k$ using
\noindent
\texttt{norm(A-Ak,2)}
Find the smallest value $k$ so that the $L_2$ norm of \texttt{A-Ak} is less than 10\% that of \texttt{A}; $k$ might or might not be larger than $10$.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Frequent Directions (10 points)}
Use the stub file $\texttt{FD.m}$ to create a function for the Frequent Directions algorithm (\textbf{Algorithm 16.2.1}). We will consider running this code on matrix \texttt{A}.
\paragraph{A (4 points):}
We can measure the error $\max_{\|x\|=1} | \|A x\|^2 - \|B x\|^2 |$ as
\texttt{norm(A'*A - B'*B, 2)}.
How large does \texttt{l} need to be for the above error to be at most $\|A\|_F^2 /10$?
How does this compare to the theoretical bound (e.g. for $k=0$).
Note: you can calculate $\|A\|_F^2$ as \texttt{norm(A, 'fro')\^{}2}.
\paragraph{B (6 points):}
Frequent Directions should also satisfy another bound based on its Frobenious norm. We can compute $A \Pi_{B_k}$ using
\texttt{Bk = B(1:k,:)} and
then calculating \texttt{A * Bk' * pinv(Bk * Bk') * Bk}.
How large does \texttt{l} need to be to achieve
\[
\|A - A \Pi_{B_k}\|_F^2 \leq 1.1 \cdot \|A - A_k\|_F^2;
\]
for each value $k \in \{1,2,3,4,5,6,7\}$. Answer both by running your algorithm and reporting the theoretical bound provided in the notes.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Linear Regression (7 points)}
We will find coefficients \texttt{A} (not the same as the loaded matrix) to estimate \texttt{X*A = Y}. We will compare two approaches \emph{least squares} and \emph{ridge regression}.
\begin{itemize} \denselist
\item[\textsf{Least Squares:} ] Set \texttt{A = inverse(X' * X)*X'*Y}
\item[\textsf{Ridge Regression:} ] Set \texttt{As = inverse(X'*X + s*eye(6))*X'*Y}
\end{itemize}
\paragraph{A (3 points): }
Solve for the coefficients $A$ (or $As$) using Least Squares and Ridge Regression with $s = \{0.1, 0.3, 0.5, 1.0, 2.0\}$.
For each set of coefficients, report the error in the estimate $\hat{Y}$ of $Y$ as
\texttt{norm(Y - X*A,2)}.
\paragraph{B (4 points): }
Create three row- subsets of \texttt{X} and \texttt{Y}
\begin{itemize} \denselist
\item \texttt{X1 = X(1:8,:)} and \texttt{Y1 = Y(1:8)}
\item \texttt{X2 = X(3:10,:)} and \texttt{Y2 = Y(3:10)}
\item \texttt{X3 = [X(1:4,:); X(7:10,:)]} and \texttt{Y3 = [Y(1:4); Y(7:10)]}
\end{itemize}
Repeat the above procedure on these subsets and \emph{cross-validate} the solution on the remainder of \texttt{X} and \texttt{Y}. Specifically, learn the coefficients \texttt{A} using, say, \texttt{X1 and Y1} and then measure \texttt{norm(Y(9:10) - X(9:10,:)*A,2)}.
Which approach works best (averaging the results from the three subsets): Least Squares, or for which value of $s$ using Ridge Regression?
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{BONUS (3 points)}
Consider a linear equation \texttt{W = M*S} where \texttt{M} is a measurement matrix filled with random values $\{-1, 0, +1\}$ (although now that they are there, they are no longer random), and \texttt{W} is the output of the sparse signal \texttt{S} when measured by \texttt{M}.
Use Orthogonal Matching Pursuit (as described in the notes) to recover the non-zero entries from \texttt{S}. Record the order in which you find each entry and the residual vector after each step.
\end{document}