Math 389L

Schedule | Info

Advanced Big Data Analysis

Math 389L, Spring 2019
Claremont Graduate University
Professor: Weiqing Gu
Teaching Assistant: Conner DiPaolo

Meeting Time

T 07:00-09:50PM. SHAN 3460/3485

Office Hours

T 06:15-07:00PM. SHAN 3460/3485 (with Conner)

Course Description

This graduate level course is designed to give students a snapshot of recent techniques used to analyze, statistically and algorithmically, extremely large datasets. To accomplish this goal, the course will start with an applied and quick introduction to necessary optimization background. From there we will introduce students to topics such as spectral graph clustering, fast kernel methods, compressed sensing, among others. We will highlight applications of these methods to diverse areas such as genomics and recommender systems, but the bedrock of the course will be theory. To that end, students are expected to have a solid foundation in probability and analysis, as well as comfort with algorithmic thinking.

Textbooks

Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge university press.

Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science. Cambridge University Press.

Woodruff, D. P. (2014). Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical CS.

Grading

Homework

Problem sets will be due (virtually) every week, on Tuesday in class. Problems will be discussed in class, and often will be designed to investigate or re-prove results from research in this area. Problems might require coding. For this we recommend either Python (eg. using numpy and scipy), Matlab, or R.

Midterm Project

Details given in class, but this will reflect the final project in nature. If the final project is to be a continuation of the midterm project (which is expected), significant additional progress must be made.

Final Project

The final project is intended to give students, in groups of 2-3, the opportunity to deep-dive into a specific area of interest in linear algebra or matrix analysis. This could be theoretical or applied, but in both cases should be originated from a single question. For example, such questions might be:

This motivating question will be turned in as the single sentence alone, typed and stapled onto the back of their midterm project. Students are encouraged to discuss their ideas with teaching staff before committing to a question. Staff can also help students less fluent in the field to find topics that might intersect their interests. Questions should be quite focused.

About a month after proposing their initial question, students will submit a literature review of work that attempts to answer their question. This will be at most four pages in LaTeX, one inch margins, not including references. The review should include important definitions, discuss the body of work surrounding the question. At the top of the paper, as an abstract, the student should include a refined version of their motivating question.

By the end of the course, students are expected to continue investigating their question. In particular, students should be able to find a concrete open problem in the area or blind spot in the research body. (Hint: look at the end of recent papers). Open problems can be empirical (e.g. investigating the geometry of neural network loss surfaces through the spectral information in the Hessians), applied theory (e.g. creating algorithms for robust low rank approximation), or theoretical (e.g. lower bounds on robust low rank approximation).

Before the end of the course, using this prior work on the project, the student will create a paper of at least 12 pages that details the background and progress of the research body on their open problem, promising directions, and demonstrations of results (computations or proofs). If the student is able to solve or even make concrete progress towards the open problem they will get an A on the final paper. Otherwise, experimental evidence towards their open problem is expected. The group will also give a presentation of at most 15 minutes detailing this adventure.

Deadlines

Disabilities

Students who need disability-related accommodations are encouraged to discuss this with the instructor as soon as possible.