Spring 2017

Tues/Thurs 12:10 pm - 13:30 pm

Instructor: Cho-Jui Hsieh Office location: Mathematical Sciences Building (MSB) 4232 Email: chohsieh@ucdavis.edu Office hours: Wednesday 2pm-3pm |

TA: Huan Zhang (ecezhang@ucdavis.edu), Clark Fitzgerald (clarkfitzg@gmail.com) TA office hours: Tuesday 2pm-4pm (MSB 1117) |

Announcements

final project proposal guidline

Basic Linear Algebra (Notes)

Overview

Course description

This course explores aspects of scaling statistical computing for large data and simulations. It will cover (1) How to write a good program for analyzing data, (2) Data-intensive computing for statistical models, and (3) How to parallelize the code for handling big data. The goal is to learn practical techniques to efficiently handle real world data mining tasks and competitions.

Syllabus

A high-level summary of the syllabus is as follows:

This course explores aspects of scaling statistical computing for large data and simulations. It will cover (1) How to write a good program for analyzing data, (2) Data-intensive computing for statistical models, and (3) How to parallelize the code for handling big data. The goal is to learn practical techniques to efficiently handle real world data mining tasks and competitions.

Syllabus

A high-level summary of the syllabus is as follows:

I. Statistical Programming (in Python)

II. Advanced statistical computing

- Basic python programming
- Numpy and Scipy
- Big-O: analyzing the speed of your program
- Basic algorithms and data structure

III. Parallel computing

- Linear algebra and applications
- Optimization and applications
- Clustering, classfication, regression, EM

- Scikit-learn

- Multicore programming
- Distributed (MapReduce)

Grading Policy

Grades will be determined as follows:

- Homework (60%)
- Final project (30%)
- Class participation, including attendance (10%)

Schedule

- Course Intro
- Slides: lecture_0
- Basic Python Programming
- Time complexity analysis, basic algorithms and data structures
- Numerical Linear Algebra for Statistics
- Slides: [ lecture_5 ] [ lecture_6 ] [ lecture_7 ] [ lecture_note_linear_algebra ]
- Reading Material: [ PageRank ] [ Hubs & Authorities] [ word2vec]
- Homework: [ homework_2 ]
- Numerical Optimization for Statistics
- Slides: [ lecture_notes_optimization_1 ] [ lecture_notes_optimization_2] [ lecture_8 ] [ lecture_notes_optimization_3]
- Reading Material:
- Homework: [ homework_3 ]
- Computing for Statistical Models
- Multicore and Distributed Computing
- Slides: [ lecture_11 ] [ lecture_12 ]
- Reading Material: [ introduction to python multiprocessing ] [tutorial with more detail ]
- Homework: [ homework_4 ]