STA 141C Big Data & High Performance Statistical Computing (Spring 2017)
Spring 2017

Tues/Thurs 12:10 pm - 13:30 pm

Instructor: Cho-Jui Hsieh
Office location: Mathematical Sciences Building (MSB) 4232
Office hours: Wednesday 2pm-3pm
TA: Huan Zhang (, Clark Fitzgerald (
TA office hours: Tuesday 2pm-4pm (MSB 1117)

final project proposal guidline
Basic Linear Algebra (Notes)


Course description

This course explores aspects of scaling statistical computing for large data and simulations. It will cover (1) How to write a good program for analyzing data, (2) Data-intensive computing for statistical models, and (3) How to parallelize the code for handling big data. The goal is to learn practical techniques to efficiently handle real world data mining tasks and competitions.


    A high-level summary of the syllabus is as follows:
I. Statistical Programming (in Python)
II. Advanced statistical computing
III. Parallel computing
  • Multicore programming
  • Distributed (MapReduce)

Grading Policy

Grades will be determined as follows: