When we teach statistics, we often act as if simple models were right. For example, we might find the line that best fits the data, then act as if that line were a useful description of reality. With the right blend of intuition and luck, this can work --- we can get by with a model that is wrong but not too wrong --- but it's often helpful to have a more nuanced perspective. This class is an introduction to the theory and practice of doing statistics without simplistic modeling assumptions. In short, we will talk about how to answer questions about the world, like whether a treatment helps patients or a policy has the intended effect, by fitting flexible models to data.
Along the way, we'll get in some practice using and generalizing some mathematical tools you've probably seen in calculus, linear algebra, and probability classes. If you're considering graduate school in a discipline that involves quantitative work, that should be useful preparation.
In the beginning of the semester, we'll talk about a few nonparametric regression methods and the criteria we use to evaluate nonparametric regression methods generally. Then we'll shift our focus to theoretical tools that can help us understand how they behave. For the majority of the semester, we'll focus on regression with one covariate. This is atypical in modern data analysis, but it makes things easier to understand, as it means we can visualize our data and the curves we're fitting to it easily. But we'll use concepts relevant to higher-dimensional problems, and the understanding we develop this way will pay off late in the semester, when we'll find ourselves prepared for a sophisticated discussion of the essential challenge of working with big data: the curse of dimensionality.
We will cover a set supervised machine learning tasks including regression, classification, and model selection, using ℓ1-regularized models, shape-constrained models, kernel methods, and more. And we will talk briefly about the modern way to use them to answer questions. If you have heard of augmented inverse propensity weighting (AIPW) or double machine learning (DML), that's what I'm talking about. But the class is not meant to be a broad survey of methods. Instead, our focus will be on mathematical concepts that help us understand what we can and can't trust these methods (and others) to do. Translating this into practice, we'll discuss what we should try to estimate, how we should do it, and what to expect when we do. Causal inference applications will be emphasized. Drawing exercises, computer visualization, and computer simulations will be used to get a feel for the material.
Class will meet on Tuesdays and Thursdays from 4:00-5:15 in PAIS 225.
I'll hold office hours on Tuesdays, excepting university holidays, from 2:00-3:45 in PAIS 583.
You'll need a working understanding of the basic concepts of probability, linear algebra, and calculus. For probability, you'll have to get comfortable talking about events and their probabilities and random variables and their expected values and variances; independence of events and random variables; and the multivariate normal distribution. For linear algebra, likewise for talking about a basis, inner product, orthogonality, eigenvalues, and eigenvectors. And for calculus, we'll work with partial derivatives and calculate an easy integral here and there.
You should be well-prepared if you've taken Math 221, QTM 220 or Math 361/362, and their prerequisites. I'll review most of this as it comes up, so if there are a few gaps or some unfamiliar terminology, it won't be a big deal. If you'd like to do some reading to prepare in advance, take a look at Chapters 1-4 of Larry Wasserman's All of Statistics, a paper copy of which is available from the library. It covers more than you will need on probability. Chapter 6 and Section 8.1 of Nicholson's Linear Algebra with Applications covers enough linear algebra. If you've forgotten the details, or never learned them, that's fine: you will not need to calculate a difficult integral, diagonalize a matrix, or know what the Poisson distribution is.
Aside from lecture slides and the solutions to in-class and homework exercises, you won't need to do any reading to follow what's going on in class. If you like books, Vershynin's High Dimensional Probability is a great read and covers many of the theoretical concepts we'll be talking about. It is, however, intended for graduate students. This is a good book to look at if you've enjoyed the class, feel confident about the material we've covered, and are looking for more breadth or depth. To help you navigate the book, I'll include references to relevant sections in the slides.
Week 0 | |||
Tue, Jan 14 | Lecture | Introduction | |
Thu, Jan 16 | Lab | Implementing Monotone Regression (1/2) | [solution] |
Warm Up | Fitting Lines in CVXR | [solution] | |
Homework | Vector Spaces | [solution] | |
Week 1 | |||
Tue, Jan 21 | Lab | Implementing Monotone Regression (2/2) | [solution] |
Follow Up | Image Denoising | [solution] | |
Thu, Jan 23 | Lecture | Bounded Variation Regression | |
Homework | Inner Product Spaces | [solution] | |
Week 2 | |||
Tue, Jan 28 | Lab | Implementing Bounded Variation Regression | [solution] |
Thu, Jan 30 | Lab | Rates of Convergence | [solution] |
Homework | Option 1. Smooth Regression | [solution] | |
Homework | Option 2. Convex Regression | [solution] | |
Week 3 | |||
Tue, Feb 04 | Lecture | Treatment Effects and the R-Learner | |
Thu, Feb 06 | Lab | The Parametric R-Learner | [solution] |
Week 4 | |||
Tue, Feb 11 | Lab | The Nonparametric R-Learner | [solution] |
Thu, Feb 13 | No Class | Cancelled | |
Week 5 | |||
Tue, Feb 18 | Review | Smooth and Shape-Constrained Regression | |
Thu, Feb 20 | Lecture | Sobolev Regression | |
Homework | Sobolev Models and Finite-Dimensional Approximation | Due Thu, Feb 27 at 11:59pm | |
Week 6 | |||
Tue, Feb 25 | Lab | Implementing Sobolev Regression | |
Thu, Feb 27 | Lecture | Multivariate Sobolev Regression and Image Denoising | |
Week 7 | |||
Tue, Mar 04 | Lecture | Least Squares in Finite Models, i.e. Model Selection (1/2) | |
Thu, Mar 06 | Lecture | Least Squares in Finite Models, i.e. Model Selection (2/2) | |
Week 8 | |||
Tue, Mar 11 | No Class | Spring Break | |
Thu, Mar 13 | No Class | Spring Break | |
Week 9 | |||
Tue, Mar 18 | Lecture | Least Squares in Infinite Models, i.e. Regression, with Gaussian Noise | |
Thu, Mar 20 | Lecture | Least Squares with Misspecification | |
Week 10 | |||
Tue, Mar 25 | Lab | Drawing Gaussian Width | |
Thu, Mar 27 | Lab | Computing Gaussian Width | |
Week 11 | |||
Tue, Apr 01 | Review | ||
Thu, Apr 03 | Lecture | Bounding Gaussian Width using Covering Numbers | |
Week 12 | |||
Tue, Apr 08 | Lecture | Bounding Gaussian Width using Chaining | |
Thu, Apr 10 | Lecture | The Curse of Dimensionality | |
Week 13 | |||
Tue, Apr 15 | Lecture | Least Squares and non-Gaussian Noise | |
Thu, Apr 17 | Lecture | Least Squares, Sampling, and Population MSE | |
Week 14 | |||
Tue, Apr 22 | Review |
Problem sets will be assigned as homework roughly every other week. They'll be posted the schedule above; submission will be via Canvas. Collaboration is encouraged. I prefer that each student write and turn in solutions in their own words, and think that it is often best that this writing is done separately, with collaboration limited to discussion of problems, sketching solutions on a whiteboard, etc. This will help you and I understand where you're at in terms of your proficiency with the material. These are not a test. I will work on them with you during my office hours if you want. That said, it's not necessary to come to hours if there's a problem you can't get. I encourage you to try all the problems, but it's fine to omit a problem or two from what you turn in.
Formatting. Please submit your work as a single PDF or HTML file with answers to each problem in order and clearly labeled. And please try to keep your submissions concise. In particular, include code only if it's asked for explicitly. I'm not asking for beautiful formatting. It's fine, for example, to write answers by hand, take photos, and stick them in a PDF as long as everything is legible, labeled, and in the right order.
Presentations. We'll review some of these homework problems on the days marked Review on the schedule above. I'll ask that you present the solution to a problem (or group of related problems) at least once during the semester, and that you meet with me in advance to discuss what you plan to present. We'll discuss which problems to review and who will present them in class in advance of each review day.
We will not have any exams, quizzes, or projects.
I would like to see you at the majority of our class meetings. That said, schedule conflicts and illness happen. Please do not come to class sick. I will record the lectures and post recordings soon after class. There is no need to explain your absences, but please try to inform me of them in advance of class meetings.
I'm here to teach, not to judge. I'll give written feedback on each problem set and each presentation you give, but I won't be grading each one individually. Instead, we'll meet periodically to discuss your work, your learning goals, and the progress you're making toward them. At the end of the semester, we'll meet to discuss your progress and choose a grade together. If you'd like to check in about your grade in our meetings throughout the semester, that's fine with me; if you'd like to wait until the end, that's fine too.
You are not in competition with your classmates for a limited number of A's.
Incompletes. Incomplete grades are now handled by Emory’s Office of Undergraduate Education, with permission of instructors. The College’s general policy on Incompletes can be found here; further questions can be directed to your OUE Academic Advisor. There must be an agreement between the instructor and the student prior to the end of the course for approval of an Incomplete, in addition to the approval from OUE.
I prefer to speak with students in real time rather than via email. That helps us get to know each other better and tends to lead to more efficient communication. The office hours listed above are set aside entirely for you; you don't need to make an appointment and can come and go as you please. I'll be there. If you'd like to meet outside of these hours, please email me to set up an appointment. I'll do my best to respond to emails within 48 hours.
As the instructor of this course I endeavor to provide an inclusive learning environment. I want every student to succeed. The Department of Accessibility Services (DAS) works with students who have disabilities to provide reasonable accommodations. It is your responsibility to request accommodations. In order to receive consideration for reasonable accommodations, you must register with the DAS here. Accommodations cannot be retroactively applied so you need to contact DAS as early as possible and contact me as early as possible in the semester to discuss the plan for implementation of your accommodations. For additional information about accessibility and accommodations, please contact the Department of Accessibility Services at (404) 727-9877 or accessibility@emory.edu.
Tutors in the Emory Writing Center and the ESL Program are available to support Emory College students as they work on any type of writing assignment, at any stage of the composing process. Tutors can assist with a range of projects, from traditional papers and presentations to websites and other multimedia projects. Writing Center and ESL tutors take a similar approach as they work with students on concerns including idea development, structure, use of sources, grammar, and word choice. They do not proofread for students. Instead, they discuss strategies and resources students can use as they write, revise, and edit their own work. Students who are non-native speakers of English are welcome to visit either Writing Center tutors or ESL tutors. All other students in the college should see Writing Center tutors. Learn more, view hours, and make appointments by visiting the websites of the ESL Program and the Writing Center. Please review the Writing Center’s tutoring policies before your visit.
The Honor Code is in effect throughout the semester. By taking this course, you affirm that it is a violation of the code to to plagiarize, to give false information to a faculty member, and to undertake any other form of academic misconduct. You also affirm that if you witness others violating the code you have a duty to report them to the honor council..