Part 2: Linear regression¶

6. Background on regression modeling¶

6.1 Regression models¶

6.2 Fitting a simple regression to fake data¶

6.3 Interpret coefficients as comparisons, not effects¶

6.4 Historical origins of regression¶

6.5 The paradox of regression to the mean¶

6.6 Bibliographic note¶

6.7 Exercises¶

7. Linear regression with a single predictor¶

7.1 Example: predicting presidential vote share from the economy¶

7.2 Checking the model-fitting procedure using fake-data simulation¶

7.3 Formulating comparisons as regression models¶

7.4 Bibliographic note¶

7.5 Exercises¶

8. Fitting regression models¶

8.1 Least squares, maximum likelihood, and Bayesian inference¶

8.2 Influence of individual points in a fitted regression¶

8.3 Least squares slope as a weighted average of slopes of pairs¶

8.4 Comparing two fitting functions: lm and stan_glm¶

8.5 Bibliographic note¶

8.6 Exercises¶

9. Prediction and Bayesian inference¶

9.1 Propagating uncertainty in inference using posterior simulations¶

9.2 Prediction and uncertainty: predict, posterior_linpred, and posterior_predict¶

9.3 Prior information and Bayesian synthesis¶

9.4 Example of Bayesian inference: beauty and sex ratio¶

9.5 Uniform, weakly informative, and informative priors in regression¶

9.6 Bibliographic note¶

9.7 Exercises¶

10. Linear regression with multiple predictors¶

10.1 Adding predictors to a model¶

10.2 Interpreting regression coefficients¶

10.3 Interactions¶

10.4 Indicator variables¶

10.5 Formulating paired or blocked designs as a regression problem¶

10.6 Example: uncertainty in predicting congressional elections¶

10.7 Mathematical notation and statistical inference¶

10.8 Weighted regression¶

10.9 Fitting the same model to many datasets¶

10.10 Bibliographic note¶

10.11 Exercises¶

11. Assumptions, diagnostics, and model evaluation¶

11.1 Assumptions of regression analysis¶

11.2 Plotting the data and fitted model¶

11.3 Residual plots¶

11.4 Comparing data to replications from a fitted model¶

11.5 Example: predictive simulation to check the fit of a time-series model¶

11.6 Residual standard deviation σ and explained variance R²¶

11.7 External validation: checking fitted model on new data¶

11.8 Cross validation¶

11.9 Bibliographic note¶

11.10 Exercises¶

12. Transformations and regression¶

12.1 Linear transformations¶

12.2 Centering and standardizing for models with interactions¶

12.3 Correlation and 'regression to the mean'¶

12.4 Logarithmic transformations¶

12.5 Other transformations¶

12.6 Building and comparing regression models for prediction¶

12.7 Models for regression coefficients¶

12.8 Bibliographic note¶

12.9 Exercises¶

Part 3: Generalized linear models¶

13. Logistic regression¶

13.1 Logistic regression with a single predictor¶

13.2 Interpreting logistic regression coefficients and the divide-by-4 rule¶

13.3 Predictions and comparisons¶

13.4 Latent-data formulation¶

13.5 Maximum likelihood and Bayesian inference for logistic regression¶

13.6 Cross validation and log score for logistic regression¶

13.7 Building a logistic regression model: wells in Bangladesh¶

13.8 Bibliographic note¶

13.9 Exercises¶

14. Working with logistic regression¶

14.1 Graphing logistic regression and binary data¶

14.2 Logistic regression with interactions¶

14.3 Predictive simulation¶

14.4 Average predictive comparisons on the probability scale¶

14.5 Residuals for discrete-data regression¶

14.6 Identification and separation¶

14.7 Bibliographic note¶

14.8 Exercises¶

15. Other generalized linear models¶

15.1 Definition and notation¶

15.2 Poisson and negative binomial regression¶

15.3 Logistic-binomial model¶

15.4 Probit regression: normally distributed latent data¶

15.5 Ordered and unordered categorical regression¶

15.6 Robust regression using the t model¶

15.7 Constructive choice models¶

15.8 Going beyond generalized linear models¶

15.9 Bibliographic note¶

15.10 Exercises¶

Part 4: Before and after fitting a regression¶

16. Design and sample size decisions¶

16.1 The problem with statistical power¶

16.2 General principles of design, as illustrated by estimates of proportions¶

16.3 Sample size and design calculations for continuous outcomes¶

16.4 Interactions are harder to estimate than main effects¶

16.5 Design calculations after the data have been collected¶

16.6 Design analysis using fake-data simulation¶

16.7 Bibliographic note¶

16.8 Exercises¶

17. Poststratification and missing-data imputation¶

17.1 Poststratification: using regression to generalize to a new population¶

17.2 Fake-data simulation for regression and poststratification¶

17.3 Models for missingness¶

17.4 Simple approaches for handling missing data¶

17.5 Understanding multiple imputation¶

17.6 Nonignorable missing-data models¶

17.7 Bibliographic note¶

17.8 Exercises¶

Part 5: Causal inference¶

18. Causal inference and randomized experiments¶

18.1 Basics of causal inference¶

18.2 Average causal effects¶

18.3 Randomized experiments¶

18.4 Sampling distributions, randomization distributions, and bias in estimation¶

18.5 Using additional information in experimental design¶

18.6 Properties, assumptions, and limitations of randomized experiments¶

18.7 Bibliographic note¶

18.8 Exercises¶

19. Causal inference using regression on the treatment variable¶

19.1 Pre-treatment covariates, treatments, and potential outcomes¶

19.2 Example: the effect of showing children an educational television show¶

19.3 Including pre-treatment predictors¶

19.4 Varying treatment effects, interactions, and poststratification¶

19.5 Challenges of interpreting regression coefficients as treatment effects¶

19.6 Do not adjust for post-treatment variables¶

19.7 Intermediate outcomes and causal paths¶

19.8 Bibliographic note¶

19.9 Exercises¶

20. Observational studies with all confounders assumed to be measured¶

20.1 The challenge of causal inference¶

20.2 Using regression to estimate a causal effect from observational data¶

20.3 Assumption of ignorable treatment assignment in an observational study¶

20.4 Imbalance and lack of complete overlap¶

20.5 Example: evaluating a child care program¶

20.6 Subclassification and average treatment effects¶

20.7 Propensity score matching for the child care example¶

20.8 Restructuring to create balanced treatment and control groups¶

20.9 Additional considerations with observational studies¶

20.10 Bibliographic note¶

20.11 Exercises¶

21. Additional topics in causal inference¶

21.1 Estimating causal effects indirectly using instrumental variables¶

21.2 Instrumental variables in a regression framework¶

21.3 Regression discontinuity: known assignment mechanism but no overlap¶

21.4 Identification using variation within or between groups¶

21.5 Causes of effects and effects of causes¶

21.6 Bibliographic note¶

21.7 Exercises¶

Part 6: What comes next?¶

22. Advanced regression and multilevel models¶

22.1 Expressing the models so far in a common framework¶

22.2 Incomplete data¶

22.3 Correlated errors and multivariate models¶

22.4 Regularization for models with many predictors¶

22.5 Multilevel or hierarchical models¶

22.6 Nonlinear models, a demonstration using Stan¶

22.7 Nonparametric regression and machine learning¶

22.8 Computational efficiency¶

22.9 Bibliographic note¶

22.10 Exercises¶