Part 2: Linear regression
¶
6. Background on regression modeling
¶
6.1 Regression models
¶
6.2 Fitting a simple regression to fake data
¶
6.3 Interpret coefficients as comparisons, not effects
¶
6.4 Historical origins of regression
¶
6.5 The paradox of regression to the mean
¶
6.6 Bibliographic note
¶
6.7 Exercises
¶
7. Linear regression with a single predictor
¶
7.1 Example: predicting presidential vote share from the economy
¶
7.2 Checking the model-fitting procedure using fake-data simulation
¶
7.3 Formulating comparisons as regression models
¶
7.4 Bibliographic note
¶
7.5 Exercises
¶
8. Fitting regression models
¶
8.1 Least squares, maximum likelihood, and Bayesian inference
¶
8.2 Influence of individual points in a fitted regression
¶
8.3 Least squares slope as a weighted average of slopes of pairs
¶
8.4 Comparing two fitting functions: lm and stan_glm
¶
8.5 Bibliographic note
¶
8.6 Exercises
¶
9. Prediction and Bayesian inference
¶
9.1 Propagating uncertainty in inference using posterior simulations
¶
9.2 Prediction and uncertainty: predict, posterior_linpred, and posterior_predict
¶
9.3 Prior information and Bayesian synthesis
¶
9.4 Example of Bayesian inference: beauty and sex ratio
¶
9.5 Uniform, weakly informative, and informative priors in regression
¶
9.6 Bibliographic note
¶
9.7 Exercises
¶
10. Linear regression with multiple predictors
¶
10.1 Adding predictors to a model
¶
10.2 Interpreting regression coefficients
¶
10.3 Interactions
¶
10.4 Indicator variables
¶
10.5 Formulating paired or blocked designs as a regression problem
¶
10.6 Example: uncertainty in predicting congressional elections
¶
10.7 Mathematical notation and statistical inference
¶
10.8 Weighted regression
¶
10.9 Fitting the same model to many datasets
¶
10.10 Bibliographic note
¶
10.11 Exercises
¶
11. Assumptions, diagnostics, and model evaluation
¶
11.1 Assumptions of regression analysis
¶
11.2 Plotting the data and fitted model
¶
11.3 Residual plots
¶
11.4 Comparing data to replications from a fitted model
¶
11.5 Example: predictive simulation to check the fit of a time-series model
¶
11.6 Residual standard deviation σ and explained variance R²
¶
11.7 External validation: checking fitted model on new data
¶
11.8 Cross validation
¶
11.9 Bibliographic note
¶
11.10 Exercises
¶
12. Transformations and regression
¶
12.1 Linear transformations
¶
12.2 Centering and standardizing for models with interactions
¶
12.3 Correlation and 'regression to the mean'
¶
12.4 Logarithmic transformations
¶
12.5 Other transformations
¶
12.6 Building and comparing regression models for prediction
¶
12.7 Models for regression coefficients
¶
12.8 Bibliographic note
¶
12.9 Exercises
¶
Part 3: Generalized linear models
¶
13. Logistic regression
¶
13.1 Logistic regression with a single predictor
¶
13.2 Interpreting logistic regression coefficients and the divide-by-4 rule
¶
13.3 Predictions and comparisons
¶
13.4 Latent-data formulation
¶
13.5 Maximum likelihood and Bayesian inference for logistic regression
¶
13.6 Cross validation and log score for logistic regression
¶
13.7 Building a logistic regression model: wells in Bangladesh
¶
13.8 Bibliographic note
¶
13.9 Exercises
¶
14. Working with logistic regression
¶
14.1 Graphing logistic regression and binary data
¶
14.2 Logistic regression with interactions
¶
14.3 Predictive simulation
¶
14.4 Average predictive comparisons on the probability scale
¶
14.5 Residuals for discrete-data regression
¶
14.6 Identification and separation
¶
14.7 Bibliographic note
¶
14.8 Exercises
¶
15. Other generalized linear models
¶
15.1 Definition and notation
¶
15.2 Poisson and negative binomial regression
¶
15.3 Logistic-binomial model
¶
15.4 Probit regression: normally distributed latent data
¶
15.5 Ordered and unordered categorical regression
¶
15.6 Robust regression using the t model
¶
15.7 Constructive choice models
¶
15.8 Going beyond generalized linear models
¶
15.9 Bibliographic note
¶
15.10 Exercises
¶
Part 4: Before and after fitting a regression
¶
16. Design and sample size decisions
¶
16.1 The problem with statistical power
¶
16.2 General principles of design, as illustrated by estimates of proportions
¶
16.3 Sample size and design calculations for continuous outcomes
¶
16.4 Interactions are harder to estimate than main effects
¶
16.5 Design calculations after the data have been collected
¶
16.6 Design analysis using fake-data simulation
¶
16.7 Bibliographic note
¶
16.8 Exercises
¶
17. Poststratification and missing-data imputation
¶
17.1 Poststratification: using regression to generalize to a new population
¶
17.2 Fake-data simulation for regression and poststratification
¶
17.3 Models for missingness
¶
17.4 Simple approaches for handling missing data
¶
17.5 Understanding multiple imputation
¶
17.6 Nonignorable missing-data models
¶
17.7 Bibliographic note
¶
17.8 Exercises
¶
Part 5: Causal inference
¶
18. Causal inference and randomized experiments
¶
18.1 Basics of causal inference
¶
18.2 Average causal effects
¶
18.3 Randomized experiments
¶
18.4 Sampling distributions, randomization distributions, and bias in estimation
¶
18.5 Using additional information in experimental design
¶
18.6 Properties, assumptions, and limitations of randomized experiments
¶
18.7 Bibliographic note
¶
18.8 Exercises
¶
19. Causal inference using regression on the treatment variable
¶
19.1 Pre-treatment covariates, treatments, and potential outcomes
¶
19.2 Example: the effect of showing children an educational television show
¶
19.3 Including pre-treatment predictors
¶
19.4 Varying treatment effects, interactions, and poststratification
¶
19.5 Challenges of interpreting regression coefficients as treatment effects
¶
19.6 Do not adjust for post-treatment variables
¶
19.7 Intermediate outcomes and causal paths
¶
19.8 Bibliographic note
¶
19.9 Exercises
¶
20. Observational studies with all confounders assumed to be measured
¶
20.1 The challenge of causal inference
¶
20.2 Using regression to estimate a causal effect from observational data
¶
20.3 Assumption of ignorable treatment assignment in an observational study
¶
20.4 Imbalance and lack of complete overlap
¶
20.5 Example: evaluating a child care program
¶
20.6 Subclassification and average treatment effects
¶
20.7 Propensity score matching for the child care example
¶
20.8 Restructuring to create balanced treatment and control groups
¶
20.9 Additional considerations with observational studies
¶
20.10 Bibliographic note
¶
20.11 Exercises
¶
21. Additional topics in causal inference
¶
21.1 Estimating causal effects indirectly using instrumental variables
¶
21.2 Instrumental variables in a regression framework
¶
21.3 Regression discontinuity: known assignment mechanism but no overlap
¶
21.4 Identification using variation within or between groups
¶
21.5 Causes of effects and effects of causes
¶
21.6 Bibliographic note
¶
21.7 Exercises
¶
Part 6: What comes next?
¶
22. Advanced regression and multilevel models
¶
22.1 Expressing the models so far in a common framework
¶
22.2 Incomplete data
¶
22.3 Correlated errors and multivariate models
¶
22.4 Regularization for models with many predictors
¶
22.5 Multilevel or hierarchical models
¶
22.6 Nonlinear models, a demonstration using Stan
¶
22.7 Nonparametric regression and machine learning
¶
22.8 Computational efficiency
¶
22.9 Bibliographic note
¶
22.10 Exercises
¶