The journey from descriptive statistics to dimension reduction, clustering, classification, and regression reflects the actual workflow of a data scientist: understand your data, prepare it carefully, find structure within it, and build models that are both accurate and interpretable.
The field continues to evolve rapidly. The methods in this textbook are the stable, well-validated foundations — from here, you are equipped to engage with the research literature, implement new methods, and evaluate claims critically.
References
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B, 26(2), 211–243.
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47(5), 1287–1294.
Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, 19(1), 15–18.
Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression, I. Biometrika, 37(3/4), 409–428.
Fisher, R. A. (1925). Statistical methods for research workers. Oliver and Boyd.
Fox, J., & Weisberg, S. (2019). An R companion to applied regression (3rd ed.). SAGE.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer. https://hastie.su.domains/ElemStatLearn/
Henderson, H. V., & Velleman, P. F. (1981). Building multiple regression models interactively. Biometrics, 37(2), 391–411.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning with applications in R (2nd ed.). Springer. https://www.statlearning.com
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). Chapman and Hall.
New York State Department of Conservation. (1973). Daily air quality measurements in New York, May–September 1973.
Pedersen, T. L. (2022). patchwork: The composer of plots (R package version 1.1.2). https://CRAN.R-project.org/package=patchwork
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Ramsey, J. B. (1969). Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society: Series B, 31(2), 350–371.
Robinson, D., Hayes, A., & Couch, S. (2023). broom: Convert statistical objects into tidy tibbles (R package version 1.0.5). https://CRAN.R-project.org/package=broom
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). Springer.
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.
Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B, 73(1), 3–36.
Wood, S. N. (2023). mgcv: Mixed GAM computation vehicle (R package version 1.9-0). https://CRAN.R-project.org/package=mgcv
Xie, Y. (2015). Dynamic documents with R and knitr (2nd ed.). CRC Press.
Zeileis, A., & Hothorn, T. (2002). Diagnostic checking in regression relationships. R News, 2(3), 7–10.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67(2), 301–320.
End of Chapter 10 and the Textbook.