Data Science Curriculum Pathway

Author

Felicity Jackson, M.S.
WanWan Li, Ph.D.,
E. F. Valderrama-Araya, Ph.D.,
Dr.-Ing Jan H.R. Woerner.

Course Skills (by Course)

CSC 201: Intro to Data Science CSC 303: Data Science Foundations CSC 464: Machine Learning at Scale CSC 463: Artificial Intelligence CSC 477: Visualizations
Identify data types, collection methods, licensing, and trusted data sources. Master advanced R programming for efficient, maintainable code. Explain ML concepts, applications and challenges. Why ML Hows humans learn and how machines learn. Explain the hierarchical relationship among AI, ML, and DL (AI > ML > DL), with a primary focus on DL, while also introducing LLMs and RAG. Explain the role and principles of data visualization in data science.
Acquire, clean, and preprocess data across multiple platforms. Perform advanced data manipulation using dplyr, data.table, and the broader tidyverse ecosystem. Design, select, train, and deploy machine a learning model: From data preparation to model maintenance based on data at scale (Big Data stack setup, storage & acquisition, batch/real-time/interactive analytics). Define the components and structure of a neural network, and develop deep learning models using neural networks. Conduct EDA to guide visualization choices.
Conduct exploratory data analysis to identify patterns and trends. Build static visualizations using base R and ggplot2 to understand the grammar of graphics. Master supervised learning as linear models, kNN, support vector machines, decision trees, ensemble learning, random forest, and gradient boosted trees Build CNN models. Create static and interactive visualizations and dashboards using ggplot2, Matplotlib, Seaborn, Plotly, Tableau, Quarto, and Jupyter.
Create clear and informative visualizations. Apply regression, ANOVA, logistic regression, time series analysis, and diagnostics; implement supervised (classification/regression) and unsupervised (clustering/dimensionality reduction) learning. Advanced unsupervised learning: Dimensionality reduction & feature extraction, k-means anomaly & novelty detection, and introduction to artificial neuro networks Build Keras neural networks. Apply design principles for clarity and accessibility, and create visualizations that incorporate statistical measures such as p-values and other key metrics to communicate insights effectively.

Survey, Introductory   Foundations   Master Machine Learning   Advanced DL, LLM, RL, RAG   Master Visualization

Course Goals

Course Goal
CSC 201 Introductory survey of data science concepts; build literacy in data acquisition, cleaning, and exploratory analysis.
CSC 303 Strengthen core data science skills with advanced R; math concepts prepare for higher-level ML/AI.
CSC 464 Master Machine Learning for classification/regression; focus on evaluation, bias–variance, feature engineering, hyperparameter tuning, and deployment; practice ensembles, kernels, PCA, neural nets, and text/image task at scale using Big Data techniques
CSC 463 Apply advanced AI (deep learning, RL, LLMs, RAG) and evaluate ethical implications.
CSC 477 Master visualization theory and tools; communicate through static, interactive, and dashboard formats.

Prerequisites, Tools, and Math Requirements

Course Prerequisite(s) Languages & Tools Books Math Level
CSC 201 MAT 232 or MAT 325 R, Excel (base R) Wickham & Grolemund Intro Prob & Stat
CSC 303 CSC 201 R (tidyverse) DS4A; Hastie et al.; Wickham Prob & Stat, Calc, Linear Alg
CSC 464 CSC 303 Python HOML 3rd ed. , Aurélien Géron Prob & Stat, Calc, Linear Alg
CSC 463 CSC 303 R, Python. (Keras) DS4A; Deep Learning for R 2\(^{nd}\) Ed. (Allaire); AI (Russell) Prob & Stat, Calc, Linear Alg
CSC 477 CSC 201 R, Python, Tableau, Shiny, Quarto, Jupyter Wickham(R); McKinney(Python); Ryan(Tableau) Intro Prob & Stat

Whole picture

Venn diagram

AI/ML/DL Diagram.

Course Recurrence

Term F25 S26 F26 S27 F27 S28 F28 S29
Odd/Even 4631 477 463 477 463
Every Year 201, 4642 303 201, 464 303 201, 464 303 201, 464 303

References

  1. Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science (2nd ed.). https://r4ds.hadley.nz/
  2. Grolemund, G. (2014). Hands-On Programming with R. https://rstudio-education.github.io/hopr/
  3. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). ISL (2e). https://www.statlearning.com/
  4. Russell, S., & Norvig, P. (2021). AIMA (4e). https://aima.cs.berkeley.edu/
  5. Chollet, F., Kalinowski, T. and Allaire, J. J., 2022. Deep Learning with R. 2nd ed. Manning Publications..https://www.manning.com/books/deep-learning-with-r-second-edition
  6. Valderrama, E. F. (2024). Probability and Statistics with R. https://efvalder.github.io/RProbStatBook/

Footnotes

  1. this break the rule↩︎

  2. this semester is 461↩︎