2025 ANNUAL AMMNET SENEGAL MEETING/TRAINING
Basics: Introduction to R and Data Visualization
Facilitator: Dennis K. Muriithi
Center for Data Analytics & Modeling, Chuka University - Kenya

INTRODUCTION TO R AND DATA VISUALIZATION TRAINING

Course Overview

This training is designed for beginners who want to learn the fundamentals of R programming and apply it to data visualization. Participants will gain hands-on experience working with a malaria dataset, learning how to analyze, and visualize data effectively.

Training Aims

  1. Understand the Basics of R Programming (Data Import and Exploratory Data Analysis).
  2. Develop proficiency in creating data visualizations in R using ggplot2.
  3. Enhance Data Storytelling Through Visualization.
  4. Introduction to Animated and Interactive Visualizations.
  5. Equip participants with skills to apply R for real-world malaria data analysis.

Learning Objectives

By the end of the training, participants should be able to perform:

  • ✔️ Data Import and Exploratory Data Analysis.
  • ✔️ Data Visualization (bar charts, scatterplots, histograms etc) with ggplot2, Patchwork etc.
  • ✔️ Interpret and customize visualizations for clarity and impact.

Training Structure

A blend of interactive and hands-on methods:

  1. Presentations: Concise theory and examples.
  2. Practical Work: Guided coding exercises.
  3. Group Work: Collaborative problem-solving.
  4. Discussions: Q&A and best practices.

Topic Covered

  1. Section 1: Introduction to R Programming
  2. Section 2: Exploring Data Analysis
  3. Section 3: Data Visualization with ggplot2
  4. Section 4: Advanced Topics and Best Practices

Section 1: Introduction to R and Rstudio Programming

Section 2: Exploring Data Analysis (EDA)

EDA is a critical step before building models, as it helps in:

Section 3: Data Visualization

Data visualization helps in understanding patterns, trends, and relationships in data.

Types of Data Visualization

1. Univariate Data Visualizations (Single Variable)

  • ✔️ Histogram: Used for understanding the distribution of a single variable.
  • ✔️ Box Plot: Used for Detecting outliers and understanding the spread of data.

2. Bivariate Data Visualizations (Two Variables)

  • ✔️ Scatter Plot: UUsed for understanding relationships between two numerical variables.
  • ✔️ Line Plot: UUsed for showing trends over time or continuous data.
  • ✔️ Bar Chart: Used for comparing categorical data.

3. Multivariate Data Visualizations (More than Two Variables)

  • ✔️ Heatmap: Used for visualizing correlations between multiple numerical variables.
  • ✔️ Pair Plot: Used for visualizing pairwise relationships in the dataset.
  • ✔️ Violin Plot: Used for understanding the distribution of a variable across categories.

4. Specialized Data Visualizations

  • ✔️ Pie Chart: Used for representing proportions.
  • ✔️ Bubble Chart: Used for adding a third variable to a scatter plot(Comparing three numerical variables)
  • ✔️ Word Cloud: used to highlight keywords, trends, or themes in textual data (Text Analysis : Highlighting key terms in articles, reviews, or social media posts.)

5. Time Series Visualizations

  • ✔️ Time Series Line plot: Used to analyze trends, patterns, or changes in data over a continuous period (e.g., days, months, years).
  • ✔️ Autocorrelation Plot: Used for finding patterns in time series data.

Section 4: Advanced Topics and Best Practices

Advanced Topics

  • ✅ Interactive Visualizations : –Use tools like Plotly and Dash to enable user interaction.
  • ✅ High-Dimensional Data Visualization : Use PCA, t-SNE, and UMAP for dimensionality reduction.
  • ✅ Time Series Visualization : – Apply line plots, heatmaps, and rolling averages for trends.
  • ✅ Network Graph Visualization : –Use NetworkX and Gephi for social and supply chain analysis.
  • ✅ Big Data Visualization : – Handle large datasets with Dask, Vaex, and Apache Superset.

Best Practices

  • ✅ Choose the Right Chart Type (e.g., bar charts for categories, line charts for trends).
  • ✅ Follow Design Principles (simplicity, consistency, accessibility).
  • ✅ Use Storytelling to highlight key insights and structure visuals logically.
  • ✅ Avoid Common Pitfalls (misleading scales, cluttered visuals, unnecessary 3D charts).