2025 ANNUAL AMMNET SENEGAL MEETING/TRAINING
Basics: Introduction to R and Data Visualization
Facilitator: Dennis K. Muriithi
Center for Data Analytics & Modeling, Chuka University - Kenya
INTRODUCTION TO R AND DATA VISUALIZATION TRAINING
Course Overview
This training is designed for beginners who want to learn the
fundamentals of R programming and apply it to data visualization.
Participants will gain hands-on experience working with a malaria
dataset, learning how to analyze, and visualize data effectively.
Training Aims
-
Understand the Basics of R Programming (Data Import and Exploratory Data
Analysis).
-
Develop proficiency in creating data visualizations in R using ggplot2.
-
Enhance Data Storytelling Through Visualization.
-
Introduction to Animated and Interactive Visualizations.
-
Equip participants with skills to apply R for real-world malaria data
analysis.
Learning Objectives
By the end of the training, participants should be able to
perform:
-
✔️ Data Import and Exploratory Data Analysis.
-
✔️ Data Visualization (bar charts, scatterplots, histograms etc) with
ggplot2, Patchwork etc.
-
✔️ Interpret and customize visualizations for clarity and impact.
Training Structure
A blend of interactive and hands-on methods:
-
Presentations: Concise theory and examples.
-
Practical Work: Guided coding exercises.
-
Group Work: Collaborative problem-solving.
-
Discussions: Q&A and best practices.
Topic Covered
-
Section 1: Introduction to R Programming
-
Section 2: Exploring Data Analysis
-
Section 3: Data Visualization with ggplot2
-
Section 4: Advanced Topics and Best Practices
Section 1: Introduction to R and Rstudio Programming
-
✔️R is a programming language for statistical computing and graphics.
-
✔️ Rstudio is an integrated development environment (IDE) for R that
provides tools for coding, data visualization, and project management.
Installing R and RStudio
-
✔️ Download R latest version 4.4.3 from the CRAN website.
-
✔️ Download RStudio from the RStudio website.
RStudio Interface
-
Scripts: Write and save R code here..
-
Console : Execute commands interactively.
-
Environment: View variables, data frames, and functions in
memory.
-
Plots: Display graphical outputs.
Section 2: Exploring Data Analysis (EDA)
EDA is a critical step before building models, as it helps in:
-
✅ Understanding the data structure and identifying inconsistencies.
-
✅ Detecting missing values, outliers, and unusual patterns.
-
✅ Selecting appropriate features for predictive modeling.
-
✅ Improving data preprocessing and transformation steps.
-
✅ Summarize key characteristics of a dataset.
Section 3: Data Visualization
Data visualization helps in understanding patterns, trends, and
relationships in data.
Types of Data Visualization
1. Univariate Data Visualizations (Single Variable)
-
✔️ Histogram: Used for understanding the distribution of a
single variable.
-
✔️ Box Plot: Used for Detecting outliers and understanding the
spread of data.
2. Bivariate Data Visualizations (Two Variables)
-
✔️ Scatter Plot: UUsed for understanding relationships between
two numerical variables.
-
✔️ Line Plot: UUsed for showing trends over time or continuous
data.
-
✔️ Bar Chart: Used for comparing categorical data.
3. Multivariate Data Visualizations (More than Two Variables)
-
✔️ Heatmap: Used for visualizing correlations between multiple
numerical variables.
-
✔️ Pair Plot: Used for visualizing pairwise relationships in the
dataset.
-
✔️ Violin Plot: Used for understanding the distribution of a
variable across categories.
4. Specialized Data Visualizations
-
✔️ Pie Chart: Used for representing proportions.
-
✔️ Bubble Chart: Used for adding a third variable to a scatter
plot(Comparing three numerical variables)
-
✔️ Word Cloud: used to highlight keywords, trends, or themes in
textual data (Text Analysis : Highlighting key terms in articles,
reviews, or social media posts.)
5. Time Series Visualizations
-
✔️ Time Series Line plot: Used to analyze trends, patterns, or
changes in data over a continuous period (e.g., days, months, years).
-
✔️ Autocorrelation Plot: Used for finding patterns in time
series data.
Section 4: Advanced Topics and Best Practices
Advanced Topics
-
✅ Interactive Visualizations : –Use tools like Plotly and Dash
to enable user interaction.
-
✅ High-Dimensional Data Visualization : Use PCA, t-SNE, and
UMAP for dimensionality reduction.
-
✅ Time Series Visualization : – Apply line plots, heatmaps,
and rolling averages for trends.
-
✅ Network Graph Visualization : –Use NetworkX and Gephi for
social and supply chain analysis.
-
✅ Big Data Visualization : – Handle large datasets with Dask,
Vaex, and Apache Superset.
Best Practices
-
✅ Choose the Right Chart Type (e.g., bar charts for categories,
line charts for trends).
-
✅ Follow Design Principles (simplicity, consistency,
accessibility).
-
✅ Use Storytelling to highlight key insights and structure visuals
logically.
-
✅ Avoid Common Pitfalls (misleading scales, cluttered visuals,
unnecessary 3D charts).