This notebook introduces the foundations of Data Analytics with
R.
It focuses on how data analytics supports decision-making, how
to work with real datasets, and how to understand data
structure before modeling.
By the end of this notebook, you should be comfortable loading datasets, inspecting their structure, summarizing key variables, and producing basic but meaningful plots.
Data analytics helps convert raw data into evidence for decisions.
Examples: - Which district has the highest school dropout rate? - Are exam scores improving over time? - Which variables best explain loan default?
In practice, analytics follows a simple pipeline:
R is especially strong in steps 3–5.
At this stage, we focus on descriptive and comparative questions.
This course assumes you already know basic R syntax. We briefly revisit only what is essential for analytics.
scores <- c(65, 15, -87, 502, 35.086, 72, 80, 55, 90)
mean(scores)
## [1] 91.89844
mine<-function(n){
start<-0
for (i in n){
start<-start + i
}
if (start>10) {
cat("Not worth it")
}
else if (start < -5){
cat("exactly have what we need")
}
else {
cat("Totally acceptable")
}
}
mine(c(5:40))
## Not worth it
score <- 40
hey <- function(toolit){
for(i in score:toolit){
score = toolit * i
}
if(score >= 50){
cat("Ohh my God!")
}
else {
cat("Lemme tell you something")
}
}
hey(7)
## Lemme tell you something
hey(8)
## Ohh my God!
students <- data.frame(
name = c("Ninsiima", "Katusiime", "Ndyowe", "Manzi", "Atwine"),
gender = c("F", "M", "F", "M", "F"),
score = c(98, 72, 80, 15, 61)
)
students
Most real analytics begins with external data files, often CSV.
# Example: replace with your actual file path
df1 <- read.csv("students.csv") #change the working directory to where you have saved anyu of your "csv"
df1
# For illustration, we reuse the students data frame
df <- students
df
Key function: - read.csv() reads tabular data into R as
a data frame.
Before analysis, always inspect the structure.
str(df)
## 'data.frame': 5 obs. of 3 variables:
## $ name : chr "Ninsiima" "Katusiime" "Ndyowe" "Manzi" ...
## $ gender: chr "F" "M" "F" "M" ...
## $ score : num 98 72 80 15 61
This tells you: - Number of rows and columns - Variable names - Data types of each variable
names(df)
## [1] "name" "gender" "score"
dim(df)
## [1] 5 3
Summaries help you understand distributions quickly.
summary(df)
## name gender score
## Length:5 Length:5 Min. :15.0
## Class :character Class :character 1st Qu.:61.0
## Mode :character Mode :character Median :72.0
## Mean :65.2
## 3rd Qu.:80.0
## Max. :98.0
For numeric variables, this gives: - Min, Max - Median - Mean (for many datasets)
summary(df$score)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.0 61.0 72.0 65.2 80.0 98.0
Visualization helps reveal patterns that tables cannot.
hist(df$score,
main = "Distribution of Student Scores",
xlab = "Score")

boxplot(score ~ gender,
data = df,
main = "Scores by Gender",
xlab = "Gender",
ylab = "Score")

plot(df$score,
main = "Index vs Score",
ylab = "Score",
xlab = "Student Index")

When looking at summaries and plots, always ask:
Analytics is not just code — it is thinking with data.
str().study <- data.frame(
students = c("marvel", "yuppie", "inno", "derrick", "suzan"),
mtc_score = c(96,78,43,77,69),
sci_score = c(48,90,78,36,89),
sst_score = c(67,89,09,86,84),
eng_score = c(78,85,59,20,14)
)
study
write.csv(study,"study.csv")
read.csv("study.csv")
str(study)
## 'data.frame': 5 obs. of 5 variables:
## $ students : chr "marvel" "yuppie" "inno" "derrick" ...
## $ mtc_score: num 96 78 43 77 69
## $ sci_score: num 48 90 78 36 89
## $ sst_score: num 67 89 9 86 84
## $ eng_score: num 78 85 59 20 14
summary(study)
## students mtc_score sci_score sst_score eng_score
## Length:5 Min. :43.0 Min. :36.0 Min. : 9 Min. :14.0
## Class :character 1st Qu.:69.0 1st Qu.:48.0 1st Qu.:67 1st Qu.:20.0
## Mode :character Median :77.0 Median :78.0 Median :84 Median :59.0
## Mean :72.6 Mean :68.2 Mean :67 Mean :51.2
## 3rd Qu.:78.0 3rd Qu.:89.0 3rd Qu.:86 3rd Qu.:78.0
## Max. :96.0 Max. :90.0 Max. :89 Max. :85.0
str(study)
## 'data.frame': 5 obs. of 5 variables:
## $ students : chr "marvel" "yuppie" "inno" "derrick" ...
## $ mtc_score: num 96 78 43 77 69
## $ sci_score: num 48 90 78 36 89
## $ sst_score: num 67 89 9 86 84
## $ eng_score: num 78 85 59 20 14
hist(study$mtc_score,
main = "A histogram of MTC scores",
xlab = "MTC scores",
ylab = "frequency",
col = "yellow")
4. Produce a boxplot comparing one numeric variable across groups.
# Base R
boxplot(sci_score ~ students, data = study,
main = "A boxplot of sci scores of students",
xlab = "Students",
ylab = "SCI scores")
# ggplot
#install.packages("ggplot2") this installs the ggplots packages if they are not installed before
library(ggplot2)
ggplot(data = study, aes(x = students, y = mtc_score)) +
geom_boxplot() +
labs(
title = "A boxplot of mtc scores of students",
x = "Students",
y = "MTC scores",
caption = "Source: own dataset"
) +
theme_classic() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
# hjust = 0.5 centers the title
# theme() changes the appearance e.g theme_minimal(), theme_bw(), theme_classic() and others
# fill = "red" for applying color in the plot
# A box plot shows how the distribution of variables differs across groups, highlighting the median, spread, and any outliers.
# if one group has a higher median and a wider box, it indicates higher typical values and greater variability compared to the others
read.csv(), str(), and
summary() are core analytics tools.End of Weeks 1–2 notebook.