How to use this notebook in class

  1. Read the text.
  2. Run code chunks one-by-one (click the green ▶ button).
  3. Answer the quick tasks where you see Task boxes.
  4. At the end, click Knit to produce a clean HTML report.

1 0. What you should have before we start

1.1 Required software

You need:

  • R (the language)
  • RStudio (the editor where we work)

If you are already in the computer lab and can open RStudio, you’re good.

1.2 First-time check

Run the next chunk. If it prints your R version, you are ready.

R.version.string
## [1] "R version 4.5.2 (2025-10-31 ucrt)"

2 1. RStudio tour (the 4 main panes)

When you open RStudio, you usually see these panes:

  1. Source (top-left): where you write scripts and notebooks
  2. Console (bottom-left): where R runs commands
  3. Environment/History (top-right): your objects and past commands
  4. Files/Plots/Packages/Help (bottom-right): where outputs and help show

2.1 What to avoid

  • Do not write long code in the Console.
  • Write code in the Source pane (scripts or R Markdown) and run from there.

3 2. Working directory and files (very important)

3.1 What is a working directory?

It is the folder where R looks for files by default (CSV, images, saved outputs).

Check your current working directory:

getwd()
## [1] "C:/Users/uSer/OneDrive/Documents/SEMESTER 1.2"

3.3 Setting the directory (only when needed)

If you must set it manually:

setwd("C:/Users/YourName/Documents/SDS1201")

Note: In class, we prefer Projects over setwd().


4 3. Packages (install once, load every time)

4.1 Install packages (once per computer)

You only install a package once (like installing an app).

install.packages("tidyverse")
install.packages("readr")

4.2 Load packages (every session)

Each time you open RStudio, you load packages using library().

# If tidyverse is installed, this should load without errors:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

If you see an error like “there is no package called …”
→ you must install it first.


5 4. R Markdown (.Rmd): what it is and how to run it

An R Markdown file is a document that combines: - text explanations - code chunks - output (tables, plots)

5.1 Running code chunks

A code chunk looks like this:


``` r
2 + 2
```

```
## [1] 4
```

Try it:

2 + 2
## [1] 4

5.2 Knitting (exporting)

Click Knit (top of RStudio) to generate an HTML report.

  • If it fails, read the error message carefully.
  • Most common issues are missing packages or file paths.

6 5. Core R objects you must know (fast, practical)

6.1 5.1 Variables

A variable stores a value.

x <- 10
y <- 3
x + y
## [1] 13

6.2 5.2 Vectors (most common)

A vector is a 1D collection of values of the same type.

scores <- c(65, 70, 80, 55)
scores
## [1] 65 70 80 55

Common operations:

length(scores)
## [1] 4
mean(scores)
## [1] 67.5
max(scores)
## [1] 80

Indexing:

scores[1]      # first item
## [1] 65
scores[2:4]    # items 2 to 4
## [1] 70 80 55
scores[scores >= 70]  # conditional selection
## [1] 70 80

6.3 5.3 Data frames (most common in analytics)

A data frame is a table (rows and columns).

students <- data.frame(
  name = c("Amina", "Brian", "Carol"),
  age  = c(20, 21, 19),
  score = c(78, 62, 85)
)
students

Access columns:

students$score
## [1] 78 62 85
mean(students$score)
## [1] 75

7 6. Reading data (CSV) and checking it

7.2 6.2 Basic checks you should always do

glimpse(df)
## Rows: 3
## Columns: 3
## $ name  <chr> "Amina", "Brian", "Carol"
## $ age   <dbl> 20, 21, 19
## $ score <dbl> 78, 62, 85
summary(df)
##      name                age           score     
##  Length:3           Min.   :19.0   Min.   :62.0  
##  Class :character   1st Qu.:19.5   1st Qu.:70.0  
##  Mode  :character   Median :20.0   Median :78.0  
##                     Mean   :20.0   Mean   :75.0  
##                     3rd Qu.:20.5   3rd Qu.:81.5  
##                     Max.   :21.0   Max.   :85.0

Missing values:

colSums(is.na(df))
##  name   age score 
##     0     0     0

8 7. The basic analytics workflow (the one you should master)

We will use a simple structure in this course:

  1. Load data
  2. Clean data
  3. Explore data
  4. Summarize results
  5. Visualize key patterns
  6. Report cleanly (Knit)

Let’s do a small example.


9 8. Example: Cleaning + summarizing + plotting (step-by-step)

9.1 8.1 Create a slightly messy dataset

set.seed(1)

demo <- tibble(
  student_id = 1:12,
  gender = sample(c("F", "M"), 12, replace = TRUE),
  math = sample(c(NA, 40:95), 12, replace = TRUE),
  english = sample(c(NA, 40:95), 12, replace = TRUE)
)

demo

9.2 8.2 Check missingness

colSums(is.na(demo))
## student_id     gender       math    english 
##          0          0          0          0

9.3 8.3 Simple cleaning rule (example)

Here we fill missing marks using the subject mean (basic imputation).

demo_clean <- demo %>%
  mutate(
    math = ifelse(is.na(math), mean(math, na.rm = TRUE), math),
    english = ifelse(is.na(english), mean(english, na.rm = TRUE), english),
    average = (math + english) / 2
  )

demo_clean

9.4 8.4 Summaries

Mean by gender:

demo_clean %>%
  group_by(gender) %>%
  summarise(
    n = n(),
    mean_math = mean(math),
    mean_english = mean(english),
    mean_average = mean(average),
    .groups = "drop"
  )

9.5 8.5 Visualization (ggplot2)

ggplot(demo_clean, aes(x = math, y = english)) +
  geom_point() +
  labs(
    title = "Math vs English (Demo Data)",
    x = "Math score",
    y = "English score"
  )

A distribution plot:

ggplot(demo_clean, aes(x = average)) +
  geom_histogram(bins = 8) +
  labs(
    title = "Distribution of Average Score",
    x = "Average",
    y = "Count"
  )


10 9. Saving your work

10.1 9.1 Save objects (RDS)

If you want to save an R object to re-use later:

saveRDS(demo_clean, "demo_clean.rds")

To load it:

loaded <- readRDS("demo_clean.rds")
head(loaded)

10.2 9.2 Export a clean CSV

readr::write_csv(demo_clean, "demo_clean.csv")

11 10. Getting help (quickly)

11.1 10.1 Use ? help

?mean
?read_csv

11.2 10.2 Search examples

help.search("histogram")

11.3 10.3 Inspect objects

str(demo_clean)
## tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
##  $ student_id: int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
##  $ gender    : chr [1:12] "F" "M" "F" "F" ...
##  $ math      : int [1:12] 71 59 59 80 92 84 48 45 47 53 ...
##  $ english   : int [1:12] 79 63 84 75 75 72 80 63 82 53 ...
##  $ average   : num [1:12] 75 61 71.5 77.5 83.5 78 64 54 64.5 53 ...
names(demo_clean)
## [1] "student_id" "gender"     "math"       "english"    "average"

12 11. Common mistakes (and how to fix them)

12.1 Mistake 1: Package not found

Error: there is no package called 'tidyverse'
Fix: install once, then load.

install.packages("tidyverse")
library(tidyverse)

12.2 Mistake 2: File not found

Error: cannot open the connection
Fix: check working directory + file name.

getwd()
## [1] "C:/Users/uSer/OneDrive/Documents/SEMESTER 1.2"
list.files()
##  [1] "00_Introduction.html"                                                 
##  [2] "01_Foundations_Data_Analytics_R.Rmd"                                  
##  [3] "02_Data_Cleaning_Preparation_R.Rmd"                                   
##  [4] "03_Exploratory_Data_Analysis_R (2).Rmd"                               
##  [5] "04_Introduction_2_Analytical_Models_R.Rmd"                            
##  [6] "05_Model_Validation_Responsible_Analysis_R.Rmd"                       
##  [7] "06_Applied_Project_R.Rmd"                                             
##  [8] "2025_A_KSD_1466_F.html"                                               
##  [9] "2025_A_KSD_1466_F.Rmd"                                                
## [10] "2025_A_KSD_1466_F_files"                                              
## [11] "airquality.csv"                                                       
## [12] "clean_student_grades.csv"                                             
## [13] "demo_clean.csv"                                                       
## [14] "demo_clean.rds"                                                       
## [15] "dice_game_results.csv"                                                
## [16] "dice_rolls.csv"                                                       
## [17] "dice_sum_counts.csv"                                                  
## [18] "DOC-20250406-WA0005.pdf"                                              
## [19] "Macro economics - introductory.pdf"                                   
## [20] "messy_data.csv"                                                       
## [21] "Naboth_Harris.html"                                                   
## [22] "processed_mtcars.csv"                                                 
## [23] "progam.csv"                                                           
## [24] "progam.rds"                                                           
## [25] "Riemann Integration.pdf"                                              
## [26] "rsconnect"                                                            
## [27] "Share with CamScanner.zip"                                            
## [28] "STA1204_APT_COURSE CONTENT APPLIED PROBABILITY.pdf"                   
## [29] "STA1205_COURSE CONTENT MATHEMATICAL STATISTICS.pdf"                   
## [30] "student.csv"                                                          
## [31] "students.csv"                                                         
## [32] "students.rds"                                                         
## [33] "students_clean.csv"                                                   
## [34] "The-Academic-Policy-and-Examination-Regulations-Kabale-University.pdf"

12.3 Mistake 3: Using commas instead of dots

In R: - decimal is 3.14 not 3,14


13 12. Student tasks (submit in this same .Rmd)

13.1 Task A: Your profile dataset

Create a small data frame named profile with: - your name - your program - your home district - your favorite number

Then print it and show its structure.

# TODO: write your code here
profile <- data.frame(
  name = c("AHABWE NABOTH KAKURU"),
  program = c("STATISTICS AND DATA SCIENCE"),
  home_district = c("KABALE"),
  fav_number = 0040910635672174.74
)
profile

13.2 Task B: Quick summary

Create a vector of 10 numbers (any numbers), then compute: - mean - median - standard deviation - min and max

# TODO: write your code here
v1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
mean(v1)
## [1] 5.5
median(v1)
## [1] 5.5
sd(v1)
## [1] 3.02765
min(v1)
## [1] 1
max(v1)
## [1] 10

13.3 Task C: Mini-plot

Create a data frame with two columns x and y (10 rows), then make a scatter plot.

# TODO: write your code here
dading <- data.frame(
  x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  y = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
)
dading
plot(dading$x, dading$y,
  main = "X valiations against Y",
  xlab = "X values",
  ylab = "Y values",
  pch = 21,
  col = "red")

13.4 Task D: Import practice

  1. Create a small CSV file with 5 rows and 3 columns.
  2. Read it using read_csv().
  3. Print glimpse() and summary().
# TODO: write your code here
progam <- data.frame(
  name = c("Naboth", "Nebart", "Lynn", "Ian", "Collins"),
  scores = c(90, 80, 70, 60, 50),
  grades = c("A", "B", "C", "D", "E")
)
progam
write.csv(progam, "progam.csv", row.names = FALSE)
read.csv("progam.csv")
glimpse(progam)
## Rows: 5
## Columns: 3
## $ name   <chr> "Naboth", "Nebart", "Lynn", "Ian", "Collins"
## $ scores <dbl> 90, 80, 70, 60, 50
## $ grades <chr> "A", "B", "C", "D", "E"
summary(progam)
##      name               scores      grades         
##  Length:5           Min.   :50   Length:5          
##  Class :character   1st Qu.:60   Class :character  
##  Mode  :character   Median :70   Mode  :character  
##                     Mean   :70                     
##                     3rd Qu.:80                     
##                     Max.   :90

14 13. Checklist before you submit


14.1 End

If you can run this notebook and complete Tasks A–D, you are ready for SDS 1201.