1 0. What you should have before we start
- 1.1 Required software
- 1.2 First-time check
2 1. RStudio tour (the 4 main panes)
- 2.1 What to avoid
3 2. Working directory and files (very important)
4 3. Packages (install once, load every time)
- 4.1 Install packages (once per computer)
- 4.2 Load packages (every session)
5 4. R Markdown (.Rmd): what it is and how to run it
- 5.1 Running code chunks
- 5.2 Knitting (exporting)
6 5. Core R objects you must know (fast, practical)
7 6. Reading data (CSV) and checking it
- 7.1 6.1 Recommended: readr::read_csv()
- 7.2 6.2 Basic checks you should always do
8 7. The basic analytics workflow (the one you should master)
9 8. Example: Cleaning + summarizing + plotting (step-by-step)
10 9. Saving your work
- 10.1 9.1 Save objects (RDS)
- 10.2 9.2 Export a clean CSV
11 10. Getting help (quickly)
12 11. Common mistakes (and how to fix them)
13 12. Student tasks (submit in this same .Rmd)
14 13. Checklist before you submit
- 14.1 End

How to use this notebook in class

Read the text.

Run code chunks one-by-one (click the green ▶ button).

Answer the quick tasks where you see Task boxes.

At the end, click Knit to produce a clean HTML report.

1 0. What you should have before we start

1.1 Required software

You need:

R (the language)
RStudio (the editor where we work)

If you are already in the computer lab and can open RStudio, you’re good.

1.2 First-time check

Run the next chunk. If it prints your R version, you are ready.

R.version.string

## [1] "R version 4.5.2 (2025-10-31 ucrt)"

2 1. RStudio tour (the 4 main panes)

When you open RStudio, you usually see these panes:

Source (top-left): where you write scripts and notebooks
Console (bottom-left): where R runs commands
Environment/History (top-right): your objects and past commands
Files/Plots/Packages/Help (bottom-right): where outputs and help show

2.1 What to avoid

Do not write long code in the Console.
Write code in the Source pane (scripts or R Markdown) and run from there.

3 2. Working directory and files (very important)

3.1 What is a working directory?

It is the folder where R looks for files by default (CSV, images, saved outputs).

Check your current working directory:

getwd()

## [1] "C:/Users/uSer/OneDrive/Documents/SEMESTER 1.2"

3.2 Best practice (recommended)

Use an RStudio Project so your work is organized.

3.2.1 Create a project (once)

File → New Project → New Directory → New Project
Choose a folder name like SDS1201
Click Create Project

After that: - Put your data files inside that project folder - Your working directory becomes stable

3.3 Setting the directory (only when needed)

If you must set it manually:

setwd("C:/Users/YourName/Documents/SDS1201")

Note: In class, we prefer Projects over setwd().

4 3. Packages (install once, load every time)

4.1 Install packages (once per computer)

You only install a package once (like installing an app).

install.packages("tidyverse")
install.packages("readr")

4.2 Load packages (every session)

Each time you open RStudio, you load packages using library().

# If tidyverse is installed, this should load without errors:
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

If you see an error like “there is no package called …”
→ you must install it first.

5 4. R Markdown (.Rmd): what it is and how to run it

An R Markdown file is a document that combines: - text explanations - code chunks - output (tables, plots)

5.1 Running code chunks

A code chunk looks like this:


``` r
2 + 2
```

```
## [1] 4
```

Try it:

2 + 2

## [1] 4

5.2 Knitting (exporting)

Click Knit (top of RStudio) to generate an HTML report.

If it fails, read the error message carefully.
Most common issues are missing packages or file paths.

6 5. Core R objects you must know (fast, practical)

6.1 5.1 Variables

A variable stores a value.

x <- 10
y <- 3
x + y

## [1] 13

6.2 5.2 Vectors (most common)

A vector is a 1D collection of values of the same type.

scores <- c(65, 70, 80, 55)
scores

## [1] 65 70 80 55

Common operations:

length(scores)

## [1] 4

mean(scores)

## [1] 67.5

max(scores)

## [1] 80

Indexing:

scores[1]      # first item

## [1] 65

scores[2:4]    # items 2 to 4

## [1] 70 80 55

scores[scores >= 70]  # conditional selection

## [1] 70 80

6.3 5.3 Data frames (most common in analytics)

A data frame is a table (rows and columns).

students <- data.frame(
  name = c("Amina", "Brian", "Carol"),
  age  = c(20, 21, 19),
  score = c(78, 62, 85)
)
students

Access columns:

students$score

## [1] 78 62 85

mean(students$score)

## [1] 75

7 6. Reading data (CSV) and checking it

7.1 6.1 Recommended: `readr::read_csv()`

It is fast and gives clean output.

# We'll create a small CSV in memory for practice
tmp_file <- tempfile(fileext = ".csv")
writeLines(
  c("name,age,score",
    "Amina,20,78",
    "Brian,21,62",
    "Carol,19,85"),
  tmp_file
)

df <- readr::read_csv(tmp_file, show_col_types = FALSE)
df

7.2 6.2 Basic checks you should always do

glimpse(df)

## Rows: 3
## Columns: 3
## $ name  <chr> "Amina", "Brian", "Carol"
## $ age   <dbl> 20, 21, 19
## $ score <dbl> 78, 62, 85

summary(df)

##      name                age           score     
##  Length:3           Min.   :19.0   Min.   :62.0  
##  Class :character   1st Qu.:19.5   1st Qu.:70.0  
##  Mode  :character   Median :20.0   Median :78.0  
##                     Mean   :20.0   Mean   :75.0  
##                     3rd Qu.:20.5   3rd Qu.:81.5  
##                     Max.   :21.0   Max.   :85.0

Missing values:

colSums(is.na(df))

##  name   age score 
##     0     0     0

8 7. The basic analytics workflow (the one you should master)

We will use a simple structure in this course:

Load data
Clean data
Explore data
Summarize results
Visualize key patterns
Report cleanly (Knit)

Let’s do a small example.

9 8. Example: Cleaning + summarizing + plotting (step-by-step)

9.1 8.1 Create a slightly messy dataset

set.seed(1)

demo <- tibble(
  student_id = 1:12,
  gender = sample(c("F", "M"), 12, replace = TRUE),
  math = sample(c(NA, 40:95), 12, replace = TRUE),
  english = sample(c(NA, 40:95), 12, replace = TRUE)
)

demo

9.2 8.2 Check missingness

colSums(is.na(demo))

## student_id     gender       math    english 
##          0          0          0          0

9.3 8.3 Simple cleaning rule (example)

Here we fill missing marks using the subject mean (basic imputation).

demo_clean <- demo %>%
  mutate(
    math = ifelse(is.na(math), mean(math, na.rm = TRUE), math),
    english = ifelse(is.na(english), mean(english, na.rm = TRUE), english),
    average = (math + english) / 2
  )

demo_clean

9.4 8.4 Summaries

Mean by gender:

demo_clean %>%
  group_by(gender) %>%
  summarise(
    n = n(),
    mean_math = mean(math),
    mean_english = mean(english),
    mean_average = mean(average),
    .groups = "drop"
  )

9.5 8.5 Visualization (ggplot2)

ggplot(demo_clean, aes(x = math, y = english)) +
  geom_point() +
  labs(
    title = "Math vs English (Demo Data)",
    x = "Math score",
    y = "English score"
  )

A distribution plot:

ggplot(demo_clean, aes(x = average)) +
  geom_histogram(bins = 8) +
  labs(
    title = "Distribution of Average Score",
    x = "Average",
    y = "Count"
  )

10 9. Saving your work

10.1 9.1 Save objects (RDS)

If you want to save an R object to re-use later:

saveRDS(demo_clean, "demo_clean.rds")

To load it:

loaded <- readRDS("demo_clean.rds")
head(loaded)

10.2 9.2 Export a clean CSV

readr::write_csv(demo_clean, "demo_clean.csv")

11 10. Getting help (quickly)

11.1 10.1 Use `?` help

?mean
?read_csv

11.2 10.2 Search examples

help.search("histogram")

11.3 10.3 Inspect objects

str(demo_clean)

## tibble [12 × 5] (S3: tbl_df/tbl/data.frame)
##  $ student_id: int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
##  $ gender    : chr [1:12] "F" "M" "F" "F" ...
##  $ math      : int [1:12] 71 59 59 80 92 84 48 45 47 53 ...
##  $ english   : int [1:12] 79 63 84 75 75 72 80 63 82 53 ...
##  $ average   : num [1:12] 75 61 71.5 77.5 83.5 78 64 54 64.5 53 ...

names(demo_clean)

## [1] "student_id" "gender"     "math"       "english"    "average"

12 11. Common mistakes (and how to fix them)

12.1 Mistake 1: Package not found

Error: there is no package called 'tidyverse'
Fix: install once, then load.

install.packages("tidyverse")
library(tidyverse)

12.2 Mistake 2: File not found

Error: cannot open the connection
Fix: check working directory + file name.

getwd()

## [1] "C:/Users/uSer/OneDrive/Documents/SEMESTER 1.2"

list.files()

##  [1] "00_Introduction.html"                                                 
##  [2] "01_Foundations_Data_Analytics_R.Rmd"                                  
##  [3] "02_Data_Cleaning_Preparation_R.Rmd"                                   
##  [4] "03_Exploratory_Data_Analysis_R (2).Rmd"                               
##  [5] "04_Introduction_2_Analytical_Models_R.Rmd"                            
##  [6] "05_Model_Validation_Responsible_Analysis_R.Rmd"                       
##  [7] "06_Applied_Project_R.Rmd"                                             
##  [8] "2025_A_KSD_1466_F.html"                                               
##  [9] "2025_A_KSD_1466_F.Rmd"                                                
## [10] "2025_A_KSD_1466_F_files"                                              
## [11] "airquality.csv"                                                       
## [12] "clean_student_grades.csv"                                             
## [13] "demo_clean.csv"                                                       
## [14] "demo_clean.rds"                                                       
## [15] "dice_game_results.csv"                                                
## [16] "dice_rolls.csv"                                                       
## [17] "dice_sum_counts.csv"                                                  
## [18] "DOC-20250406-WA0005.pdf"                                              
## [19] "Macro economics - introductory.pdf"                                   
## [20] "messy_data.csv"                                                       
## [21] "Naboth_Harris.html"                                                   
## [22] "processed_mtcars.csv"                                                 
## [23] "progam.csv"                                                           
## [24] "progam.rds"                                                           
## [25] "Riemann Integration.pdf"                                              
## [26] "rsconnect"                                                            
## [27] "Share with CamScanner.zip"                                            
## [28] "STA1204_APT_COURSE CONTENT APPLIED PROBABILITY.pdf"                   
## [29] "STA1205_COURSE CONTENT MATHEMATICAL STATISTICS.pdf"                   
## [30] "student.csv"                                                          
## [31] "students.csv"                                                         
## [32] "students.rds"                                                         
## [33] "students_clean.csv"                                                   
## [34] "The-Academic-Policy-and-Examination-Regulations-Kabale-University.pdf"

12.3 Mistake 3: Using commas instead of dots

In R: - decimal is 3.14 not 3,14

13 12. Student tasks (submit in this same .Rmd)

13.1 Task A: Your profile dataset

Create a small data frame named profile with: - your name - your program - your home district - your favorite number

Then print it and show its structure.

# TODO: write your code here
profile <- data.frame(
  name = c("AHABWE NABOTH KAKURU"),
  program = c("STATISTICS AND DATA SCIENCE"),
  home_district = c("KABALE"),
  fav_number = 0040910635672174.74
)
profile

13.2 Task B: Quick summary

Create a vector of 10 numbers (any numbers), then compute: - mean - median - standard deviation - min and max

# TODO: write your code here
v1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
mean(v1)

## [1] 5.5

median(v1)

## [1] 5.5

sd(v1)

## [1] 3.02765

min(v1)

## [1] 1

max(v1)

## [1] 10

13.3 Task C: Mini-plot

Create a data frame with two columns x and y (10 rows), then make a scatter plot.

# TODO: write your code here
dading <- data.frame(
  x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
  y = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
)
dading

plot(dading$x, dading$y,
  main = "X valiations against Y",
  xlab = "X values",
  ylab = "Y values",
  pch = 21,
  col = "red")

13.4 Task D: Import practice

Create a small CSV file with 5 rows and 3 columns.
Read it using read_csv().
Print glimpse() and summary().

# TODO: write your code here
progam <- data.frame(
  name = c("Naboth", "Nebart", "Lynn", "Ian", "Collins"),
  scores = c(90, 80, 70, 60, 50),
  grades = c("A", "B", "C", "D", "E")
)
progam

write.csv(progam, "progam.csv", row.names = FALSE)
read.csv("progam.csv")

glimpse(progam)

## Rows: 5
## Columns: 3
## $ name   <chr> "Naboth", "Nebart", "Lynn", "Ian", "Collins"
## $ scores <dbl> 90, 80, 70, 60, 50
## $ grades <chr> "A", "B", "C", "D", "E"

summary(progam)

##      name               scores      grades         
##  Length:5           Min.   :50   Length:5          
##  Class :character   1st Qu.:60   Class :character  
##  Mode  :character   Median :70   Mode  :character  
##                     Mean   :70                     
##                     3rd Qu.:80                     
##                     Max.   :90

14 13. Checklist before you submit

Your .Rmd runs without errors
You answered Tasks A–D inside the file
You clicked Knit and it produced HTML
You submit the .Rmd (and HTML if required)

14.1 End

If you can run this notebook and complete Tasks A–D, you are ready for SDS 1201.

SDS 1201 — Data Analytics (Intro Notebook)

Notebook 0: Getting Started with RStudio + R Markdown

Instructor: RK

26 February 2026