This document contains the solutions to the exercises from the practical session of Day 1 of the Introduction to R workshop. The session focuses on the fundamentals of working with data in R, including data import and export, understanding and handling different data types, and basic data manipulation techniques. The exercises provide hands-on practice with these core concepts and help build a foundation for efficient data analysis and programming in R.
1. Create a new folder and call it DAY 1
- Add the data files Data.xlsx, Data.txt, Data.csv, origin.txt to this
folder
- Set your working directory to the same folder and check whether it’s
ok.
# Note that the path to your folder will be different
setwd("~/ADS Courses/Intro to R/DAY 1")
getwd()
#> [1] "C:/Users/aagten/Documents/ADS Courses/Intro to R/DAY 1"
2. Open the Data.csv CSV file
The data represents the results of a statistics course. Twelve
people completed one of three workshops, took an exam and then filled
out the evaluation form. The 3 last variables refer to the answer they
gave on 3 of the questions (scale 1 to 5).
- Check whether the dataset looks ok.
- Try to extract the variable ‘workshop’
# Read the dataset
dataset <- read.csv("Data.csv",header=T)
# Look at the dataset
View(dataset)
# Get the structure of the dataset
str(dataset)
#> 'data.frame': 12 obs. of 7 variables:
#> $ ...1 : chr "Peeters" "Bruyneel" "Pasmans" "Van den Broeck" ...
#> $ workshop : int 1 2 3 1 2 3 1 2 3 1 ...
#> $ gender : chr "female" "female" "female" "female" ...
#> $ exam : num 9 16.5 14 12 NA 10 14.5 15 9 8 ...
#> $ question1: int 1 2 2 2 3 2 4 5 NA 5 ...
#> $ question2: int 1 1 4 2 1 3 5 4 3 3 ...
#> $ question3: int 5 4 3 4 NA 3 2 5 2 4 ...
# Extract the variable 'workshop'
dataset$workshop
#> [1] 1 2 3 1 2 3 1 2 3 1 2 3
3. Data types
- Check the class of the variable ‘exam’, which is part of the ‘Data’
object
- Check the class of the variable ‘gender’, which is part of the ‘Data’
object
- In order to do further statistical analysis, we need to convert
‘gender’ to a factor. Try to do this and check whether it is ok.
- Check the variable ‘workshop’. Try to convert it to a factor with
three levels: 1=R, 2=SAS, 3=SPSS.
- Change the name of the unidentified column (“NA”) into “id”
# Check the class of exam - numeric variable
class(dataset$exam)
#> [1] "numeric"
# Check the class of gender - character variable
class(dataset$gender)
#> [1] "character"
# Set gender to factor variable
dataset$gender <- as.factor(dataset$gender)
# Check whether gender was correctly specified as being a factor variable
is.factor(dataset$gender)
#> [1] TRUE
# Convert workshop to a factor with three levels: 1=R, 2=SAS, 3=SPSS.
dataset$workshop <- as.factor(dataset$workshop)
levels(dataset$workshop) <- c("R","SAS","SPSS")
dataset$workshop
#> [1] R SAS SPSS R SAS SPSS R SAS SPSS R SAS SPSS
#> Levels: R SAS SPSS
# Change the name of the unidentified column into “id”
colnames(dataset)
#> [1] "...1" "workshop" "gender" "exam" "question1" "question2"
#> [7] "question3"
colnames(dataset)[1] <- "id"
colnames(dataset)
#> [1] "id" "workshop" "gender" "exam" "question1" "question2"
#> [7] "question3"
4. Making a new variable
- Make a new variable called ‘total’ that computes the total result of
the 3 questions
- Try to change the name of the variable ‘gender’ into ‘sex’ using the
function names()
# New variable that totals the results of the three questions
dataset$total <- dataset$question1+dataset$question2+dataset$question3
head(dataset)
#> id workshop gender exam question1 question2 question3 total
#> 1 Peeters R female 9.0 1 1 5 7
#> 2 Bruyneel SAS female 16.5 2 1 4 7
#> 3 Pasmans SPSS female 14.0 2 4 3 9
#> 4 Van den Broeck R female 12.0 2 2 4 8
#> 5 Verbrugghe SAS female NA 3 1 NA NA
#> 6 Steppe SPSS female 10.0 2 3 3 8
# Rename variable gender
names(dataset)
#> [1] "id" "workshop" "gender" "exam" "question1" "question2"
#> [7] "question3" "total"
names(dataset)[3]<- "sex"
names(dataset)
#> [1] "id" "workshop" "sex" "exam" "question1" "question2"
#> [7] "question3" "total"
5. Exporting a modified table
Export the “Data’ object to a .csv file and open the file to check
write.csv(dataset, "MyModifiedData.csv")
6. Sorting
Try to order ‘Data’ according to ‘exam’, in an decreasing order.
dataset_ordered <- dataset[order(dataset$exam,decreasing=T),]
dataset_ordered
#> id workshop sex exam question1 question2 question3 total
#> 2 Bruyneel SAS female 16.5 2 1 4 7
#> 11 Maenen SAS male 16.0 4 NA 5 NA
#> 8 Bosch SAS male 15.0 5 4 5 14
#> 12 Saenen SPSS male 15.0 4 4 3 11
#> 7 Jansen R male 14.5 4 5 2 11
#> 3 Pasmans SPSS female 14.0 2 4 3 9
#> 4 Van den Broeck R female 12.0 2 2 4 8
#> 6 Steppe SPSS female 10.0 2 3 3 8
#> 1 Peeters R female 9.0 1 1 5 7
#> 9 Verlinden SPSS male 9.0 NA 3 2 NA
#> 10 De Pooter R male 8.0 5 3 4 12
#> 5 Verbrugghe SAS female NA 3 1 NA NA
7. Conditional selection
- Try to select only the respondents that have followed the SPSS
course
- Suppose you’re only interested in names and exam results of the people
in the SPSS course. Select the appropriate columns and respondents.
# Only respondents that have followed the SPSS course
dataset_spss1 <- dataset[which(dataset$workshop=="SPSS"),]
dataset_spss1
#> id workshop sex exam question1 question2 question3 total
#> 3 Pasmans SPSS female 14 2 4 3 9
#> 6 Steppe SPSS female 10 2 3 3 8
#> 9 Verlinden SPSS male 9 NA 3 2 NA
#> 12 Saenen SPSS male 15 4 4 3 11
dataset_spss2 <- subset(dataset, workshop=="SPSS")
dataset_spss2
#> id workshop sex exam question1 question2 question3 total
#> 3 Pasmans SPSS female 14 2 4 3 9
#> 6 Steppe SPSS female 10 2 3 3 8
#> 9 Verlinden SPSS male 9 NA 3 2 NA
#> 12 Saenen SPSS male 15 4 4 3 11
# Keep only name and exam results
dataset_spss1_results <- dataset_spss1[,c(1,4)]
dataset_spss1_results
#> id exam
#> 3 Pasmans 14
#> 6 Steppe 10
#> 9 Verlinden 9
#> 12 Saenen 15
dataset_spss2_results <- subset(dataset_spss2, select=c(id,exam))
dataset_spss2_results
#> id exam
#> 3 Pasmans 14
#> 6 Steppe 10
#> 9 Verlinden 9
#> 12 Saenen 15
8. Splitting, stacking and merging files
- Try to create a separate data frame for males and females.
- Try to combine these two separate dataframes again using the rbind()
function.
- Read in the file ‘origin.txt’ (it’s a tab-delimited format). Merge the
origin dataset with ‘Data’ by the ‘id’ variable
# Separate dataset for males and females
data_males <- subset(dataset, sex=="male")
data_females <- subset(dataset, sex=="female")
# Combine data using rbind(..)
merged_data1 <- rbind(data_males,data_females)
merged_data1
#> id workshop sex exam question1 question2 question3 total
#> 7 Jansen R male 14.5 4 5 2 11
#> 8 Bosch SAS male 15.0 5 4 5 14
#> 9 Verlinden SPSS male 9.0 NA 3 2 NA
#> 10 De Pooter R male 8.0 5 3 4 12
#> 11 Maenen SAS male 16.0 4 NA 5 NA
#> 12 Saenen SPSS male 15.0 4 4 3 11
#> 1 Peeters R female 9.0 1 1 5 7
#> 2 Bruyneel SAS female 16.5 2 1 4 7
#> 3 Pasmans SPSS female 14.0 2 4 3 9
#> 4 Van den Broeck R female 12.0 2 2 4 8
#> 5 Verbrugghe SAS female NA 3 1 NA NA
#> 6 Steppe SPSS female 10.0 2 3 3 8
# Read origin data
origin <- read.delim("origin.txt", header=T)
# Merge origin and dataset by id
dataset_merged <- merge(dataset,origin,by.x = "id",by.y = "id")
dataset_merged
#> id workshop sex exam question1 question2 question3 total
#> 1 Bosch SAS male 15.0 5 4 5 14
#> 2 Bruyneel SAS female 16.5 2 1 4 7
#> 3 De Pooter R male 8.0 5 3 4 12
#> 4 Jansen R male 14.5 4 5 2 11
#> 5 Maenen SAS male 16.0 4 NA 5 NA
#> 6 Pasmans SPSS female 14.0 2 4 3 9
#> 7 Peeters R female 9.0 1 1 5 7
#> 8 Steppe SPSS female 10.0 2 3 3 8
#> 9 Van den Broeck R female 12.0 2 2 4 8
#> 10 Verbrugghe SAS female NA 3 1 NA NA
#> 11 Verlinden SPSS male 9.0 NA 3 2 NA
#> origin
#> 1 C
#> 2 B
#> 3 B
#> 4 A
#> 5 C
#> 6 A
#> 7 B
#> 8 B
#> 9 A
#> 10 B
#> 11 A