This document contains the solutions to the exercises from the practical session of Day 1 of the Introduction to R workshop. The session focuses on the fundamentals of working with data in R, including data import and export, understanding and handling different data types, and basic data manipulation techniques. The exercises provide hands-on practice with these core concepts and help build a foundation for efficient data analysis and programming in R.

1. Create a new folder and call it DAY 1
- Add the data files Data.xlsx, Data.txt, Data.csv, origin.txt to this folder
- Set your working directory to the same folder and check whether it’s ok.

# Note that the path to your folder will be different

setwd("~/ADS Courses/Intro to R/DAY 1")

getwd()
#> [1] "C:/Users/aagten/Documents/ADS Courses/Intro to R/DAY 1"

2. Open the Data.csv CSV file
The data represents the results of a statistics course. Twelve people completed one of three workshops, took an exam and then filled out the evaluation form. The 3 last variables refer to the answer they gave on 3 of the questions (scale 1 to 5).
- Check whether the dataset looks ok.
- Try to extract the variable ‘workshop’

# Read the dataset
dataset <- read.csv("Data.csv",header=T)

# Look at the dataset
View(dataset)

# Get the structure of the dataset
str(dataset)
#> 'data.frame':    12 obs. of  7 variables:
#>  $ ...1     : chr  "Peeters" "Bruyneel" "Pasmans" "Van den Broeck" ...
#>  $ workshop : int  1 2 3 1 2 3 1 2 3 1 ...
#>  $ gender   : chr  "female" "female" "female" "female" ...
#>  $ exam     : num  9 16.5 14 12 NA 10 14.5 15 9 8 ...
#>  $ question1: int  1 2 2 2 3 2 4 5 NA 5 ...
#>  $ question2: int  1 1 4 2 1 3 5 4 3 3 ...
#>  $ question3: int  5 4 3 4 NA 3 2 5 2 4 ...

# Extract the variable 'workshop'
dataset$workshop
#>  [1] 1 2 3 1 2 3 1 2 3 1 2 3

3. Data types
- Check the class of the variable ‘exam’, which is part of the ‘Data’ object
- Check the class of the variable ‘gender’, which is part of the ‘Data’ object
- In order to do further statistical analysis, we need to convert ‘gender’ to a factor. Try to do this and check whether it is ok.
- Check the variable ‘workshop’. Try to convert it to a factor with three levels: 1=R, 2=SAS, 3=SPSS.
- Change the name of the unidentified column (“NA”) into “id”

# Check the class of exam - numeric variable
class(dataset$exam)
#> [1] "numeric"

# Check the class of gender - character variable
class(dataset$gender)
#> [1] "character"

# Set gender to factor variable
dataset$gender <- as.factor(dataset$gender)

# Check whether gender was correctly specified as being a factor variable
is.factor(dataset$gender)
#> [1] TRUE

# Convert workshop to a factor with three levels: 1=R, 2=SAS, 3=SPSS. 
dataset$workshop <- as.factor(dataset$workshop)
levels(dataset$workshop) <- c("R","SAS","SPSS")
dataset$workshop
#>  [1] R    SAS  SPSS R    SAS  SPSS R    SAS  SPSS R    SAS  SPSS
#> Levels: R SAS SPSS

# Change the name of the unidentified column into “id” 
colnames(dataset)
#> [1] "...1"      "workshop"  "gender"    "exam"      "question1" "question2"
#> [7] "question3"
colnames(dataset)[1] <- "id"
colnames(dataset)
#> [1] "id"        "workshop"  "gender"    "exam"      "question1" "question2"
#> [7] "question3"

4. Making a new variable
- Make a new variable called ‘total’ that computes the total result of the 3 questions
- Try to change the name of the variable ‘gender’ into ‘sex’ using the function names()

# New variable that totals the results of the three questions
dataset$total <- dataset$question1+dataset$question2+dataset$question3 
head(dataset)
#>               id workshop gender exam question1 question2 question3 total
#> 1        Peeters        R female  9.0         1         1         5     7
#> 2       Bruyneel      SAS female 16.5         2         1         4     7
#> 3        Pasmans     SPSS female 14.0         2         4         3     9
#> 4 Van den Broeck        R female 12.0         2         2         4     8
#> 5     Verbrugghe      SAS female   NA         3         1        NA    NA
#> 6         Steppe     SPSS female 10.0         2         3         3     8

# Rename variable gender
names(dataset)
#> [1] "id"        "workshop"  "gender"    "exam"      "question1" "question2"
#> [7] "question3" "total"
names(dataset)[3]<- "sex"
names(dataset)
#> [1] "id"        "workshop"  "sex"       "exam"      "question1" "question2"
#> [7] "question3" "total"

5. Exporting a modified table
Export the “Data’ object to a .csv file and open the file to check

write.csv(dataset, "MyModifiedData.csv")

6. Sorting
Try to order ‘Data’ according to ‘exam’, in an decreasing order.

dataset_ordered <- dataset[order(dataset$exam,decreasing=T),]
dataset_ordered
#>                id workshop    sex exam question1 question2 question3 total
#> 2        Bruyneel      SAS female 16.5         2         1         4     7
#> 11         Maenen      SAS   male 16.0         4        NA         5    NA
#> 8           Bosch      SAS   male 15.0         5         4         5    14
#> 12         Saenen     SPSS   male 15.0         4         4         3    11
#> 7          Jansen        R   male 14.5         4         5         2    11
#> 3         Pasmans     SPSS female 14.0         2         4         3     9
#> 4  Van den Broeck        R female 12.0         2         2         4     8
#> 6          Steppe     SPSS female 10.0         2         3         3     8
#> 1         Peeters        R female  9.0         1         1         5     7
#> 9       Verlinden     SPSS   male  9.0        NA         3         2    NA
#> 10      De Pooter        R   male  8.0         5         3         4    12
#> 5      Verbrugghe      SAS female   NA         3         1        NA    NA

7. Conditional selection
- Try to select only the respondents that have followed the SPSS course
- Suppose you’re only interested in names and exam results of the people in the SPSS course. Select the appropriate columns and respondents.

# Only respondents that have followed the SPSS course
dataset_spss1 <- dataset[which(dataset$workshop=="SPSS"),]
dataset_spss1
#>           id workshop    sex exam question1 question2 question3 total
#> 3    Pasmans     SPSS female   14         2         4         3     9
#> 6     Steppe     SPSS female   10         2         3         3     8
#> 9  Verlinden     SPSS   male    9        NA         3         2    NA
#> 12    Saenen     SPSS   male   15         4         4         3    11

dataset_spss2 <- subset(dataset, workshop=="SPSS") 
dataset_spss2
#>           id workshop    sex exam question1 question2 question3 total
#> 3    Pasmans     SPSS female   14         2         4         3     9
#> 6     Steppe     SPSS female   10         2         3         3     8
#> 9  Verlinden     SPSS   male    9        NA         3         2    NA
#> 12    Saenen     SPSS   male   15         4         4         3    11

# Keep only name and exam results
dataset_spss1_results <- dataset_spss1[,c(1,4)]
dataset_spss1_results
#>           id exam
#> 3    Pasmans   14
#> 6     Steppe   10
#> 9  Verlinden    9
#> 12    Saenen   15

dataset_spss2_results <- subset(dataset_spss2, select=c(id,exam)) 
dataset_spss2_results
#>           id exam
#> 3    Pasmans   14
#> 6     Steppe   10
#> 9  Verlinden    9
#> 12    Saenen   15

8. Splitting, stacking and merging files
- Try to create a separate data frame for males and females.
- Try to combine these two separate dataframes again using the rbind() function.
- Read in the file ‘origin.txt’ (it’s a tab-delimited format). Merge the origin dataset with ‘Data’ by the ‘id’ variable

# Separate dataset for males and females
data_males <- subset(dataset, sex=="male")
data_females <- subset(dataset, sex=="female")

# Combine data using rbind(..)
merged_data1 <- rbind(data_males,data_females)
merged_data1
#>                id workshop    sex exam question1 question2 question3 total
#> 7          Jansen        R   male 14.5         4         5         2    11
#> 8           Bosch      SAS   male 15.0         5         4         5    14
#> 9       Verlinden     SPSS   male  9.0        NA         3         2    NA
#> 10      De Pooter        R   male  8.0         5         3         4    12
#> 11         Maenen      SAS   male 16.0         4        NA         5    NA
#> 12         Saenen     SPSS   male 15.0         4         4         3    11
#> 1         Peeters        R female  9.0         1         1         5     7
#> 2        Bruyneel      SAS female 16.5         2         1         4     7
#> 3         Pasmans     SPSS female 14.0         2         4         3     9
#> 4  Van den Broeck        R female 12.0         2         2         4     8
#> 5      Verbrugghe      SAS female   NA         3         1        NA    NA
#> 6          Steppe     SPSS female 10.0         2         3         3     8

# Read origin data
origin <- read.delim("origin.txt", header=T)

# Merge origin and dataset by id
dataset_merged <- merge(dataset,origin,by.x = "id",by.y = "id")
dataset_merged 
#>                id workshop    sex exam question1 question2 question3 total
#> 1           Bosch      SAS   male 15.0         5         4         5    14
#> 2        Bruyneel      SAS female 16.5         2         1         4     7
#> 3       De Pooter        R   male  8.0         5         3         4    12
#> 4          Jansen        R   male 14.5         4         5         2    11
#> 5          Maenen      SAS   male 16.0         4        NA         5    NA
#> 6         Pasmans     SPSS female 14.0         2         4         3     9
#> 7         Peeters        R female  9.0         1         1         5     7
#> 8          Steppe     SPSS female 10.0         2         3         3     8
#> 9  Van den Broeck        R female 12.0         2         2         4     8
#> 10     Verbrugghe      SAS female   NA         3         1        NA    NA
#> 11      Verlinden     SPSS   male  9.0        NA         3         2    NA
#>    origin
#> 1       C
#> 2       B
#> 3       B
#> 4       A
#> 5       C
#> 6       A
#> 7       B
#> 8       B
#> 9       A
#> 10      B
#> 11      A