In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. Thanks to the different package, R can import Excel or SPSS file into the R console. Let’s install and load the package first. Then we can use certain functions inside this package to read the SPSS/Excel file.
library(knitr)
library(readxl)
library(haven)
# Using read_excel() function to open excel file
asgusam5 <- read_excel("asgusam5_excel.xlsx")
View(asgusam5)
# Using R-basic function str() to overview the type of all the variables
# Also, we can view the firsr 6 row of the data by using head() function
str(asgusam5)
## Classes 'tbl_df', 'tbl' and 'data.frame': 12569 obs. of 50 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : num 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num 5 4 4 4 4 4 4 4 4 4 ...
## $ language : num 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num 4 4 4 4 3 3 1 4 4 1 ...
## $ learning3 : num 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num 4 4 4 4 4 3 2 4 4 1 ...
## $ learning5 : num 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num 492 517 656 550 642 ...
## $ score2 : num 487 576 603 567 644 ...
## $ score3 : num 463 536 627 575 673 ...
## $ score4 : num 455 537 574 544 645 ...
## $ score5 : num 476 513 633 609 637 ...
head(asgusam5)
## # A tibble: 6 x 50
## id gender month year language book home_computer home_desk home_book
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 5 1 4 1 2 1
## 2 2 2 9 4 1 3 1 1 1
## 3 3 2 10 4 1 4 1 1 1
## 4 4 1 8 4 1 3 1 1 1
## 5 5 2 8 4 1 5 1 1 1
## 6 6 2 11 4 1 3 1 1 1
## # … with 41 more variables: home_room <dbl>, home_internet <dbl>,
## # computer_home <dbl>, computer_school <dbl>, computer_some <dbl>,
## # parentsupport1 <dbl>, parentsupport2 <dbl>, parentsupport3 <dbl>,
## # parentsupport4 <dbl>, school1 <dbl>, school2 <dbl>, school3 <dbl>,
## # studentbullied1 <dbl>, studentbullied2 <dbl>, studentbullied3 <dbl>,
## # studentbullied4 <dbl>, studentbullied5 <dbl>, studentbullied6 <dbl>,
## # learning1 <dbl>, learning2 <dbl>, learning3 <dbl>, learning4 <dbl>,
## # learning5 <dbl>, learning6 <dbl>, learning7 <dbl>, engagement1 <dbl>,
## # engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>,
## # engagement5 <dbl>, confidence1 <dbl>, confidence2 <dbl>,
## # confidence3 <dbl>, confidence4 <dbl>, confidence5 <dbl>,
## # confidence6 <dbl>, score1 <dbl>, score2 <dbl>, score3 <dbl>,
## # score4 <dbl>, score5 <dbl>
# for the future operation, we need build a new copy to the orginal dataset so we can get back to our original data set whenever we want.
test_data <- asgusam5
As you can see, the default data type for all the variables is numeric. The numeric data type is for interval & ratio scales. However,some of the data such as gender, month, language, book are supposed to be categorical data. We need to transform them to the categorical data type by using factor() function.
test_data$gender <- factor(test_data$gender,levels=c(1,2),labels=c("male","female"))
# check the data by using str() function to see the new data type for gender
str(test_data)
## Classes 'tbl_df', 'tbl' and 'data.frame': 12569 obs. of 50 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num 5 4 4 4 4 4 4 4 4 4 ...
## $ language : num 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num 4 4 4 4 3 3 1 4 4 1 ...
## $ learning3 : num 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num 4 4 4 4 4 3 2 4 4 1 ...
## $ learning5 : num 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num 492 517 656 550 642 ...
## $ score2 : num 487 576 603 567 644 ...
## $ score3 : num 463 536 627 575 673 ...
## $ score4 : num 455 537 574 544 645 ...
## $ score5 : num 476 513 633 609 637 ...
test_data$language <- factor(test_data$language,levels=c(1,2,3),labels=c("Always Speak","Sometimes Speak","Never Speak"))
# check the data by using str() function to see the new data type for language
str(test_data)
## Classes 'tbl_df', 'tbl' and 'data.frame': 12569 obs. of 50 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num 5 4 4 4 4 4 4 4 4 4 ...
## $ language : Factor w/ 3 levels "Always Speak",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num 4 4 4 4 3 3 1 4 4 1 ...
## $ learning3 : num 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num 4 4 4 4 4 3 2 4 4 1 ...
## $ learning5 : num 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num 492 517 656 550 642 ...
## $ score2 : num 487 576 603 567 644 ...
## $ score3 : num 463 536 627 575 673 ...
## $ score4 : num 455 537 574 544 645 ...
## $ score5 : num 476 513 633 609 637 ...
# Can you make a new one for variable:month?
# In this semester, we will use "dplyr" package for most of the data manipulation.
# The detailed function inside this package will be described case by case
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Calculating the Science total score by using rowMeans() function
test_data$ScienceTotal <- rowMeans(subset(test_data,select=c(score1,score2,score3,score4,score5)),na.rm=TRUE)
2.Create new variable,‘ParentSupport’ using the mean of 4 variables (parentsupport1-parentsupport4)
test_data$ParentSupport <- rowMeans(subset(test_data,select=c(parentsupport1,parentsupport2,parentsupport3,parentsupport4)), na.rm=TRUE)
3.Create new variable,‘StudentsBullied’ using the sum of 6 variables (studentbullied1-studentbullied6)
test_data$StudentsBullied <-rowSums(test_data[,c("studentbullied1","studentbullied2","studentbullied3","studentbullied4","studentbullied5","studentbullied6")], na.rm=TRUE)
4.Perform descriptive analysis (mean,median,mode,and S.D.) on ScienceScore, ParentSupport,and StudentsBullied.
# Polling out these new variables we just made, and store them to a new data frame
new_data <- subset(test_data,select=c("ScienceTotal","ParentSupport","StudentsBullied"))
# We can get mean and median for these variables by using summary() function
summary(new_data)
## ScienceTotal ParentSupport StudentsBullied
## Min. :276.7 Min. :1.000 Min. : 0.0
## 1st Qu.:493.8 1st Qu.:1.000 1st Qu.:17.0
## Median :547.5 Median :1.500 Median :21.0
## Mean :542.2 Mean :1.808 Mean :20.2
## 3rd Qu.:594.5 3rd Qu.:2.000 3rd Qu.:23.0
## Max. :774.0 Max. :9.000 Max. :54.0
## NA's :23
# Then,we can use SD() function in "psych" package to obtain the SD.
library(psych)
SD(new_data,na.rm=TRUE)
## ScienceTotal ParentSupport StudentsBullied
## 74.435145 1.230632 6.208656
# For calculate the mode, we need build a own funtion named "Mode"
Mode <- function(x) {
uni <- unique(x)
uni[which.max(tabulate(match(x, uni)))]
}
# Calculate the required mode
Mode(new_data$ScienceTotal)
## [1] 474.5791
Mode(new_data$ParentSupport)
## [1] 1
Mode(new_data$StudentsBullied)
## [1] 24
# for reverse coding the variables in R, we need the recode() function in car package
item_need_reverse <- test_data$engagement2
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
item_reversed <- recode(item_need_reverse, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the engagement 2 item
# Use the table() function to double check, and also review the frequencies for each categories
table(item_need_reverse, useNA = "ifany")
## item_need_reverse
## 1 2 3 4 9 <NA>
## 1798 2765 2105 5561 317 23
table(item_reversed,useNA = "ifany")
## item_reversed
## 1 2 3 4 9 <NA>
## 5561 2105 2765 1798 317 23
1.Recode into same variables for ‘learning2, learning4’
test_data$learning2 <- recode(test_data$learning2, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 2
test_data$learning4 <- recode(test_data$learning4, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 4
2.Recode ‘confidence2, confidence3, confidence6’ variables only for students who are born in 1999 (‘year’ variable) and save the recoded variables into ‘confidence2_re, confidence3_re, confidence6_re’ variables.
## Subsetting the data
new_test_data <- test_data %>%
filter(year==2)
## Recode ‘confidence2, confidence3, confidence6’
new_test_data$confidence2_re <- recode(new_test_data$confidence2, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence3_re <- recode(new_test_data$confidence3, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence6_re <- recode(new_test_data$confidence6, "1=4; 2=3; 3=2; 4=1")
3.Perform frequency analysis for ‘learning2, learning4, confidence2_re, confidence3_re, confidence6_re’.
table(test_data$learning2,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 6765 2157 1804 1528 292 23
table(test_data$learning4,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 7304 1960 1684 1183 415 23
table(new_test_data$confidence2_re,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 101 39 31 31 7 1
table(new_test_data$confidence3_re,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 94 32 45 27 11 1
table(new_test_data$confidence6_re,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 109 36 26 29 9 1
1.Select ‘gender = girl’ and ‘year = 2000’ and create a new dataset named by GIRL_2000. What is the mean of StudentBullied1?
# Using function filter() in dplyr package to filter the data
girl_2000 <- test_data %>%
filter(year==3,gender=="female")
# looking for mean of the studentbullied1 in girl_2000 dataset
describe(girl_2000$studentbullied1)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2284 3.08 1.48 3 3.07 1.48 1 9 8 1 3.85
## se
## X1 0.03
2.Select students who have id=3001 to id=4000 and filter out unselected cases. What is the variance of parentsupport1?
# Using function filter() in dplyr package to filter the data
id3000_4000student <- test_data %>%
filter(id>3000,id<=4000)
# looking for variance of parentsupport1 in id3000_4000student dataset
var(id3000_4000student$parentsupport1,na.rm=TRUE)
## [1] 2.530103
3.Select 10% of total student at random and delete unselected cases. What is the frequency of learning1?
# We need sample_n() function in dplyr package
sample_data <- sample_n(test_data,1257)
# Check the frequency of learning1
table(sample_data$learning1,useNA = "ifany")
##
## 1 2 3 4 9
## 785 253 96 103 20
# Example. sort the cases by year and month
new_ordered_data <- test_data[order(test_data$year,test_data$month),]
head(new_ordered_data,50)
## # A tibble: 50 x 53
## id gender month year language book home_computer home_desk
## <dbl> <fct> <dbl> <dbl> <fct> <dbl> <dbl> <dbl>
## 1 6529 female 4 1 Sometim… 2 1 1
## 2 2397 female 10 1 Always … 2 1 1
## 3 3116 male 10 1 Sometim… 3 1 1
## 4 867 female 11 1 Never S… 2 1 1
## 5 910 female 11 1 Always … 2 1 1
## 6 2071 female 11 1 Always … 3 1 1
## 7 9152 male 11 1 Sometim… 1 2 2
## 8 907 male 12 1 Sometim… 2 1 1
## 9 1345 female 12 1 Always … 2 2 2
## 10 2222 female 12 1 Always … 1 1 1
## # … with 40 more rows, and 45 more variables: home_book <dbl>,
## # home_room <dbl>, home_internet <dbl>, computer_home <dbl>,
## # computer_school <dbl>, computer_some <dbl>, parentsupport1 <dbl>,
## # parentsupport2 <dbl>, parentsupport3 <dbl>, parentsupport4 <dbl>,
## # school1 <dbl>, school2 <dbl>, school3 <dbl>, studentbullied1 <dbl>,
## # studentbullied2 <dbl>, studentbullied3 <dbl>, studentbullied4 <dbl>,
## # studentbullied5 <dbl>, studentbullied6 <dbl>, learning1 <dbl>,
## # learning2 <dbl>, learning3 <dbl>, learning4 <dbl>, learning5 <dbl>,
## # learning6 <dbl>, learning7 <dbl>, engagement1 <dbl>,
## # engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>,
## # engagement5 <dbl>, confidence1 <dbl>, confidence2 <dbl>,
## # confidence3 <dbl>, confidence4 <dbl>, confidence5 <dbl>,
## # confidence6 <dbl>, score1 <dbl>, score2 <dbl>, score3 <dbl>,
## # score4 <dbl>, score5 <dbl>, ScienceTotal <dbl>, ParentSupport <dbl>,
## # StudentsBullied <dbl>
# Import the cases first
Data_add_cases <- read_sav("Data_add cases.sav")
# In R, each observation is a row, and each variable is a colcumn.
# The previous operation has already changed the variables number of our test data. And that's why we need made a copy of the original data. Now we need copy a new test data for this operation.
test_data2 <- asgusam5
# We can use the rbind() function to add the cases
added_test_data <- rbind(test_data2,Data_add_cases)
str(added_test_data)
## Classes 'tbl_df', 'tbl' and 'data.frame': 13069 obs. of 50 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : num 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num 5 4 4 4 4 4 4 4 4 4 ...
## $ language : num 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num 4 4 4 4 3 3 1 4 4 1 ...
## $ learning3 : num 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num 4 4 4 4 4 3 2 4 4 1 ...
## $ learning5 : num 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num 492 517 656 550 642 ...
## $ score2 : num 487 576 603 567 644 ...
## $ score3 : num 463 536 627 575 673 ...
## $ score4 : num 455 537 574 544 645 ...
## $ score5 : num 476 513 633 609 637 ...
Now we have 13069 observations instead of 12569 observations,right?
# We can use the cbind() function to add the variable
# Import the variables first
library(haven)
Data_add_variables <- read_sav("Data_add variables.sav")
variable_added_data <- cbind(test_data,Data_add_variables)
# Check the new dataset
str(variable_added_data)
## 'data.frame': 12569 obs. of 60 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num 5 4 4 4 4 4 4 4 4 4 ...
## $ language : Factor w/ 3 levels "Always Speak",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num 1 1 1 1 2 2 4 1 1 4 ...
## $ learning3 : num 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num 1 1 1 1 1 2 3 1 1 4 ...
## $ learning5 : num 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num 492 517 656 550 642 ...
## $ score2 : num 487 576 603 567 644 ...
## $ score3 : num 463 536 627 575 673 ...
## $ score4 : num 455 537 574 544 645 ...
## $ score5 : num 476 513 633 609 637 ...
## $ ScienceTotal : num 475 536 619 569 648 ...
## $ ParentSupport : num 1.25 1 2 1 2.25 1.5 2 1 2.25 3.25 ...
## $ StudentsBullied: num 20 8 18 17 8 17 24 11 21 24 ...
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "format.spss")= chr "F12.0"
## ..- attr(*, "display_width")= int 12
## $ IDCNTRY : num 840 840 840 840 840 840 840 840 840 840 ...
## ..- attr(*, "label")= chr "*COUNTRY ID*"
## ..- attr(*, "format.spss")= chr "F5.0"
## $ IDBOOK : num 2 3 4 5 6 7 9 10 11 12 ...
## ..- attr(*, "label")= chr "*ACHIEVEMENT TEST BOOKLET*"
## ..- attr(*, "format.spss")= chr "F2.0"
## $ IDSCHOOL : num 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "*SCHOOL ID*"
## ..- attr(*, "format.spss")= chr "F4.0"
## $ IDCLASS : num 102 102 102 102 102 102 102 102 102 102 ...
## ..- attr(*, "label")= chr "*CLASS ID*"
## ..- attr(*, "format.spss")= chr "F6.0"
## $ IDSTUD : num 10201 10202 10203 10204 10205 ...
## ..- attr(*, "label")= chr "*STUDENT ID*"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ IDGRADE : 'haven_labelled' num 4 4 4 4 4 4 4 4 4 4 ...
## ..- attr(*, "label")= chr "*GRADE ID*"
## ..- attr(*, "format.spss")= chr "F2.0"
## ..- attr(*, "labels")= Named num 3 4 5 6 99
## .. ..- attr(*, "names")= chr "GRADE 3" "GRADE 4" "GRADE 5" "GRADE 6" ...
Now we have 60 variables instead of 53, right? Good job!
Note* Sort the Key variable (e.g.‘id’ variable from both datasets) first before merging the datasets. 1.Add all cases into the ‘asgusam5.sav’ file from ‘Data_add cases(exercise).sav’ file.
## Import ‘Data_add cases(exercise).sav’ to R, and save it to Data_add_cases
Data_add_cases_exercise_ <- read_sav("Data_add variables.sav")
## Merge the dataset by using rbind() function
# Data_added_cases_exercise <- rbind(test_data,Data_add_cases_exercise_)
2.Add all variables from ‘Data_add variables(exercise).sav’ file into the ‘asgusam5.sav’ file.
## Import ‘Data_add variables(exercise).sav’ to R, and save it to add_variable_ex.
add_variable_ex <- read_sav("Data_add variables(exercise).sav")
## Merge the dataset by using cbind() function
variable_added_ex <- cbind(test_data,add_variable_ex)