Section 1: Read the data from an excel/SPSS file

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. Thanks to the different package, R can import Excel or SPSS file into the R console. Let’s install and load the package first. Then we can use certain functions inside this package to read the SPSS/Excel file.

library(knitr)
library(readxl)
library(haven)
# Using read_excel() function to open excel file
asgusam5 <- read_excel("asgusam5_excel.xlsx")
View(asgusam5)
# Using R-basic function str() to overview the type of all the variables
# Also, we can view the firsr 6 row of the data by using head() function
str(asgusam5)

## Classes 'tbl_df', 'tbl' and 'data.frame':    12569 obs. of  50 variables:
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : num  1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num  1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num  5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num  4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num  2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num  1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num  1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num  4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num  4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num  1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num  2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num  1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num  1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num  3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num  3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num  2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num  3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num  3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num  3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num  4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num  3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num  4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num  3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num  4 4 4 4 3 3 1 4 4 1 ...
##  $ learning3      : num  3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num  4 4 4 4 4 3 2 4 4 1 ...
##  $ learning5      : num  1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num  1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num  1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num  1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num  1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num  3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num  2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num  1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num  3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num  1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num  1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num  4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num  3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num  1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num  492 517 656 550 642 ...
##  $ score2         : num  487 576 603 567 644 ...
##  $ score3         : num  463 536 627 575 673 ...
##  $ score4         : num  455 537 574 544 645 ...
##  $ score5         : num  476 513 633 609 637 ...

head(asgusam5)

## # A tibble: 6 x 50
##      id gender month  year language  book home_computer home_desk home_book
##   <dbl>  <dbl> <dbl> <dbl>    <dbl> <dbl>         <dbl>     <dbl>     <dbl>
## 1     1      1     1     5        1     4             1         2         1
## 2     2      2     9     4        1     3             1         1         1
## 3     3      2    10     4        1     4             1         1         1
## 4     4      1     8     4        1     3             1         1         1
## 5     5      2     8     4        1     5             1         1         1
## 6     6      2    11     4        1     3             1         1         1
## # … with 41 more variables: home_room <dbl>, home_internet <dbl>,
## #   computer_home <dbl>, computer_school <dbl>, computer_some <dbl>,
## #   parentsupport1 <dbl>, parentsupport2 <dbl>, parentsupport3 <dbl>,
## #   parentsupport4 <dbl>, school1 <dbl>, school2 <dbl>, school3 <dbl>,
## #   studentbullied1 <dbl>, studentbullied2 <dbl>, studentbullied3 <dbl>,
## #   studentbullied4 <dbl>, studentbullied5 <dbl>, studentbullied6 <dbl>,
## #   learning1 <dbl>, learning2 <dbl>, learning3 <dbl>, learning4 <dbl>,
## #   learning5 <dbl>, learning6 <dbl>, learning7 <dbl>, engagement1 <dbl>,
## #   engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>,
## #   engagement5 <dbl>, confidence1 <dbl>, confidence2 <dbl>,
## #   confidence3 <dbl>, confidence4 <dbl>, confidence5 <dbl>,
## #   confidence6 <dbl>, score1 <dbl>, score2 <dbl>, score3 <dbl>,
## #   score4 <dbl>, score5 <dbl>

# for the future operation, we need build a new copy to the orginal dataset so we can get back to our original data set whenever we want.
test_data <- asgusam5

Section 2: Transform the data

As you can see, the default data type for all the variables is numeric. The numeric data type is for interval & ratio scales. However,some of the data such as gender, month, language, book are supposed to be categorical data. We need to transform them to the categorical data type by using factor() function.

test_data$gender <- factor(test_data$gender,levels=c(1,2),labels=c("male","female"))
# check the data by using str() function to see the new data type for gender
str(test_data)

## Classes 'tbl_df', 'tbl' and 'data.frame':    12569 obs. of  50 variables:
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num  1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num  5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num  4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num  2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num  1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num  1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num  4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num  4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num  1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num  2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num  1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num  1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num  3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num  3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num  2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num  3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num  3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num  3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num  4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num  3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num  4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num  3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num  4 4 4 4 3 3 1 4 4 1 ...
##  $ learning3      : num  3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num  4 4 4 4 4 3 2 4 4 1 ...
##  $ learning5      : num  1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num  1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num  1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num  1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num  1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num  3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num  2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num  1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num  3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num  1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num  1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num  4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num  3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num  1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num  492 517 656 550 642 ...
##  $ score2         : num  487 576 603 567 644 ...
##  $ score3         : num  463 536 627 575 673 ...
##  $ score4         : num  455 537 574 544 645 ...
##  $ score5         : num  476 513 633 609 637 ...

test_data$language <- factor(test_data$language,levels=c(1,2,3),labels=c("Always Speak","Sometimes Speak","Never Speak"))
# check the data by using str() function to see the new data type for language
str(test_data)

## Classes 'tbl_df', 'tbl' and 'data.frame':    12569 obs. of  50 variables:
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num  1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num  5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : Factor w/ 3 levels "Always Speak",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num  4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num  2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num  1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num  1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num  4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num  4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num  1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num  2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num  1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num  1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num  3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num  3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num  2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num  3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num  3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num  3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num  4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num  3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num  4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num  3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num  4 4 4 4 3 3 1 4 4 1 ...
##  $ learning3      : num  3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num  4 4 4 4 4 3 2 4 4 1 ...
##  $ learning5      : num  1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num  1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num  1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num  1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num  1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num  3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num  2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num  1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num  3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num  1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num  1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num  4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num  3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num  1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num  492 517 656 550 642 ...
##  $ score2         : num  487 576 603 567 644 ...
##  $ score3         : num  463 536 627 575 673 ...
##  $ score4         : num  455 537 574 544 645 ...
##  $ score5         : num  476 513 633 609 637 ...

# Can you make a new one for variable:month?

Section 3: Compute Variable

Class Activity

Create new variable,‘ScienceTotal’,using the average of (score1-score5).

# In this semester, we will use "dplyr" package for most of the data manipulation.
# The detailed function inside this package will be described case by case
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Calculating the Science total score by using rowMeans() function
test_data$ScienceTotal <- rowMeans(subset(test_data,select=c(score1,score2,score3,score4,score5)),na.rm=TRUE)

2.Create new variable,‘ParentSupport’ using the mean of 4 variables (parentsupport1-parentsupport4)

test_data$ParentSupport <- rowMeans(subset(test_data,select=c(parentsupport1,parentsupport2,parentsupport3,parentsupport4)), na.rm=TRUE)

3.Create new variable,‘StudentsBullied’ using the sum of 6 variables (studentbullied1-studentbullied6)

test_data$StudentsBullied <-rowSums(test_data[,c("studentbullied1","studentbullied2","studentbullied3","studentbullied4","studentbullied5","studentbullied6")], na.rm=TRUE)

4.Perform descriptive analysis (mean,median,mode,and S.D.) on ScienceScore, ParentSupport,and StudentsBullied.

# Polling out these new variables we just made, and store them to a new data frame 
new_data <- subset(test_data,select=c("ScienceTotal","ParentSupport","StudentsBullied"))
# We can get mean and median for these variables by using summary() function
summary(new_data)

##   ScienceTotal   ParentSupport   StudentsBullied
##  Min.   :276.7   Min.   :1.000   Min.   : 0.0   
##  1st Qu.:493.8   1st Qu.:1.000   1st Qu.:17.0   
##  Median :547.5   Median :1.500   Median :21.0   
##  Mean   :542.2   Mean   :1.808   Mean   :20.2   
##  3rd Qu.:594.5   3rd Qu.:2.000   3rd Qu.:23.0   
##  Max.   :774.0   Max.   :9.000   Max.   :54.0   
##                  NA's   :23

# Then,we can use SD() function in "psych" package to obtain the SD.
library(psych)
SD(new_data,na.rm=TRUE)

##    ScienceTotal   ParentSupport StudentsBullied 
##       74.435145        1.230632        6.208656

# For calculate the mode, we need build a own funtion named "Mode"
Mode <- function(x) {
  uni <- unique(x)
  uni[which.max(tabulate(match(x, uni)))]
}
# Calculate the required mode
Mode(new_data$ScienceTotal)

## [1] 474.5791

Mode(new_data$ParentSupport)

## [1] 1

Mode(new_data$StudentsBullied)

## [1] 24

Section 4: Recoding Variables

# for reverse coding the variables in R, we need the recode() function in car package
item_need_reverse <- test_data$engagement2
library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:dplyr':
## 
##     recode

item_reversed <- recode(item_need_reverse, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the engagement 2 item
# Use the table() function to double check, and also review the frequencies for each categories
table(item_need_reverse, useNA = "ifany")

## item_need_reverse
##    1    2    3    4    9 <NA> 
## 1798 2765 2105 5561  317   23

table(item_reversed,useNA = "ifany")

## item_reversed
##    1    2    3    4    9 <NA> 
## 5561 2105 2765 1798  317   23

Class Activities

1.Recode into same variables for ‘learning2, learning4’

test_data$learning2 <- recode(test_data$learning2, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 2
test_data$learning4 <- recode(test_data$learning4, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 4

2.Recode ‘confidence2, confidence3, confidence6’ variables only for students who are born in 1999 (‘year’ variable) and save the recoded variables into ‘confidence2_re, confidence3_re, confidence6_re’ variables.

## Subsetting the data
new_test_data <- test_data %>%
  filter(year==2)
## Recode ‘confidence2, confidence3, confidence6’
new_test_data$confidence2_re <- recode(new_test_data$confidence2, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence3_re <- recode(new_test_data$confidence3, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence6_re <- recode(new_test_data$confidence6, "1=4; 2=3; 3=2; 4=1")

3.Perform frequency analysis for ‘learning2, learning4, confidence2_re, confidence3_re, confidence6_re’.

table(test_data$learning2,useNA = "ifany")

## 
##    1    2    3    4    9 <NA> 
## 6765 2157 1804 1528  292   23

table(test_data$learning4,useNA = "ifany")

## 
##    1    2    3    4    9 <NA> 
## 7304 1960 1684 1183  415   23

table(new_test_data$confidence2_re,useNA = "ifany")

## 
##    1    2    3    4    9 <NA> 
##  101   39   31   31    7    1

table(new_test_data$confidence3_re,useNA = "ifany")

## 
##    1    2    3    4    9 <NA> 
##   94   32   45   27   11    1

table(new_test_data$confidence6_re,useNA = "ifany")

## 
##    1    2    3    4    9 <NA> 
##  109   36   26   29    9    1

Section 5. Select Cases

Class Activities

1.Select ‘gender = girl’ and ‘year = 2000’ and create a new dataset named by GIRL_2000. What is the mean of StudentBullied1?

# Using function filter() in dplyr package to filter the data
girl_2000 <- test_data %>%
  filter(year==3,gender=="female")
# looking for mean of the studentbullied1 in girl_2000 dataset
describe(girl_2000$studentbullied1)

##    vars    n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 2284 3.08 1.48      3    3.07 1.48   1   9     8    1     3.85
##      se
## X1 0.03

2.Select students who have id=3001 to id=4000 and filter out unselected cases. What is the variance of parentsupport1?

# Using function filter() in dplyr package to filter the data
id3000_4000student <- test_data %>%
  filter(id>3000,id<=4000)
# looking for variance of parentsupport1 in id3000_4000student dataset
var(id3000_4000student$parentsupport1,na.rm=TRUE)

## [1] 2.530103

3.Select 10% of total student at random and delete unselected cases. What is the frequency of learning1?

# We need sample_n() function in dplyr package
sample_data <- sample_n(test_data,1257)
# Check the frequency of learning1
table(sample_data$learning1,useNA = "ifany")

## 
##   1   2   3   4   9 
## 785 253  96 103  20

Section 6. Sorting Cases & Merging files

# Example. sort the cases by year and month
new_ordered_data <- test_data[order(test_data$year,test_data$month),]
head(new_ordered_data,50)

## # A tibble: 50 x 53
##       id gender month  year language  book home_computer home_desk
##    <dbl> <fct>  <dbl> <dbl> <fct>    <dbl>         <dbl>     <dbl>
##  1  6529 female     4     1 Sometim…     2             1         1
##  2  2397 female    10     1 Always …     2             1         1
##  3  3116 male      10     1 Sometim…     3             1         1
##  4   867 female    11     1 Never S…     2             1         1
##  5   910 female    11     1 Always …     2             1         1
##  6  2071 female    11     1 Always …     3             1         1
##  7  9152 male      11     1 Sometim…     1             2         2
##  8   907 male      12     1 Sometim…     2             1         1
##  9  1345 female    12     1 Always …     2             2         2
## 10  2222 female    12     1 Always …     1             1         1
## # … with 40 more rows, and 45 more variables: home_book <dbl>,
## #   home_room <dbl>, home_internet <dbl>, computer_home <dbl>,
## #   computer_school <dbl>, computer_some <dbl>, parentsupport1 <dbl>,
## #   parentsupport2 <dbl>, parentsupport3 <dbl>, parentsupport4 <dbl>,
## #   school1 <dbl>, school2 <dbl>, school3 <dbl>, studentbullied1 <dbl>,
## #   studentbullied2 <dbl>, studentbullied3 <dbl>, studentbullied4 <dbl>,
## #   studentbullied5 <dbl>, studentbullied6 <dbl>, learning1 <dbl>,
## #   learning2 <dbl>, learning3 <dbl>, learning4 <dbl>, learning5 <dbl>,
## #   learning6 <dbl>, learning7 <dbl>, engagement1 <dbl>,
## #   engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>,
## #   engagement5 <dbl>, confidence1 <dbl>, confidence2 <dbl>,
## #   confidence3 <dbl>, confidence4 <dbl>, confidence5 <dbl>,
## #   confidence6 <dbl>, score1 <dbl>, score2 <dbl>, score3 <dbl>,
## #   score4 <dbl>, score5 <dbl>, ScienceTotal <dbl>, ParentSupport <dbl>,
## #   StudentsBullied <dbl>

Merging file: Add Cases

# Import the cases first
Data_add_cases <- read_sav("Data_add cases.sav")
# In R, each observation is a row, and each variable is a colcumn.
# The previous operation has already changed the variables number of our test data. And that's why we need made a copy of the original data. Now we need copy a new test data for this operation.
test_data2 <- asgusam5
# We can use the rbind() function to add the cases
added_test_data <- rbind(test_data2,Data_add_cases)
str(added_test_data)

## Classes 'tbl_df', 'tbl' and 'data.frame':    13069 obs. of  50 variables:
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : num  1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num  1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num  5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num  4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num  2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num  1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num  1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num  4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num  4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num  1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num  2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num  1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num  1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num  3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num  3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num  2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num  3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num  3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num  3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num  4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num  3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num  4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num  3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num  4 4 4 4 3 3 1 4 4 1 ...
##  $ learning3      : num  3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num  4 4 4 4 4 3 2 4 4 1 ...
##  $ learning5      : num  1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num  1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num  1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num  1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num  1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num  3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num  2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num  1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num  3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num  1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num  1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num  4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num  3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num  1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num  492 517 656 550 642 ...
##  $ score2         : num  487 576 603 567 644 ...
##  $ score3         : num  463 536 627 575 673 ...
##  $ score4         : num  455 537 574 544 645 ...
##  $ score5         : num  476 513 633 609 637 ...

Now we have 13069 observations instead of 12569 observations,right?

Merging file: Add variable

# We can use the cbind() function to add the variable
# Import the variables first
library(haven)
Data_add_variables <- read_sav("Data_add variables.sav")
variable_added_data <- cbind(test_data,Data_add_variables)
# Check the new dataset
str(variable_added_data)

## 'data.frame':    12569 obs. of  60 variables:
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num  1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num  5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : Factor w/ 3 levels "Always Speak",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num  4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num  2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num  1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num  1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num  4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num  4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num  1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num  2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num  1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num  1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num  3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num  3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num  2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num  3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num  3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num  3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num  4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num  3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num  4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num  3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num  1 1 1 1 2 2 4 1 1 4 ...
##  $ learning3      : num  3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num  1 1 1 1 1 2 3 1 1 4 ...
##  $ learning5      : num  1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num  1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num  1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num  1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num  1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num  3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num  2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num  1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num  3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num  1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num  1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num  4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num  3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num  1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num  492 517 656 550 642 ...
##  $ score2         : num  487 576 603 567 644 ...
##  $ score3         : num  463 536 627 575 673 ...
##  $ score4         : num  455 537 574 544 645 ...
##  $ score5         : num  476 513 633 609 637 ...
##  $ ScienceTotal   : num  475 536 619 569 648 ...
##  $ ParentSupport  : num  1.25 1 2 1 2.25 1.5 2 1 2.25 3.25 ...
##  $ StudentsBullied: num  20 8 18 17 8 17 24 11 21 24 ...
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "format.spss")= chr "F12.0"
##   ..- attr(*, "display_width")= int 12
##  $ IDCNTRY        : num  840 840 840 840 840 840 840 840 840 840 ...
##   ..- attr(*, "label")= chr "*COUNTRY ID*"
##   ..- attr(*, "format.spss")= chr "F5.0"
##  $ IDBOOK         : num  2 3 4 5 6 7 9 10 11 12 ...
##   ..- attr(*, "label")= chr "*ACHIEVEMENT TEST BOOKLET*"
##   ..- attr(*, "format.spss")= chr "F2.0"
##  $ IDSCHOOL       : num  1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "*SCHOOL ID*"
##   ..- attr(*, "format.spss")= chr "F4.0"
##  $ IDCLASS        : num  102 102 102 102 102 102 102 102 102 102 ...
##   ..- attr(*, "label")= chr "*CLASS ID*"
##   ..- attr(*, "format.spss")= chr "F6.0"
##  $ IDSTUD         : num  10201 10202 10203 10204 10205 ...
##   ..- attr(*, "label")= chr "*STUDENT ID*"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ IDGRADE        : 'haven_labelled' num  4 4 4 4 4 4 4 4 4 4 ...
##   ..- attr(*, "label")= chr "*GRADE ID*"
##   ..- attr(*, "format.spss")= chr "F2.0"
##   ..- attr(*, "labels")= Named num  3 4 5 6 99
##   .. ..- attr(*, "names")= chr  "GRADE 3" "GRADE 4" "GRADE 5" "GRADE 6" ...

Now we have 60 variables instead of 53, right? Good job!

Class Activity: Merging files

Note* Sort the Key variable (e.g.‘id’ variable from both datasets) first before merging the datasets. 1.Add all cases into the ‘asgusam5.sav’ file from ‘Data_add cases(exercise).sav’ file.

## Import ‘Data_add cases(exercise).sav’ to R, and save it to Data_add_cases
Data_add_cases_exercise_ <- read_sav("Data_add variables.sav")
## Merge the dataset by using rbind() function
# Data_added_cases_exercise <- rbind(test_data,Data_add_cases_exercise_)

2.Add all variables from ‘Data_add variables(exercise).sav’ file into the ‘asgusam5.sav’ file.

## Import ‘Data_add variables(exercise).sav’ to R, and save it to add_variable_ex.
add_variable_ex <- read_sav("Data_add variables(exercise).sav")
## Merge the dataset by using cbind() function
variable_added_ex <- cbind(test_data,add_variable_ex)

Class1 - Play with R

Cheng Hua

6/25/2019

Section 1: Read the data from an excel/SPSS file

Section 2: Transform the data

Section 3: Compute Variable

Class Activity

Section 4: Recoding Variables

Class Activities

Section 5. Select Cases

Class Activities

Section 6. Sorting Cases & Merging files

Merging file: Add Cases

Merging file: Add variable

Class Activity: Merging files