0330 HW 4-5
HW4
Reverse the order of input to the series of dplyr::*_join examples using data from the Nobel laureates in literature and explain the resulting output.
load the data and check data structure, display first 6 rows of Vocab
n.countries<-read.table("C:/Users/USER/Desktop/R_data management/0330/nobel_countries.txt", header=TRUE)
n.winner<-read.table("C:/Users/USER/Desktop/R_data management/0330/nobel_winners.txt", header=TRUE)
str(n.countries)## 'data.frame': 8 obs. of 2 variables:
## $ Country: Factor w/ 7 levels "Canada","China",..: 3 6 6 7 1 2 4 5
## $ Year : int 2014 1950 2017 2016 2013 2012 2015 2011
## Country Year
## 1 France 2014
## 2 UK 1950
## 3 UK 2017
## 4 US 2016
## 5 Canada 2013
## 6 China 2012
## 'data.frame': 7 obs. of 3 variables:
## $ Name : Factor w/ 7 levels "Alice Munro",..: 6 2 4 3 1 5 7
## $ Gender: Factor w/ 2 levels "Female","Male": 2 2 2 2 1 2 1
## $ Year : int 2014 1950 2017 2016 2013 2012 1938
## Name Gender Year
## 1 Patrick Modiano Male 2014
## 2 Bertrand Russell Male 1950
## 3 Kazuo Ishiguro Male 2017
## 4 Bob Dylan Male 2016
## 5 Alice Munro Female 2013
## 6 Mo Yan Male 2012
Mutating joins
inner_join: Return all rows from n.countries where there are matching year in n.winner, and all columns from n.countries and n.winner
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
left_join: Return all rows from n.countries and and all columns from n.countries and n.winner. All combination of the matches are returned
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
## 7 Russia 2015 <NA> <NA>
## 8 Sweden 2011 <NA> <NA>
right_join: Return all rows from n.winner and and all columns from n.countries and n.winner. All combination of the matches are returned
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
## 7 <NA> 1938 Pearl Buck Female
full_join: Return all rows and all columns from both n.countries and n.winner
## Country Year Name Gender
## 1 France 2014 Patrick Modiano Male
## 2 UK 1950 Bertrand Russell Male
## 3 UK 2017 Kazuo Ishiguro Male
## 4 US 2016 Bob Dylan Male
## 5 Canada 2013 Alice Munro Female
## 6 China 2012 Mo Yan Male
## 7 Russia 2015 <NA> <NA>
## 8 Sweden 2011 <NA> <NA>
## 9 <NA> 1938 Pearl Buck Female
Filitering joins
semi_join: Return all rows from n.countries where there are matching year in n.winner, keeping just columns from n.countries
## Country Year
## 1 France 2014
## 2 UK 1950
## 3 UK 2017
## 4 US 2016
## 5 Canada 2013
## 6 China 2012
Anti_join: Return all rows from n.countries where there are not matching year in n.winner, keeping just columns from n.countries
## Country Year
## 1 Russia 2015
## 2 Sweden 2011
HW5
Augment the data object in the ‘SAT’ lecture note with state.division{datasets}. For each of the 9 divisions, find the slope estimate for regressing average SAT scores onto average teacher’s salary. How many of them are of negative signs?
load the data and display first 6 rows of dta
library(datasets)
dta<- read.table("http://www.amstat.org/publications/jse/datasets/sat.dat.txt", row.names=1)
head(dta)## V2 V3 V4 V5 V6 V7 V8
## Alabama 4.405 17.2 31.144 8 491 538 1029
## Alaska 8.963 17.6 47.951 47 445 489 934
## Arizona 4.778 19.3 32.175 27 448 496 944
## Arkansas 4.459 17.1 28.934 6 482 523 1005
## California 4.992 24.0 41.078 45 417 485 902
## Colorado 5.443 18.4 34.571 29 462 518 980
rename column
library(datasets)
colnames(dta)<-c("expenditure", "PT_ratio", "salary", "PercentageSAT", "vs", "ms", "tos")VARIABLE DESCRIPTIONS:
Columns
1 - 16 Name of state (in quotation marks)
18 - 22 Current expenditure per pupil in average daily attendance
in public elementary and secondary schools, 1994-95 (in thousands of dollars)
24 - 27 Average pupil/teacher ratio in public elementary and secondary schools, Fall 1994
29 - 34 Estimated average annual salary of teachers in public elementary and secondary schools, 1994-95 (in thousands of dollars)
36 - 37 Percentage of all eligible students taking the SAT, 1994-95
39 - 41 Average verbal SAT score, 1994-95
43 - 45 Average math SAT score, 1994-95
47 - 50 Average total score on the SAT, 1994-95
available on “http://jse.amstat.org/datasets/sat.txt”
create new variable
extract slope of estimate for regressing total score on the SAT onto salary
## Warning: package 'dplyr' was built under R version 3.6.3
m<-split(dta, dta$division)
slopeSAT<-m %>% sapply(., function(x) coef(lm(x$tos~x$salary))[2])%>% as.data.framegive colnames and create new column for the divisons then select those negative slope estimate
## slope division
## 1 -0.2067041 New England.x$salary
## 2 -2.6215161 East South Central.x$salary
## 3 -28.1929942 West South Central.x$salary
## 4 -1.3174294 West North Central.x$salary
## 5 -13.7999482 Mountain.x$salary