Summarize the backpain{HSAUR3} into the following format:
driver suburban case control total
no no ? ? ?
no yes ? ? ?
yes no ? ? ?
yes yes ? ? ?
backpain %>% group_by(status,driver,suburban) %>% summarize(total = n()) %>%
unite(Time, status) %>% spread(Time, total) %>% mutate(total = case + control) %>%
as.data.frame() %>% head(.,4)Merge the two data sets: state.x77{datasets} and USArrests{datasets} and compute all pair-wise correlations for numerical variables. Is there anything interesting to report?
## Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20
## Alaska 365 6315 1.5 69.31 11.3 66.7 152
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
## California 21198 5114 1.1 71.71 10.3 62.6 20
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166
## Area
## Alabama 50708
## Alaska 566432
## Arizona 113417
## Arkansas 51945
## California 156361
## Colorado 103766
## Murder Population Income Illiteracy Life Exp
## Murder 1.00000000 0.32869864 -0.09833907 0.7997904 -0.74059658
## Population 0.32869864 1.00000000 -0.02186730 0.1245989 0.04728461
## Income -0.09833907 -0.02186730 1.00000000 -0.4579606 0.22182349
## Illiteracy 0.79979040 0.12459889 -0.45796056 1.0000000 -0.74652671
## Life Exp -0.74059658 0.04728461 0.22182349 -0.7465267 1.00000000
## HS Grad -0.59204539 -0.39784300 0.66120542 -0.6765483 0.65625800
## Frost -0.53457903 -0.36245690 0.35878256 -0.6710699 0.40934054
## Area 0.36543567 -0.09454207 0.56785382 0.1844932 -0.21059347
## Assault 0.81449426 0.41126839 0.28340572 0.5064039 -0.60478595
## UrbanPop -0.11103769 0.56955715 0.09148709 -0.3512532 0.24842664
## Rape 0.72359299 0.55697741 0.02312758 0.4970228 -0.35938747
## HS Grad Frost Area Assault UrbanPop
## Murder -0.5920454 -0.5345790 0.36543567 0.8144943 -0.11103769
## Population -0.3978430 -0.3624569 -0.09454207 0.4112684 0.56955715
## Income 0.6612054 0.3587826 0.56785382 0.2834057 0.09148709
## Illiteracy -0.6765483 -0.6710699 0.18449317 0.5064039 -0.35125322
## Life Exp 0.6562580 0.4093405 -0.21059347 -0.6047860 0.24842664
## HS Grad 1.0000000 0.6023133 0.35935037 -0.3651881 -0.12557750
## Frost 0.6023133 1.0000000 0.11733688 -0.2795162 0.19173187
## Area 0.3593504 0.1173369 1.00000000 0.5419884 -0.07080678
## Assault -0.3651881 -0.2795162 0.54198844 1.0000000 0.13822857
## UrbanPop -0.1255775 0.1917319 -0.07080678 0.1382286 1.00000000
## Rape -0.4139066 -0.4625038 0.50318069 0.6872273 0.23961214
## Rape
## Murder 0.72359299
## Population 0.55697741
## Income 0.02312758
## Illiteracy 0.49702278
## Life Exp -0.35938747
## HS Grad -0.41390656
## Frost -0.46250383
## Area 0.50318069
## Assault 0.68722731
## UrbanPop 0.23961214
## Rape 1.00000000
Supply comments to each code chunk in the following survey rmarkdown file and preview it as an R notebook or knit to html.
The data set Vocab{car} gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.
Vocab %>% xyplot(vocabulary ~ education | factor(year), groups = sex, data = ., type = c("p","g","r"), auto.key = list(columns = 2))The ‘MASS’ library has these two data sets: ‘Animals’ and ‘mammals’. Merge the two files and remove duplicated observations using ‘duplicated’.
## [1] 0
Select at random one school per county in the data set Caschool{Ecdat} and draw a scatter diagram of average math score mathscr against average reading score readscr for the sampled data set. Make sure your results are reproducible (e.g., the same random sample will be drawn each time).
Caschool %>% group_by(county) %>% sample_n(1) %>%
xyplot(readscr ~ mathscr, type=c("p","g","r"), data = .)Find 133 class-level 95%-confidence intervals for language test score means of the nlschools{MASS} data set by using the tidy approach. The tail end of the data object should looks as follows:
classID language_mean language_lb language_ub
131 11.273 ... ...
132 10.550 ... ...
133 10.643 ... ...
nlschools %>%
mutate(classID = factor(class, levels = levels(class), labels = c(1:length(levels(.$class))))) %>%
group_by(classID) %>%
summarize(language_mean = mean(lang),
language_lb = language_mean - 1.96*sd(lang),
language_ub = language_mean + 1.96*sd(lang)) %>%
tail(.,3)Use the Prestige{car} data set for this problem.
Find the median prestige score for each of the three types of occupation, respectively.
Use the median score in each type of occupation to define two levels of prestige: High and low, for each occupation, respectively. Summarize the relationship between income and education for each category generated from crossing the factor prestige with the type of occupation.
Prestige %>%
group_by(type) %>%
mutate(ptmed = median(prestige),
ptlev = case_when(prestige > ptmed ~ "High",
prestige < ptmed ~ "Low")) %>%
xyplot(income ~ education | type, groups = ptlev, data = ., type = c("g","p","r"))## Warning: Factor `type` contains implicit NA, consider using
## `forcats::fct_explicit_na`
## Warning: Factor `type` contains implicit NA, consider using
## `forcats::fct_explicit_na`
Reverse the order of input to the series of dplyr::*_join examples using data from the Nobel laureates in literature and explain the resulting output.
## Joining, by = "Year"
使用Year倒排序使最大的在最前面
Augment the data object in the ‘SAT’ lecture note with state.division{datasets}. For each of the 9 divisions, find the slope estimate for regressing average SAT scores onto average teacher’s salary. How many of them are of negative signs?
fL <- "http://www.amstat.org/publications/jse/datasets/sat.dat.txt"
dta <- read.table(fL, row.names=1)
names(dta) <- c("Spending", "PTR", "Salary", "PE", "Verbal", "Math", "SAT")
dta$Region <- state.division
head(dta)negative signs : West North Central, Mountain, New England, East South Central, West South Central
The HELP (Health Evaluation and Linkage to Primary Care) study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomized to receive a multidisciplinary assessment and a brief motivational intervention or usual care, with the goal of linking them to primary medical care. Eligible subjects were adults, who spoke Spanish or English, reported alcohol, heroin or cocaine as their first or second drug of choice, resided in proximity to the primary care clinic to which they would be referred or were homeless. Subjects were interviewed at baseline during their detoxification stay and follow-up interviews were undertaken every 6 months for 2 years. A variety of continuous, count, discrete, and survival time predictors and outcomes were collected at each of these five occasions.