In this problem set, you will investigate objects and data patterns.
See reading R Markdown: The Definitive Guide section 3.1 (LINK HERE) to help answering these questions
Load tidyverse package [code already provided]
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.4
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load data frame object df_school_all; it is similar to
the dataframe df_school used in lecture but has more
varibles
names() to see all the variables contained within
df_school_all [code already provided]table() to tabulate the total number of visits each
school received [code already provided]rm(list = ls()) # remove all objects before loading new dataset
getwd()
#> [1] "C:/Users/Debbie Feehs/Desktop/HED PhD/HED 696C- Data Management and Manipulation in R/rclass/modules/module2"
load(url("https://github.com/ksalazar3/HED696C_RClass/raw/master/data/recruiting/recruit_school_allvars.RData"))
#glimpse(df_school_all)
names(df_school_all)
#> [1] "state_code" "school_type" "ncessch"
#> [4] "name" "address" "city"
#> [7] "zip_code" "pct_white" "pct_black"
#> [10] "pct_hispanic" "pct_asian" "pct_amerindian"
#> [13] "pct_other" "num_fr_lunch" "total_students"
#> [16] "num_took_math" "num_prof_math" "num_took_rla"
#> [19] "num_prof_rla" "avgmedian_inc_2564" "latitude"
#> [22] "longitude" "visits_by_196097" "visits_by_186380"
#> [25] "visits_by_215293" "visits_by_201885" "visits_by_181464"
#> [28] "visits_by_139959" "visits_by_218663" "visits_by_100751"
#> [31] "visits_by_199193" "visits_by_110635" "visits_by_110653"
#> [34] "visits_by_126614" "visits_by_155317" "visits_by_106397"
#> [37] "visits_by_149222" "visits_by_166629" "total_visits"
#> [40] "inst_196097" "inst_186380" "inst_215293"
#> [43] "inst_201885" "inst_181464" "inst_139959"
#> [46] "inst_218663" "inst_100751" "inst_199193"
#> [49] "inst_110635" "inst_110653" "inst_126614"
#> [52] "inst_155317" "inst_106397" "inst_149222"
#> [55] "inst_166629"
table(df_school_all$total_visits)
#>
#> 0 1 2 3 4 5 6 7 8 9 10 11 12
#> 15405 2718 1324 671 395 263 152 107 89 57 31 25 19
#> 13 14 15 16 17 18 19 20 21 22 23 26
#> 16 8 7 3 1 2 1 1 1 1 3 1
df_school_all by running the appropriate R commands within
the code chunk and writing any substantive response required next to the
question. The code and substantive response for the first question will
be answered for you as an example.
df_school_all?
df_school_all? What
does this specific value of length refer to?
df_school_all? what
does each row represent?
#type of df_school_all
typeof(df_school_all)
#> [1] "list"
#length of df_school_all
length(df_school_all)
#> [1] 55
#num of rows in df_school_all
nrow(df_school_all)
#> [1] 21301
str() function to
describe the contents of df_school_all and then answer the
following questions in text above the code chunk.
df_school_all
represent?
df_school_all lists
or vectors?
df_school_all named
or un-named? what do these element names refer to? (hint use
names())
str(df_school_all)
#> tibble [21,301 × 55] (S3: tbl_df/tbl/data.frame)
#> $ state_code : chr [1:21301] "AK" "AK" "AK" "AK" ...
#> $ school_type : chr [1:21301] "public" "public" "public" "public" ...
#> $ ncessch : chr [1:21301] "020000100208" "020000100211" "020000100212" "020000100213" ...
#> $ name : chr [1:21301] "Bethel Regional High School" "Ayagina'ar Elitnaurvik" "Kwigillingok School" "Nelson Island Area School" ...
#> $ address : chr [1:21301] "1006 Ron Edwards Memorial Dr" "106 Village Road" "108 Village Road" "118 Village Road" ...
#> $ city : chr [1:21301] "Bethel" "Kongiganak" "Kwigillingok" "Toksook Bay" ...
#> $ zip_code : chr [1:21301] "99559" "99559" "99622" "99637" ...
#> $ pct_white : num [1:21301] 11.78 0 0 0 2.52 ...
#> $ pct_black : num [1:21301] 0.599 0 0 0 0 ...
#> $ pct_hispanic : num [1:21301] 1.6 0 0 0 0 ...
#> $ pct_asian : num [1:21301] 0.998 0 0 0 0 ...
#> $ pct_amerindian : num [1:21301] 84.6 99.5 100 100 97.5 ...
#> $ pct_other : num [1:21301] 0.399 0.549 0 0 0 ...
#> $ num_fr_lunch : num [1:21301] 362 182 116 187 238 180 418 185 179 186 ...
#> $ total_students : num [1:21301] 501 182 120 201 238 231 428 262 179 186 ...
#> $ num_took_math : num [1:21301] 146 17 14 30 28 25 62 21 23 19 ...
#> $ num_prof_math : num [1:21301] 24.8 1.7 3.5 3 2.8 ...
#> $ num_took_rla : num [1:21301] 147 17 14 30 28 24 62 22 23 19 ...
#> $ num_prof_rla : num [1:21301] 25 1.7 3.5 3 2.8 ...
#> $ avgmedian_inc_2564: num [1:21301] 76160 76160 NA 57657 37553 ...
#> $ latitude : num [1:21301] 60.8 60 59.9 60.5 62.7 ...
#> $ longitude : num [1:21301] -162 -163 -163 -165 -165 ...
#> $ visits_by_196097 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_186380 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_215293 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_201885 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_181464 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_139959 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_218663 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_100751 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_199193 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_110635 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_110653 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_126614 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_155317 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_106397 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_149222 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ visits_by_166629 : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ total_visits : int [1:21301] 0 0 0 0 0 0 0 0 0 0 ...
#> $ inst_196097 : chr [1:21301] "NY" "NY" "NY" "NY" ...
#> $ inst_186380 : chr [1:21301] "NJ" "NJ" "NJ" "NJ" ...
#> $ inst_215293 : chr [1:21301] "PA" "PA" "PA" "PA" ...
#> $ inst_201885 : chr [1:21301] "OH" "OH" "OH" "OH" ...
#> $ inst_181464 : chr [1:21301] "NE" "NE" "NE" "NE" ...
#> $ inst_139959 : chr [1:21301] "GA" "GA" "GA" "GA" ...
#> $ inst_218663 : chr [1:21301] "SC" "SC" "SC" "SC" ...
#> $ inst_100751 : chr [1:21301] "AL" "AL" "AL" "AL" ...
#> $ inst_199193 : chr [1:21301] "NC" "NC" "NC" "NC" ...
#> $ inst_110635 : chr [1:21301] "CA" "CA" "CA" "CA" ...
#> $ inst_110653 : chr [1:21301] "CA" "CA" "CA" "CA" ...
#> $ inst_126614 : chr [1:21301] "CO" "CO" "CO" "CO" ...
#> $ inst_155317 : chr [1:21301] "KS" "KS" "KS" "KS" ...
#> $ inst_106397 : chr [1:21301] "AR" "AR" "AR" "AR" ...
#> $ inst_149222 : chr [1:21301] "IL" "IL" "IL" "IL" ...
#> $ inst_166629 : chr [1:21301] "MA" "MA" "MA" "MA" ...
typeof(df_school_all)
#> [1] "list"
names(df_school_all)
#> [1] "state_code" "school_type" "ncessch"
#> [4] "name" "address" "city"
#> [7] "zip_code" "pct_white" "pct_black"
#> [10] "pct_hispanic" "pct_asian" "pct_amerindian"
#> [13] "pct_other" "num_fr_lunch" "total_students"
#> [16] "num_took_math" "num_prof_math" "num_took_rla"
#> [19] "num_prof_rla" "avgmedian_inc_2564" "latitude"
#> [22] "longitude" "visits_by_196097" "visits_by_186380"
#> [25] "visits_by_215293" "visits_by_201885" "visits_by_181464"
#> [28] "visits_by_139959" "visits_by_218663" "visits_by_100751"
#> [31] "visits_by_199193" "visits_by_110635" "visits_by_110653"
#> [34] "visits_by_126614" "visits_by_155317" "visits_by_106397"
#> [37] "visits_by_149222" "visits_by_166629" "total_visits"
#> [40] "inst_196097" "inst_186380" "inst_215293"
#> [43] "inst_201885" "inst_181464" "inst_139959"
#> [46] "inst_218663" "inst_100751" "inst_199193"
#> [49] "inst_110635" "inst_110653" "inst_126614"
#> [52] "inst_155317" "inst_106397" "inst_149222"
#> [55] "inst_166629"
school_type
within the object df_school_all. Run the appropriate R
commands in the chunk below and write substantive responses below each
question.
school_type?
school_type? What does this
specific value of length refer to?
class(df_school_all$school_type)
#> [1] "character"
length(df_school_all$school_type)
#> [1] 21301
The data frame df_school_all has one observation for
each high school (public and private).
visits_by_... identify
how many off-campus recruiting visits the high school received from a
particular public university. For example, UC Berkeley has the ID
110635 so the variable visits_by_110635
identifies how many visits the high school received from UC
Berkeley.total_visits identifies the number of
visits the high school received from all (16) public research
universities in this data collection sample.For the questions below, imagine that you have been asked by a major news outlet to identify which high schools receive the most total number of off-campus recruiting visits from public universities.
head() function and explicitly tell R to print 10
observationsavgmedian_inc_2564 to give it a shorter name.avgmedian_inc_2564 to
med_inc and assign new variable name to the existing object
df_school_alldf_school_all2 <- rename(df_school_all, med_inc = avgmedian_inc_2564)
visits_by_100751. Compare the number of in-state public
high schools to the number that received at least one visit from The
University of Alabama.__
filter and the
count function. The associated variables needed to filter
by: state_code, school_type, and
visits_by_100751count
function around the filter function ; or you can do this in
two steps by creating a new data frame firstdf_school_alabama <- filter(df_school_all, school_type == "public", state_code == "AL", visits_by_100751 >= 1)
nrow(df_school_alabama)
#> [1] 108
pct_hispanic and
pct_black.df_School_alabama_.5his_.5blk <- filter(df_school_alabama, pct_hispanic >= .5, pct_black >= .5)
nrow(df_School_alabama_.5his_.5blk)
#> [1] 97
& or %in%count(filter(df_school_all, visits_by_100751 >= 1 & school_type == "public" & state_code != "AL"))
#> # A tibble: 1 × 1
#> n
#> <int>
#> 1 1644
Once finished, knit to (HTML) and upload both .Rmd and HTML files
Remember to use this naming convention “lastname_module2_ps”