In questions 1-5, correct the code chunks, looking for common errors. The code chunks build upon one another, so make sure you’ve corrected and run the chunk before moving on to the next one Hint: If you can’t find an error, look at Lesson 1 to find an example of correctly written code. The most common errors in programming are typos and simple errors. They can be surprisingly hard to spot!
In questions 6-10, answer the questions by using your new skills to subset and explore data frames.
# calculate five plus seven
calculation1 <- 4 + 7
# calcalution1
calculation1
# calculate five plus seven
five <- 5
seven <- "7"
# calculation2 <- five + seven
calculation2 <- five + as.numeric(seven)
calculation2
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
### Import dataset of desegregation orders from 1957 to 2014 from the ProPublica - find it in data folder
# (source - https://www.propublica.org/datastore/dataset/school-desegregation-orders-data)
# all_deseg_pp <- read_csv(data/invol_data_propublica.csv)
all_deseg_pp <- read_csv("data/invol_data_propublica.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## District.Name = col_character(),
## City = col_character(),
## State = col_character(),
## Year.Lifted = col_character(),
## Year.Placed = col_character()
## )
## View the column and first rows of the data frame
glimpse(all_deseg_pp)
# create a data frame of open desegregation orders from the dataset imported in question 3
# open_deseg_pp <- subset(all_deseg_pp, Year_Lifted == "STILL OPEN") ## UNCORRECTED
##### CORRECTED
# use the names() to see how to spell each column
names(all_deseg_pp)
# correct the name of the Year.Lifted column
open_deseg_pp <- subset(all_deseg_pp, Year.Lifted == "STILL OPEN")
# calculate how long the the desegregation orders have been in effect, for all open desegregation orders
### Note: we are using 2014 in the equation below because this dataset was created in 2014. Some of these orders may have closed since then.
# UNCORRECTED
# open_deseg_pp$duration <- 2014 - open_deseg_pp$Year.Placed ## Note, expect a red message about NAs for the correct answer - it is a message, not an error
##### CORRECTED
open_deseg_pp$duration <- 2014 - as.numeric(open_deseg_pp$Year.Placed)
## Warning: NAs introduced by coercion
# the Year.Placed column was text, to perform a mathematical operation you need to convert to numeric
# write your code here to answer question 6
pp_open_count <- nrow(open_deseg_pp)
pp_open_count # 330
# write the code here to answer question 7 - you can be very simple!
View(open_deseg_pp)
## Sorted descending by the duration column to find that District of Columbia had deseg order from 1954 that was still open in 2014
# write the code here to answer question 8. There is another dataset in the data folder that will help you answer this question, a dataset of school districts under a desegregation order according to the Civil Rights Data Collection from 2017-18
# import the Civil Rights Data Collection dataset
crdc <- read_csv("data/lea_deseg_CRDC_2017_18.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## LEA_ZIP = col_double(),
## LEA_ENR = col_double(),
## LEA_ENR_NONLEAFAC = col_double(),
## LEA_SCHOOLS = col_double(),
## LEA_PSENR_A2 = col_double(),
## LEA_PSENR_A3 = col_double(),
## LEA_PSENR_A4 = col_double(),
## LEA_PSENR_A5 = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
# look for DC in it
View(crdc)
## District of Columbia is not in the CRDC dataset - in 2017-18 school year, the order was not standing
# write the code here to answer question 9
# Starting with question 9, it is good practice to write out the steps to answer the question, and then fill it in the code
# create a data frame of open desegregation orders in Alabama in the ProPublica dataset
# View the al_pp data frame to ensure it is as expected
# count the number of rows
# create a data frame of open desegregation orders in Alabama in the CRDC dataset
# View the al_crdc data frame to ensure it is as expected
# count the number of rows
# create a data frame of open desegregation orders in Alabama
al_pp <- subset(open_deseg_pp, State == "AL")
# View the table to ensure it is as expected
View(al_pp) # you can also just click on the data frame in your Environment pane instead of writing out the function in your script
# count the number of rows
pp_al_count <- nrow(al_pp)
pp_al_count # 47 open orders in the Pro Publica dataset
# create a data frame of open desegregation orders in Alabama in the CRDC dataset
al_crdc <- subset(crdc, LEA_STATE == "AL")
# View the al_crdc data frame to ensure it is as expected
# count the number of rows
crdc_al_count <- nrow(al_crdc)
crdc_al_count # 18 open orders in the CRDC dataset
# These are very different numbers and would require digging to find out the true number
# write the code here to answer question 10
### Analysis steps ###
# create data frame of Texas deseg orders that are not open
# View the tx_closed_pp data frame to ensure it is as expected
# count the number of rows
# create data frame of Texas deseg orders that are not open
tx_closed_pp <- subset(all_deseg_pp, State == "TX" & Year.Lifted != "STILL OPEN")
# View the tx_closed_pp data frame to ensure it is as expected
tx_closed_count <- nrow(tx_closed_pp)
# count the number of rows
tx_closed_count # 36
Things to notice:
- Your data frames aren’t saved to your computer until you write them to your computer
- Your R script is your recipe for your data analysis so that you can do it again
- There are many ways to do each task