Week 1 Homework Solutions

In questions 1-5, correct the code chunks, looking for common errors. The code chunks build upon one another, so make sure you’ve corrected and run the chunk before moving on to the next one Hint: If you can’t find an error, look at Lesson 1 to find an example of correctly written code. The most common errors in programming are typos and simple errors. They can be surprisingly hard to spot!

In questions 6-10, answer the questions by using your new skills to subset and explore data frames.

# calculate five plus seven
calculation1 <- 4 + 7

# calcalution1
calculation1

# calculate five plus seven

five <- 5
seven <- "7"
# calculation2 <- five + seven
calculation2 <- five + as.numeric(seven)

calculation2

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

### Import dataset of desegregation orders from 1957 to 2014 from the ProPublica - find it in data folder
# (source - https://www.propublica.org/datastore/dataset/school-desegregation-orders-data)
# all_deseg_pp <- read_csv(data/invol_data_propublica.csv)
all_deseg_pp <- read_csv("data/invol_data_propublica.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   District.Name = col_character(),
##   City = col_character(),
##   State = col_character(),
##   Year.Lifted = col_character(),
##   Year.Placed = col_character()
## )

## View the column and first rows of the data frame
glimpse(all_deseg_pp)

# create a data frame of open desegregation orders from the dataset imported in question 3
# open_deseg_pp <- subset(all_deseg_pp, Year_Lifted == "STILL OPEN") ## UNCORRECTED

##### CORRECTED
# use the names() to see how to spell each column
names(all_deseg_pp)
# correct the name of the Year.Lifted column
open_deseg_pp <- subset(all_deseg_pp, Year.Lifted == "STILL OPEN")

# calculate how long the the desegregation orders have been in effect, for all open desegregation orders
### Note: we are using 2014 in the equation below because this dataset was created in 2014.  Some of these orders may have closed since then.

# UNCORRECTED
# open_deseg_pp$duration <- 2014 - open_deseg_pp$Year.Placed  ## Note, expect a red message about NAs for the correct answer - it is a message, not an error

##### CORRECTED
open_deseg_pp$duration <- 2014 - as.numeric(open_deseg_pp$Year.Placed)

## Warning: NAs introduced by coercion

# the Year.Placed column was text, to perform a mathematical operation you need to convert to numeric

How many of the desegregation orders were still open in 2014?

# write your code here to answer question 6
pp_open_count <- nrow(open_deseg_pp)
pp_open_count # 330

What desegregation order that was still open in 2014 had been open longest?

# write the code here to answer question 7 - you can be very simple! 
View(open_deseg_pp)
## Sorted descending by the duration column to find that District of Columbia had deseg order from 1954 that was still open in 2014

Is that desegregation order still open?

# write the code here to answer question 8.  There is another dataset in the data folder that will help you answer this question, a dataset of school districts under a desegregation order according to the Civil Rights Data Collection from 2017-18

# import the Civil Rights Data Collection dataset 
crdc <- read_csv("data/lea_deseg_CRDC_2017_18.csv")

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   LEA_ZIP = col_double(),
##   LEA_ENR = col_double(),
##   LEA_ENR_NONLEAFAC = col_double(),
##   LEA_SCHOOLS = col_double(),
##   LEA_PSENR_A2 = col_double(),
##   LEA_PSENR_A3 = col_double(),
##   LEA_PSENR_A4 = col_double(),
##   LEA_PSENR_A5 = col_double()
## )
## ℹ Use `spec()` for the full column specifications.

# look for DC in it
View(crdc)
## District of Columbia is not in the CRDC dataset - in 2017-18 school year, the order was not standing

How many open desegregation orders are there in Alabama, according to the ProPublica dataset and the CRDC dataset

# write the code here to answer question 9

# Starting with question 9, it is good practice to write out the steps to answer the question, and then fill it in the code

# create a data frame of open desegregation orders in Alabama in the ProPublica dataset
# View the al_pp data frame to ensure it is as expected
# count the number of rows
# create a data frame of open desegregation orders in Alabama in the CRDC dataset
# View the al_crdc data frame to ensure it is as expected
# count the number of rows

# create a data frame of open desegregation orders in Alabama
al_pp <- subset(open_deseg_pp, State == "AL")

# View the table to ensure it is as expected
View(al_pp) # you can also just click on the data frame in your Environment pane instead of writing out the function in your script

# count the number of rows
pp_al_count <- nrow(al_pp)
pp_al_count # 47 open orders in the Pro Publica dataset

# create a data frame of open desegregation orders in Alabama in the CRDC dataset
al_crdc <- subset(crdc, LEA_STATE == "AL")

# View the al_crdc data frame to ensure it is as expected

# count the number of rows
crdc_al_count <- nrow(al_crdc)
crdc_al_count # 18 open orders in the CRDC dataset

# These are very different numbers and would require digging to find out the true number

How many desegregation orders were closed in Texas between 1957 and 2014

# write the code here to answer question 10

### Analysis steps ###
# create data frame of Texas deseg orders that are not open
# View the tx_closed_pp data frame to ensure it is as expected
# count the number of rows

# create data frame of Texas deseg orders that are not open
tx_closed_pp <- subset(all_deseg_pp, State == "TX" & Year.Lifted != "STILL OPEN")

# View the tx_closed_pp data frame to ensure it is as expected
tx_closed_count <- nrow(tx_closed_pp)

# count the number of rows
tx_closed_count # 36

Things to notice:

Your data frames aren’t saved to your computer until you write them to your computer

Your R script is your recipe for your data analysis so that you can do it again

There are many ways to do each task