GEOG3023 Statistics and Geographic Data

Homework 2: Intro to R and Learning to Google

University of Colorado, Boulder

Kate Little

1/24/2022

Question 1 (25pts): Please find which date has the most increased death and which date has the most increased positive cases and which date has the most tests

Hint: use which.max() to find the row with the maximum for each separately, then use [] to find the dates, and put the results in the comments

Include both your code and your answer!

library(dplyr) 
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr) 

ColoradoCovidRecent <- read_csv("C:/Users/klittle/Desktop/GEOG3023 CU boulder/Lab/lab2/ColoradoCovidRecent.csv") 
## Rows: 139 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Date
## dbl (7): dayNumber, positive, totalTestResults, death, positiveIncrease, tot...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
covid <- ColoradoCovidRecent

which.max(covid$deathIncrease) 
## [1] 118
covid[118,1]
## # A tibble: 1 x 1
##   Date      
##   <chr>     
## 1 12/27/2021
#Date of highest deaths increase: 12/27/2021

which.max(covid$positiveIncrease) 
## [1] 130
covid[130,1]
## # A tibble: 1 x 1
##   Date      
##   <chr>     
## 1 01/08/2022
 #Date of highest positive cases increase: 01/08/2022

which.max(covid$totalTestResultsIncrease) 
## [1] 132
covid[132,1]
## # A tibble: 1 x 1
##   Date      
##   <chr>     
## 1 01/10/2022
#Date of highest test results increase: 01/10/2022

Question 2 (25pts): In the data frame ‘covid’, please add a new column called ‘positivityRate’ to show the daily positive rate (positiveIncrease/totalTestResultsIncrease)

covid <- mutate(covid, positivityRate = positiveIncrease / totalTestResultsIncrease)

Question 3 (25pts): Based on the variable you added in Question 2, please plot:

#(a) the daily positivity rate over time

#(b) the daily deaths over time (this isn’t a new variable, it was already there) # Then, in the comments interpret the trends over the time

# A few lines of comments can be used to interpret the trends. Remember, to insert a comment use a “#” at the start of the line

# Hint: Use dayNumber as your X variable in your plot! You can label your x-axis as “days since 9/1/2021” library(ggplot2)

library(ggplot2)
ggplot( data = covid, mapping = aes(x = dayNumber, y = positivityRate))+ 
  geom_point(size = 1)+ 
  geom_smooth(colour = "red")+ 
  labs(title = "Covid Daily Positivity Rate Over Time", x = "Days Since 9/1/2021", y = "Positivity Rate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The Daily positive Covid Rate appears to be generally increasing after day 100, while days 0-99 were generally increasing as well, they were increasing at a much lower rate.

ggplot(covid, aes(dayNumber, death))+ 
  geom_point(size = 1)+ 
  geom_smooth(size = 0.5, colour = "red")+ 
  labs(title = "Total Covid Deaths", x = "Days Since 9/1/2021", y = "Total Deaths")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

As expected, the total number of Covid deaths has been steadily increasing since September 1st. Starting around day 90, there appear to be weekly trends in which few new deaths are reported for several days then a jump in the next week.

Question 4 (25pts): Make a histogram of the daily positivity rate

Hint: You’ll probably have to search Google to figure this out!

Try Googling: “ggplot how to make histogram”

Is the distribution left-skewed, right-skewed, or fairly symmetric? Answer in the comments

ggplot(covid, aes(positivityRate))+ 
  geom_histogram(binwidth = .025, 
                 color = "blue")+ 
  labs(title = "Covid Daily Postivity Rate Histogram", x = "Positivity Rate", y = "Count") 

which.max(covid$positivityRate)
## [1] 130
covid[130,9] 
## # A tibble: 1 x 1
##   positivityRate
##            <dbl>
## 1           2.41
covid[130, 1]
## # A tibble: 1 x 1
##   Date      
##   <chr>     
## 1 01/08/2022

The distribution of the positivity rate appears to be right-skewed with an outlier over 2 on Jan 8 2022. This date had the most new cases documented, jumping from 8,754 to 30,802 in one day. This could be a data entry error, but could also reflect the spike in cases over the holidays with results reporting on Jan 8.

End! Once you’re done, make sure to save this script (File -> Save)

Then you can copy and paste your plots into a word file or save them as image files

by clicking the “Export” button in the Plots pane

On Canvas, you’ll submit this file and your 4 plots:

1. Daily positivity rate over time

2. Daily deaths over time

3. Your histogram of the daily positivity rate