GEOG3023 Statistics and Geographic Data
Homework 2: Intro to R and Learning to Google
University of Colorado, Boulder
Kate Little
1/24/2022
Question 1 (25pts): Please find which date has the most increased death and which date has the most increased positive cases and which date has the most tests
Hint: use which.max() to find the row with the maximum for each separately, then use [] to find the dates, and put the results in the comments
Include both your code and your answer!
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
ColoradoCovidRecent <- read_csv("C:/Users/klittle/Desktop/GEOG3023 CU boulder/Lab/lab2/ColoradoCovidRecent.csv")
## Rows: 139 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Date
## dbl (7): dayNumber, positive, totalTestResults, death, positiveIncrease, tot...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
covid <- ColoradoCovidRecent
which.max(covid$deathIncrease)
## [1] 118
covid[118,1]
## # A tibble: 1 x 1
## Date
## <chr>
## 1 12/27/2021
#Date of highest deaths increase: 12/27/2021
which.max(covid$positiveIncrease)
## [1] 130
covid[130,1]
## # A tibble: 1 x 1
## Date
## <chr>
## 1 01/08/2022
#Date of highest positive cases increase: 01/08/2022
which.max(covid$totalTestResultsIncrease)
## [1] 132
covid[132,1]
## # A tibble: 1 x 1
## Date
## <chr>
## 1 01/10/2022
#Date of highest test results increase: 01/10/2022
Question 2 (25pts): In the data frame ‘covid’, please add a new column called ‘positivityRate’ to show the daily positive rate (positiveIncrease/totalTestResultsIncrease)
covid <- mutate(covid, positivityRate = positiveIncrease / totalTestResultsIncrease)
Question 3 (25pts): Based on the variable you added in Question 2, please plot:
#(a) the daily positivity rate over time
#(b) the daily deaths over time (this isn’t a new variable, it was already there) # Then, in the comments interpret the trends over the time
# A few lines of comments can be used to interpret the trends. Remember, to insert a comment use a “#” at the start of the line
# Hint: Use dayNumber as your X variable in your plot! You can label your x-axis as “days since 9/1/2021” library(ggplot2)
library(ggplot2)
ggplot( data = covid, mapping = aes(x = dayNumber, y = positivityRate))+
geom_point(size = 1)+
geom_smooth(colour = "red")+
labs(title = "Covid Daily Positivity Rate Over Time", x = "Days Since 9/1/2021", y = "Positivity Rate")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The Daily positive Covid Rate appears to be generally increasing after day 100, while days 0-99 were generally increasing as well, they were increasing at a much lower rate.
ggplot(covid, aes(dayNumber, death))+
geom_point(size = 1)+
geom_smooth(size = 0.5, colour = "red")+
labs(title = "Total Covid Deaths", x = "Days Since 9/1/2021", y = "Total Deaths")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
As expected, the total number of Covid deaths has been steadily increasing since September 1st. Starting around day 90, there appear to be weekly trends in which few new deaths are reported for several days then a jump in the next week.
Question 4 (25pts): Make a histogram of the daily positivity rate
Hint: You’ll probably have to search Google to figure this out!
Try Googling: “ggplot how to make histogram”
Is the distribution left-skewed, right-skewed, or fairly symmetric? Answer in the comments
ggplot(covid, aes(positivityRate))+
geom_histogram(binwidth = .025,
color = "blue")+
labs(title = "Covid Daily Postivity Rate Histogram", x = "Positivity Rate", y = "Count")
which.max(covid$positivityRate)
## [1] 130
covid[130,9]
## # A tibble: 1 x 1
## positivityRate
## <dbl>
## 1 2.41
covid[130, 1]
## # A tibble: 1 x 1
## Date
## <chr>
## 1 01/08/2022
The distribution of the positivity rate appears to be right-skewed with an outlier over 2 on Jan 8 2022. This date had the most new cases documented, jumping from 8,754 to 30,802 in one day. This could be a data entry error, but could also reflect the spike in cases over the holidays with results reporting on Jan 8.
End! Once you’re done, make sure to save this script (File -> Save)
Then you can copy and paste your plots into a word file or save them as image files
by clicking the “Export” button in the Plots pane
On Canvas, you’ll submit this file and your 4 plots:
1. Daily positivity rate over time
2. Daily deaths over time
3. Your histogram of the daily positivity rate