My project demonstrates the cumulative increase in COVID-19 cases from January 2020 through May 2022 throughout the United States (including the District of Columbia). I will be creating maps ofthe U.S. as a whole by to illustrate the number of cases and deaths reported each year, as well as line charts to show the states with the highest and lowest number of COVID-19 cases and deaths by daily cases reported. I will also be coding to merge data to create a line chart that demonstrates the case rate in states in order to take population into account. The overall goal of my project is to show the growth of the country’s number of cases and deaths in 2020, 2021 and 2022. I retrieved my data for this project from the New York Times COVID-19 data set (https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-stats.csv)
I predict that within the cumulative increase of COVID-19 cases and deaths, that the states with the most cases and deaths will be those with the most populated cities, especially those with major urban areas that are highly densely populated like cities in California and New York, for example. Population density is defined as the number of people per unit of area (counties in this case). In terms of COVID-19 case rates, I predict that there will be trends present among states with higher and lower case rates. I hypothesize that traditionally “red” states that tend to lean toward more conservative views will have higher COVID-19 case rates and that traditionally “blue” states that tend to lean toward more liberal views will have lower COVID-19 case rates. The COVID-19 case rate is defined as “the number of cases of the disease caused by the novel coronavirus per 100,000 people” according to US News (https://www.usnews.com/news/best-states/coronavirus-data/covid-case-rate?active=37&chart_type=map).
The NYT data site has detailed the following limitations in the dataset I used to complete my project: “On several occasions, officials have corrected information hours or days after first reporting it. At times, cases have disappeared from a local government database, or officials have moved a patient first identified in one state or county to another, often with no explanation. In those instances, which have become more common as the number of cases has grown, our team has made every effort to update the data to reflect the most current, accurate information while ensuring that every known case is counted. When the information is available, we count patients where they are being treated, not necessarily where they live. In most instances, the process of recording cases has been straightforward. But because of the patchwork of reporting methods for this data across more than 50 state and territorial governments and hundreds of local health departments, our journalists sometimes had to make difficult interpretations about how to count and record cases. For those reasons, our data will in some cases not exactly match with the information reported by states and counties.”
First, I created three maps, one for each year of the pandemic, to show the cumulative increase of the number of COVID-19 cases in each of the U.S. states and the District of Columbia.
Each map illustrates the cumulative increase in the number of COVID-19 cases per state. In 2020, California had the highest number at 2,307,860 cases, followed by Texas at 1,770,527 cases and Florida at 1,323,307. In 2021, California had the highest number at 5,515,613, cases, followed by Texas at 4,574,881 cases and Florida at 4,166,392. Finally, by 2022, California still had the highest number at 9,262,337, cases, followed by Texas at 6,758,486 cases and Florida at 5,941,724 total. Those three states have consistently had the most COVID-19 cases outbreak over the course of the pandemic through today. The next consistent states that fell close behind were New York and Illinois.
Here, I created a chart that illustrates the daily number of COVID-19 cases by state.
This chart depicts that the first reported COVID-19 cases was in Washington state on January 21, 2020. The chart also confirms the consistency of New York, Florida, California and Texas as the states with the most COVID-19 cases throughout the course of the pandemic. After these four states, there has also been a consistent gap between the rest of the country since the spring of 2021– around the time of the vaccine roll out nationwide. Last, the chart shows that over the last few months, Vermont has has the least amount of daily COVID-19 cases.
Next, I created three maps, one for each year of the pandemic, to show the cumulative increase of the number of COVID-19 deaths in each of the U.S. states and the District of Columbia.
Each map illustrates the cumulative increase in the number of COVID-19 deaths per state. In 2020, New York had the highest number at 37,557 deaths, followed by Texas at 28,155 deaths and California at 25,965 deaths. In 2021, California had the highest number at 76,709, deaths, followed closely by Texas at 76,062 deaths and Florida at 62,504 deaths total. Finally, by 2022, California had the highest number at 90,541 deaths, followed by Texas at 88,324 deaths and Florida at 74,010 deaths total. Although New York fell out of the top three after 2020, it remained consistently close behind at the number four overall state with the most COVID-19 deaths.
Here, I created a chart that illustrates the daily number of COVID-19 deaths by state.
Similarly to the chart displaying the daily number of COVID-19 cases, this chart showing daily deaths instead, demonstrates the same pattern of four states – California, Texas, Florida and New York – having consistently the highest number of COVID-19 deaths in the country. This chart showing daily deaths also illustrates how New York in 2020 had the highest number of deaths, but was overtaken for the rest of the course of the pandemic by California and Texas. As of May 3, 2022, California had the most daily deaths at 90,541.
Here, I used R to code for a data merge in order to create case rate chart to show for each state over the course of the pandemic since March 13, 2020.
library(tidycensus)
library(tidyverse)
total_population_10 <- get_decennial(
geography = "state",
variables = "P001001",
year = 2010
)
colnames(total_population_10)[2] <- "state"
colnames(total_population_10)[4] <- "Population"
covid <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")
total_population_10 %>%
full_join(covid, by = "state") -> merged_covid
merged_covid %>%
mutate(case_rate = (cases / Population) * 100000) %>%
mutate(death_rate = (deaths / Population) * 100000) -> merged_covid
write_csv(merged_covid, "~/Desktop/covid.csv")
Here, I created a line chart that illustrates the daily case rate for COVID-19 cases by state– just including the states with the four highest and four lowest case rates.
The chart shows that states with the highest COVID-19 cases rates are North Dakota, Alaska, Utah and South Carolina, while the lowest are Hawaii, Maryland, Maine and Oregon. The highest four are all “red” states that have a conservative lean politically, while the lowest four are “blue” states with a more liberal lean. There has been a gap since November 2020, that has grown larger recently, between the states with the highest and lowest case rates.
Overall, my hypothesis was accurate because states with densely populated metropolitan hubs – California, Texas, New York and Florida – consistently had the most reported COVID-19 cases and deaths throughout the pandemic. Also, I was correct because the states with the highest and lowest COVID-19 case rates did lean conservatively and liberally respectively.
In my project, I demonstrated the cumulative increase in COVID-19 cases and deaths from January 2020 through May 2022 by yearly maps and daily line charts. The pandemic’s first major hit was on March 13, 2020 when the country went into a nationwide lockdown in order to stop the spread of coronavirus cases. Despite this, cases continued to rise throughout the spring and slowed into the summertime. The country experienced another surge as local schools and colleges across the country returned to school with hybrid or in-person learning and again as in the winter when the Delta variant was first discovered. In early 2021 throughout the spring, people across the country groups based on their need/risk began to recieve their first and second doses of a vaccine. Cases remained steady as vaccines help fight against the virus, until the emergence of the Omicron variant in December 2021. Cases have overall gone down since the end of January after the height of Omicron hit as the country is now in year three of a global pandemic. Researchers at Yale Medicine outline here the multiple variants that affected the cases and deaths numbers as well as fluctuations throughout the pandemic at this link: https://www.yalemedicine.org/news/covid-19-variants-of-concern-omicron
When you look at the most densely populated metropolitan areas, cities from each of the four states with the highest number of COVID-19 cases and deaths fall into the top 20 in the United States. Los Angeles, California is the number one most densely populated metropolitan area followed by New York, New York, according to World Atlas (https://www.worldatlas.com/articles/the-most-crowded-city-in-the-united-states.html). Miami and Tampa, Florida are ranked at 10 and 11 with Houston, Texas at number 19, thus confirming my hypothesis.
When breaking down the COVID-19 Case Rate map, case rates in general can be helpful in figuring out to what extent the virus has taken hold in a state. It is key when looking at a comparison of the virus’ spread in different states of different sizes because areas with more cases but a large population may also have a similar rate to areas with less cases and overall smaller population. States like North Dakota and Alaska had the two highest case rates– both states are large in size, but less people. On the contrary, Hawaii and Maryland are both smaller in size but have larger populations than North Dakota and Alaska. A trend lies in the highest and lowest case rates as well – by the state’s overall political leaning. The states with the highest case rates – North Dakota, Alaska, Utah and South Carolina all are historically “red” states with conservative state governments who practiced more relaxed masking policies throughout the course of the pandemic with less vaccinated people. Meanwhile, the states with the lowest case rates – Hawaii, Maryland, Maine and Oregon all are historically “blue” states with liberal state governments who practiced stricter masking policies throughout the course of the pandemic with more vaccinated people.