Wherever I look, I can’t seem to escape the narrative of a widespread “anti-mask” movement. This informal analysis sets out to shed some light on the following questions:
1) Is the open refusal to wear masks actually widespread, or is this being overblown?
2) If this variation in mask adoption does exist, is it correlated with support for President Trump?
Some of the datasets used in this analysis are citizen-sourced (from Kaggle), so any conclusions made should be taken with a grain of salt. Either way, I hope this provides a good showcase of my ability to clean and combine different datasets, and make inferences using linear models.
The dataset that initially motivated this analysis was compiled and estimated by Josh Katz, Margot Sanger-Katz and Kevin Quealy of the New York Times. Here is the README they included in their Github repo:
This data comes from a large number of interviews conducted online by the global data and survey firm Dynata at the request of The New York Times. The firm asked a question about mask use to obtain 250,000 survey responses between July 2 and July 14, enough data to provide estimates more detailed than the state level. (Several states have imposed new mask requirements since the completion of these interviews.) Specifically, each participant was asked: How often do you wear a mask in public when you expect to be within six feet of another person?
The methodology used by the researchers to obtain their estimates is also included:
To transform raw survey responses into county-level estimates, the survey data was weighted by age and gender, and survey respondents’ locations were approximated from their ZIP codes. Then estimates of mask-wearing were made for each census tract by taking a weighted average of the 200 nearest responses, with closer responses getting more weight in the average. These tract-level estimates were then rolled up to the county level according to each tract’s total population.
By rolling the estimates up to counties, it reduces a lot of the random noise that is seen at the tract level. In addition, the shapes in the map are constructed from census tracts that have been merged together — this helps in displaying a detailed map, but is less useful than county-level in analyzing the data.
setwd("~/R Data Science/Winter Break Project/covid-19-data")
mask <- read.csv("./mask-use/mask-use-by-county.csv")
head(mask)
## COUNTYFP NEVER RARELY SOMETIMES FREQUENTLY ALWAYS
## 1 1001 0.053 0.074 0.134 0.295 0.444
## 2 1003 0.083 0.059 0.098 0.323 0.436
## 3 1005 0.067 0.121 0.120 0.201 0.491
## 4 1007 0.020 0.034 0.096 0.278 0.572
## 5 1009 0.053 0.114 0.180 0.194 0.459
## 6 1011 0.031 0.040 0.144 0.286 0.500
I prefer to see percents represented as whole numbers, so I multiplied all responses by 100. As the summary statistics show, in the majority of US counties, over 50 percent of the population always wears a mask. However, if we look at lower levels of mask adoption, there is plenty of variation to be explored.
The histograms below represent the frequency of counties that fall at each percent-level of the response. So if we look at the “never” histogram, it looks like most counties are clustered around 0-10 percent, meaining that in most US counties, only 0-10 percent of the county population say they never wear a mask. This histogram is skewed right, with one county having 43.2 percent of respondents saying they never wear a mask.
#change range to be within [0,100] instead of [0,1]
mask[,-1] <- sapply(mask[,-1], function(x) x*100)
sapply(mask[,-1],summary)
## NEVER RARELY SOMETIMES FREQUENTLY ALWAYS
## Min. 0.000000 0.000000 0.1000 2.90000 11.50000
## 1st Qu. 3.400000 4.000000 7.9000 16.40000 39.32500
## Median 6.800000 7.300000 11.5000 20.40000 49.70000
## Mean 7.993953 8.291852 12.1318 20.77247 50.80936
## 3rd Qu. 11.300000 11.500000 15.6000 24.70000 61.37500
## Max. 43.200000 38.400000 42.2000 54.90000 88.90000
hist.data.frame(mask[, -1], mtitl = "Share of County-level Responses")
The next dataset to be read in was sourced from the USDA Economic Research Service. I was interested in exploring levels of educational attainment to see if they correlated with mask usage. This dataset represents a 5-year average of educational attainment levels by county from 2014 to 2018.
In order to merge the two datasets by county ID, I had to add leading zeros to the USDA county ID column. Then I removed columns that would I would not use in the analysis.
I also decided to combine the “FREQUENTLY” and “ALWAYS” responses in the NYT survey to create a category I called “wearers”. Based my interpretation of the survey categories, this represents the percentage of a county’s population that can confidently say they wear a mask over 50 percent of the time they are within 6 feet of someone.
#reading in the county-level education data
education <- read_excel("~/R Data Science/Winter Break Project/education.xls",
col_names = FALSE)
## New names:
## * `` -> ...1
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * ...
#formatting
colnames(education) <- education[5,]
education <- education[-c(1:5),]
education <- as.data.frame(education)
str(education)
## 'data.frame': 3283 obs. of 47 variables:
## $ FIPS Code : chr "00000" "01000" "01001" "01003" ...
## $ State : chr "US" "AL" "AL" "AL" ...
## $ Area name : chr "United States" "Alabama" "Autauga County" "Baldwin County" ...
## $ 2003 Rural-urban Continuum Code : chr NA NA "2" "4" ...
## $ 2003 Urban Influence Code : chr NA NA "2" "5" ...
## $ 2013 Rural-urban Continuum Code : chr NA NA "2" "3" ...
## $ 2013 Urban Influence Code : chr NA NA "2" "2" ...
## $ Less than a high school diploma, 1970 : chr "52373312" "1062306" "6611" "18726" ...
## $ High school diploma only, 1970 : chr "34158051" "468269" "3757" "8426" ...
## $ Some college (1-3 years), 1970 : chr "11650730" "136287" "933" "2334" ...
## $ Four years of college or higher, 1970 : chr "11717266" "141936" "767" "2038" ...
## $ Percent of adults with less than a high school diploma, 1970 : chr "47.700000000000003" "58.700000000000003" "54.799999999999997" "59.399999999999999" ...
## $ Percent of adults with a high school diploma only, 1970 : chr "31.100000000000001" "25.899999999999999" "31.100000000000001" "26.699999999999999" ...
## $ Percent of adults completing some college (1-3 years), 1970 : chr "10.6" "7.5" "7.7000000000000002" "7.4000000000000004" ...
## $ Percent of adults completing four years of college or higher, 1970 : chr "10.699999999999999" "7.7999999999999998" "6.4000000000000004" "6.5" ...
## $ Less than a high school diploma, 1980 : chr "44535197" "964840" "7074" "18125" ...
## $ High school diploma only, 1980 : chr "45947035" "704207" "6145" "15380" ...
## $ Some college (1-3 years), 1980 : chr "20794975" "278205" "2104" "6602" ...
## $ Four years of college or higher, 1980 : chr "21558480" "270063" "2117" "5498" ...
## $ Percent of adults with less than a high school diploma, 1980 : chr "33.5" "43.5" "40.600000000000001" "39.700000000000003" ...
## $ Percent of adults with a high school diploma only, 1980 : chr "34.600000000000001" "31.800000000000001" "35.200000000000003" "33.700000000000003" ...
## $ Percent of adults completing some college (1-3 years), 1980 : chr "15.699999999999999" "12.5" "12.1" "14.5" ...
## $ Percent of adults completing four years of college or higher, 1980 : chr "16.199999999999999" "12.199999999999999" "12.1" "12.1" ...
## $ Less than a high school diploma, 1990 : chr "39343718" "843638" "6252" "17309" ...
## $ High school diploma only, 1990 : chr "47642763" "749591" "6671" "20544" ...
## $ Some college or associate's degree, 1990 : chr "39571702" "553512" "4912" "15900" ...
## $ Bachelor's degree or higher, 1990 : chr "32310253" "399228" "3026" "10870" ...
## $ Percent of adults with less than a high school diploma, 1990 : chr "24.800000000000001" "33.100000000000001" "30" "26.800000000000001" ...
## $ Percent of adults with a high school diploma only, 1990 : chr "30" "29.399999999999999" "32" "31.800000000000001" ...
## $ Percent of adults completing some college or associate's degree, 1990 : chr "24.899999999999999" "21.699999999999999" "23.5" "24.600000000000001" ...
## $ Percent of adults with a bachelor's degree or higher, 1990 : chr "20.300000000000001" "15.699999999999999" "14.5" "16.800000000000001" ...
## $ Less than a high school diploma, 2000 : chr "35715625" "714081" "5872" "17258" ...
## $ High school diploma only, 2000 : chr "52168981" "877216" "9332" "28428" ...
## $ Some college or associate's degree, 2000 : chr "49864428" "746495" "7413" "28178" ...
## $ Bachelor's degree or higher, 2000 : chr "44462605" "549608" "4972" "22146" ...
## $ Percent of adults with less than a high school diploma, 2000 : chr "19.600000000000001" "24.699999999999999" "21.300000000000001" "18" ...
## $ Percent of adults with a high school diploma only, 2000 : chr "28.600000000000001" "30.399999999999999" "33.799999999999997" "29.600000000000001" ...
## $ Percent of adults completing some college or associate's degree, 2000 : chr "27.399999999999999" "25.899999999999999" "26.899999999999999" "29.300000000000001" ...
## $ Percent of adults with a bachelor's degree or higher, 2000 : chr "24.399999999999999" "19" "18" "23.100000000000001" ...
## $ Less than a high school diploma, 2014-18 : chr "26948057" "470043" "4204" "14310" ...
## $ High school diploma only, 2014-18 : chr "59265308" "1020172" "12119" "40579" ...
## $ Some college or associate's degree, 2014-18 : chr "63365655" "987148" "10552" "46025" ...
## $ Bachelor's degree or higher, 2014-18 : chr "68867051" "822595" "10291" "46075" ...
## $ Percent of adults with less than a high school diploma, 2014-18 : chr "12.300000000000001" "14.199999999999999" "11.300000000000001" "9.6999999999999993" ...
## $ Percent of adults with a high school diploma only, 2014-18 : chr "27.100000000000001" "30.899999999999999" "32.600000000000001" "27.600000000000001" ...
## $ Percent of adults completing some college or associate's degree, 2014-18: chr "29" "29.899999999999999" "28.399999999999999" "31.300000000000001" ...
## $ Percent of adults with a bachelor's degree or higher, 2014-18 : chr "31.5" "24.899999999999999" "27.699999999999999" "31.300000000000001" ...
#add leading zeros to county ID in mask dataset
mask$COUNTYFP <- as.character(mask$COUNTYFP)
for(i in 1:length(mask$COUNTYFP)){
if(nchar(mask$COUNTYFP[i]) < 5){
mask$COUNTYFP[i] <- paste("0", mask$COUNTYFP[i], sep = "")
}
}
#merging the datasets by county code
merged <- merge(mask, education, by.x = "COUNTYFP", by.y = "FIPS Code", all.x = TRUE)
#removing unnecessary columns
merged <- merged[,c(1:8,45:52)]
head(merged)
## COUNTYFP NEVER RARELY SOMETIMES FREQUENTLY ALWAYS State Area name
## 1 01001 5.3 7.4 13.4 29.5 44.4 AL Autauga County
## 2 01003 8.3 5.9 9.8 32.3 43.6 AL Baldwin County
## 3 01005 6.7 12.1 12.0 20.1 49.1 AL Barbour County
## 4 01007 2.0 3.4 9.6 27.8 57.2 AL Bibb County
## 5 01009 5.3 11.4 18.0 19.4 45.9 AL Blount County
## 6 01011 3.1 4.0 14.4 28.6 50.0 AL Bullock County
## Less than a high school diploma, 2014-18 High school diploma only, 2014-18
## 1 4204 12119
## 2 14310 40579
## 3 4901 6486
## 4 2650 7471
## 5 7861 13489
## 6 1760 2817
## Some college or associate's degree, 2014-18
## 1 10552
## 2 46025
## 3 4566
## 4 3846
## 5 13267
## 6 1582
## Bachelor's degree or higher, 2014-18
## 1 10291
## 2 46075
## 3 2220
## 4 1813
## 5 5010
## 6 945
## Percent of adults with less than a high school diploma, 2014-18
## 1 11.300000000000001
## 2 9.6999999999999993
## 3 27
## 4 16.800000000000001
## 5 19.800000000000001
## 6 24.800000000000001
## Percent of adults with a high school diploma only, 2014-18
## 1 32.600000000000001
## 2 27.600000000000001
## 3 35.700000000000003
## 4 47.299999999999997
## 5 34
## 6 39.700000000000003
## Percent of adults completing some college or associate's degree, 2014-18
## 1 28.399999999999999
## 2 31.300000000000001
## 3 25.100000000000001
## 4 24.399999999999999
## 5 33.5
## 6 22.300000000000001
## Percent of adults with a bachelor's degree or higher, 2014-18
## 1 27.699999999999999
## 2 31.300000000000001
## 3 12.199999999999999
## 4 11.5
## 5 12.6
## 6 13.300000000000001
#sum the share of county pop. that wears mask frequently and always
merged$wearers <- (merged$ALWAYS + merged$FREQUENTLY)
hist(merged$wearers, main = "Mask Adoption by County", xlab = "Share of county that wears masks 'Frequently' or 'Always'")
summary(merged$wearers)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25.50 62.30 72.20 71.58 81.90 99.20
#change column classes to numeric
merged[,9:ncol(merged)] <- apply(merged[,9:ncol(merged)], 2, as.numeric)
Already, we can see from the histogram above that the notion of a widespread anti-mask movement is somewhat sensationalized. In 75 percent of US counties, at least 60 percent of county residents wear a mask frequently or always when within 6 feet of others. Further, the median percentage of county residents who are “mask wearers” is 72 percent.
First, I ran the correlations between the new variable “mask wearers” and the remaining variables in the dataset.
cor(merged[,13:(ncol(merged)-1)], merged[,"wearers"], use = "complete.obs")
## [,1]
## Percent of adults with less than a high school diploma, 2014-18 0.01303711
## Percent of adults with a high school diploma only, 2014-18 -0.35189409
## Percent of adults completing some college or associate's degree, 2014-18 -0.18278990
## Percent of adults with a bachelor's degree or higher, 2014-18 0.36052924
To further understand the relationships between mask “wearers” and education levels, I created scatterplots overlaid with a simple linear model.
#renaming columns to make them easier to work with
colnames(merged)[13:16] <- c("%less.HS", "%only.HS", "%some.college", "%bachelors.or.higher")
# % Bachelor's Degree EDA
plot1 <- plot(merged$`%bachelors.or.higher`, merged$wearers, xlab = "% of adults with Bachelor's degree or higher", ylab = "% mask wearers", main = "Mask wearers vs. % Bachelor's Degree or above")
plot1
## NULL
#simple linear model
lm1 <- lm(wearers ~ `%bachelors.or.higher`, data = merged)
summary(lm1)
##
## Call:
## lm(formula = wearers ~ `%bachelors.or.higher`, data = merged)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.849 -7.797 0.864 9.151 29.810
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 60.77509 0.54455 111.61 <2e-16 ***
## `%bachelors.or.higher` 0.50093 0.02313 21.66 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.23 on 3140 degrees of freedom
## Multiple R-squared: 0.13, Adjusted R-squared: 0.1297
## F-statistic: 469.1 on 1 and 3140 DF, p-value: < 2.2e-16
abline(lm1)
# %only HS degree EDA
plot2 <- plot(merged$`%only.HS`, merged$wearers, xlab = "% of adults only HS degree", ylab = "% mask wearers", main = "Mask wearers vs. % Only HS Degree")
plot2
## NULL
lm2 <- lm(wearers ~ `%only.HS`, data = merged)
summary(lm2)
##
## Call:
## lm(formula = wearers ~ `%only.HS`, data = merged)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.247 -7.857 0.908 9.084 32.485
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 93.57306 1.06663 87.73 <2e-16 ***
## `%only.HS` -0.64139 0.03045 -21.07 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.27 on 3140 degrees of freedom
## Multiple R-squared: 0.1238, Adjusted R-squared: 0.1236
## F-statistic: 443.8 on 1 and 3140 DF, p-value: < 2.2e-16
abline(lm2)
#try MLR model
lm3 <- lm(wearers ~ `%bachelors.or.higher` + `%only.HS`, data = merged)
summary(lm3)
##
## Call:
## lm(formula = wearers ~ `%bachelors.or.higher` + `%only.HS`, data = merged)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43.302 -7.671 1.049 9.043 29.351
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.33684 2.30394 33.133 < 2e-16 ***
## `%bachelors.or.higher` 0.30551 0.03630 8.415 < 2e-16 ***
## `%only.HS` -0.33091 0.04762 -6.948 4.48e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.14 on 3139 degrees of freedom
## Multiple R-squared: 0.1432, Adjusted R-squared: 0.1426
## F-statistic: 262.2 on 2 and 3139 DF, p-value: < 2.2e-16
So educational levels yield significant coefficients in the multiple linear regression model, but also a low adjusted R-squared value. A one percent increase in the percentage of a county with a Bachelor’s degree or higher is correlated with a .305 percent increase in mask adoption. Similarly, a one percent increase in the percentage of a county with only a high school degree is correlated with a .3309 percent decrease in mask adoption. However this only explains about 14% of the total variation in mask adoption across counties.
I would be remiss to not explore the relationship between the 2020 presidential election results and the percentage of conscientious mask-wearers in a county. After some searching on Kaggle, I came across a dataset that had what I was looking for. The dataset was compiled by the user Ethan Schacht.
This gives us a rough idea of the 2020 election vote share by county but there are some oddities in the dataset and Schact does not go into detail regarding where he sourced this dataset. For some reason, there are 4867 rows despite there only being about 3142 counties in the US. Additionally, many of the columns have about 1725 NAs.
I will still clean and analyze the dataset, but any information gained should be taken in the context of this user-sourced dataset.
#read in new dataset
us <- read.csv("./county_statistics.csv")
us <- us[,-1]
dim(us)
## [1] 4867 50
head(us)
## county state percentage16_Donald_Trump percentage16_Hillary_Clinton
## 1 Abbeville SC 0.629 0.346
## 2 Acadia LA 0.773 0.206
## 3 Accomack VA 0.545 0.428
## 4 Ada ID 0.479 0.387
## 5 Adair IA 0.653 0.300
## 6 Adair KY 0.806 0.161
## total_votes16 votes16_Donald_Trump votes16_Hillary_Clinton
## 1 10724 6742 3712
## 2 27386 21159 5638
## 3 15755 8582 6737
## 4 195587 93748 75676
## 5 3759 2456 1127
## 6 8231 6637 1323
## percentage20_Donald_Trump percentage20_Joe_Biden total_votes20
## 1 0.661 0.330 12433
## 2 0.795 0.191 28425
## 3 0.542 0.447 16938
## 4 0.504 0.465 259389
## 5 0.697 0.286 4183
## 6 0.830 0.159 8766
## votes20_Donald_Trump votes20_Joe_Biden lat long cases deaths
## 1 8215 4101 34.22333 -82.46171 805 17
## 2 22596 5443 30.29506 -92.41420 3182 102
## 3 9172 7578 37.76707 -75.63235 1227 19
## 4 130699 120539 43.45266 -116.24155 17451 181
## 5 2917 1197 41.33076 -94.47106 222 1
## 6 7275 1391 37.10460 -85.28130 517 22
## TotalPop Men Women Hispanic White Black Native Asian Pacific
## 1 24788 12044 12744 1.3 68.9 27.6 0.1 0.3 0.0
## 2 62607 30433 32174 2.4 77.5 17.6 0.1 0.1 0.0
## 3 32840 16079 16761 8.8 60.3 28.3 0.3 0.7 0.0
## 4 435117 217999 217118 7.9 85.2 1.2 0.4 2.6 0.1
## 5 7192 3552 3640 1.7 96.6 0.3 0.0 0.4 0.0
## 6 19304 9632 9672 1.8 93.4 3.6 0.1 0.1 0.0
## VotingAgeCitizen Income IncomeErr IncomePerCap IncomePerCapErr Poverty
## 1 19452 35254 2259 19234 799 22.7
## 2 45197 40492 2544 21591 1002 21.5
## 3 24408 42260 2253 24266 1564 19.8
## 4 316189 60151 1294 31642 725 11.8
## 5 5572 49477 2633 28861 2055 9.5
## 6 15280 36575 3426 18408 1010 21.5
## ChildPoverty Professional Service Office Construction Production Drive
## 1 32.1 27.2 20.7 20.8 10.6 20.7 78.3
## 2 27.6 27.6 16.9 25.7 15.0 14.8 83.2
## 3 31.8 31.1 17.7 18.8 15.1 17.3 80.0
## 4 13.1 43.0 16.6 25.0 6.9 8.4 80.7
## 5 12.1 28.2 16.9 20.0 17.3 17.6 77.9
## 6 27.1 28.5 15.9 19.7 12.2 23.8 84.5
## Carpool Transit Walk OtherTransp WorkAtHome MeanCommute Employed PrivateWork
## 1 11.1 0.5 1.8 1.8 6.5 25.8 9505 78.8
## 2 10.3 0.2 1.6 2.2 2.5 27.6 24982 80.0
## 3 10.6 0.5 2.6 1.8 4.5 22.0 13837 74.6
## 4 7.7 0.5 1.5 2.8 6.9 20.4 214984 78.3
## 5 12.4 0.3 2.8 0.4 6.2 22.3 3680 73.8
## 6 9.0 0.0 2.6 0.5 3.4 22.2 7988 74.1
## PublicWork SelfEmployed FamilyWork Unemployment
## 1 13.3 7.8 0.1 9.4
## 2 12.1 7.6 0.3 8.9
## 3 18.1 7.1 0.2 5.4
## 4 15.0 6.6 0.1 4.3
## 5 15.3 10.4 0.5 3.0
## 6 15.8 9.9 0.1 6.2
sapply(us, function(x) sum(is.na(x)))
## county state
## 0 0
## percentage16_Donald_Trump percentage16_Hillary_Clinton
## 1756 1756
## total_votes16 votes16_Donald_Trump
## 1756 1756
## votes16_Hillary_Clinton percentage20_Donald_Trump
## 1756 377
## percentage20_Joe_Biden total_votes20
## 377 234
## votes20_Donald_Trump votes20_Joe_Biden
## 234 234
## lat long
## 1615 1615
## cases deaths
## 1615 1615
## TotalPop Men
## 1725 1725
## Women Hispanic
## 1725 1725
## White Black
## 1725 1725
## Native Asian
## 1725 1725
## Pacific VotingAgeCitizen
## 1725 1725
## Income IncomeErr
## 1725 1725
## IncomePerCap IncomePerCapErr
## 1725 1725
## Poverty ChildPoverty
## 1725 1726
## Professional Service
## 1725 1725
## Office Construction
## 1725 1725
## Production Drive
## 1725 1725
## Carpool Transit
## 1725 1725
## Walk OtherTransp
## 1725 1725
## WorkAtHome MeanCommute
## 1725 1725
## Employed PrivateWork
## 1725 1725
## PublicWork SelfEmployed
## 1725 1725
## FamilyWork Unemployment
## 1725 1725
Since the new dataset does not have county ID’s, we will need to join the two dataframes by the county name and state abbreviation columns.
#merge the columns from the US dataset to the aggregate merged set
merged$`Area name` <- strsplit(merged$`Area name`, split = " ")
merged$`Area name` <- sapply(merged$`Area name`, function(x) x[1])
merged <- merge(merged, us, by.x = c('Area name', 'State'), by.y = c('county', 'state'), all.x = TRUE)
#express percentages within range of [0,100]
merged[,23:24] <- merged[,23:24]*100
Again, we will explore the relationships of this new dataset by running a correlation between all variables and the share of mask “wearers”.
cor(merged$wearers, merged[,18:65], use = "complete.obs")
## percentage16_Donald_Trump percentage16_Hillary_Clinton total_votes16
## [1,] -0.4671356 0.4817604 0.3201092
## votes16_Donald_Trump votes16_Hillary_Clinton percentage20_Donald_Trump
## [1,] 0.3280372 0.2938529 -0.4966669
## percentage20_Joe_Biden total_votes20 votes20_Donald_Trump
## [1,] 0.5017273 0.3045235 0.3009313
## votes20_Joe_Biden lat long cases deaths TotalPop
## [1,] 0.2912494 -0.215381 0.1542209 0.2300885 0.2128475 0.3005584
## Men Women Hispanic White Black Native Asian
## [1,] 0.3009001 0.3001504 0.2782887 -0.3371633 0.1819226 -0.09867881 0.3158354
## Pacific VotingAgeCitizen Income IncomeErr IncomePerCap
## [1,] 0.0701423 0.3125965 0.2395558 -0.1792755 0.2122368
## IncomePerCapErr Poverty ChildPoverty Professional Service Office
## [1,] -0.2107613 -0.01300673 0.005401577 0.1811291 0.1679398 0.2338466
## Construction Production Drive Carpool Transit Walk
## [1,] -0.2160054 -0.2766698 -0.02112641 0.001268234 0.2283708 -0.1199684
## OtherTransp WorkAtHome MeanCommute Employed PrivateWork PublicWork
## [1,] 0.1137755 -0.09420392 0.3047717 0.299713 0.1389315 0.034555
## SelfEmployed FamilyWork Unemployment
## [1,] -0.2964324 -0.1515119 0.1868341
#simple linear regression for percent trump voters and mask use
plot(merged$percentage20_Donald_Trump, merged$wearers, main = "Mask wearers vs. 2020 Trump Vote Share", xlab = "2020 Trump vote %", ylab = "% mask wearers")
lm4 <- lm(wearers ~ percentage20_Donald_Trump, data = merged)
summary(lm4)
##
## Call:
## lm(formula = wearers ~ percentage20_Donald_Trump, data = merged)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.153 -7.276 0.683 8.068 35.375
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 98.94587 0.93336 106.01 <2e-16 ***
## percentage20_Donald_Trump -0.42169 0.01375 -30.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.27 on 2873 degrees of freedom
## (267 observations deleted due to missingness)
## Multiple R-squared: 0.2467, Adjusted R-squared: 0.2464
## F-statistic: 940.8 on 1 and 2873 DF, p-value: < 2.2e-16
abline(lm4)
The model has an adjusted R-squared value of .2464 so 2020 Trump vote share explains more of the variation in mask adoption than levels of education. It is important to note that the residual standard error is 11.27, indicating that there is plenty of unexplored variation in mask adoption that cannot be explained by vote share alone. Still, the coefficient is significant, indicating that a one percent increase in 2020 Trump vote share is correlated with a .42 percent decrease in mask adoption.
Next we will combine the educational predictor and 2020 vote share in a multiple linear regression model.
lm5 <- lm(wearers ~ percentage20_Donald_Trump + `%only.HS`, merged, na.action = na.omit)
summary(lm5)
##
## Call:
## lm(formula = wearers ~ percentage20_Donald_Trump + `%only.HS`,
## data = merged, na.action = na.omit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.308 -6.876 0.753 7.766 35.351
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 103.94537 1.15179 90.247 < 2e-16 ***
## percentage20_Donald_Trump -0.37051 0.01533 -24.170 < 2e-16 ***
## `%only.HS` -0.24248 0.03329 -7.285 4.13e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.17 on 2872 degrees of freedom
## (267 observations deleted due to missingness)
## Multiple R-squared: 0.2603, Adjusted R-squared: 0.2598
## F-statistic: 505.4 on 2 and 2872 DF, p-value: < 2.2e-16
After adjusting for the % of the population in a county that only has a high school degree, a one percent increase in 2020 Trump vote share is correlated with a .37 percent decrease in mask adoption. The coefficients for % with HS degree and 2020 Trump vote share are significant, and the adjusted R-squared value is greater than any simple model run in this analysis.
First, we can conclude that in almost all counties, the vast majority of individuals do wear a mask within 6 feet of others. This seems to negate the idea of a widespread mask movement.
Second, there is a significant negative correlation between support for President Trump and mask adoption after controlling for education levels. However, this relationship is small in magnitude, and there is a high residual standard error. So support for Trump might be a small, significant factor among many other factors that explain mask adoption by county.