CUNY Data 607 - Final Project
Overview
The spread and effects of COVID-19 misinformation have been widely reported as an obstacle to vaccination. Conspiracy theories and disinformation on social media - as well as some mainstream outlets - has been fanning public resistance to scientific research and legitimate public health advisories, especially among vulnerable populations in the United States.
Assuming the (hypothetical) role of a communications strategist advising NGOs or government agencies on vaccine hesitancy outreach, it would be helpful to identify targetable audiences and effective messaging.
In this analysis I will examine and attempt to quantify any fundamental relationships between county-level vaccination rates and metrics of trust/distrust regarding media, public health and government agencies.
Project Update
One major change from the original proposal is the selection of a alternative source of explanatory variables. I had been planning to examine socioeconomic datasets such as the Geography of Social Capital in America published by the U.S. Congress Joint Economic Committee, or the Social Vulnerability Index (SVI) published by the CDC.
However, both these datasets are only available as direct downloads - in order to fulfill the requirements of this project I opted for other sources of information that could be accessed via API.
Carnegie Mellon University sponsors an epidemiological research team (the Delphi Group) which publishes and maintains COVIDcast, the nation’s largest repository of real-time COVID-19 datasets, available via a public API. Among these data are national surveys of sociological indicators such as sentiment/attitudes towards media, public health and governmental officials with regard to COVID-19 at a county level.
Datasets
All data preparation steps are embedded in this workbook, and include the following steps:
- Load, select, filter and aggregate relevant fields and rows from each dataset.
- Format/clean data values and align naming conventions.
- Join datasets at the county level using the County FIPS ID.
CDC COVID Data Tracker
The CDC COVID Data Tracker includes vaccination data reported by state and local Immunization Information Systems (IIS) per U.S. county. These data reflect cumulative national statistics at the county level as of November 2020. This dataset was retrieved manually from the CDC website and saved to the file covid19_vaccinations_equity.csv.
cdc_vax<- read.csv('data/covid19_vaccinations_equity.csv', skip=2)| County | State | SVI | County.of.Residence.Reporting.Completeness | Percent.of.total.population.fully.vaccinated | Percent.of.12..pop.fully.vaccinated | Percent.of.18..pop.fully.vaccinated | Percent.of.65..pop.fully.vaccinated |
|---|---|---|---|---|---|---|---|
| Autauga County | AL | Low-Mod | 92.6 | 36.2 | 42.5 | 44.3 | 64.2 |
| Baldwin County | AL | Low | 92.6 | 45.1 | 52.3 | 54.7 | 77.9 |
| Barbour County | AL | High | 92.6 | 39 | 45 | 46.9 | 67.7 |
| Bibb County | AL | Mod-High | 92.6 | 31.4 | 36.1 | 37.6 | 58.5 |
| Blount County | AL | Low-Mod | 92.6 | 28.3 | 33.3 | 35.2 | 51.3 |
| Bullock County | AL | High | 92.6 | 45.4 | 52.9 | 54 | 75.2 |
Federal Information Processing System (FIPS)
A reference dataset of Federal Information Processing System (FIPS) codes at the county level. For simplicity, this dataset was retrieved manually from a public github repo, verified against the U.S. Census ANSI site and saved to the file state_and_county_fips_master.csv.
county_fips <- read.csv('data/state_and_county_fips_master.csv')| fips | name | state |
|---|---|---|
| 0 | UNITED STATES | NA |
| 1000 | ALABAMA | NA |
| 1001 | Autauga County | AL |
| 1003 | Baldwin County | AL |
| 1005 | Barbour County | AL |
| 1007 | Bibb County | AL |
COVID-19 Trends and Impact Survey
The COVID-19 Trends and Impact Survey from the COVIDcast Epidata API, published by the Delphi Group at Carnegie Mellon University, includes multiple data streams covering the spread and impact of COVID-19 in the United States, and is accessed by a public API endpoint.
Special Note: While the COVIDcast project maintains an R library (covidcast) for easy retrieval, I was unable to install it correctly due to dependency issues on my local system. I was able to install the Python version, however, and in completing this project learned how to call Py code from within R using the reticulate library.
Per the Covidcast instructions I have saved these datasets locally after retrieving them via API, but the python code used for accessing them is included here for reference.
# python setup
import pandas as pd
import covidcast as cc
from datetime import dateFrom the Covidcast API, I selected several datasets in the Belief, experience, and information indicators category. These are Facebook-based surveys measuring respondent’s beliefs about COVID-19, their experiences of health care, their sources of information about COVID-19, and their degree of trust in different information sources. All available results for the calendar year 2021 were retrieved.
Survey Item G3: belief_masking_effective Estimated percentage of respondents who believe that wearing a face mask is either very or moderately effective for preventing the spread of COVID-19.
# Facebook Survey - belief that masking is effective
data_fb_g3 = cc.signal('fb-survey', 'smoothed_wbelief_masking_effective',
date(2021,6,10), date(2021,12,31), 'county')
data_fb_g3.to_csv('data/delphi/data_fb_g3.csv')Survey Item V5abc: hesitancy_reason_distrust_gov Estimated percentage of respondents who say they are hesitant to get vaccinated because they don’t trust the government, among respondents who answered “Yes, probably”, “No, probably not”, or “No, definitely not” when asked if they would get vaccinated if offered.
# Facebook Survey - vaccine hesitancy reason: distrust gov
data_fb_v5abc = cc.signal('fb-survey', 'smoothed_whesitancy_reason_distrust_gov',
date(2021,2,26), date(2021,12,31), 'county')
data_fb_v5abc.to_csv('data/delphi/data_fb_v5abc.csv')Survey Item i3: belief_created_small_group Estimated percentage of people who believe that the statement “COVID-19 was deliberately created by a small group of people who secretly manipulate world events” is definitely or probably true.
# Facebook Survey - belief COVID created by small group
data_fb_i3 = cc.signal('fb-survey', 'smoothed_wbelief_created_small_group',
date(2021,5,20), date(2021,12,31), 'county')
data_fb_i3.to_csv('data/delphi/data_fb_i3.csv')Survey Item i4: belief_govt_exploitation Estimated percentage of people who indicate that the statement “The COVID-19 pandemic is being exploited by the government to control people” is definitely or probably true.
# Facebook Survey - belief in government exploitation
data_fb_i4 = cc.signal('fb-survey', 'smoothed_wbelief_govt_exploitation',
date(2021,5,20), date(2021,12,31), 'county')
data_fb_i4.to_csv('data/delphi/data_fb_i4.csv')Survey Item i6: trust_covid_info_experts Estimated percentage of respondents who trust scientists and other health experts to provide accurate news and information about COVID-19.
# Facebook Survey - Trust experts for covid info
data_fb_i6_xprt = cc.signal('fb-survey', 'smoothed_wtrust_covid_info_experts',
date(2021,5,19), date(2021,12,31), 'county')
data_fb_i6_xprt.to_csv('data/delphi/data_fb_i6_xprt.csv')Survey Item i6: trust_covid_info_journalists Estimated percentage of respondents who trust journalists to provide accurate news and information about COVID-19.
# Facebook Survey - Trust journalists for covid info
data_fb_i6_jrn = cc.signal('fb-survey', 'smoothed_wtrust_covid_info_journalists',
date(2021,5,19), date(2021,12,31), 'county')
data_fb_i6_jrn.to_csv('data/delphi/data_fb_i6_jrn.csv')Survey Item i6: trust_covid_info_docs Estimated percentage of respondents who trust doctors to provide accurate news and information about COVID-19.
# Facebook Survey - Trust doctors for covid info
data_fb_i6_jrn = cc.signal('fb-survey', 'smoothed_wtrust_covid_info_doctors',
date(2021,5,19), date(2021,12,31), 'county')
data_fb_i6_jrn.to_csv('data/delphi/data_fb_i6_doc.csv')Data Transformations
Unfortunately the CDC Covid Data Tracker data did not include the FIPS county codes, posing an additional challenge of matching county names as character strings, with some noted inconsistencies. This required some cleanup of both the vaccination and FIPS datasets, incorporating many alternative spellings and designations for county and country-equivalent regions:
vax <- cdc_vax %>%
select(c(County,State,Percent.of.total.population.fully.vaccinated)) %>%
rename(vax_rate = Percent.of.total.population.fully.vaccinated) %>%
rename(name = County) %>%
rename(state=State) %>%
mutate(state = str_trim(state)) %>%
mutate(name= tolower(name)) %>%
mutate(vax_rate= as.numeric(vax_rate)) %>%
mutate(name = str_replace_all(name,' and borough','')) %>%
mutate(name = str_replace_all(name,' borough','')) %>%
mutate(name = str_replace_all(name,'&bor','')) %>%
mutate(name = str_replace_all(name,' census area','')) %>%
mutate(name = str_replace_all(name,' ca','')) %>%
mutate(name = str_replace_all(name,'rolina',' carolina')) %>%
mutate(name = str_replace_all(name,' city','')) %>%
mutate(name = str_replace_all(name,' cty','')) %>%
mutate(name = str_replace_all(name,' county','')) %>%
mutate(name = str_replace_all(name,' municipality','')) %>%
mutate(name = str_replace_all(name,' muno','')) %>%
mutate(name = str_replace_all(name,' muny','')) %>%
mutate(name = str_replace_all(name,' parish','')) %>%
mutate(name = str_replace_all(name,'la salle','lasalle')) %>%
mutate(name = str_replace_all(name,'do̱a','dona'))And similar cleanup for the county_fips table:
fips <- county_fips %>%
mutate(name= tolower(name)) %>%
mutate(name = str_replace_all(name,' and borough','')) %>%
mutate(name = str_replace_all(name,' borough','')) %>%
mutate(name = str_replace_all(name,'&bor','')) %>%
mutate(name = str_replace_all(name,' census area','')) %>%
mutate(name = str_replace_all(name,' ca','')) %>%
mutate(name = str_replace_all(name,'rolina',' carolina')) %>%
mutate(name = str_replace_all(name,' city','')) %>%
mutate(name = str_replace_all(name,' cty','')) %>%
mutate(name = str_replace_all(name,' county','')) %>%
mutate(name = str_replace_all(name,' municipality','')) %>%
mutate(name = str_replace_all(name,' muno','')) %>%
mutate(name = str_replace_all(name,' muny','')) %>%
mutate(name = str_replace_all(name,' parish','')) %>%
mutate(name = str_replace_all(name,'la salle','lasalle')) %>%
mutate(name = str_replace_all(name,'do̱a','dona'))And then join the two for a dataframe of vaccination rates by county, with county FIPS code:
# join
df_county_vax <- full_join(vax,fips) %>%
filter(!is.na(fips))| name | state | vax_rate | fips |
|---|---|---|---|
| autauga | AL | 36.2 | 1001 |
| baldwin | AL | 45.1 | 1003 |
| barbour | AL | 39.0 | 1005 |
| bibb | AL | 31.4 | 1007 |
| blount | AL | 28.3 | 1009 |
| bullock | AL | 45.4 | 1011 |
Now for the survey data. Since all surveys follow an identical format, it will be easier to union them into one dataframe:
df_files <- data.frame() # initialize
files <- list.files(path='data/delphi')
for (file_name in files){
file_path <- paste('data/delphi/', file_name, sep='')
df_files <- union_all(df_files, read_csv(file_path, show_col_types = FALSE))
}And now some cleanup of the survey data, renaming columns and converting the FIPS code to integer format:
df_surveys <- df_files %>%
select(geo_value, signal, time_value, value, stderr, sample_size) %>%
rename(fips = geo_value) %>%
rename(date = time_value) %>%
mutate(fips = as.integer(fips, length=0))| fips | signal | date | value | stderr | sample_size |
|---|---|---|---|---|---|
| 1000 | smoothed_wbelief_masking_effective | 2021-06-10 | 64.49579 | 1.733865 | 761.6943 |
| 1073 | smoothed_wbelief_masking_effective | 2021-06-10 | 72.52404 | 4.242234 | 110.7250 |
| 1089 | smoothed_wbelief_masking_effective | 2021-06-10 | 76.90033 | 3.687850 | 130.6133 |
| 2000 | smoothed_wbelief_masking_effective | 2021-06-10 | 69.72459 | 3.273444 | 197.0000 |
| 4000 | smoothed_wbelief_masking_effective | 2021-06-10 | 69.37468 | 3.064314 | 226.2638 |
| 4013 | smoothed_wbelief_masking_effective | 2021-06-10 | 80.38044 | 1.386473 | 820.3837 |
There is one abnormality in the survey data we need to account for. As part of the Delphi Group’s methodology, State-level FIPS codes (all ending in 000) were used to bucket daily county-level responses where a certain response threshold was not met. Since it is not practical to dis-aggregate these rows at the county level, we’ll discard them and keep only those rows where, on a particular day, response levels were high enough to be attributed to a specific county.
df_county_surveys <- df_surveys %>%
filter(fips %% 1000 != 0)For this analysis, we’ll calculate the mean response value for each county and each of the seven survey topics (aka ‘signals’), and then pivot_wider so we have a single row of these seven observations per county:
df_county_surveys <- df_county_surveys %>%
group_by(fips,signal) %>%
summarize(mean_value = mean(value, rm.na=TRUE)) %>%
mutate(signal=str_replace(signal,'smoothed_w',''))
df_county_surveys_pvt <- df_county_surveys %>%
pivot_wider(names_from=signal, values_from=mean_value)And for the final step, we’ll join the transformed survey and vaccination datasets on the FIPS code.
df <- full_join(df_county_vax, df_county_surveys_pvt)| name | state | vax_rate | fips | belief_created_small_group | belief_govt_exploitation | belief_masking_effective | hesitancy_reason_distrust_gov | trust_covid_info_doctors | trust_covid_info_experts | trust_covid_info_journalists |
|---|---|---|---|---|---|---|---|---|---|---|
| autauga | AL | 36.2 | 1001 | NA | NA | NA | NA | NA | NA | NA |
| baldwin | AL | 45.1 | 1003 | 27.73857 | 34.03297 | 63.2041 | 26.99626 | 62.26397 | 48.86253 | 6.387081 |
| barbour | AL | 39.0 | 1005 | NA | NA | NA | NA | NA | NA | NA |
| bibb | AL | 31.4 | 1007 | NA | NA | NA | NA | NA | NA | NA |
| blount | AL | 28.3 | 1009 | NA | NA | NA | NA | NA | NA | NA |
| bullock | AL | 45.4 | 1011 | NA | NA | NA | NA | NA | NA | NA |
| butler | AL | 35.6 | 1013 | NA | NA | NA | NA | NA | NA | NA |
| calhoun | AL | 42.0 | 1015 | NA | NA | NA | NA | NA | NA | NA |
| chambers | AL | 28.3 | 1017 | NA | NA | NA | NA | NA | NA | NA |
| cherokee | AL | 29.1 | 1019 | NA | NA | NA | NA | NA | NA | NA |
# some useful references
signals <- unique(df_county_surveys$signal)
state_fips <- fips %>% filter(fips %% 1000 == 0)Exporatory Data Analysis
The dataset consists of 3207 rows representing individual U.S. counties, with 3138 mean vaccination rates, but a much sparser distribution of county-level survey results:
kbl(sapply(df, function(x) sum(!is.na(x)))) %>%
kable_styling()| x | |
|---|---|
| name | 3207 |
| state | 3155 |
| vax_rate | 3138 |
| fips | 3207 |
| belief_created_small_group | 311 |
| belief_govt_exploitation | 310 |
| belief_masking_effective | 323 |
| hesitancy_reason_distrust_gov | 193 |
| trust_covid_info_doctors | 298 |
| trust_covid_info_experts | 297 |
| trust_covid_info_journalists | 295 |
The mean vaccination rate for all counties is 44.8 and the median is 44.5, with a standard deviation of 12.36.
The distribution of vaccination rates by county are nearly symmetrical but with somewhat thicker tails than a normal distribution, and some notable outliers on either side. Some of these outliers are easily identifiable as data errors by cross-checking with contemporary sources (for example, Chattahoochee, GA at the top of our distribution was never 99.9% vaccinated - Georgia has some the lowest vax rates in the country. And Honolulu, HI at the bottom of our distribution did not have an 0.1% vax rate in November 2020.)
ggplot(df, aes(x=vax_rate)) +
geom_histogram(bins=150, alpha=0.5) +
geom_vline(xintercept=median(df$vax_rate, na.rm=TRUE), size=2, color='blue', alpha=0.2) +
geom_vline(xintercept=median(df$vax_rate, na.rm=TRUE)-(IQR(df$vax_rate, na.rm=TRUE) * 1.5), size=2, color='red', alpha=0.2) +
geom_vline(xintercept=median(df$vax_rate, na.rm=TRUE)+(IQR(df$vax_rate, na.rm=TRUE) * 1.5), size=2, color='red', alpha=0.2) +
labs(title="County Vaccination Rates", x = 'vax_rate')However most of these extreme values, while well outside the “1.5 IQR” range (red lines in the graph), seem legitimate. I opted to cherry-pick and remove these two obvious data errors from the most extreme ends of the distribution and leave the rest of the values in place.
df <- df %>%
filter(vax_rate > 0.1) %>%
filter(vax_rate < 99.9)Graphing the response and explanatory variables together reveals strong linear relationships between vaccination rates and our signals - some positive and some negative.
Where trust signals are strong (for example, trust in doctors, journalists, experts), there is a positive relationship with vaccination rates. Where distrust signals are strong (for example, belief in conspiracy theories or distrust in government), there is a negative relationship with vaccination rates.
for (s in signals){
print(
ggplot(df, aes_string(x = s, y='vax_rate')) +
geom_point(alpha=0.2) +
geom_smooth(method='lm') +
labs(x = s, y = 'vax_rate')
)
}Linear Models
Given the strong linear relationships, linear models may be useful to predict vaccination rates based on these variables.
Running simple linear regressions of vaccination rates against each individual signal provides some insight into the corresponding scatterplots above.
For example, here is the summary of a linear regression of vaccination rates to the signal belief_created_small_group. The coefficent is -0.89, illustrating a strongly negative relationship to vaccination rates, and at a high level of statistical significance.
lm_vax_1 <- lm(vax_rate ~ belief_created_small_group, data=df)
summary(lm_vax_1) # .2269##
## Call:
## lm(formula = vax_rate ~ belief_created_small_group, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.149 -4.221 -0.066 4.069 26.225
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.78991 1.72075 42.882 <2e-16 ***
## belief_created_small_group -0.88640 0.09287 -9.545 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.479 on 306 degrees of freedom
## (2828 observations deleted due to missingness)
## Multiple R-squared: 0.2294, Adjusted R-squared: 0.2269
## F-statistic: 91.1 on 1 and 306 DF, p-value: < 2.2e-16
This signal alone, however, only explains about 23% of the variance in the model as indicated by the Adjusted R-squared. We can run a multivariate linear regression instead with all seven of our signals to see if we get a better result:
lm_vax_2 <- lm(vax_rate ~ belief_created_small_group +
belief_govt_exploitation +
belief_masking_effective +
hesitancy_reason_distrust_gov +
trust_covid_info_doctors +
trust_covid_info_experts +
trust_covid_info_journalists, data=df)
summary(lm_vax_2) # .4479##
## Call:
## lm(formula = vax_rate ~ belief_created_small_group + belief_govt_exploitation +
## belief_masking_effective + hesitancy_reason_distrust_gov +
## trust_covid_info_doctors + trust_covid_info_experts + trust_covid_info_journalists,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.862 -2.449 0.329 4.016 13.983
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -71.1366 44.8933 -1.585 0.1148
## belief_created_small_group 1.4633 0.3359 4.356 2.19e-05 ***
## belief_govt_exploitation -0.1517 0.5191 -0.292 0.7704
## belief_masking_effective 0.1736 0.2758 0.629 0.5299
## hesitancy_reason_distrust_gov -0.0970 0.1045 -0.928 0.3547
## trust_covid_info_doctors 0.6579 0.3041 2.163 0.0318 *
## trust_covid_info_experts 0.7133 0.3121 2.285 0.0234 *
## trust_covid_info_journalists 0.4326 0.3732 1.159 0.2479
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.333 on 184 degrees of freedom
## (2944 observations deleted due to missingness)
## Multiple R-squared: 0.4681, Adjusted R-squared: 0.4479
## F-statistic: 23.13 on 7 and 184 DF, p-value: < 2.2e-16
Running the regression based on all seven signals produces a much higher Adjusted R-Squared of 45% with p-value of 2.2e-16, although several of the individual explanatory variables now have low levels of statistical significance. However, I found that backing out individual signals with higher p-values and re-fitting the model did not improve the Adjusted R-Squared.
Conclusions
While our explanatory variables only represent seven out of many available survey datasets from the COVIDcast project, we can reasonably conclude that they hold predictive value in estimating county-level vaccination rates in the United States, inferring a relationship between surveyed “trust metrics” and overall vaccine hesitancy in the population.
Of course the larger behavioral picture is much more complicated than a simple model can capture. In actual practice we would likely include many more survey signals, and work to identify and eliminate any sources of bias (for example, selection bias of social media-based surveys.) Any hypotheses formed from future analysis would need to be tested under experimental conditions - but, the model is already useful in proving that relationships exist between these variables, suggesting that they could be used to target and craft outreach.
References
“COVID Data Tracker | COVID-19 Vaccination Equity” Centers for Disease Control and Prevention, Accessed November 10, 2021. https://covid.cdc.gov/covid-data-tracker.
“County and State FIPS Codes” Github, Accessed November 10, 2021. https://github.com/kjhealy/fips-codes.
“COVID-19 Trends and Impact Survey” CMU Delphi Research Group, Accessed Jan 15, 2021 https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html
Appendix
Model Diagnostics
Multivariate regression models generally depend on meeting the following four conditions: normally distributed residuals, near-constant variability of residuals, independence of residuals, and linear relationships of the explanatory variables to the fitted values.
Normally-distributed residuals:
By examining a histogram and Q-Q plot of the residuals, we can see that the distribution is nearly normal, but (as noted) there are thick tails and several distinct outliers present. The condition is largely met, keeping in mind that two of the most extreme outliers, separately identified as data errors, have already been removed.
ggplot(lm_vax_2, aes(x=lm_vax_2$residuals)) +
geom_histogram(bins=100, alpha=0.5) +
labs(title="Residuals Distribution",
x = 'residuals')
ggplot(lm_vax_2, aes(sample=lm_vax_2$residuals)) +
geom_qq(alpha=0.5) +
geom_qq_line(colour='blue') +
labs(title="Residuals Q-Q")Constant variability of residuals:
The Residuals x Fitted Plots show a random distribution of residuals around the fitted line, demonstrating the constant variability of the residuals.
lm_vax_2 %>%
ggplot(aes(y=lm_vax_2$residuals, x=lm_vax_2$fitted.values)) +
geom_point(alpha=0.2) +
geom_hline(yintercept=0, colour='blue') +
labs(title="Residuals x Fitted",
y = 'residuals',
x = 'fitted vax_rate')
lm_vax_2 %>%
ggplot(aes(y=abs(lm_vax_2$residuals), x=lm_vax_2$fitted.values)) +
geom_point(alpha=0.2) +
geom_hline(yintercept=0, colour='blue') +
labs(title="Abs Residuals x Fitted",
y = 'abs(residuals)',
x = 'fitted vax_rate')Independence of observations
Independence of the observations is important to establish when they are collected over time. I will assume the condition is met.
vax %>% ggplot(aes(x=name,y=vax_rate)) +
geom_point() +
theme(legend.position = 'none')df_surveys %>% ggplot(aes(x=date,y=value)) +
geom_point() +
facet_grid(~ signal) +
theme(legend.position = 'none')Linear relationship of predictors to residuals
By plotting the residuals against each of the explanatory variables, we can spot changes in variability and understand any individual linear relationships. The histogram of residuals and party indicate very little difference in variability between the two factors. Similarly, the scatterplots of the numeric variables demonstrate a fairly constant state of variability with fairly strong linear relationships. The condition is met.
for (s in signals){
i <- paste0('lm_vax_2$model$',s)
print(
ggplot(lm_vax_2, aes_string(x=i, y='lm_vax_2$residuals')) +
geom_point(alpha=0.2) +
geom_smooth(method='lm') +
labs(x = s, y = 'residuals')
)
}