Load your chosen dataset into Rmarkdown
Select the dependent variable you are interested in, along with independent variables which you believe are causing the dependent variable
create a linear model using the “lm()” command, save it to some object
call a “summary()” on your new model
interpret the model’s r-squared and p-values. How much of the dependent variable does the overall model explain? What are the significant variables? What are the insignificant variables?
Choose some significant independent variables. Interpret its Estimates (or Beta Coefficients). How do the independent variables individually affect the dependent variable?
Does the model you create meet or violate the assumption of linearity? Show your work with “plot(x,which=1)”
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(pastecs)
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## The following object is masked from 'package:tidyr':
##
## extract
library(readxl)
fema_data <- read_excel("fema_data.xlsx", sheet = "Core Survey", skip = 1)
## Warning: Expecting logical in LS1120 / R1120C331: got 'there is no third
## gender'
## Warning: Expecting logical in MO1246 / R1246C353: got 'normal'
## Warning: Expecting logical in MO1414 / R1414C353: got 'Fluid'
## Warning: Expecting logical in LS1963 / R1963C331: got 'Transgender'
## Warning: Expecting logical in MO2184 / R2184C353: got 'Genderfluid'
## Warning: Expecting logical in LS2311 / R2311C331: got 'Trans'
## Warning: Expecting logical in MO2699 / R2699C353: got 'Transgender'
## Warning: Expecting logical in MO2830 / R2830C353: got 'Nothing in particular'
## Warning: Expecting logical in MO3048 / R3048C353: got 'Good'
## Warning: Expecting logical in MO3304 / R3304C353: got 'Transgender'
## Warning: Expecting logical in LS4642 / R4642C331: got 'Transgender Female'
## Warning: Expecting logical in MO5355 / R5355C353: got 'Femal'
## Warning: Expecting logical in LS5609 / R5609C331: got 'There are two sexes,
## male and female. I'm a male.'
## Warning: Expecting logical in MO6038 / R6038C353: got 'Transgender'
## Warning: Expecting logical in LS6358 / R6358C331: got 'Transgender Male (FTM)'
## Warning: Expecting logical in MO7225 / R7225C353: got 'solo ager'
#This command instructs R which sheet to use and to start on row 2 for variable names. Row one is the long form question that was asked in the survey.
#level of self perceived preparedness
table(fema_data$dis_soc)
##
## Don't know
## 387
## I am NOT prepared, and I do not intend to prepare in the next year
## 736
## I am NOT prepared, but I intend to get prepared in the next six months
## 1505
## I am NOT prepared, but I intend to start preparing in the next year
## 1252
## I have been prepared for LESS than a year
## 1357
## I have been prepared for MORE than a year and I continue preparing
## 2367
#is there a perceived threat from disaster
table(fema_data$dis_iperception)
##
## No Unknown Yes
## 1457 619 5528
#individual identifies as having a disability
table(fema_data$disability)
##
## Disability No Disability
## 1855 5749
#Viewing levels of preparedness as factors will allow me to understand how the different responses are ordered from least to most prepared.
#I am also removing "Don't know" responses for preparedness, since those become NA when the ordered factor is created.
fema_data$dis_soc <- factor(fema_data$dis_soc,levels = c(
"I am NOT prepared, and I do not intend to prepare in the next year",
"I am NOT prepared, but I intend to start preparing in the next year",
"I am NOT prepared, but I intend to get prepared in the next six months",
"I have been prepared for LESS than a year",
"I have been prepared for MORE than a year and I continue preparing"))
#I am not interested in data from individuals who answered "unknown" to whether they perceive a threat from disasters.
fema_compare <- subset(fema_data,dis_iperception %in% c("Yes", "No") & !is.na(dis_soc))
#Converting the preparedness levels to numerics will allow me to run histograms and plots.
fema_compare$prep_numeric <- as.numeric(fema_compare$dis_soc)
#Convert disability status to numeric
fema_compare$disability_numeric <- ifelse(fema_compare$disability == "Disability", 1, 0)
#Convert perceived threat to numeric
fema_compare$threat_numeric <- ifelse(fema_compare$dis_iperception == "Yes", 1, 0)
#turn disaster experience into a factor and remove "Don't know" responses
fema_compare$dis_exp <- factor(fema_compare$dis_exp,levels = c("No", "Yes"))
#turn care required or given into a factor and remove "Don't know" responses
fema_compare$care <- factor(fema_compare$care,levels = c("No", "Yes"))
disability_perception_model<-lm(prep_numeric~dis_perception+disability_numeric,data = fema_compare)
summary(disability_perception_model)
##
## Call:
## lm(formula = prep_numeric ~ dis_perception + disability_numeric,
## data = fema_compare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8618 -0.9911 0.1382 1.1382 2.0931
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.54473 0.02407 147.296 <2e-16 ***
## dis_perceptionUnlikely -0.55364 0.04224 -13.106 <2e-16 ***
## dis_perceptionVery likely 0.31707 0.03762 8.429 <2e-16 ***
## disability_numeric -0.08423 0.03776 -2.231 0.0257 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.319 on 6725 degrees of freedom
## Multiple R-squared: 0.04938, Adjusted R-squared: 0.04895
## F-statistic: 116.4 on 3 and 6725 DF, p-value: < 2.2e-16
#5) The R-squared value for this model is 0.049, meaning the model explains about 4.9% of the variation in preparedness levels. This is relatively low, which suggests that while the variables in the model do have an effect, most of the variation in preparedness is explained by other factors not included in the model.
#The overall model is statistically significant, as indicated by the p-value (< 2.2e-16). This means that at least one of the independent variables has a meaningful relationship with preparedness.
#Looking at individual variables, both levels of disaster perception (“Unlikely” and “Very likely”) are highly statistically significant (p < 2e-16). The disability variable is also statistically significant (p = 0.0257), although its effect is much smaller. There are no clearly insignificant variables in this model, but disability is only weakly significant compared to perception.
#6)
#Does prior disaster experience affect the level of disaster preparedness?
disability_experience_model<-lm(prep_numeric ~ dis_exp + disability_numeric, data = fema_compare)
summary(disability_experience_model)
##
## Call:
## lm(formula = prep_numeric ~ dis_exp + disability_numeric, data = fema_compare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8082 -1.0504 0.1918 1.1918 1.9496
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.15812 0.02614 120.797 < 2e-16 ***
## dis_expYes 0.65009 0.03301 19.696 < 2e-16 ***
## disability_numeric -0.10771 0.03792 -2.841 0.00452 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.314 on 6610 degrees of freedom
## (116 observations deleted due to missingness)
## Multiple R-squared: 0.05551, Adjusted R-squared: 0.05522
## F-statistic: 194.2 on 2 and 6610 DF, p-value: < 2.2e-16
#Does requiring or providing care to someone needing assistance affect the level of disaster preparedness?
disability_care_model<-lm(prep_numeric ~ care + disability_numeric, data = fema_compare)
summary(disability_care_model)
##
## Call:
## lm(formula = prep_numeric ~ care + disability_numeric, data = fema_compare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.768 -1.364 0.342 1.526 1.636
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.47438 0.01979 175.591 < 2e-16 ***
## careYes 0.29380 0.04342 6.766 1.43e-11 ***
## disability_numeric -0.11020 0.04068 -2.709 0.00676 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.349 on 6662 degrees of freedom
## (64 observations deleted due to missingness)
## Multiple R-squared: 0.006883, Adjusted R-squared: 0.006585
## F-statistic: 23.09 on 2 and 6662 DF, p-value: 1.019e-10
#All together now
disability_exp_perc_model<-lm(prep_numeric ~ dis_perception + dis_exp + disability_numeric + care, data = fema_compare)
summary(disability_exp_perc_model)
##
## Call:
## lm(formula = prep_numeric ~ dis_perception + dis_exp + disability_numeric +
## care, data = fema_compare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1143 -0.9830 0.1645 1.1081 2.3389
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.25561 0.03102 104.964 < 2e-16 ***
## dis_perceptionUnlikely -0.42009 0.04305 -9.758 < 2e-16 ***
## dis_perceptionVery likely 0.22233 0.03784 5.875 4.43e-09 ***
## dis_expYes 0.50506 0.03436 14.699 < 2e-16 ***
## disability_numeric -0.17443 0.03949 -4.417 1.01e-05 ***
## careYes 0.13124 0.04281 3.066 0.00218 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.295 on 6551 degrees of freedom
## (172 observations deleted due to missingness)
## Multiple R-squared: 0.08259, Adjusted R-squared: 0.08189
## F-statistic: 118 on 5 and 6551 DF, p-value: < 2.2e-16
#In this model, the reference groups are individuals who do not perceive a disaster as likely (intercept), have no prior disaster experience, do not have a disability, and do not provide or require care.
#Perceived disaster risk has a strong effect on preparedness. Individuals who believe a disaster is unlikely have preparedness levels about 0.42 points lower, while those who believe a disaster is very likely have preparedness levels about 0.22 points higher, compared to the reference group. Both effects are statistically significant, indicating that perception of risk plays an important role in preparedness.
#Prior disaster experience also has a substantial positive effect. Individuals who have experienced a disaster have preparedness levels about 0.51 points higher than those who have not. This is the largest effect in the model and is highly statistically significant, suggesting that lived experience strongly influences preparedness behavior.
#The disability variable has a coefficient of approximately -0.17, meaning individuals with a disability are slightly less prepared on average. This effect is statistically significant, but smaller in magnitude than both perception and prior experience.
#The care variable shows that individuals who provide or require care have preparedness levels about 0.13 points higher than those who do not. While statistically significant, this effect is relatively modest.
#7)
plot(disability_exp_perc_model)
#The residuals vs fitted plot shows that the points are randomly scattered with no clear pattern. This suggests that the assumption of linearity is met and that a linear model is appropriate for this data. While the points are somewhat spread out, there is no visible curve or trend in the data.