[https://fivethirtyeight.com/features/the-rock-isnt-alone-lots-of-people-are-worried-about-the-big-one/] (Fear of the Big One)
We imported the earthquake_data.csv from fivethirtyeight.com into a dataframe in R. The variables of the data are listed here:
quake <- data.frame(read.table("https://raw.githubusercontent.com/fivethirtyeight/data/master/san-andreas/earthquake_data.csv", header = TRUE, sep = ","))
names(quake)
## [1] "In.general..how.worried.are.you.about.earthquakes."
## [2] "How.worried.are.you.about.the.Big.One..a.massive..catastrophic.earthquake."
## [3] "Do.you.think.the..Big.One..will.occur.in.your.lifetime."
## [4] "Have.you.ever.experienced.an.earthquake."
## [5] "Have.you.or.anyone.in.your.household.taken.any.precautions.for.an.earthquake..packed.an.earthquake.survival.kit..prepared.an.evacuation.plan..etc..."
## [6] "How.familiar.are.you.with.the.San.Andreas.Fault.line."
## [7] "How.familiar.are.you.with.the.Yellowstone.Supervolcano."
## [8] "Age"
## [9] "What.is.your.gender."
## [10] "How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year."
## [11] "US.Region"
For this study, we will focus on variables 1, 2, 6, 7, 10, and 11, and first simplified the column names.
quake_df <- quake[c(1,2,6,10,11)]
names(quake_df) <- c("worry_gen","worry_big","aware_andreas","income","region")
knitr::kable(head(quake_df,10))
worry_gen | worry_big | aware_andreas | income | region |
---|---|---|---|---|
Not at all worried | Not so worried | Somewhat familiar | Prefer not to answer | New England |
Somewhat worried | Very worried | Not at all familiar | $75,000 to $99,999 | East North Central |
Not so worried | Somewhat worried | Very familiar | $10,000 to $24,999 | Pacific |
Not so worried | Not so worried | Very familiar | $25,000 to $49,999 | West South Central |
Not so worried | Not so worried | Somewhat familiar | $200,000 and up | Middle Atlantic |
Not at all worried | Not at all worried | Very familiar | $25,000 to $49,999 | East North Central |
Very worried | Not at all worried | |||
Not at all worried | Not at all worried | Not so familiar | Prefer not to answer | South Atlantic |
Not at all worried | Not at all worried | Very familiar | Prefer not to answer | East North Central |
Not at all worried | Not at all worried | Not at all familiar | $10,000 to $24,999 | East North Central |
Here we can see the breakdown of survey respondees by region:
Var1 | Freq |
---|---|
35 | |
East North Central | 140 |
East South Central | 40 |
Middle Atlantic | 137 |
Mountain | 67 |
New England | 63 |
Pacific | 206 |
South Atlantic | 155 |
West North Central | 71 |
West South Central | 99 |
Here we can see the breakdown of survey respondees by income:
Var1 | Freq |
---|---|
12 | |
$0 to $9,999 | 67 |
$10,000 to $24,999 | 89 |
$100,000 to $124,999 | 75 |
$125,000 to $149,999 | 45 |
$150,000 to $174,999 | 33 |
$175,000 to $199,999 | 12 |
$200,000 and up | 50 |
$25,000 to $49,999 | 162 |
$50,000 to $74,999 | 175 |
$75,000 to $99,999 | 136 |
Prefer not to answer | 157 |
To make the dataframe more compact, we replace the values of nominal variables with strictly ordinal ones according to level. The income level classifiers correspond to an upper bound and the survey results range from 1 (least worry & familiarity) to 5 (most extreme worry and familiarity).
lvl_worry_gen <- levels(quake_df$worry_gen)
lvl_worry_big <- levels(quake_df$worry_big)
lvl_andreas <- levels(quake_df$aware_andreas)
lvl_income <- levels(quake_df$income)
quake_df$worry_gen <- mapvalues(quake_df$worry_gen, from = lvl_worry_gen, to = c(5,1,2,3,4))
quake_df$worry_big <- mapvalues(quake_df$worry_big, from = lvl_worry_big, to = c(5,1,2,3,4))
quake_df$aware_andreas <- mapvalues(quake_df$aware_andreas, from = lvl_andreas, to = c(NA, 5,1,2,3,4))
quake_df$income <- mapvalues(quake_df$income, from = lvl_income, to = c(NA, "10K","25K","125K","150K","175K","200K","top","50K","75K","100K", NA))
We further clean the data by removing rows with null values or indefinite responses.
Next, we re-order the levels to reflect their ordinal sequence.
income.groups <- list("10K","25K","50K","75K","100K","125K","150K","175K","200K","top")
rating <- list("1","2","3","4","5")
quake_df$income <- factor(quake_df$income, levels = income.groups)
quake_df$worry_gen <- factor(quake_df$worry_gen, levels = rating)
quake_df$worry_big <- factor(quake_df$worry_big, levels = rating)
quake_df$aware_andreas <- factor(quake_df$aware_andreas, levels = rating)
knitr::kable(head(quake_df,10))
worry_gen | worry_big | aware_andreas | income | region | |
---|---|---|---|---|---|
2 | 3 | 4 | 1 | 100K | East North Central |
3 | 2 | 3 | 4 | 25K | Pacific |
4 | 2 | 2 | 4 | 50K | West South Central |
5 | 2 | 2 | 3 | top | Middle Atlantic |
6 | 1 | 1 | 4 | 50K | East North Central |
10 | 1 | 1 | 1 | 25K | East North Central |
11 | 1 | 2 | 3 | 75K | West North Central |
12 | 1 | 1 | 5 | 50K | East South Central |
14 | 3 | 4 | 3 | 10K | Pacific |
15 | 2 | 2 | 4 | 125K | New England |
After ordering the qualitative factors and income levels, we create tables to be applied to the mosiacplot() function in the graphics package.
worry_gen_observed <- table(quake_df$income, quake_df$worry_gen, dnn=c("income","gen_fear_rating"))
worry_big_observed <- table(quake_df$income, quake_df$worry_big, dnn=c("income","big_fear_rating"))
aware_andreas_observed <- table(quake_df$income, quake_df$aware_andreas, dnn=c("income","andreas_aware_rating"))
mosaicplot(worry_gen_observed, shade = TRUE, las=2,
main = "Fear of Earthquakes")
mosaicplot(aware_andreas_observed, shade = TRUE, las=2,
main = "Awareness of the San Andreas Fault")
The blue and purple boxes of the mosaic plot reveal strongly positive standardized residuals, which indicates an observed value far exceeding the expected value if variables were independent.
Our findings are:
Finally, we convert the tables into tidied dataframes, using the data.frame() function.
tidy1 <- data.frame(worry_gen_observed)
tidy1 <-`names<-`(tidy1, c("income","gen_fear_rating","gen_fear_freq"))
knitr::kable(head(tidy1,10))
income | gen_fear_rating | gen_fear_freq |
---|---|---|
10K | 1 | 22 |
25K | 1 | 25 |
50K | 1 | 50 |
75K | 1 | 65 |
100K | 1 | 50 |
125K | 1 | 34 |
150K | 1 | 19 |
175K | 1 | 16 |
200K | 1 | 7 |
top | 1 | 14 |
tidy2 <- data.frame(worry_big_observed)
tidy2 <-`names<-`(tidy2, c("income","big_fear_rating","big_fear_freq"))
knitr::kable(head(tidy2,10))
income | big_fear_rating | big_fear_freq |
---|---|---|
10K | 1 | 20 |
25K | 1 | 27 |
50K | 1 | 52 |
75K | 1 | 54 |
100K | 1 | 50 |
125K | 1 | 29 |
150K | 1 | 15 |
175K | 1 | 14 |
200K | 1 | 6 |
top | 1 | 17 |
tidy3 <- data.frame(aware_andreas_observed)
tidy3 <-`names<-`(tidy3, c("income","andreas_aware_rating","andreas_aware_freq"))
knitr::kable(tail(tidy3,10))
income | andreas_aware_rating | andreas_aware_freq | |
---|---|---|---|
41 | 10K | 5 | 6 |
42 | 25K | 5 | 8 |
43 | 50K | 5 | 22 |
44 | 75K | 5 | 27 |
45 | 100K | 5 | 19 |
46 | 125K | 5 | 10 |
47 | 150K | 5 | 7 |
48 | 175K | 5 | 6 |
49 | 200K | 5 | 0 |
50 | top | 5 | 14 |