Unearthing the Tidyverse: Exploring the Relationship between Fear of Earthquakes & Income

Zachary Herold

December 1, 2018

This is a study to determine if there is greater disaster awareness and fear aversity as income levels rise. In the process, we leverage tidyverse’s mapvalues() function in plyr.

[https://fivethirtyeight.com/features/the-rock-isnt-alone-lots-of-people-are-worried-about-the-big-one/] (Fear of the Big One)

We imported the earthquake_data.csv from fivethirtyeight.com into a dataframe in R. The variables of the data are listed here:

quake <- data.frame(read.table("https://raw.githubusercontent.com/fivethirtyeight/data/master/san-andreas/earthquake_data.csv", header = TRUE, sep = ","))
names(quake)

##  [1] "In.general..how.worried.are.you.about.earthquakes."                                                                                                  
##  [2] "How.worried.are.you.about.the.Big.One..a.massive..catastrophic.earthquake."                                                                          
##  [3] "Do.you.think.the..Big.One..will.occur.in.your.lifetime."                                                                                             
##  [4] "Have.you.ever.experienced.an.earthquake."                                                                                                            
##  [5] "Have.you.or.anyone.in.your.household.taken.any.precautions.for.an.earthquake..packed.an.earthquake.survival.kit..prepared.an.evacuation.plan..etc..."
##  [6] "How.familiar.are.you.with.the.San.Andreas.Fault.line."                                                                                               
##  [7] "How.familiar.are.you.with.the.Yellowstone.Supervolcano."                                                                                             
##  [8] "Age"                                                                                                                                                 
##  [9] "What.is.your.gender."                                                                                                                                
## [10] "How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year."                                                                     
## [11] "US.Region"

For this study, we will focus on variables 1, 2, 6, 7, 10, and 11, and first simplified the column names.

quake_df <- quake[c(1,2,6,10,11)]
names(quake_df) <- c("worry_gen","worry_big","aware_andreas","income","region")
knitr::kable(head(quake_df,10))

worry_gen	worry_big	aware_andreas	income	region
Not at all worried	Not so worried	Somewhat familiar	Prefer not to answer	New England
Somewhat worried	Very worried	Not at all familiar	$75,000 to $99,999	East North Central
Not so worried	Somewhat worried	Very familiar	$10,000 to $24,999	Pacific
Not so worried	Not so worried	Very familiar	$25,000 to $49,999	West South Central
Not so worried	Not so worried	Somewhat familiar	$200,000 and up	Middle Atlantic
Not at all worried	Not at all worried	Very familiar	$25,000 to $49,999	East North Central
Very worried	Not at all worried
Not at all worried	Not at all worried	Not so familiar	Prefer not to answer	South Atlantic
Not at all worried	Not at all worried	Very familiar	Prefer not to answer	East North Central
Not at all worried	Not at all worried	Not at all familiar	$10,000 to $24,999	East North Central

Here we can see the breakdown of survey respondees by region:

knitr::kable(table(quake_df$region), width = 40)

Var1	Freq
	35
East North Central	140
East South Central	40
Middle Atlantic	137
Mountain	67
New England	63
Pacific	206
South Atlantic	155
West North Central	71
West South Central	99

Here we can see the breakdown of survey respondees by income:

knitr::kable(table(quake_df$income), width = 40)

Var1	Freq
	12
$0 to $9,999	67
$10,000 to $24,999	89
$100,000 to $124,999	75
$125,000 to $149,999	45
$150,000 to $174,999	33
$175,000 to $199,999	12
$200,000 and up	50
$25,000 to $49,999	162
$50,000 to $74,999	175
$75,000 to $99,999	136
Prefer not to answer	157

Using plyr’s mapvalues() function

To make the dataframe more compact, we replace the values of nominal variables with strictly ordinal ones according to level. The income level classifiers correspond to an upper bound and the survey results range from 1 (least worry & familiarity) to 5 (most extreme worry and familiarity).

lvl_worry_gen <- levels(quake_df$worry_gen)
lvl_worry_big <- levels(quake_df$worry_big)
lvl_andreas <- levels(quake_df$aware_andreas)
lvl_income <- levels(quake_df$income)

quake_df$worry_gen <- mapvalues(quake_df$worry_gen, from = lvl_worry_gen, to = c(5,1,2,3,4))
quake_df$worry_big <- mapvalues(quake_df$worry_big, from = lvl_worry_big, to = c(5,1,2,3,4))
quake_df$aware_andreas <- mapvalues(quake_df$aware_andreas, from = lvl_andreas, to = c(NA, 5,1,2,3,4))
quake_df$income <- mapvalues(quake_df$income, from = lvl_income, to = c(NA, "10K","25K","125K","150K","175K","200K","top","50K","75K","100K", NA))

We further clean the data by removing rows with null values or indefinite responses.

quake_df <- quake_df[complete.cases(quake_df),]

Next, we re-order the levels to reflect their ordinal sequence.

income.groups <- list("10K","25K","50K","75K","100K","125K","150K","175K","200K","top")
rating <- list("1","2","3","4","5")

quake_df$income <- factor(quake_df$income, levels = income.groups)
quake_df$worry_gen <- factor(quake_df$worry_gen, levels = rating)
quake_df$worry_big <- factor(quake_df$worry_big, levels = rating)
quake_df$aware_andreas <- factor(quake_df$aware_andreas, levels = rating)

knitr::kable(head(quake_df,10))

	worry_gen	worry_big	aware_andreas	income	region
2	3	4	1	100K	East North Central
3	2	3	4	25K	Pacific
4	2	2	4	50K	West South Central
5	2	2	3	top	Middle Atlantic
6	1	1	4	50K	East North Central
10	1	1	1	25K	East North Central
11	1	2	3	75K	West North Central
12	1	1	5	50K	East South Central
14	3	4	3	10K	Pacific
15	2	2	4	125K	New England

After ordering the qualitative factors and income levels, we create tables to be applied to the mosiacplot() function in the graphics package.

worry_gen_observed <- table(quake_df$income, quake_df$worry_gen, dnn=c("income","gen_fear_rating"))
worry_big_observed <- table(quake_df$income, quake_df$worry_big, dnn=c("income","big_fear_rating"))
aware_andreas_observed <- table(quake_df$income, quake_df$aware_andreas, dnn=c("income","andreas_aware_rating"))

mosaicplot(worry_gen_observed, shade = TRUE, las=2,
           main = "Fear of Earthquakes")

mosaicplot(worry_big_observed, shade = TRUE, las=2,
           main = "Fear of the Big One")

mosaicplot(aware_andreas_observed, shade = TRUE, las=2,
           main = "Awareness of the San Andreas Fault")

The blue and purple boxes of the mosaic plot reveal strongly positive standardized residuals, which indicates an observed value far exceeding the expected value if variables were independent.

Our findings are:

People with less than $25K in income have disproportionately high general worry about , including “the Big One” (perhaps due to housing status);
People with less than $25K in income have disproportionately low awareness of the Andreas Fault (perhaps due to lower education level);
People with greater than $200K in income have disproportionately high awareness of the Andreas Fault (perhaps due to education level and real estate investment considerations);

Finally, we convert the tables into tidied dataframes, using the data.frame() function.

tidy1 <- data.frame(worry_gen_observed) 
tidy1 <-`names<-`(tidy1, c("income","gen_fear_rating","gen_fear_freq"))

knitr::kable(head(tidy1,10))

income	gen_fear_rating	gen_fear_freq
10K	1	22
25K	1	25
50K	1	50
75K	1	65
100K	1	50
125K	1	34
150K	1	19
175K	1	16
200K	1	7
top	1	14

tidy2 <- data.frame(worry_big_observed)
tidy2 <-`names<-`(tidy2, c("income","big_fear_rating","big_fear_freq"))

knitr::kable(head(tidy2,10))

income	big_fear_rating	big_fear_freq
10K	1	20
25K	1	27
50K	1	52
75K	1	54
100K	1	50
125K	1	29
150K	1	15
175K	1	14
200K	1	6
top	1	17

tidy3 <- data.frame(aware_andreas_observed)
tidy3 <-`names<-`(tidy3, c("income","andreas_aware_rating","andreas_aware_freq"))

knitr::kable(tail(tidy3,10))

	income	andreas_aware_rating	andreas_aware_freq
41	10K	5	6
42	25K	5	8
43	50K	5	22
44	75K	5	27
45	100K	5	19
46	125K	5	10
47	150K	5	7
48	175K	5	6
49	200K	5	0
50	top	5	14