Unearthing the Tidyverse: Exploring the Relationship between Fear of Earthquakes & Income

Zachary Herold

December 1, 2018


This is a study to determine if there is greater disaster awareness and fear aversity as income levels rise. In the process, we leverage tidyverse’s mapvalues() function in plyr.

[https://fivethirtyeight.com/features/the-rock-isnt-alone-lots-of-people-are-worried-about-the-big-one/] (Fear of the Big One)

We imported the earthquake_data.csv from fivethirtyeight.com into a dataframe in R. The variables of the data are listed here:

##  [1] "In.general..how.worried.are.you.about.earthquakes."                                                                                                  
##  [2] "How.worried.are.you.about.the.Big.One..a.massive..catastrophic.earthquake."                                                                          
##  [3] "Do.you.think.the..Big.One..will.occur.in.your.lifetime."                                                                                             
##  [4] "Have.you.ever.experienced.an.earthquake."                                                                                                            
##  [5] "Have.you.or.anyone.in.your.household.taken.any.precautions.for.an.earthquake..packed.an.earthquake.survival.kit..prepared.an.evacuation.plan..etc..."
##  [6] "How.familiar.are.you.with.the.San.Andreas.Fault.line."                                                                                               
##  [7] "How.familiar.are.you.with.the.Yellowstone.Supervolcano."                                                                                             
##  [8] "Age"                                                                                                                                                 
##  [9] "What.is.your.gender."                                                                                                                                
## [10] "How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year."                                                                     
## [11] "US.Region"

For this study, we will focus on variables 1, 2, 6, 7, 10, and 11, and first simplified the column names.

worry_gen worry_big aware_andreas income region
Not at all worried Not so worried Somewhat familiar Prefer not to answer New England
Somewhat worried Very worried Not at all familiar $75,000 to $99,999 East North Central
Not so worried Somewhat worried Very familiar $10,000 to $24,999 Pacific
Not so worried Not so worried Very familiar $25,000 to $49,999 West South Central
Not so worried Not so worried Somewhat familiar $200,000 and up Middle Atlantic
Not at all worried Not at all worried Very familiar $25,000 to $49,999 East North Central
Very worried Not at all worried
Not at all worried Not at all worried Not so familiar Prefer not to answer South Atlantic
Not at all worried Not at all worried Very familiar Prefer not to answer East North Central
Not at all worried Not at all worried Not at all familiar $10,000 to $24,999 East North Central

Here we can see the breakdown of survey respondees by region:

Var1 Freq
35
East North Central 140
East South Central 40
Middle Atlantic 137
Mountain 67
New England 63
Pacific 206
South Atlantic 155
West North Central 71
West South Central 99

Here we can see the breakdown of survey respondees by income:

Var1 Freq
12
$0 to $9,999 67
$10,000 to $24,999 89
$100,000 to $124,999 75
$125,000 to $149,999 45
$150,000 to $174,999 33
$175,000 to $199,999 12
$200,000 and up 50
$25,000 to $49,999 162
$50,000 to $74,999 175
$75,000 to $99,999 136
Prefer not to answer 157

Using plyr’s mapvalues() function

To make the dataframe more compact, we replace the values of nominal variables with strictly ordinal ones according to level. The income level classifiers correspond to an upper bound and the survey results range from 1 (least worry & familiarity) to 5 (most extreme worry and familiarity).

We further clean the data by removing rows with null values or indefinite responses.

Next, we re-order the levels to reflect their ordinal sequence.

worry_gen worry_big aware_andreas income region
2 3 4 1 100K East North Central
3 2 3 4 25K Pacific
4 2 2 4 50K West South Central
5 2 2 3 top Middle Atlantic
6 1 1 4 50K East North Central
10 1 1 1 25K East North Central
11 1 2 3 75K West North Central
12 1 1 5 50K East South Central
14 3 4 3 10K Pacific
15 2 2 4 125K New England

After ordering the qualitative factors and income levels, we create tables to be applied to the mosiacplot() function in the graphics package.

The blue and purple boxes of the mosaic plot reveal strongly positive standardized residuals, which indicates an observed value far exceeding the expected value if variables were independent.

Our findings are:

  • People with less than $25K in income have disproportionately high general worry about , including “the Big One” (perhaps due to housing status);
  • People with less than $25K in income have disproportionately low awareness of the Andreas Fault (perhaps due to lower education level);
  • People with greater than $200K in income have disproportionately high awareness of the Andreas Fault (perhaps due to education level and real estate investment considerations);

Finally, we convert the tables into tidied dataframes, using the data.frame() function.

income gen_fear_rating gen_fear_freq
10K 1 22
25K 1 25
50K 1 50
75K 1 65
100K 1 50
125K 1 34
150K 1 19
175K 1 16
200K 1 7
top 1 14
income big_fear_rating big_fear_freq
10K 1 20
25K 1 27
50K 1 52
75K 1 54
100K 1 50
125K 1 29
150K 1 15
175K 1 14
200K 1 6
top 1 17
income andreas_aware_rating andreas_aware_freq
41 10K 5 6
42 25K 5 8
43 50K 5 22
44 75K 5 27
45 100K 5 19
46 125K 5 10
47 150K 5 7
48 175K 5 6
49 200K 5 0
50 top 5 14