a. Find and get a dataset from the datasets available within R. Perform exploratory data analysis (EDA) and prepare a codebook on that dataset using a newer method in R

Codebook
x <- enframe(get_label(quakes))

colnames(x) <- c("variable", "details")

x$details[1] = "Latitude of event"
x$details[2] = "Longitude"
x$details[3] = "Depth (km)"
x$details[4] = "Richter Magnitude"
x$details[5] = "Number of stations reporting"

x
## # A tibble: 5 x 2
##   variable details                     
##   <chr>    <chr>                       
## 1 lat      Latitude of event           
## 2 long     Longitude                   
## 3 depth    Depth (km)                  
## 4 mag      Richter Magnitude           
## 5 stations Number of stations reporting
new_codebook_rmd()
codebook(quakes)
## No missing values.

Metadata

Description

Dataset name: quakes

The dataset has N=1000 rows and 5 columns. 1000 rows have no missing values on any column.

Metadata for search engines
  • Date published: 2021-12-31
x
lat
long
depth
mag
stations

#Variables

lat

Distribution

Distribution of values for lat

Distribution of values for lat

0 missing values.

Summary statistics

name data_type n_missing complete_rate min median max mean sd hist label
lat numeric 0 1 -39 -20 -11 -20.64275 5.028791 <U+2581><U+2581><U+2585><U+2587><U+2583> NA

long

Distribution

Distribution of values for long

Distribution of values for long

0 missing values.

Summary statistics

name data_type n_missing complete_rate min median max mean sd hist label
long numeric 0 1 166 181 188 179.462 6.069497 <U+2582><U+2581><U+2581><U+2587><U+2583> NA

depth

Distribution

Distribution of values for depth

Distribution of values for depth

0 missing values.

Summary statistics

name data_type n_missing complete_rate min median max mean sd hist label
depth numeric 0 1 40 247 680 311.371 215.5355 <U+2587><U+2583><U+2582><U+2583><U+2585> NA

mag

Distribution

Distribution of values for mag

Distribution of values for mag

0 missing values.

Summary statistics

name data_type n_missing complete_rate min median max mean sd hist label
mag numeric 0 1 4 4.6 6.4 4.6204 0.402773 <U+2587><U+2587><U+2583><U+2581><U+2581> NA

stations

Distribution

Distribution of values for stations

Distribution of values for stations

0 missing values.

Summary statistics

name data_type n_missing complete_rate min median max mean sd hist label
stations numeric 0 1 10 27 132 33.418 21.90039 <U+2587><U+2582><U+2581><U+2581><U+2581> NA

Missingness report

Codebook table

name data_type n_missing complete_rate min median max mean sd hist label
lat numeric 0 1 -39 -20.3 -10.7 -20.64275 5.028791 <U+2581><U+2581><U+2585><U+2587><U+2583> NA
long numeric 0 1 166 181.4 188.1 179.46202 6.069497 <U+2582><U+2581><U+2581><U+2587><U+2583> NA
depth numeric 0 1 40 247.0 680.0 311.37100 215.535498 <U+2587><U+2583><U+2582><U+2583><U+2585> NA
mag numeric 0 1 4 4.6 6.4 4.62040 0.402773 <U+2587><U+2587><U+2583><U+2581><U+2581> NA
stations numeric 0 1 10 27.0 132.0 33.41800 21.900386 <U+2587><U+2582><U+2581><U+2581><U+2581> NA
JSON-LD metadata

The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "quakes",
  "datePublished": "2021-12-31",
  "description": "The dataset has N=1000 rows and 5 columns.\n1000 rows have no missing values on any column.\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name     |label | n_missing|\n|:--------|:-----|---------:|\n|lat      |NA    |         0|\n|long     |NA    |         0|\n|depth    |NA    |         0|\n|mag      |NA    |         0|\n|stations |NA    |         0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
  "keywords": ["lat", "long", "depth", "mag", "stations"],
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "lat",
      "@type": "propertyValue"
    },
    {
      "name": "long",
      "@type": "propertyValue"
    },
    {
      "name": "depth",
      "@type": "propertyValue"
    },
    {
      "name": "mag",
      "@type": "propertyValue"
    },
    {
      "name": "stations",
      "@type": "propertyValue"
    }
  ]
}`
Scatter plot of magnitude versus depth
ggplot(quakes, aes(x=depth, y=mag))+geom_bin2d()+labs(title="Magnitude of earthquake according to the depth", x="Depth(km)", y="Magnitude(Scalar Ritcher)")

Analysis : null / no relationship

Line graph of number of station versus depth
ggplot(quakes, aes(x=depth, y=stations))+geom_bar(stat="identity")+labs(title="Number of stations reporting about earthquake depending on depth", x="Depth(km)", y="Number of Stations")

Analysis : Low amount of stations are alerted between 300km-400km, Moderate amount of stations are alerted between 500km-700km, High amount of stations are alerted below 300km depth

Earthquakes plotted on map
ggplot(quakes, aes(long,lat))+geom_point(size = .25, show.legend = FALSE) + coord_quickmap() + labs(title="Earthquakes plotted on map", x="Longitude", y="Latitude")

Analysis : Earthquakes occured the most in at latitude between -25 to -5 and longitude between 180 to 185

Mean of Depth
mean(quakes$depth)
## [1] 311.371
Median of Depth
median(quakes$depth)
## [1] 247
Interquartile range of Depth
IQR(quakes$depth)
## [1] 444
Mean of Magnitude
mean(quakes$mag)
## [1] 4.6204
Median of Magnitude
median(quakes$mag)
## [1] 4.6
Interquartile range of Magnitude
IQR(quakes$mag)
## [1] 0.6

b. Demonstrate these FIVE (5) functions of dplyr for data manipulation:

i) filter()

Only show rows with mag greater than 5.5

quakes %>% filter(mag>5.5)
##       lat   long depth mag stations
## 1  -20.70 169.92   139 6.1       94
## 2  -13.64 165.96    50 6.0       83
## 3  -22.55 185.90    42 5.7       76
## 4  -23.34 184.50    56 5.7      106
## 5  -15.56 167.62   127 6.4      122
## 6  -26.00 182.12   205 5.6       98
## 7  -32.22 180.20   216 5.7       90
## 8  -22.13 180.38   577 5.7      104
## 9  -24.57 178.40   562 5.6       80
## 10 -15.33 186.75    48 5.7      123
## 11 -17.84 181.30   535 5.7      112
## 12 -22.91 183.95    64 5.9      118
## 13 -34.68 179.82    75 5.6       79
## 14 -19.89 174.46   546 5.7       99
## 15 -18.82 182.21   417 5.6      129
## 16 -37.03 177.52   153 5.6       87
## 17 -11.40 166.07    93 5.6       94
## 18 -15.93 167.91   183 5.6      109
## 19 -21.08 180.85   627 5.9      119
## 20 -21.14 174.21    40 5.7       78
## 21 -12.23 167.02   242 6.0      132
## 22 -17.85 181.44   589 5.6      115
## 23 -20.25 184.75   107 5.6      121
## 24 -21.59 170.56   165 6.0      119

ii) arrange()

Sort i) in ascending order of stations

quakes %>% filter(mag>5.5) %>% arrange(stations)
##       lat   long depth mag stations
## 1  -22.55 185.90    42 5.7       76
## 2  -21.14 174.21    40 5.7       78
## 3  -34.68 179.82    75 5.6       79
## 4  -24.57 178.40   562 5.6       80
## 5  -13.64 165.96    50 6.0       83
## 6  -37.03 177.52   153 5.6       87
## 7  -32.22 180.20   216 5.7       90
## 8  -20.70 169.92   139 6.1       94
## 9  -11.40 166.07    93 5.6       94
## 10 -26.00 182.12   205 5.6       98
## 11 -19.89 174.46   546 5.7       99
## 12 -22.13 180.38   577 5.7      104
## 13 -23.34 184.50    56 5.7      106
## 14 -15.93 167.91   183 5.6      109
## 15 -17.84 181.30   535 5.7      112
## 16 -17.85 181.44   589 5.6      115
## 17 -22.91 183.95    64 5.9      118
## 18 -21.08 180.85   627 5.9      119
## 19 -21.59 170.56   165 6.0      119
## 20 -20.25 184.75   107 5.6      121
## 21 -15.56 167.62   127 6.4      122
## 22 -15.33 186.75    48 5.7      123
## 23 -18.82 182.21   417 5.6      129
## 24 -12.23 167.02   242 6.0      132

Sort i) in descending order of stations

quakes %>% filter(mag>5.5) %>% arrange(desc(stations))
##       lat   long depth mag stations
## 1  -12.23 167.02   242 6.0      132
## 2  -18.82 182.21   417 5.6      129
## 3  -15.33 186.75    48 5.7      123
## 4  -15.56 167.62   127 6.4      122
## 5  -20.25 184.75   107 5.6      121
## 6  -21.08 180.85   627 5.9      119
## 7  -21.59 170.56   165 6.0      119
## 8  -22.91 183.95    64 5.9      118
## 9  -17.85 181.44   589 5.6      115
## 10 -17.84 181.30   535 5.7      112
## 11 -15.93 167.91   183 5.6      109
## 12 -23.34 184.50    56 5.7      106
## 13 -22.13 180.38   577 5.7      104
## 14 -19.89 174.46   546 5.7       99
## 15 -26.00 182.12   205 5.6       98
## 16 -20.70 169.92   139 6.1       94
## 17 -11.40 166.07    93 5.6       94
## 18 -32.22 180.20   216 5.7       90
## 19 -37.03 177.52   153 5.6       87
## 20 -13.64 165.96    50 6.0       83
## 21 -24.57 178.40   562 5.6       80
## 22 -34.68 179.82    75 5.6       79
## 23 -21.14 174.21    40 5.7       78
## 24 -22.55 185.90    42 5.7       76

iii) mutate()

Add a new column based on current column depth

Add column depthinmeter based on column depth (in km)

quakes %>% filter(mag>5.5) %>% select(depth) %>% mutate(depthinmeter = depth*1000)
##    depth depthinmeter
## 1    139       139000
## 2     50        50000
## 3     42        42000
## 4     56        56000
## 5    127       127000
## 6    205       205000
## 7    216       216000
## 8    577       577000
## 9    562       562000
## 10    48        48000
## 11   535       535000
## 12    64        64000
## 13    75        75000
## 14   546       546000
## 15   417       417000
## 16   153       153000
## 17    93        93000
## 18   183       183000
## 19   627       627000
## 20    40        40000
## 21   242       242000
## 22   589       589000
## 23   107       107000
## 24   165       165000

iv) select()

Select specific column

quakes %>% filter(mag>5.5) %>% select(depth)
##    depth
## 1    139
## 2     50
## 3     42
## 4     56
## 5    127
## 6    205
## 7    216
## 8    577
## 9    562
## 10    48
## 11   535
## 12    64
## 13    75
## 14   546
## 15   417
## 16   153
## 17    93
## 18   183
## 19   627
## 20    40
## 21   242
## 22   589
## 23   107
## 24   165

v) summarise()

Creates a new dataframe with selected column

quakes %>% filter(mag>5.5) %>%  summarise(depth,mag)
##    depth mag
## 1    139 6.1
## 2     50 6.0
## 3     42 5.7
## 4     56 5.7
## 5    127 6.4
## 6    205 5.6
## 7    216 5.7
## 8    577 5.7
## 9    562 5.6
## 10    48 5.7
## 11   535 5.7
## 12    64 5.9
## 13    75 5.6
## 14   546 5.7
## 15   417 5.6
## 16   153 5.6
## 17    93 5.6
## 18   183 5.6
## 19   627 5.9
## 20    40 5.7
## 21   242 6.0
## 22   589 5.6
## 23   107 5.6
## 24   165 6.0