Intro to Data Science - HW 2 - Diganta Rashed

Attribution statement: (choose only one and delete the rest)

# 1. I did this homework by myself, with help from the book and the professor.
# 2. I did this homework with help from the book and the professor and these Internet sources:
# 3. I did this homework with help from <Name of another student> but did not cut and paste any code.

Reminders of things to practice from last week:

Assignment arrow <-
The combine command c( )
Descriptive statistics mean( ) sum( ) max( )
Arithmetic operators + - * /
Boolean operators > < >= <= == !=

This Week: Explore the quakes dataset (which is included in R). Copy the quakes dataset into a new dataframe (call it myQuakes), so that if you need to start over, you can do so easily (by copying quakes into myQuakes again). Summarize the variables in myQuakes. Also explore the structure of the dataframe

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr)
myQuakes = quakes

Step 1: Explore the earthquake magnitude variable called mag

  1. What is the average magnitude? Use mean() or summary():
mean(myQuakes$mag)
## [1] 4.6204
  1. What is the magnitude of the largest earthquake? Use max() or summary() and save the result in a variable called maxQuake:
maxQuake = max(myQuakes$mag)
  1. What is the magnitude of the smallest earthquake? Use min() or summary() and save the result in a variable called minQuake:
minQuake = min(myQuakes$mag)
  1. Output the third row of the dataframe
myQuakes[3, ]
##   lat  long depth mag stations
## 3 -26 184.1    42 5.4       43

E. Create a new dataframe, with only the rows where the magnitude is greater than 4. How many rows are in that dataframe (use code, do not count by looking at the output)

majorQuake = filter(myQuakes, mag > 4)
count(majorQuake)
##     n
## 1 954
  1. Create a sorted dataframe based on magnitude and store it in quakeSorted1. Do the sort two different ways, once with arrange() and then with order()
quakeSorted1 = arrange(myQuakes,mag)
quakeSorted2 = myQuakes[order(myQuakes$mag),]
  1. What are the latitude and longitude of the quake reported by the largest number of stations?
select(filter(myQuakes, stations==max(stations)),lat,long)
##      lat   long
## 1 -12.23 167.02
  1. What are the latitude and longitude of the quake reported by the smallest number of stations?
select(filter(myQuakes, stations==min(stations)),lat,long)
##       lat   long
## 1  -21.00 181.66
## 2  -23.55 180.80
## 3  -16.30 186.00
## 4  -20.10 184.40
## 5  -15.03 182.29
## 6  -19.06 169.01
## 7  -17.70 185.00
## 8  -21.04 181.20
## 9  -27.21 182.43
## 10 -18.40 183.40
## 11 -20.30 182.30
## 12 -14.85 184.87
## 13 -17.60 181.50
## 14 -20.61 182.44
## 15 -25.00 180.00
## 16 -17.78 185.33
## 17 -20.70 186.30
## 18 -21.77 181.00
## 19 -21.05 180.90
## 20 -17.70 188.10

Step 3: Using conditional if statements

  1. Test if maxQuake is greater than 7 (output “yes” or “no”)
    Hint: Try modifying the following code in R:
if (maxQuake > 7) "yes" else "no"
## [1] "no"
  1. Following the same logic, test if minQuake is less than 3 (output “yes” or “no”):
if (minQuake < 3) "yes" else "no"
## [1] "no"