Thy Nguyen Thn553

This homework is due on Sunday, Jan 31 at 11:59 pm. Please submit as an HTML file on Canvas.

For all questions, include the R commands/functions that you used to find your answer. Answers without supporting code will not receive credit.

How to submit this assignment

All homework assignments will be completed using R Markdown. These .Rmd files consist of text/syntax (formatted using Markdown) alongside embedded R code. When you have completed the assignment (by adding R code inside codeblocks and supporting text outside codeblocks), create your document as follows:

  • Click the “Knit” button (above)
  • Fix any errors in your code, if applicable
  • Upload the HTML file to Canvas

Q1 (1 pts)

The dataset quakes contains information about earthquakes occurring near Fiji since 1964. The first few observations are listed below.
head(quakes)
##      lat   long depth mag stations
## 1 -20.42 181.62   562 4.8       41
## 2 -20.62 181.03   650 4.2       15
## 3 -26.00 184.10    42 5.4       43
## 4 -17.97 181.66   626 4.1       19
## 5 -20.42 181.96   649 4.0       11
## 6 -19.68 184.31   195 4.0       12
How many observations are there of each variable (i.e., how many rows are there; show using code)? How many variables are there total (i.e., how many columns are in the dataset)? You can read more about the dataset here Do not forget to include the code you used to find the answer each question
NROW(quakes)
## [1] 1000

The data set give the locations of 1000 seismic events of MB > 4.0. The events occurred in a cube near Fiji since 1964. I used NROW command to list the observations in one random variable.


Q2 (2 pts)

What are the minimum, maximum, mean, and median values for the variables mag and depth? Note that there are many functions that can be used to answer this question. If you chose to work with each variable separately, recall that you can access individual variables in a dataframe using the $ operator (e.g., dataset$variable). Describe your answer in words.
min(quakes$mag)
## [1] 4
max(quakes$mag)
## [1] 6.4
mean(quakes$mag)
## [1] 4.6204
median(quakes$mag)
## [1] 4.6
min(quakes$depth)
## [1] 40
max(quakes$depth)
## [1] 680
mean(quakes$depth)
## [1] 311.371
median(quakes$depth)
## [1] 247

The minimum value for mag is 4.The maximum value for mag is 6.4.The mean value for mag is 4.6204.The median value for mag is 4.6.The minimum value for depth is 40.The maximum value for depth is 680.The mean value for depth is 311.371.The median value for depth is 247.


Q3

Recall how logical indexing of a dataframe works in R. To refresh your memory, in the example code below I ask R for the median magnitude for quakes whose longitude is greater than 175. The two ways produce equivalent results.
median(quakes$mag[quakes$long>175])
## [1] 4.5
median(quakes[quakes$long>175,]$mag) #this is the more conventional notation
## [1] 4.5

3.1 (1 pt)

Explain what each of the two lines of code are doing in words. Specifically, why do we need to use the comma in the second case but not in the first? Remember that the $ selects a single variable and that [ ] are used for indexing whatever object came before (either a single variable or a dataframe).

The first line first subset the longitude variable with values greater than 175 then it displays the median magnitudes of quakes from the longitude values. The second command subsets the longitude values between the quakes\(mag followed by a comma to specify the position where longitude subset is between quakes\)mag.

3.2 (3 pts)

What is the mean of the variable mag when depth is greater than the median depth? What is the mean of the variable mag when depth is less than the median depth? What does this suggest about the relationship between an earthquake’s depth and its magnitude?
mean(quakes$mag[quakes$depth>median(quakes$depth)])
## [1] 4.5232
mean(quakes$mag[quakes$depth<median(quakes$depth)])
## [1] 4.7176

There is an inverse relationship between an earthquake’s depth and its magnitude.

3.3 (2 pts)

What is the standard deviation of the variable lat when depth is greater than the median depth? What is the standard deviation of the variable lat when depth is less than the median depth? What does this suggest about the relationship between an earthquake’s latitude and it’s depth?
sd(quakes$lat[quakes$depth>median(quakes$depth)])
## [1] 3.577252
sd(quakes$lat[quakes$depth<median(quakes$depth)])
## [1] 6.1501

There is an inverse relationship between an earthquake’s latitude and it’s depth.


Q4 (2 pts)

The variable depth is measured in kilometers. Create a new variable called depth_m that gives depth in meters rather than kilometers and add it to the dataset quakes. To help get you started, I have given you code that creates the new variable but fills it with NA values. Overwrite the NAs below by writing code on the right-hand side of the assignment operator (<-) that computes the requested transformation. Print out the first few rows of the updated dataset using head().
quakes$depth_m <- quakes$depth*1000
head(quakes$depth_m, n= 10)
##  [1] 562000 650000  42000 626000 649000 195000  82000 194000 211000 622000

Q5

Let’s make some plots in base R.

5.1 (2 pt)

Create a boxplot of depth using the boxplot() function. Describe where you see the min, max, and median (which you calculated in question 2) in this plot.
boxplot(quakes$depth, main="Depth Boxplot")

The max is on top whisker and the minimum value is the bottom whisker. The median is the line within the box.

5.2 (2 pt)

Create a histogram of depth using the hist() function. What important information does the histogram provide that the boxplot does not?
hist(quakes$depth, main="Depth Histogram",xlab="Depths")

Histogram displays better distribution of depth. From histogram we can tell if the distribution is symmetric or skewed. The distribution in the depth histogram is bimodal.

5.3 (2 pt)

Create a scatterplot by plotting variables mag and stations against each other using the plot() function. Note that to generate a scatterplot, the plot() takes two arguments: the x-axis variable and the y-axis variable. Describe the relationship between the two variables.
plot(quakes$mag, quakes$stations, main = "Scatterplot of Mag and Stations",
     xlab = "mag", ylab = "stations")

There is a positive linear relationship between mag and stations with some data point being far away as mag and stations values increase.

5.4 (3 pt)