Enter your name and EID here

Please submit as an HTML file on Canvas before the due date

For all questions, include the R commands/functions that you used to find your answer. Answers without supporting code will not receive credit.

How to submit this assignment

All homework assignments will be completed using R Markdown. These .Rmd files consist of text/syntax (formatted using Markdown) alongside embedded R code. When you have completed the assignment (by adding R code inside codeblocks and supporting text outside codeblocks), create your document as follows:

  • Click the “Knit” button (above)
  • Fix any errors in your code, if applicable
  • Upload the HTML file to Canvas

Q1 (0.5 pts)

The dataset quakes contains information about earthquakes occurring near Fiji since 1964. The first few observations are listed below.

head(quakes)
##      lat   long depth mag stations
## 1 -20.42 181.62   562 4.8       41
## 2 -20.62 181.03   650 4.2       15
## 3 -26.00 184.10    42 5.4       43
## 4 -17.97 181.66   626 4.1       19
## 5 -20.42 181.96   649 4.0       11
## 6 -19.68 184.31   195 4.0       12

How many observations are there of each variable (i.e., how many rows are there; show using code)? How many variables are there total (i.e., how many columns are in the dataset)? You can read more about the dataset here Do not forget to include the code you used to find the answer each question

# Columns
length(quakes)
## [1] 5
# Rows
nrow(quakes)
## [1] 1000

There are 1000 observations of each variable and there are 5 variables total.


Q2 (1 pts)

What are the minimum, maximum, mean, and median values for the variables mag and depth? Note that there are many functions that can be used to answer this question. If you chose to work with each variable separately, recall that you can access individual variables in a dataframe using the $ operator (e.g., dataset$variable). Describe your answer in words.

# mag
x <- (quakes$mag)
min(x)
## [1] 4
max(x)
## [1] 6.4
mean(x)
## [1] 4.6204
median(x)
## [1] 4.6
# depth
y <- (quakes$depth)
min(y)
## [1] 40
max(y)
## [1] 680
mean(y)
## [1] 311.371
median(y)
## [1] 247

For ‘mag’, the minimum is 4, the maximum is 6.4, the mean is 4.6204, and the median is 4.6. For ‘depth’, the minimum is 40, the maximum is 680, the mean is 311.371, and the median is 247.


Q3

Recall how logical indexing of a dataframe works in R. To refresh your memory, in the example code below I ask R for the median magnitude for quakes whose longitude is greater than 175.

median(quakes$mag[quakes$long > 175])
## [1] 4.5

Breaking this down a bit, the above line of code is doing the following (this is just for illustration, the code itself is unnecessarily verbose):

mags <- quakes$mag
longs <- quakes$long
is_long_greater_175 <- longs > 175  ## Makes a logical vector
mags_where_long_is_greater_175 <- mags[is_long_greater_175]  ## Indexing using logical vector

median(mags_where_long_is_greater_175)
## [1] 4.5

3.1 (0.5 pts)

Explain in words what the single line of code is doing. Remember that the $ selects a single variable and that [ ] are used for indexing whatever object came before (either a single variable or a dataframe).

The single line of code of median(quakes\(mag[quakes\)long > 175]) is finding the median magnitude of the earthquakes that are longer than 175 minutes.

3.2 (1.5 pts)

What is the mean of the variable mag when depth is greater than the median depth? What is the mean of the variable mag when depth is less than the median depth? What does this suggest about the relationship between an earthquake’s depth and its magnitude?

mean(quakes$mag[quakes$depth > median(quakes$depth)])
## [1] 4.5232
mean(quakes$mag[quakes$depth < median(quakes$depth)])
## [1] 4.7176

The average magnitude is less for the earthquakes with a depth greater than the median depth than that of the mean magnitude of the earthquakes with a depth less than that of the median depth of earthquakes. To simplify, there is an inverse relationship between the depth of an earthquake and its magnitude.

3.3 (1 pts)

The standard deviation of a quantity is a measure of variable that quantity is. For example, the following plot gives histograms of two variables (petal length and petal width from the iris dataset).

hist(iris$Petal.Length)

hist(iris$Petal.Width)

We see that the petal length is more variable than the petal width, which can be measured using the standard deviation (computed using the sd function):

print(sd(iris$Petal.Length))
## [1] 1.765298
print(sd(iris$Petal.Width))
## [1] 0.7622377

**What is the standard deviation of the variable lat when depth is *greater than the median depth? What is the standard deviation of the variable lat when depth is less than the median depth? What does this suggest about the relationship between an earthquake’s latitude and it’s depth?**

sd(quakes$lat[quakes$depth > median(quakes$depth)])
## [1] 3.577252
sd(quakes$lat[quakes$depth < median(quakes$depth)])
## [1] 6.1501

The relationship between an earthquake’s latitude and it’s depth is inverse.


Q4 (1 pts)

The variable depth is measured in kilometers. Create a new variable called depth_m that gives depth in meters rather than kilometers and add it to the dataset quakes. To help get you started, I have given you code that creates the new variable but fills it with NA values. Overwrite the NAs below by writing code on the right-hand side of the assignment operator (<-) that computes the requested transformation. Print out the first few rows of the updated dataset using head().

quakes$depth_m <- (quakes$depth * 1000)
head(quakes$depth_m)
## [1] 562000 650000  42000 626000 649000 195000

Q5

Let’s make some plots in base R.

5.1 (1 pts)

Create a boxplot of depth using the boxplot() function. Describe where you see the min, max, and median (which you calculated in question 2) in this plot.

boxplot(quakes$depth)

The histogram shows that the median, the dark bolded line in the center, appears to be around 250, which is correct in that it was computed before to be 247. The max of 680 is the upper line connected to the upper whisker, and the minimum is the bottom horizontal line connected to the lower whisker at 40.

5.2 (1 pts)

Create a histogram of depth using the hist() function. What important information does the histogram provide that the boxplot does not?

hist(quakes$depth)

A histogram shows the overall distribution of the data, which can tell us where a majority of the data falls and an idea of the standard deviation. This histogram shows bimodal data, with two the two peaks at 50-100 and 550-600.

5.3 (1 pts)

Create a scatterplot by plotting variables mag and stations against each other using the plot() function. Note that to generate a scatterplot, the plot() takes two arguments: the x-axis variable and the y-axis variable. Describe the relationship between the two variables.

plot(quakes$mag, quakes$stations, main = "Scatterplot of Mag and Stations",
    xlab = "mag", ylab = "stations")

The relationship between Mags and Stations is a positive correlation.

5.4 (1.5 pts)

Create scatterplot of the quakes’ geographic locations by plotting long on the x-axis and lat on the y-axis. Using this plot, and the map/link below (note the two trenches), and some of the techniques you practiced above, are deeper quakes more likely to originate east or west of Fiji?

Link to location on Google maps

plot(quakes$long, quakes$lat, main = "Scatterplot of Longitude and Latitude of Earthquakes",
    xlab = "longitude", ylab = "latitude")

The deeper quakes are concentrated more on the higher longitudes which means that the deeper quakes are more likely located east of Fiji.


## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
## 
## Matrix products: default
## BLAS:   /stor/system/opt/R/R-4.0.3/lib/R/lib/libRblas.so
## LAPACK: /stor/system/opt/R/R-4.0.3/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.29   R6_2.5.1        jsonlite_1.8.0  formatR_1.12   
##  [5] magrittr_2.0.3  evaluate_0.15   highr_0.9       stringi_1.7.8  
##  [9] cachem_1.0.6    rlang_1.0.4     cli_3.3.0       rstudioapi_0.13
## [13] jquerylib_0.1.4 bslib_0.4.0     rmarkdown_2.14  tools_4.0.3    
## [17] stringr_1.4.0   xfun_0.31       yaml_2.3.5      fastmap_1.1.0  
## [21] compiler_4.0.3  htmltools_0.5.3 knitr_1.39      sass_0.4.2
## [1] "2022-09-04 13:32:13 CDT"
##                                       sysname 
##                                       "Linux" 
##                                       release 
##                          "4.15.0-191-generic" 
##                                       version 
## "#202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022" 
##                                      nodename 
##                  "educcomp01.ccbb.utexas.edu" 
##                                       machine 
##                                      "x86_64" 
##                                         login 
##                                     "unknown" 
##                                          user 
##                                     "cdb4373" 
##                                effective_user 
##                                     "cdb4373"