Problem Set # 2

Due Date: October 14, 2022 Total Points: 32

1 The following ten observations, taken during the years 1970-1979, are on October snow cover for Eurasia in units of millions of square kilometers. Follow the instructions and answer the questions by typing the appropriate commands.

Year Snow 1970 6.5 1971 12.0 1972 14.9 1973 10.0 1974 10.7 1975 7.9 1976 21.9 1977 12.5 1978 14.5 1979 9.2

  1. Create a data frame from these data. (2)
Year = c(1970,1971,1972,1973,1974,1975,1976,1977,1978,1979)
Snow = c(6.5,12.0,14.9,10.0,10.7,7.9,21.9,12.5,14.5,9.2)
snowfall.df=data.frame(Year,Snow)
(snowfall.df)
##    Year Snow
## 1  1970  6.5
## 2  1971 12.0
## 3  1972 14.9
## 4  1973 10.0
## 5  1974 10.7
## 6  1975  7.9
## 7  1976 21.9
## 8  1977 12.5
## 9  1978 14.5
## 10 1979  9.2
  1. What are the mean and median snow cover over this decade? (2)
mean(Snow)
## [1] 12.01
median(Snow)
## [1] 11.35
  1. What is the standard deviation of the snow cover over this decade? (2)
sd(Snow)
## [1] 4.390761
  1. How many Octobers had snow cover greater than 10 million km\(^2\)? (2)
sum(Snow > 10)
## [1] 6

2 The data vector rivers contains the lengths (miles) of 141 major rivers in North America.

  1. What proportion of the rivers are shorter than 500 miles long? (2)
(rivers)
##   [1]  735  320  325  392  524  450 1459  135  465  600  330  336  280  315  870
##  [16]  906  202  329  290 1000  600  505 1450  840 1243  890  350  407  286  280
##  [31]  525  720  390  250  327  230  265  850  210  630  260  230  360  730  600
##  [46]  306  390  420  291  710  340  217  281  352  259  250  470  680  570  350
##  [61]  300  560  900  625  332 2348 1171 3710 2315 2533  780  280  410  460  260
##  [76]  255  431  350  760  618  338  981 1306  500  696  605  250  411 1054  735
##  [91]  233  435  490  310  460  383  375 1270  545  445 1885  380  300  380  377
## [106]  425  276  210  800  420  350  360  538 1100 1205  314  237  610  360  540
## [121] 1038  424  310  300  444  301  268  620  215  652  900  525  246  360  529
## [136]  500  720  270  430  671 1770
smallRiver = sum(rivers < 500)
riversCount = length(rivers)
percent = (smallRiver/riversCount)* 100
(smallRiver)
## [1] 82
(riversCount)
## [1] 141
(percent)
## [1] 58.15603
  1. What proportion of the rivers are shorter than the mean length? (2)
mean = mean(rivers)
perc = (sum(rivers < mean)/riversCount)*100
(perc)
## [1] 66.66667
  1. What is the 75th percentile river length? (2)
percentile = quantile(rivers, probs = 0.75)
(percentile)
## 75% 
## 680
  1. What is the interquartile range in river length? (2)
pc25 = quantile(rivers, probs = 0.25)
range = percentile - pc25

(range)
## 75% 
## 370

3 The dataset hflights from the hflights package contains all 227,496 flights that departed Houston in 2011. Using the functions in the dplyr package

#install.packages('hflights')
#install.packages("dplyr")
#library(hflights)
#library(dplyr)
#(hflights)

#got lost on this one and could not figure it out a. Create a data frame from hflights containing only those flights that departed on September 11th of that year. (4)

#head(hflights)
#flights = data.frame(data = hflights)
  
#filter(i
#filter(flights, Month == '9')
#filter(flights, DayofMonth == '11')
  1. How many flights departed on that day? (2)

  2. Create a data frame with the first column being the tail number and the second being the number of departures from Houston the plane made that year sorted by most to least number of flights. (4)

4 Using the tornado data set (Canvas - Tornadoes.txt) create a data frame with the year in the first column and the total number of tornadoes in Kansas by year in the second column. (6)

nados = read.table("C:/Users/virg7/OneDrive/Desktop/Tornadoes.txt")
nadosdf = data.frame(nados)
#filter(nadosdf, STATE == 'KS')
#nadosdf %>%
#  group_by(Year = lubridate::Year(Year)) %>%
 # summarize(KS Tornadoes = sum())
# I could not get this finished, I feel like I am close