Problem Set #2

1. The following ten observations, taken during the years 1970-1979, are on October snow cover for Eurasia in units of millions of square kilometers. Follow the instructions and answer the questions by typing the appropriate commands.

Year Snow 1970 6.5 1971 12.0 1972 14.9 1973 10.0 1974 10.7 1975 7.9 1976 21.9 1977 12.5 1978 14.5 1979 9.2

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

a. Create a data frame from these data.

Snow = c(6.5, 12.0, 14.9, 10.0, 10.7, 7.9, 21.9, 12.5, 14.5,9.2)
Year = 1970:1979
Snow.df = data.frame(Year, Snow)
head(Snow.df)

##   Year Snow
## 1 1970  6.5
## 2 1971 12.0
## 3 1972 14.9
## 4 1973 10.0
## 5 1974 10.7
## 6 1975  7.9

b.What are the mean and median snow cover over this decade?

Mean = mean(Snow)
Mean

## [1] 12.01

Median = median(Snow)
Median

## [1] 11.35

c.What is the standard deviation of the snow cover over this decade?

SD= sd(Snow)
SD

## [1] 4.390761

d.How many Octobers had snow cover greater than 10 million km^2?

Sum = sum(Snow>10)
Sum

## [1] 6

2. The data vector rivers contains the lengths (miles) of 141 major rivers in North America.

a.What proportion of the rivers are shorter than 500 miles long?

head(rivers)

## [1] 735 320 325 392 524 450

n = length(rivers)
n

## [1] 141

Prop = sum(rivers < 500) / n
Prop

## [1] 0.5815603

b.What proportion of the rivers are shorter than the mean length?

Mean = mean(rivers)
PropS = sum(rivers < Mean)/ n            ## n = Total Numbers of major rivers in the data vector 
PropS

## [1] 0.6666667

c. What is the 75th percentile river length?

percentile_75 = quantile(rivers, probs = c(0.75))
percentile_75

## 75% 
## 680

d. What is the interquartile range in river length?

IQR = IQR(rivers)
IQR

## [1] 370

3. The dataset hflights from the hflights package contains all 227,496 flights that departed Houston in 2011. Using the functions in the dplyr package

library(dplyr)
library(hflights)

a.Create a data frame from hflights containing only those flights that departed on September 11th of that year.

flights.df = data.frame(hflights)
Sept.df = flights.df %>%
  filter(Month ==9 & DayofMonth == 11)
head(Sept.df)

##   Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum
## 1 2011     9         11         7    1546    1651            AA       458
## 2 2011     9         11         7     551     904            AA       466
## 3 2011     9         11         7    1936    2036            AA       657
## 4 2011     9         11         7    1438    1544            AA       742
## 5 2011     9         11         7    1720    2030            AA      1294
## 6 2011     9         11         7    1142    1258            AA      1848
##   TailNum ActualElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance
## 1  N559AA                65      40      -14       -4    IAH  DFW      224
## 2  N3EGAA               133     115      -16       -9    IAH  MIA      964
## 3  N498AA                60      40      -19       -4    IAH  DFW      224
## 4  N470AA                66      43        9       18    IAH  DFW      224
## 5  N3BVAA               130     118      -20       -5    IAH  MIA      964
## 6  N598AA                76      40       -2       -3    IAH  DFW      224
##   TaxiIn TaxiOut Cancelled CancellationCode Diverted
## 1     12      13         0                         0
## 2      5      13         0                         0
## 3      8      12         0                         0
## 4      6      17         0                         0
## 5      5       7         0                         0
## 6     22      14         0                         0

b.How many flights departed on that day?

count(Sept.df)

##     n
## 1 602

c.Create a data frame with the first column being the tail number and the second being the number of departures from Houston the plane made that year sorted by most to least number of flights.

Depfli.df = hflights %>%
  group_by(TailNum) %>%
  summarize(Dep = n()) %>%
  arrange(desc(Dep))
head(Depfli.df)

## # A tibble: 6 × 2
##   TailNum   Dep
##   <chr>   <int>
## 1 N14945    971
## 2 N15926    960
## 3 N16927    951
## 4 N12946    948
## 5 N14937    946
## 6 N14942    946

4. Using the tornado data set (Canvas - Tornadoes.txt) create a data frame with the year in the first column and the total number of tornadoes in Kansas by year in the second column.

data = read.table(file = "C:/SpatialStatistics/Tornadoes.txt", header = TRUE)
Tornado.df = data.frame(data)
Kankas.df = Tornado.df %>%
  select(YEAR, STATE, STATENUMBE) %>%
  filter(STATE == "KS")
head(Kankas.df)

##    YEAR STATE STATENUMBE
## 58 1950    KS          1
## 73 1950    KS          2
## 79 1950    KS          3
## 80 1950    KS          4
## 86 1950    KS          5
## 87 1950    KS          6

Answer.df = select(Kankas.df, YEAR, STATENUMBE)
head(Answer.df)

##    YEAR STATENUMBE
## 58 1950          1
## 73 1950          2
## 79 1950          3
## 80 1950          4
## 86 1950          5
## 87 1950          6

Problem Set #2

Prabish Khadka Chhetri

2022-10-20

1. The following ten observations, taken during the years 1970-1979, are on October snow cover for Eurasia in units of millions of square kilometers. Follow the instructions and answer the questions by typing the appropriate commands.

a. Create a data frame from these data.

b.What are the mean and median snow cover over this decade?

c.What is the standard deviation of the snow cover over this decade?

d.How many Octobers had snow cover greater than 10 million km^2?

2. The data vector rivers contains the lengths (miles) of 141 major rivers in North America.

a.What proportion of the rivers are shorter than 500 miles long?

b.What proportion of the rivers are shorter than the mean length?

c. What is the 75th percentile river length?

d. What is the interquartile range in river length?

3. The dataset hflights from the hflights package contains all 227,496 flights that departed Houston in 2011. Using the functions in the dplyr package

a.Create a data frame from hflights containing only those flights that departed on September 11th of that year.

b.How many flights departed on that day?

c.Create a data frame with the first column being the tail number and the second being the number of departures from Houston the plane made that year sorted by most to least number of flights.

4. Using the tornado data set (Canvas - Tornadoes.txt) create a data frame with the year in the first column and the total number of tornadoes in Kansas by year in the second column.