Assignment #2

1. What is the second argument in the lm() function?

The second argument in the lm() function is the data set you are using for the analysis.

2. What is the second argument in the plot.gam() function, found in the mgcv library?

The second argument in the plot.gam() function is residuals (i.e. the name of the variable to be plotted).

3. What is the length of the vector created from a sequence of 3 to 27 by intervals of 0.13?

185

my_vector <- seq(from = 3, to = 27, by = 0.13)
  vecto_length <- length(my_vector)
  vector_length <- length(my_vector)
  print(vector_length)

## [1] 185

4. What code subsets the first row and third column of the second matrix for the array array(seq(1:20), dim=c(2,5,2)), and what value occupies that position?

the dim=c(2,5,2) is the code used to generate the second matrix with 2 rows and 5 columns. The value that occupies the first row and second column of the second matrix is 15.

array(seq(1:20),dim=c(2,5,2))

## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   11   13   15   17   19
## [2,]   12   14   16   18   20

5. Create a list object from 3 or more built in datasets (https://vincentarelbundock.github.io/Rdatasets/datasets.html).

Fatalities <- read.csv("~/Downloads/Fatalities.csv", header=TRUE)

CASchools <- read.csv("~/Downloads/CASchools.csv", header=TRUE)

CollegeDistance <- read.csv("~/Downloads/CollegeDistance.csv", header=TRUE)

my_list <- list(Fatalities = Fatalities, CollegeDistance = CollegeDistance, CASchools = CASchools)

6. What is the mean length of major North American Rivers?

"rivers.csv"

## [1] "rivers.csv"

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

mean(rivers)

## [1] 591.1844

7. Write a built-in dataset to a CSV file (show code) and open it in Excel.

mangrove <- data.frame(
  Site = c("Fourchon", "Hopedale", "Grand_Isle"),
  Height = c(100, 120, 110),
     Width = c(50, 55, 60)
  )

write.csv(mangrove, "/Users/michaelrabalais/Desktop/R_class//mangrove.csv")

8. Tabulate the type and concentration factors in the CO2 dataset in one table. Is this experiment balanced?

"CO2.csv"

## [1] "CO2.csv"

CO2 %>% select(Type, conc)

##           Type conc
## 1       Quebec   95
## 2       Quebec  175
## 3       Quebec  250
## 4       Quebec  350
## 5       Quebec  500
## 6       Quebec  675
## 7       Quebec 1000
## 8       Quebec   95
## 9       Quebec  175
## 10      Quebec  250
## 11      Quebec  350
## 12      Quebec  500
## 13      Quebec  675
## 14      Quebec 1000
## 15      Quebec   95
## 16      Quebec  175
## 17      Quebec  250
## 18      Quebec  350
## 19      Quebec  500
## 20      Quebec  675
## 21      Quebec 1000
## 22      Quebec   95
## 23      Quebec  175
## 24      Quebec  250
## 25      Quebec  350
## 26      Quebec  500
## 27      Quebec  675
## 28      Quebec 1000
## 29      Quebec   95
## 30      Quebec  175
## 31      Quebec  250
## 32      Quebec  350
## 33      Quebec  500
## 34      Quebec  675
## 35      Quebec 1000
## 36      Quebec   95
## 37      Quebec  175
## 38      Quebec  250
## 39      Quebec  350
## 40      Quebec  500
## 41      Quebec  675
## 42      Quebec 1000
## 43 Mississippi   95
## 44 Mississippi  175
## 45 Mississippi  250
## 46 Mississippi  350
## 47 Mississippi  500
## 48 Mississippi  675
## 49 Mississippi 1000
## 50 Mississippi   95
## 51 Mississippi  175
## 52 Mississippi  250
## 53 Mississippi  350
## 54 Mississippi  500
## 55 Mississippi  675
## 56 Mississippi 1000
## 57 Mississippi   95
## 58 Mississippi  175
## 59 Mississippi  250
## 60 Mississippi  350
## 61 Mississippi  500
## 62 Mississippi  675
## 63 Mississippi 1000
## 64 Mississippi   95
## 65 Mississippi  175
## 66 Mississippi  250
## 67 Mississippi  350
## 68 Mississippi  500
## 69 Mississippi  675
## 70 Mississippi 1000
## 71 Mississippi   95
## 72 Mississippi  175
## 73 Mississippi  250
## 74 Mississippi  350
## 75 Mississippi  500
## 76 Mississippi  675
## 77 Mississippi 1000
## 78 Mississippi   95
## 79 Mississippi  175
## 80 Mississippi  250
## 81 Mississippi  350
## 82 Mississippi  500
## 83 Mississippi  675
## 84 Mississippi 1000

9. Write a single function to subset (filter) both spray A and records of counts > 17 from InsectSprays. How many records are there?

"InsectSprays.csv"

## [1] "InsectSprays.csv"

insect <- read.csv("~/Downloads/InsectSprays.csv", header=TRUE)

filter(insect, count > 17 & spray=='A')

##   rownames count spray
## 1        3    20     A
## 2        8    23     A
## 3       10    20     A

10. In 2005, what was the 5th city in TX for home sales?

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

data(txhousing)

txhousing %>% filter(year=='2005') %>% 
group_by(city) %>% 
summarize(mean_sales = mean(sales)) %>% 
print(n = Inf) %>% 
arrange(desc(mean_sales))

## # A tibble: 46 × 2
##    city                  mean_sales
##    <chr>                      <dbl>
##  1 Abilene                    165. 
##  2 Amarillo                   259. 
##  3 Arlington                  513. 
##  4 Austin                    2242. 
##  5 Bay Area                   533. 
##  6 Beaumont                   172. 
##  7 Brazoria County            107. 
##  8 Brownsville                 76.5
##  9 Bryan-College Station      185. 
## 10 Collin County             1267. 
## 11 Corpus Christi             408. 
## 12 Dallas                    4998. 
## 13 Denton County              715. 
## 14 El Paso                    446  
## 15 Fort Bend                  852. 
## 16 Fort Worth                 870. 
## 17 Galveston                  108. 
## 18 Garland                    224. 
## 19 Harlingen                   NA  
## 20 Houston                   6067. 
## 21 Irving                     124. 
## 22 Kerrville                   NA  
## 23 Killeen-Fort Hood          348. 
## 24 Laredo                      72.4
## 25 Longview-Marshall          208  
## 26 Lubbock                    270. 
## 27 Lufkin                      66.4
## 28 McAllen                    194. 
## 29 Midland                     NA  
## 30 Montgomery County          623. 
## 31 NE Tarrant County          802. 
## 32 Nacogdoches                 38.4
## 33 Odessa                      NA  
## 34 Paris                       42.5
## 35 Port Arthur                 70.8
## 36 San Angelo                 135. 
## 37 San Antonio               2003. 
## 38 San Marcos                  33.2
## 39 Sherman-Denison            126  
## 40 South Padre Island          NA  
## 41 Temple-Belton              137. 
## 42 Texarkana                   90.4
## 43 Tyler                      278. 
## 44 Victoria                    73.5
## 45 Waco                       197. 
## 46 Wichita Falls              167.

## # A tibble: 46 × 2
##    city              mean_sales
##    <chr>                  <dbl>
##  1 Houston                6067.
##  2 Dallas                 4998.
##  3 Austin                 2242.
##  4 San Antonio            2003.
##  5 Collin County          1267.
##  6 Fort Worth              870.
##  7 Fort Bend               852.
##  8 NE Tarrant County       802.
##  9 Denton County           715.
## 10 Montgomery County       623.
## # ℹ 36 more rows

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.