Write a function that calculates the mean of any numeric vector you give it, without using the built-in mean() or sum() functions.
Write a function that takes as its input a vector with four elements. If the sum of the first two elements is greater than the sum of the second two, the function returns the vector; otherwise it returns 0.
Write a function that calculates the Fibonacci sequence up to the nth element, where n is any number input into your function (its argument). The Fibonacci sequence is: 1, 1, 2, 3, 5, 8, 13, 21. . . , ie, each element is the sum of the previous two elements. One way to do this is to start off with the first two elements, c(1,1) and set an internal variable to this sequence. Then write a loop that counts up to n, where for each new element, you first calculate it by adding the last two elements of the growing sequence, and then stick that new number onto the growing sequence using c(). When the loop is finished, the function should return the final vector of Fibonacci numbers.
Create a 4x4 matrix of the numbers 1 through 16. Use apply to apply you function from (a) to each of the rows in your matrix.

Using the airquality dataset, constuct an aggregated dataset which shows the maximum wind and ozone by month.
Create the authors and books datasets following the example and data in the lecture, and then create a new data set by merging these two datasets by author, preserving all rows.
Take the following string and replace every instance of “to” or “To” with “2” : To be, or not to be – that is the question: Whether ’tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die – to sleep – No more…

Create a histogram using the base R graphics using some dataset or variable other than the one in the lessons. Always make sure your graph has well-labeled x and y axes and an explanatory title.
Create a scatter plot using the base R graphics, again with some variable other than the one in the lessons.
Create a histogram using ggplot, using some new data. In this and the later plots, please tinker with the settings using the examples in http://www.cookbook-r.com/Graphs/ to make it prettier.
Create a box plot (with multiple categories) using ggplot, using some new data.
Create a scatter plot using ggplot, using some new data.

Homework 2

1

a.

Write a function that calculates the mean of any numeric vector you give it, without using the built-in mean() or sum() functions.

# test sum f(n) for debugging
v <- seq(2, 45, 3)
t <- 0
sm <- function(x) {
    for (i in seq(1, length(x), 2)) {
        xn <- x[i + 1]
        if (is.na(xn) == T) {
            xn <- 0
        }
        t <- x[i] + t + xn
        print(paste("x[i]:", x[i]))
        print(paste("t:", t))
    }
}
sum(v)

## [1] 345

# mean f(n)
mn <- function(x) {
    for (i in seq(1, length(x), 2)) {
        if (i <= length(x)) {
            xn <- x[i + 1]
            if (is.na(xn) == T) {
                xn <- 0
            }
            t <- x[i] + t + xn
            if (i == length(x) - 1 | i == length(x)) {
                print(t/length(x))
            }
        }
    }
}
(outcome <- c(mn(v), mean(v)))

## [1] 23

## [1] 23

b.

Write a function that takes as its input a vector with four elements. If the sum of the first two elements is greater than the sum of the second two, the function returns the vector; otherwise it returns 0.

v <- c(4:1)
b <- function(x) {
    if (v[1] + v[2] > v[3] + v[4]) {
        print(v)
    } else {
        print(0)
    }
}
b(v)

## [1] 4 3 2 1

c.

Write a function that calculates the Fibonacci sequence up to the nth element, where n is any number input into your function (its argument). The Fibonacci sequence is: 1, 1, 2, 3, 5, 8, 13, 21. . . , ie, each element is the sum of the previous two elements. One way to do this is to start off with the first two elements, c(1,1) and set an internal variable to this sequence. Then write a loop that counts up to n, where for each new element, you first calculate it by adding the last two elements of the growing sequence, and then stick that new number onto the growing sequence using c(). When the loop is finished, the function should return the final vector of Fibonacci numbers.

fib <- c(1, 1)
fibseq <- function(x) {
    fend <- c(1:(x - 2))
    i <- 1
    for (i in fend) {
        sf <- sum(fib)
        fib <- c(fib[2], sf)
        i <- i + 1
        if (i == (x - 1)) {
            print(fib[2])
        }
    }
}
fibseq(6)

## [1] 8

d.

Create a 4x4 matrix of the numbers 1 through 16. Use apply to apply you function from (a) to each of the rows in your matrix.

(fibm <- matrix(data = c(1:16), nrow = 4, ncol = 4))

##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16

(fio <- apply(fibm, c(1), "mn"))

## [1] 7
## [1] 8
## [1] 9
## [1] 10

## NULL

2.

a.

Using the airquality dataset, constuct an aggregated dataset which shows the maximum wind and ozone by month.

(wioz_by_mo <- aggregate(cbind(Wind, Ozone) ~ Month, data = airquality, "max"))

##   Month Wind Ozone
## 1     5 20.1   115
## 2     6 20.7    71
## 3     7 14.9   135
## 4     8 15.5   168
## 5     9 16.6    96

b.

Create the authors and books datasets following the example and data in the lecture, and then create a new data set by merging these two datasets by author, preserving all rows.

(authors <- data.frame(surname = c("Tukey", "Venables", "Tierney", "Ripley", "McNeil"), 
    nationality = c("US", "Australia", "US", "UK", "Australia"), stringsAsFactors = FALSE))

##    surname nationality
## 1    Tukey          US
## 2 Venables   Australia
## 3  Tierney          US
## 4   Ripley          UK
## 5   McNeil   Australia

(books <- data.frame(name = c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", 
    "McNeil", "R Core"), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", 
    "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", 
    "An Introduction to R"), stringsAsFactors = FALSE))

##       name                         title
## 1    Tukey     Exploratory Data Analysis
## 2 Venables Modern Applied Statistics ...
## 3  Tierney                     LISP-STAT
## 4   Ripley            Spatial Statistics
## 5   Ripley         Stochastic Simulation
## 6   McNeil     Interactive Data Analysis
## 7   R Core          An Introduction to R

merge.data.frame(authors, books, by.x = "surname", by.y = "name")

##    surname nationality                         title
## 1   McNeil   Australia     Interactive Data Analysis
## 2   Ripley          UK            Spatial Statistics
## 3   Ripley          UK         Stochastic Simulation
## 4  Tierney          US                     LISP-STAT
## 5    Tukey          US     Exploratory Data Analysis
## 6 Venables   Australia Modern Applied Statistics ...

c.

Take the following string and replace every instance of “to” or “To” with “2” :

to_2 <- "To be, or not to be -- that is the question: Whether 'tis nobler in the mind to suffer  The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them. To die -- to sleep -- No more..."
gsub("[T|t]o", 2, to_2)

## [1] "2 be, or not 2 be -- that is the question: Whether 'tis nobler in the mind 2 suffer  The slings and arrows of outrageous fortune, Or 2 take arms against a sea of troubles, And by opposing end them. 2 die -- 2 sleep -- No more..."

3.

a.

Create a histogram using the base R graphics using some dataset or variable other than the one in the lessons. Always make sure your graph has well-labeled x and y axes and an explanatory title.

hist(mpg$hwy, main = "Frequencies of hwy mpg for 38 popular models of car", xlab = "Highway MPG")

b.

Create a scatter plot using the base R graphics, again with some variable other than the one in the lessons.

plot(mpg$cyl, y = mpg$hwy, main = "# of Cylinders v Highway MPG", xlab = "# of Cylinders", 
    ylab = "HWY MPG")

c.

Create a histogram using ggplot, using some new data. In this and the later plots, please tinker with the settings using the examples in http://www.cookbook-r.com/Graphs/ to make it prettier.

head(diamonds)

## # A tibble: 6 x 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

palette <- c("#4545FF", "#B6FF9C", "#84FFC9", "#7EEA90", "#BADB69")
(dia_1 <- diamonds %>% group_by(cut) %>% summarize(mn.price = mean(price), mn.crt = mean(carat), 
    n = n()))

## # A tibble: 5 x 4
##         cut mn.price    mn.crt     n
##       <ord>    <dbl>     <dbl> <int>
## 1      Fair 4358.758 1.0461366  1610
## 2      Good 3928.864 0.8491847  4906
## 3 Very Good 3981.760 0.8063814 12082
## 4   Premium 4584.258 0.8919549 13791
## 5     Ideal 3457.542 0.7028370 21551

ggplot(data = dia_1, mapping = aes(x = cut)) + geom_histogram(mapping = aes(y = mn.price, 
    fill = mn.crt, color = n, size = 1), stat = "identity") + scale_fill_gradient("mn.carat", 
    low = "#FDE4FF", high = "#A065A3") + xlab("Cut") + ylab("Mean Price") + ggtitle("Cut v Avg Price, Fill: Avg Carat, Border: No. in Cut")

d.

Create a box plot (with multiple categories) using ggplot, using some new data.

ggplot(data = mpg, mapping = aes(x = class, y = hwy)) + geom_boxplot(color = "#4545FF", 
    fill = "#B6FF9C") + xlab("Class of Car") + ylab("Hwy MPG") + ggtitle("Class of Car v Hwy MPG") + 
    theme(plot.title = element_text(hjust = 0.5))

e.

Create a scatter plot using ggplot, using some new data.

ggplot(data = mpg, mapping = aes(x = cyl, y = cty, color = class)) + geom_point() + 
    xlab("No. of Cylinders") + ylab("City MPG") + ggtitle("Cylinders v City MPG, Class of Vehicle by Color")

Holsenbeck_S_2

Stephen Synchronicity

2017-09-20

Homework 2

1

a.

b.

c.

d.

2.

a.

b.

c.

3.

a.

b.

c.

d.

e.