Stat 135 Assignment 6

Statistics 133 - Writing Function in R

Write a function that takes a number year and return a logical value which is either TRUE, if the year was or will be a leap year, or FALSE if not. The rule for determining leap years is that one of the following conditions must hold:

the year is divisible by 4 but not divisible by 100, or
the year is divisible by 400.

Now write an R expression using your function to calculate how many leap years you have lived through during your lifetime.

year <- 2014
if (year %% 4 == 0 & year != 100 | year %% 400) {
  print("TRUE")
} else {
  print("FALSE")
}

## [1] "TRUE"

span <- c(1993:2016)
years <- c()
for(i in 1:length(span)) {
  if(leap_year(span[i]) == TRUE) {
    years <- c(years, span[i]) 
  }
} 
length(years)

## [1] 6

It is possible to approximate the area of any two-dimensional shape using a Monte Carlo technique. To do this, we specify a region of known area that contains the shape, sample uniformly on that region, then multiply the known area by the fraction of the points that fall within the shape we’re trying to measure. In this problem you’ll write code to carry this out for a circle with radius equal to one. We can compare this to what we already know about the area of circles to see how well our approximation method performs.

(a) Write expressions in R to do the following.

Create two vectors x and y of length 100, where each is sampled from the Uniform distribution with lower limit -1 and upper limit 1.

x <- runif(n = 100, min = 0, max = 1)

y <- runif(n = 100, min = 0, max = 1)


Inside_Circle <- x^2 + y^2 <= 1

Monte <- data.frame(x,y, Inside_Circle) 

head(Monte)

##            x          y Inside_Circle
## 1 0.08088443 0.44762531          TRUE
## 2 0.50836022 0.16096869          TRUE
## 3 0.56791958 0.49901785          TRUE
## 4 0.92564738 0.07251712          TRUE
## 5 0.80282484 0.65181917         FALSE
## 6 0.71654116 0.85343823         FALSE

Calculate the proportion of these for which the point (x,y) lies within the unit circle.

proportion <- sum(Inside_Circle)/100
proportion

## [1] 0.86

Make a scatter plot of x and y, coloring the points differently depending on whether they’re inside the unit circle or not.

scatter <- ggplot(Monte, aes(x = x, y = y, col = Inside_Circle )) + 
  geom_point() + 
  xlab("x sequence") + ylab("y sequence") + 
  scale_fill_discrete(name="Inside Circle")
scatter

Multiply the proportion by 4 (the area of the square over which we’re sampling) to approximate the area inside the circle.

circle_area <- proportion * 4
circle_area

## [1] 3.44

(b) Now copy your code from part (a) and use it to create a function that takes arguments n (for sample size) and plotit (to indicate whether or not to make the plot) and returns the approximated area and optionally makes a plot depending on whether plotit is TRUE or FALSE. Give each argument a default value of your choosing.

plot <- function(n, plotit) {
  x <- runif(n, min = 0, max = 1)
  y <- runif(n, min = 0, max = 1)
  Inside_Circle <- x^2 + y^2 <= 1
  Monte <- data.frame(x,y, Inside_Circle)
  proportion <- sum(Inside_Circle)/n
  circle_area <- proportion*4
  if(plotit) {
    scatter <- ggplot(Monte, aes(x = x, y = y, col = Inside_Circle )) + 
  geom_point() + 
  xlab("x sequence") + ylab("y sequence") + 
  scale_fill_discrete(name="Inside Circle")
  print(scatter)
  } 
  return(circle_area)
}
plot(100, TRUE)

## [1] 3.4

plot(500, FALSE)

## [1] 3.096

(c) The replicate function takes an R expression and evaluates it a specified number of times. You can think of it as a very simplified for loop in which exactly the same code gets evaluated each time through the loop. Use this function to approximate the area of the circle 100 times, first for n = 50 and then for n = 500 and storing the results in two different vectors.

fifty <- replicate(100, plot(50, FALSE), simplify = "array")
head(fifty)

## [1] 3.44 2.88 3.20 3.28 2.96 3.20

fivehundred <- replicate(100, plot(500,  FALSE), simplify = "array")
head(fivehundred)

## [1] 3.296 3.240 3.192 3.224 3.200 2.976

(d) Make two histograms (facetted by vector size) showing the results from part (c). Add a vertical line to the plot to indicate the true area. Give your plot appropriate axis names and titles and a nice readable theme.

approx1 <- data.frame(fifty, fivehundred)
head(approx1)

##   fifty fivehundred
## 1  3.44       3.296
## 2  2.88       3.240
## 3  3.20       3.192
## 4  3.28       3.224
## 5  2.96       3.200
## 6  3.20       2.976

approx2 <- approx1 %>%
  gather(key = sample_size, value = area, fifty, fivehundred)
head(approx2)

##   sample_size area
## 1       fifty 3.44
## 2       fifty 2.88
## 3       fifty 3.20
## 4       fifty 3.28
## 5       fifty 2.96
## 6       fifty 3.20

area_hist <- ggplot(approx2, aes(area, fill = sample_size)) + 
  geom_histogram(binwidth = 0.05, position = "identity") + 
  facet_grid(. ~ sample_size, scale = "free_x") + geom_vline(aes(xintercept = as.numeric(pi)), color = "blue") 
area_hist

Functions can be passed as arguments to other functions. Here’s an example that we will explore.

Newton’s method is a popular numerical method ot find a root of an algebraic equation \(f(x) = 0\). if \(f(x)\) has a derivative \(f'(x)\), then the following iteration will converge to a root of the above equation if started close enough to the root:

\[x_1 = initial \: guess\] \[x_n = x_{n-1} - \frac{f(x_{n-1})}{f'(x_{n-1})}\]

This idea is based on the Taylor approximation:

\[f(x_{n}) \approx f(x_{n-1}) + (x_{n} - x_{n-1})f'(x_{n-1})\]

Suppose \(f(x) = x^3 + 2x^2 - 7\), so that \(f'(x) = 3x^2 + 4x\). The file Newton.R contains some R code to implement Newton’s method for this problem. You may adapt this code as needed to do the following problems.

(a) Write a function that implements Newton’s method for this particular choice of function \(f\). It should have arguments for the initial guess and the tolerance for convergence (set to 0.00001 in Prof. Lucas’s code). Pay special attention to whether either of these should have default values. The function should return a single number and not plot anything.

newt <- function(x, tol) {
  fx <- x^3 + 2*x^2 - 7
  while(abs(fx) > tol) {
    fpx <- 3*x^2 + 4*x
    x <- x - fx/fpx
    fx <- x^3 + 2*x^2 - 7
    
}
return(x) # (approximate) root
}
newt(2, 0.00001)

## [1] 1.428818

(b) Write two R functions for \(f\) and \(f'\). Each should take argument x.

f <- function(x) {
fx <- x^3 + 2*x^2 - 7
return(fx)
} 
f(3)

## [1] 38

g <- function(x) {
fpx <- 3*x^2 + 4*x
return(fpx)
}
g(3)

## [1] 39

(c) Rewrite your function from (a) so that it also takes arguments for the function whose root you want to find and it’s first derivative. You may find it helpful to first use your functions from (b) nad make sure the function still works, then modify the arguments to the function.

newt2 <- function(x, tol, fx, fpx) {
  
  while(abs(fx(x)) > tol) {
    x <- x - fx(x)/fpx(x)
    
  }
return(x) # (approximate) root
}
newt2(2, 0.00001, fx = function(x) {x^3 + 2*x^2 - 7}, fpx = function(x) {3*x^2 + 4*x})

## [1] 1.428818

(d) Choose a different function \(f\) and write R functions for it and the corresponding derivative. Make a plot as Professor Lucas did in Newton.R to verify that the algorithm found a root of your function.

(Note: There are some cases in which Newton’s method will fail to converge, which you will experience as the while loop never ending. If that happens to you, just choose another initial guess and/or function.)

newt2 <- function(x, fx, fpx) {
  
  while(abs(fx(x)) > 0.00001) {
    x <- x - fx(x)/fpx(x)
    df=data.frame(a=x, b=fx(x))
    print(df)
  }
  return(x) 
}

newt2(2, fx = function(x) {x^3 + 2*x^2 - 7}, fpx = function(x) {3*x^2 + 4*x})

##      a        b
## 1 1.55 1.528875
##          a          b
## 1 1.435969 0.08498814
##          a            b
## 1 1.428845 0.0003197702
##          a            b
## 1 1.428818 4.585318e-09

## [1] 1.428818

The value we approximate the function \(f(x) = x^3 + 2x^2 - 7\) converges to with the initial guess of \(x = 2\) leads us to believe that the approximation is \(f(x) = 1.428818\). We can inspect this using ggplot functions in a plot, where the vertical line indicates the xintercept of \(f(x) = 1.428818\).

fx = function(x) {x^3 + 2*x^2 - 7}
fpx = function(x) {3*x^2 + 4*x}

xseq <- seq(0, 2, 0.05)

seq_df <- data.frame(xseq, fx(xseq))
head(seq_df)

##   xseq  fx.xseq.
## 1 0.00 -7.000000
## 2 0.05 -6.994875
## 3 0.10 -6.979000
## 4 0.15 -6.951625
## 5 0.20 -6.912000
## 6 0.25 -6.859375

p <- seq_df %>%
  ggplot(aes(x=xseq, y=fx(xseq)))  +
  geom_point()
p + geom_hline(yintercept = 0) + geom_vline(xintercept = 1.428818, col = "blue")

Stat 135 Assignment 6

Samba Njie Jr

March 1, 2016

Statistics 133 - Writing Function in R