PSYC 4350: HW 03

Due date

Tuesday, February 07, by the start of class time (2:00pm).

Instructions

Use of R Markdown is optional for this assignment. Problems 1–4 were given as in-class exercises on 01/27.

Problem 1

Generate n = 50 random values from an exponential distribution with rate parameter = 10. Generate a plot of the exponential PDF with the data points overlaid (see PDF for desired output).

If you want your data to be identical to mine, add set.seed(42) before you generate the random values. This will initialize R’s random number generator so that it produces the same ‘random’ values on different computers.

Hint: Use ggplot() + stat_function() + geom_point() + geom_segment()

library(ggplot2)
set.seed(42) 

x <- rexp(50, rate = 10)
#x <- rexp(50, rate = 10) #generates 0 - 50 numbers at a rate of 10
#cat(x)

x <- data.frame(x)

print(ggplot(x, aes(x = x, y = 0.09, xend = x, yend = 0)) #takes in data frame, specifies parameters for x and y values (necessary for geom functions)
      + stat_function(fun = dexp,   # plot function of exponential distribution
                      geom = "area", 
                      color = "blue", 
                      fill = "blue", 
                      alpha = 0.5) +
        xlim(c(0,5)) +
        geom_point() +
        geom_segment()
      )

## Warning: The following aesthetics were dropped during statistical transformation: xend
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

#fig <- ggplot(data.frame(x, aes(x))) + stat_function(fun = dnorm, geom = "polygon")
#fig <- ggplot(x) + stat_function() + geom_point() + geom_segment()
#print(fig)

#the graph is incorrect, and Im not sure how to use geom_point() and geom_segment() without breaking the code and/or adding a Y value

Problem 2

The data for problem 1 was generated from an exponential distribution with \(\lambda = 10\). Suppose we only had the data, but didn’t know what distribution they came from or the true parameters used to generate the data.

Part a: What is the maximum likelihood estimate (MLE) for the rate parameter of an exponential distribution, given the data from problem 1?

# Figure out the value of lambda, that makes the value as large as possible

mle <- 1/mean(x) # wikipedia formula

## Warning in mean.default(x): argument is not numeric or logical: returning NA

Part b: Assuming instead the data is generated from a normal distribution, what are the maximum likelihood estimates for \(\mu\) and \(\sigma\)?

# this is preventing my code from rendering to html 

like_func <- function(x) {
  -sum(dnorm(x, mean(x), sd(x), log = TRUE)) #mu = mean of x values, sigma is the standard dev 
}

print(like_func(x))

Problem 3

Write a function that takes two arguments: lambda, and a vector of observations, and returns the log-likelihood of the observations according to an exponential distribution with the given value of lambda.

my_data <- x

expo <- function(lambda, my_data) {
  sum(dexp(my_data, rate = lambda, log = TRUE))
}
# take the SUM  of the value not the product
# we want the log liklihood
 #lambda is some value, dexp is the vector of values


#Results <- integrate(integrand, 0, Inf)

Problem 4

Plot the log-likelihood function for the data you generated in problem (1) That is, generate a plot with \(\lambda\) on the x-axis, and exponential_log_likelihood(\(\lambda\)) on the y-axis.

Make sure the x and y limits of your plot show the peak of the log-likelihood function.

Hint #1: ggplot( ) + stat_function( )
Hint #2: Your function exponential_log_likelihood() function needs to be vectorized—that is, given a vector of lambda values, it should return a vector of log-likelihood values.

exponential_log_likelihood <- functions{
  v <- vector(size(0))  #if I were to do this properly, I would fill this empty vector with lambda values, then use this function to calculate and return the log-likelihood values of the previous values.
  return(v)
}

Problem 5

The p.d.f. for an exponential distribution is defined as:

\[p(x) = \lambda e^{-\lambda x}\]

for \(x \in \left[0, \infty \right)\).

Using numerical integration, what is the probability that \(x \in \left(2, 5\right)\) when \(\lambda = 1\)?

Problem 6

Repeat the previous problem, except this time use the cumulative distribution function (c.d.f.) to determine your answer instead of numerical integration.

Problem 7

Using numerical integration, compute the variance of an exponential distribution with rate parameter \(\lambda = 2\). (You can check your answer against Wikipedia.)

#Variance == 1/lambda^2
#x <- rexp(50)

#Variance <- functions{
#  (x - ( 1 / lambda))^2 
#}

#Variance(x)

Problem 8

The skewness of a probability distribution is defined as:

\[ \gamma = \mathrm{E}\left[\left(\frac{x - \mu}{\sigma} \right)^3\right]\]

where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the distribution. Using this definition, compute the skewness of an exponential distribution with rate parameter \(\lambda = 2\). (As before, you can check your answer against Wikipedia.)