Instruction

  1. The due date for this final project is 12/13/2024 Friday noon. You need to email both RMD file and PDF to me by the due date. Please make your final name to be Final Project_Your First Name_Your Last Name.RMD and Final Project_Your First Name_Your Last Name.PDF. For example, mine would be Final Project_Hunter Park_RMD and Final Project_Hunter Park_PDF. Otherwise, you will lose some points. Total point is 100 and you will lose 1 point per minute after the due date. For example, if you submit it 12:30 on 12/13, your maximum score is 70/100. In short, be punctual and hand it in on time.

  2. This final project should be your original work. You should not work with others or seek help from internet such as Google or ChatGPT. There will be serious penalty for suspicious solutions (solutions that are too good to be done by undergraduate students such as using advanced mathematical theory).

  3. Your code must run without errors. If for any reason it does not run and does not produce desired results, it will be counted as zero.

Problem 1. Monte-Carlo Method

Let \(a\) is the month of your birthday and \(b\) is the day of your birthday (if you were born in December 15th, \(a\) is 12 and \(b\) is 15). Suppose that box 1 has \(a\) black balls and \(b\) white balls and box 2 has \(b\) black balls and \(a\) white balls. One ball is picked randomly from box 1 and is moved into box 2. From the box 2, another ball is selected and moved into box 1. Then, one ball is picked randomly from box 1.

1. What is \(a\) and \(b\) for you?

Your answer here.

2. Find the probability that the ball chosen in box 1 is black using probability rules.

Your answer here.

3. Use the Monte-Carlo method to simulate the approximate probability and check that your answer is close to the theoretical probability given above.

# Your code here

Problem 2. Roll a die until the sum of the rolls is prime

In this question, we compute the number of rolls of fair 6-sided dice until the first time the total sum of all rolls is a prime.

1. Make a simple code that detects where a (small) number is prime or not.

# Your code here

2. Generate a random number from 1 to 6 (rolls a fair die) and add those numbers until the sum becomes prime.

# Your code here

3. Do this process sufficiently many times and compute the expected number of rolls until the sum becomes prime.

# Your code here

Problem 3. Central Limit Theorem

1. Generate 1000 samples of integers uniformly chosen from 1 to 100.

# Your code here

2. Draw a histogram of your data (population).

# Your code here

3. Take 500 random samples of size 20 from the data (population). Draw plot and histogram of these samples.

# Your code here

4. Compute the sample mean, sample standard deviation, population mean, and population standard deviaion.

# Your code here

5. Visualize those sample means taken from the population. Also, draw a normal curve with matching mean and variance and check that they are very close to each other.

# Your code here

Problem 4. Overbooking Problem

In this problem, you will find an optimal number of airline tickets that maximizes revenue. Here is a setup. In a certain airline route (The Airbus A380), there are 853 seats and each ticket is sold at $2000. It costs $3550 for the airline for each bumped passenger for accommodations. For simplicity, we assume that each person travels individually rather than in groups, and all airline tickets are sold completely. The probability that each person who purchases a ticket shows up on time at the airport is \(p=92\%\).

Theoretic derivation

1. What is the total revenue (the money that the airline earns) without any overbooking?

Your answer here.

2. Assume that the airline sells \(n\) tickets with \(n\geq 853\) (they decide to overbook). Let \(X\) be the number of customers who show up at the airport on time. What is the distribution of \(X\)?

Your answer here.

3. Let \(Y\) be the amount of money to pay for all bumped passengers. Find an expression for \(Y\) in terms of \(X\).

Your answer here.

4. Find the expression for \(\mathbb{E}[Y]\).

Your answer here.

Monte-Carlo simulation

1. Suppose that the airline sells 890 tickets. Write down a code that computes the expected revenue for the airline. Does the airline earn or lose money on average by selling 37 extra tickets?

# Your code here

2. Now use the previous code to compute the expected revenue for the airline if they sell \(n\) tickets where \(n\) ranges appropriate ranges that contains the optimal number maximizing the revenue. Make a chart that shows the expected revenue for each value of \(n\).

# Your code here

3. What is the optimal value of the number of tickets that maximizes the revenue? You must provide a full revenue chart to show that your answer maximizes the expected revenue.

Your answer here.

Problem 5. Principal Component Analysis

In this homework, you will be working on the iris dataset containing four features (length and width of sepals and petals) of 50 samples of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), and perform PCA analysis.

First, install and load corrr, ggcorrplot, factoextra, FactoMineR and corrplot packages.

# your code here

4. Import and Explore the dataset

Import dataset iris (this can be done by typing attach(iris)). Do a quick data exploration such as summary, head, tail. How many rows and columns does it have?

# your code here

2. Check for null values

The presence of missing values can bias the result of PCA. Therefore, it is highly recommended to perform the appropriate approach to tackle those values. Check if there are any missing values.

# your code here

3. Select only numeric values

PCA can be done only for numeric values. From the data, select only numeric values and perform another quick data exploration. Then, normailze the data using scale function.

# your code here

4. Perform PCA using princomp function

Perform the PCA using princomp function.

# your code here

Then, decide the number of principal components consisting of more than 95% of the data.

# your code here

5. Visualization of the principal components

Vidualize the those components chosen above using fviz_eig, fviz_pca_var, fviz_cos2, fviz_pca_var functions.

# your code here

Problem 6. Regression with One Variable

A sample of 10 billionaires is selected, and the person’s age and net worth are compared. The data are given here.

Using the data set provided, do the following:

  1. Load in the data and create the least-squares regression line using “lm()”.
x <- c(56,39,42,60,84,37,68,66,73,55)
y <- c(18,14,12,14,11,10,10,7,7,5)
# Your code here
  1. Using your model, predict the net worth for a billionaire who is 40 years old. (Hint: use the predict command with the second argument “data.frame(x=40)”)