hw_1_carolinezanardi.knit

Practice with for loops

Title: “HW 1” Author: “Caroline Zanardi”

Question 1

x <- seq(from=0, to=1, by=0.1)
onethird_x <- rep(0, length(x))
for(i in 1:length(x)){
  onethird_x[i] <- x[i]**(1/3)
}
onethird_x

##  [1] 0.0000000 0.4641589 0.5848035 0.6694330 0.7368063 0.7937005 0.8434327
##  [8] 0.8879040 0.9283178 0.9654894 1.0000000

Question 2

x <- seq(from=0, to=2, by=0.05)
onethird_x <- rep(0, length(x))
for(i in 1:length(x)){
  onethird_x[i] <- x[i]**(1/3)
}
onethird_x

##  [1] 0.0000000 0.3684031 0.4641589 0.5313293 0.5848035 0.6299605 0.6694330
##  [8] 0.7047299 0.7368063 0.7663094 0.7937005 0.8193213 0.8434327 0.8662391
## [15] 0.8879040 0.9085603 0.9283178 0.9472682 0.9654894 0.9830476 1.0000000
## [22] 1.0163964 1.0322801 1.0476896 1.0626586 1.0772173 1.0913929 1.1052094
## [29] 1.1186889 1.1318512 1.1447142 1.1572945 1.1696071 1.1816658 1.1934832
## [36] 1.2050711 1.2164404 1.2276010 1.2385623 1.2493330 1.2599210

Question 3

x <- seq(from=0, to=1, by=0.1)
onethird_x <- x**(1/3)
onethird_x

##  [1] 0.0000000 0.4641589 0.5848035 0.6694330 0.7368063 0.7937005 0.8434327
##  [8] 0.8879040 0.9283178 0.9654894 1.0000000

Question 4

set.seed(21)
nsim <- 1000 #number of simulations
marker_position <- 0
results <- rep(NA,nsim)  #vector to store results

for(i in 1:nsim) {

while (marker_position < .5 && marker_position > -.5) {
  robot_a <- runif(n=1,min=0,max=0.50) #magnitude of robot A
  robot_a
  marker_position <- marker_position + robot_a # add magnitude of robot A to marker
  robot_b <- runif(n=1,min=0,max=0.50) #magnitude of robot B
  robot_b
  marker_position <- marker_position - robot_b # subtract magnitude of robot B to marker
}
  results[i] <- marker_position > .5 #store results for robot A's wins
}
mean(results) #take mean of robot A's wins

## [1] 1

Question 5

Report the results of 1000 simulated robot tug of war battles. Is the game fair? If not, what adjustments can be made to make it more fair?

The results of 1000 simulated robot tug of war battles is 1 so for this seed of data simulations, robot A wins every time. This is not a fair game because robot A always starts the pull so it has a larger chance of winning the game every time. To make the game more fair, each robot would have an equal chance of starting the pull first in the 1000 simulated games.

Question 6

What are some reasons researchers use simulation studies?

Researchers can use simulation studies for understanding the practical importance or validity of statistical methods/explanations under various conditions and scenarios, realizing the logical soundness of the methods used to obtain results or potential mistakes during research, and when real-life data cannot be obtaind.

Question 7

According to the paper, what are the five components (abbreviated ADEMP) involved in planning a simulation study? Summarize each of the five components.

ADEMP stands for aims, data generation, estimand/target, methods, and performance measures. Aims requires the statistican to understand the motivations for the study and objectives behind it. Data generation means how is the data for the study is simulated. Estimand/target asks the statistician what they are estimating from the data generation of the simulation. Methods asks the statistican which method(s) are being used to fit the model and make an estimation of the target. Performance measures asks how the performance of method used is being evaluated.

Question 8

In class, we started designing a simulation study to investigate the importance of the normality assumption in simple linear regression. For this simulation, describe each of the ADEMP components.

The aim of the simulation study was to test the importance of the normality assumption as a needed assumption for a linear regression model and its hypothesis tests. The data generation for the study was using different distributions such as uniform, normal, gamma, exponential, ect. The estimand/target of the study was B1, the slope in a linear regression model. The method for the study was to simulate the fit of linear regression model in R and calculate a 95% confidence interval of the linear regression. The performance measure of the study was the observed coverage of the confidence interval of different distributions and if the result was the intended 95%.

Question 9

Use the ADEMP framework to plan a simulation study to explore the constant variance assumption. That is, you should describe: The aims of your study, How you will generate the data, What quantity from the regression model you will estimate, How you will conduct the simulations (the software you will use, how you will calculate your estimates, etc.),The performance measure you will use (Hint: use the simulation from class as a guideline!). You do not need to implement any of the simulations to answer this question.

The aim of the simulation study is to test the importance of the constant variance assumption as a needed assumption for a linear regression model and on its hypothesis test. The data generation for the study is to use different standard deviations on the normal noise term. The estimand/target of the study is B1, the slope in a linear regression model. The method used is to to simulate the fit of linear regression model in R and calculate a 95% confidence interval of the linear regression. The performance measure of the study is the observed coverage of the confidence interval of different distributions and if the result was the intended 95% for different standard deviations.

Question 10

n <- 100
beta0 <- 0.5
beta1 <- 1
x <- runif(n, min=0, max=1)
noise <- rnorm(n, mean=0, sd=x)
y <- beta0 + beta1*x + noise
plot(x,y) #plot y vs. x