ANLY 505 - Data Simulation in R

Questions

Simulate data for 30 draws from a normal distribution where the means and standard deviations vary among three distributions.

# place the code to simulate the data here

set.seed(22)
x = rnorm(30,mean=c(0,3,8),sd=c(1,8,20))

x

##  [1]  -0.512139088  22.881469423  28.156523005   0.292814572   1.328325111
##  [6]  45.161847796  -0.066026405   1.697880386   4.002786402   0.300561734
## [11]  -3.111258260   9.639238076   0.743028275   2.327822449  -7.857890331
## [16]  -0.922153631   9.892499028  48.058843753   0.936551013  -9.925878973
## [21]  -3.501131784  -0.003973089  -2.408900826 -12.992565506  -0.543280568
## [26]   7.449156243  13.056754340  -0.901814675   9.595130850 -23.205595046

Simulate 2 continuous variables (normal distribution) (n=20) and plot the relationship between them

# place the code to simulate the data here
set.seed(23)

x1 = rnorm(20, 24, 6)
x2 = rnorm(20, 27, 6) 

plot(x1, x2, main="Scatterplot x1~x2",
   xlab="x1 ", ylab="x2 ", pch=19)
abline(lm(x2 ~ x1), col = "blue")

Simulate 3 variables (x1, x2 and y). x1 and x2 should be drawn from a uniform distribution and y should be drawn from a normal distribution. Fit a multiple linear regression.

# place the code to simulate the data here
set.seed(24)

y = rnorm(30, 0, 1)
x1 = runif(30, min=0, max=1) 
x2 = runif(30, min=0, max=1) 

z <- lm(y ~ x1 + x2) #linear regression model
summary(z)

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.44221 -0.35273 -0.04635  0.41187  1.79593 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.4573     0.3449  -1.326    0.196
## x1            0.1208     0.4615   0.262    0.795
## x2            0.4671     0.4693   0.995    0.328
## 
## Residual standard error: 0.7323 on 27 degrees of freedom
## Multiple R-squared:  0.03786,    Adjusted R-squared:  -0.03341 
## F-statistic: 0.5312 on 2 and 27 DF,  p-value: 0.5939

Simulate 3 letters repeating each letter twice, 2 times.

# place the code to simulate the data here

rep(letters[1:3],each=2,times=2)

##  [1] "a" "a" "b" "b" "c" "c" "a" "a" "b" "b" "c" "c"

Create a dataframe (n = 27) with 3 groups, 2 factors and two quantitative response variables. Use the replicate function.

# place the code to simulate the data here

set.seed(24)
group = rep(letters[1:3],length.out=27)
factor= rep(letters[4:5], length.out=27)
response = replicate(n=2, expr = rnorm(27, mean = 0, sd = 1))
data.frame(group, factor, response)

##    group factor           X1          X2
## 1      a      d -0.545880758  0.11953107
## 2      b      e  0.536585304 -0.11629639
## 3      c      d  0.419623149 -0.94382724
## 4      a      e -0.583627199 -0.03373792
## 5      b      d  0.847460017 -0.58542756
## 6      c      e  0.266021979  0.61285136
## 7      a      d  0.444585270  1.51712249
## 8      b      e -0.466495124  0.65738044
## 9      c      d -0.848370044 -1.07418134
## 10     a      e  0.002311942 -4.46956441
## 11     b      d -1.316908124  0.36904502
## 12     c      e  0.598269113  0.16922669
## 13     a      d -0.762214370 -1.82219032
## 14     b      e -1.429090303  0.06735770
## 15     c      d  0.332244449  0.01710596
## 16     a      e -0.469060688 -0.34365937
## 17     b      d -0.334986794 -0.66789220
## 18     c      e  1.536252156 -0.25574457
## 19     a      d  0.609994533 -0.46120796
## 20     b      e  0.516335698  1.47164158
## 21     c      d -0.074308561 -0.09196032
## 22     a      e -0.605156946  0.33519430
## 23     b      d -1.709645185 -0.23186459
## 24     c      e -0.268693105  0.52665256
## 25     a      d -0.648591507 -1.07362607
## 26     b      e -0.094110127  0.76968188
## 27     c      d -0.085540951  1.77090538

ANLY 505 - Data Simulation in R

Week 1

Karoli Sakwa

2020-01-16

Directions

Questions