ANLY 505 - Data Simulation in R

Questions

Simulate data for 30 draws from a normal distribution where the means and standard deviations vary among three distributions.

set.seed(2020)
rnorm(30, mean = c(0,10,20), sd = c(1,2,3))

##  [1]  0.3769721 10.6030967 16.7059305 -1.1304059  4.4069314 22.1617205
##  [7]  0.9391210  9.5412445 25.2773940  0.1173668  8.2937544 22.7277775
## [13]  1.1963730  9.2568322 19.6302193  1.8000431 13.4079918 10.8837062
## [19] -2.2889749 10.1166070 26.5230958  1.0981827 10.6364406 19.7805573
## [25]  0.8342687 10.3975013 23.8935242  0.9367183  9.7051336 20.3312960

Simulate 2 continuous variables (normal distribution) (n=20) and plot the relationship between them

x = rnorm(20, mean = 10, sd=1)
y = rnorm(20, mean = 20, sd=2)
plot(y~x)

Simulate 3 variables (x1, x2 and y). x1 and x2 should be drawn from a uniform distribution and y should be drawn from a normal distribution. Fit a multiple linear regression.

x1 = runif(100, min = 0, max = 10)
x2 = runif(100, min = 10, max = 100)
y = rnorm(100)
model=lm(y ~ x1 + x2)
summary(model)

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.55503 -0.65567 -0.02635  0.72341  2.86790 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.490241   0.301358  -1.627   0.1070  
## x1          -0.017637   0.033474  -0.527   0.5995  
## x2           0.010318   0.003941   2.618   0.0103 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.031 on 97 degrees of freedom
## Multiple R-squared:  0.0716, Adjusted R-squared:  0.05245 
## F-statistic:  3.74 on 2 and 97 DF,  p-value: 0.02724

Simulate 3 letters repeating each letter twice, 2 times.

letter_pick=sample(1:26, 3, replace=TRUE)
letter_pick

## [1] 1 5 3

rep(LETTERS[letter_pick],each=2,times=2)

##  [1] "A" "A" "E" "E" "C" "C" "A" "A" "E" "E" "C" "C"

Create a dataframe (n = 27) with 3 groups, 2 factors and two quantitative response variables. Use the replicate function.

data.frame(
  group = rep(c("Group 1", "Group 2", "Group 3"), length.out=27),
  factor = as.factor(rep(LETTERS[24:25], length.out=27)),
  a = rnorm(27, 10, 2),
  b = runif(27, min = 0, max = 10)
  )

##      group factor         a         b
## 1  Group 1      X  9.619306 6.2680398
## 2  Group 2      Y 10.772977 2.8300062
## 3  Group 3      X 15.045563 5.8312475
## 4  Group 1      Y 11.932518 8.2034376
## 5  Group 2      X 10.284003 8.5307469
## 6  Group 3      Y  8.692623 4.1025107
## 7  Group 1      X 13.688665 7.0722686
## 8  Group 2      Y  8.659964 8.8143742
## 9  Group 3      X 10.660699 2.7943871
## 10 Group 1      Y 11.372948 9.3593118
## 11 Group 2      X 13.377795 1.6994108
## 12 Group 3      Y 10.646845 9.2477975
## 13 Group 1      X  8.434141 4.6915996
## 14 Group 2      Y 11.725131 9.9002539
## 15 Group 3      X  8.653601 2.7236549
## 16 Group 1      Y 11.034717 9.8671852
## 17 Group 2      X 12.944128 1.9130525
## 18 Group 3      Y  9.899534 8.0183751
## 19 Group 1      X 11.688621 4.7926237
## 20 Group 2      Y  5.057191 5.4103349
## 21 Group 3      X 11.265932 7.9785120
## 22 Group 1      Y  8.008532 4.2869039
## 23 Group 2      X 11.361114 7.1188112
## 24 Group 3      Y 12.380601 5.2609393
## 25 Group 1      X 10.845043 0.7032919
## 26 Group 2      Y 11.668283 5.1255029
## 27 Group 3      X 11.567692 4.8654911

ANLY 505 - Data Simulation in R

Week 1

Yiwen Zhao

1/14/2020

Directions

Questions