Directions

The objective of this assignment is to introduce you to R and R markdown and to complete some basic data simulation exercises.

Please include all code needed to perform the tasks. This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Moodle. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

  1. Simulate data for 30 draws from a normal distribution where the means and standard deviations vary among three distributions.
# place the code to simulate the data here
a <- rnorm(n = 30, mean = 120, sd = 5)
b <- rnorm(n = 30, mean = 155, sd = 8)
c <- rnorm(n = 30, mean = 200, sd = 15)
par(mfrow = c(2,2))
plot(a)
plot(b)
plot(c)

  1. Simulate 2 continuous variables (normal distribution) (n=20) and plot the relationship between them
# place the code to simulate the data here
a1 <- rnorm(n = 20, mean = 100, sd = 20)
a2 <- rnorm(n = 20, mean = 25, sd = 5)
plot(a2,a1)

  1. Simulate 3 variables (x1, x2 and y). x1 and x2 should be drawn from a uniform distribution and y should be drawn from a normal distribution. Fit a multiple linear regression.
# place the code to simulate the data here
x1 <- runif(n = 30)
x2 <- runif(n = 30)
y <- rnorm(n = 30)

model <- lm(y ~ x1 + x2)
summary(model)
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.87863 -0.49648  0.09039  0.68625  1.48071 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   0.0827     0.4796   0.172    0.864
## x1            0.4306     0.6370   0.676    0.505
## x2           -0.4787     0.6028  -0.794    0.434
## 
## Residual standard error: 0.8485 on 27 degrees of freedom
## Multiple R-squared:  0.04452,    Adjusted R-squared:  -0.02626 
## F-statistic: 0.629 on 2 and 27 DF,  p-value: 0.5408
par(mfrow = c(2,2))
plot(model)

  1. Simulate 3 letters repeating each letter twice, 2 times.
# place the code to simulate the data here
a <- sample(1:24, 1)
letters[rep(seq(from = a, to = a + 2), 2)]
## [1] "k" "l" "m" "k" "l" "m"
  1. Create a dataframe (n = 27) with 3 groups, 2 factors and two quantitative response variables. Use the replicate function.
# place the code to simulate the data here
df <- data.frame(group = rep(c('group1', 'group2', 'group3'), 4), factor = rep(c('Male', 'Female'), 6),
           age=sample(25:54, 12), salary = sample(10000:35000, 12))
df
##     group factor age salary
## 1  group1   Male  35  24483
## 2  group2 Female  32  29740
## 3  group3   Male  41  15459
## 4  group1 Female  27  34757
## 5  group2   Male  48  14310
## 6  group3 Female  25  21843
## 7  group1   Male  37  30195
## 8  group2 Female  45  14308
## 9  group3   Male  51  21974
## 10 group1 Female  53  11856
## 11 group2   Male  47  22309
## 12 group3 Female  36  31492