ANLY 505 - Problem Set #1

Questions

Simulate data for 30 draws from a normal distribution where the means and standard deviations vary among three distributions.

# setting seed
set.seed(1)
data = rnorm(30, mean = c(0,10,100), sd = c(.1,1,10))
data

##  [1]  -0.062645381  10.183643324  91.643713876   0.159528080  10.329507772
##  [6]  91.795316159   0.048742905  10.738324705 105.757813517  -0.030538839
## [11]  11.511781168 103.898432364  -0.062124058   7.785300113 111.249309181
## [16]  -0.004493361   9.983809737 109.438362107   0.082122120  10.593901321
## [21] 109.189773716   0.078213630  10.074564983  80.106483041   0.061982575
## [26]   9.943871260  98.442044933  -0.147075238   9.521849945 104.179415602

Simulate 2 continuous variables (normal distribution) (n=20) and plot the relationship between them

set.seed(2)
var1 = rnorm(20, mean = 0, sd = 1)
var1

##  [1] -0.89691455  0.18484918  1.58784533 -1.13037567 -0.08025176
##  [6]  0.13242028  0.70795473 -0.23969802  1.98447394 -0.13878701
## [11]  0.41765075  0.98175278 -0.39269536 -1.03966898  1.78222896
## [16] -2.31106908  0.87860458  0.03580672  1.01282869  0.43226515

set.seed(3)
var2 = rnorm(20, mean = 0, sd = 1)
var2

##  [1] -0.96193342 -0.29252572  0.25878822 -1.15213189  0.19578283
##  [6]  0.03012394  0.08541773  1.11661021 -1.21885742  1.26736872
## [11] -0.74478160 -1.13121857 -0.71635849  0.25265237  0.15204571
## [16] -0.30765643 -0.95301733 -0.64824281  1.22431362  0.19981161

# plot variables
plot(var1, var2)

# As expected, there is no relationship between the two variables; each value from variable 1 has no effect on the values for variable 2

Simulate 3 variables (x1, x2 and y). x1 and x2 should be drawn from a uniform distribution and y should be drawn from a normal distribution. Fit a multiple linear regression.

# place the code to simulate the data here
set.seed(4)
x1 = runif(50, min = 0, max = 100)  # uniform dist
set.seed(5)
x2 = runif(50, min = 1, max = 10)  # uniform dist
set.seed(6)
y = rnorm(50, mean = 10, sd = 2)  # normal dist
# Fit linear model to data
model = lm(y ~ x1 + x2)
summary(model)

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3390 -1.3065 -0.2151  1.2315  4.4841 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.83193    0.83321  11.800 1.18e-15 ***
## x1           0.01138    0.01081   1.053    0.298    
## x2          -0.05831    0.12012  -0.485    0.630    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.118 on 47 degrees of freedom
## Multiple R-squared:  0.02399,    Adjusted R-squared:  -0.01755 
## F-statistic: 0.5776 on 2 and 47 DF,  p-value: 0.5652

# Neither x1 nor x2 explain the variance in y; this makes sense as they are all pulled from separate distributions

Simulate 3 letters repeating each letter twice, 2 times.

rep(letters[c(19,23,9)], each = 2, times = 2)

##  [1] "s" "s" "w" "w" "i" "i" "s" "s" "w" "w" "i" "i"

# using my initials s,w,i

Create a dataframe with 3 groups, 2 factors and two quantitative response variables. Use the replicate function (n = 25).

set.seed(7)
data.frame(Group = rep(LETTERS[1:2], length.out = 25), Response1 = rnorm(25, mean = 0, sd = 1), Response2 = rnorm(25, mean = 50, 
                                                                                                                  sd= 5))

##    Group    Response1 Response2
## 1      A  2.287247161  50.92096
## 2      B -1.196771682  53.76140
## 3      A -0.694292510  52.95873
## 4      B -0.412292951  45.08474
## 5      A -0.970673341  48.61968
## 6      B -0.947279945  45.64574
## 7      A  0.748139340  53.59355
## 8      B -0.116955226  50.55326
## 9      A  0.152657626  49.60767
## 10     B  2.189978107  47.89755
## 11     A  0.356986230  47.18937
## 12     B  2.716751783  54.98757
## 13     A  2.281451926  44.47435
## 14     B  0.324020540  49.28856
## 15     A  1.896067067  51.57497
## 16     B  0.467680511  56.09275
## 17     A -0.893800723  46.50341
## 18     B -0.307328300  48.57284
## 19     A -0.004822422  43.44224
## 20     B  0.988164149  48.04494
## 21     A  0.839750360  47.99237
## 22     B  0.705341831  56.75259
## 23     A  1.305964721  52.95595
## 24     B -1.387996217  50.50263
## 25     A  1.272916864  54.65536

# Since n=25, we will end up with a data frame that has 2 uneven groups, 13 A's and 12 B's with 2 Responses each

ANLY 505 - Problem Set #1

Shaun Irvin

June 4, 2019

Directions

Questions