Directions

The objective of this assignment is to introduce you to R and R markdown and to complete some basic data simulation exercises.

Please include all code needed to perform the tasks. This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Moodle. Please make sure that this link is hyperlinked and that I can see the visualization and the code required to create it.

Questions

  1. Simulate data for 30 draws from a normal distribution where the means and standard deviations vary among three distributions.
# place the code to simulate the data here
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.3
set.seed(1)

data =rnorm(30,mean=c(0,1,10),sd=c(1,10,100))
data
##  [1]   -0.62645381    2.83643324  -73.56286124    1.59528080    4.29507772
##  [6]  -72.04683841    0.48742905    8.38324705   67.57813517   -0.30538839
## [11]   16.11781168   48.98432364   -0.62124058  -21.14699887  122.49309181
## [16]   -0.04493361    0.83809737  104.38362107    0.82122120    6.93901321
## [21]  101.89773716    0.78213630    1.74564983 -188.93516959    0.61982575
## [26]    0.43871260   -5.57955067   -1.47075238   -3.78150055   51.79415602
  1. Simulate 2 continuous variables (normal distribution) (n=20) and plot the relationship between them
# place the code to simulate the data here

set.seed(2)

x_cord = rnorm(20,0,1)
set.seed(3)

y_cord = rnorm(20,0,1)

plot(y_cord~x_cord)

  1. Simulate 3 variables (x1, x2 and y). x1 and x2 should be drawn from a uniform distribution and y should be drawn from a normal distribution. Fit a multiple linear regression.
# place the code to simulate the data here

set.seed(4)

x1 = runif(100,1,10)

x1
##   [1] 6.272203 1.080512 3.643657 3.496375 8.322168 3.343850 7.519653
##   [8] 9.154829 9.541362 1.658300 7.792075 3.574006 1.900482 9.586619
##  [15] 4.740464 5.095922 9.739501 6.255892 9.659842 7.855322 7.430577
##  [22] 9.969516 5.556438 5.409489 6.842452 8.477258 5.337991 8.575716
##  [29] 5.623316 5.767998 6.104010 3.150540 8.901963 6.890698 5.341338
##  [36] 9.739268 5.140233 6.598508 4.495762 1.059335 9.444872 3.179540
##  [43] 6.090351 2.628092 9.140494 1.758622 9.110545 9.024503 7.510374
##  [50] 6.083289 4.497490 7.713360 9.058789 8.281691 8.366917 4.790800
##  [57] 2.590268 2.564079 9.032912 7.690007 6.048855 1.641104 8.682263
##  [64] 9.223399 3.028251 6.657041 1.622305 5.633686 8.248756 9.746105
##  [71] 4.086237 6.682271 4.690488 4.111247 8.423856 7.209105 3.885805
##  [78] 4.971967 3.356969 2.198840 9.194308 7.382511 6.148237 9.252714
##  [85] 9.117191 1.527598 1.411309 9.899277 2.852425 9.361504 2.748732
##  [92] 2.141208 5.854319 3.451034 4.182280 1.756944 7.888939 5.000374
##  [99] 1.325237 7.311602
# Adding variable x2

x2 = runif(100,100,200)
x2
##   [1] 125.3427 162.9631 126.6404 153.2446 146.8265 157.4397 166.8445
##   [8] 121.2114 197.5632 180.9383 127.5334 139.9124 129.0300 149.2993
##  [15] 175.6775 151.0697 143.8149 179.0914 191.1315 107.1074 114.2642
##  [22] 139.3073 185.6438 126.4254 109.4638 172.4334 198.0477 120.0600
##  [29] 155.2264 134.5464 140.8390 118.8031 134.5553 118.3656 181.3115
##  [36] 188.1635 170.0615 127.6427 143.2004 101.2816 156.3038 167.0700
##  [43] 131.3602 194.7544 116.8803 144.0414 157.1627 197.4654 176.4771
##  [50] 144.3055 135.5862 103.6457 159.3880 196.1553 125.2730 184.7206
##  [57] 121.2781 127.7891 147.9385 178.0000 190.0940 194.3461 141.5215
##  [64] 133.4110 128.2732 107.2222 107.0682 115.8950 115.0863 175.2092
##  [71] 109.5687 161.1492 120.0944 150.3318 112.9100 100.2923 164.3842
##  [78] 124.1036 142.0035 171.5428 110.0685 160.6837 121.2432 189.4741
##  [85] 156.3198 124.7725 173.0656 126.6332 175.4257 195.7943 148.1236
##  [92] 190.9495 199.0105 123.2417 128.1779 170.3925 183.3597 146.5679
##  [99] 139.0684 109.3888
# Adding y variable

y = rnorm(100,0,1)

y
##   [1]  0.6848019360 -0.1151135095 -0.3564751798 -0.1057716076  0.0448827901
##   [6] -1.7261732320  1.5557870203  0.7764126917 -1.0985075088 -1.7280197536
##  [11]  0.4276382246  0.7445646452  0.8652207970  0.3053288101 -0.1140227912
##  [16]  0.4236522402 -0.7977096868 -0.6041972494  1.7150105938 -0.7159482778
##  [21] -0.1332356122 -0.9997650626  1.8737601171 -0.3373884320  0.9732702887
##  [26]  0.9878279309 -0.9412566085  0.3491855939 -0.5944186817 -2.3822428313
##  [31]  1.0780189737  0.6682451050 -0.9646256667 -1.9752373319 -0.5847739007
##  [36]  0.9692770362  0.5522923259 -0.0821555007 -1.6767137584  1.2126074270
##  [41]  1.0004998710  0.7193289908 -0.8443641520  0.6219853903 -0.7226137804
##  [46] -0.4494786251 -1.1955060501  0.3904723630 -0.5163766426  0.9098689779
##  [51]  0.8769846530 -0.8161958099  1.5392932699  1.3745257156 -0.4832487112
##  [56]  0.5503499503 -0.8573656630 -0.7069613662 -2.0970775334  1.0994367548
##  [61]  0.3420340890  0.4908294804 -0.9319990260 -1.4278919839  0.9757650946
##  [66] -1.5463411878  0.0177034792 -0.7747174012 -0.2293422872 -0.2743821044
##  [71]  1.7960637815 -0.4781128994 -0.5947628530 -2.2579382170  1.6826072118
##  [76]  0.0722906844 -0.4400240932  0.6265733926 -0.7997960594 -1.1279860222
##  [81] -1.0250160534  0.0710717295  0.3817111616 -1.6225883175  1.9005426699
##  [86] -0.7161791664  0.3804596689  0.4408428474  0.2573258583 -0.1794485371
##  [91] -0.6901276793 -0.0004228025  0.5655808964 -1.2087470098 -0.3461711560
##  [96] -0.6501970444 -0.8895916708  1.4770298873 -1.1954751385  1.7504948348
mod_reg = lm(y~x1+x2)
mod_reg
## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Coefficients:
## (Intercept)           x1           x2  
##   -0.597013     0.023268     0.002585
# Plot for regression

plot(mod_reg)

  1. Simulate 3 letters repeating each letter twice, 2 times.
# place the code to simulate the data here

rep(letters[1:3],each=2,times=2)
##  [1] "a" "a" "b" "b" "c" "c" "a" "a" "b" "b" "c" "c"
  1. Create a dataframe (n = 27) with 3 groups, 2 factors and two quantitative response variables. Use the replicate function.
# place the code to simulate the data here
var1 =data.frame(group=rep(letters[1:3]),factor=rep(letters[4:5]),
           x=rnorm(6,0,1),y=rnorm(6,1,2))

print(var1)
##   group factor          x          y
## 1     a      d  1.2147301 -3.8441746
## 2     b      e -1.5478003  2.1126566
## 3     c      d -0.3022460  3.2110678
## 4     a      e  1.0392077  1.3328736
## 5     b      d -0.7678417  0.5490754
## 6     c      e  1.5246726  0.5431753
# Using the replicate function 

var_replicate =replicate(5, expr = data,simplify = FALSE)
print(var_replicate)
## [[1]]
##  [1]   -0.62645381    2.83643324  -73.56286124    1.59528080    4.29507772
##  [6]  -72.04683841    0.48742905    8.38324705   67.57813517   -0.30538839
## [11]   16.11781168   48.98432364   -0.62124058  -21.14699887  122.49309181
## [16]   -0.04493361    0.83809737  104.38362107    0.82122120    6.93901321
## [21]  101.89773716    0.78213630    1.74564983 -188.93516959    0.61982575
## [26]    0.43871260   -5.57955067   -1.47075238   -3.78150055   51.79415602
## 
## [[2]]
##  [1]   -0.62645381    2.83643324  -73.56286124    1.59528080    4.29507772
##  [6]  -72.04683841    0.48742905    8.38324705   67.57813517   -0.30538839
## [11]   16.11781168   48.98432364   -0.62124058  -21.14699887  122.49309181
## [16]   -0.04493361    0.83809737  104.38362107    0.82122120    6.93901321
## [21]  101.89773716    0.78213630    1.74564983 -188.93516959    0.61982575
## [26]    0.43871260   -5.57955067   -1.47075238   -3.78150055   51.79415602
## 
## [[3]]
##  [1]   -0.62645381    2.83643324  -73.56286124    1.59528080    4.29507772
##  [6]  -72.04683841    0.48742905    8.38324705   67.57813517   -0.30538839
## [11]   16.11781168   48.98432364   -0.62124058  -21.14699887  122.49309181
## [16]   -0.04493361    0.83809737  104.38362107    0.82122120    6.93901321
## [21]  101.89773716    0.78213630    1.74564983 -188.93516959    0.61982575
## [26]    0.43871260   -5.57955067   -1.47075238   -3.78150055   51.79415602
## 
## [[4]]
##  [1]   -0.62645381    2.83643324  -73.56286124    1.59528080    4.29507772
##  [6]  -72.04683841    0.48742905    8.38324705   67.57813517   -0.30538839
## [11]   16.11781168   48.98432364   -0.62124058  -21.14699887  122.49309181
## [16]   -0.04493361    0.83809737  104.38362107    0.82122120    6.93901321
## [21]  101.89773716    0.78213630    1.74564983 -188.93516959    0.61982575
## [26]    0.43871260   -5.57955067   -1.47075238   -3.78150055   51.79415602
## 
## [[5]]
##  [1]   -0.62645381    2.83643324  -73.56286124    1.59528080    4.29507772
##  [6]  -72.04683841    0.48742905    8.38324705   67.57813517   -0.30538839
## [11]   16.11781168   48.98432364   -0.62124058  -21.14699887  122.49309181
## [16]   -0.04493361    0.83809737  104.38362107    0.82122120    6.93901321
## [21]  101.89773716    0.78213630    1.74564983 -188.93516959    0.61982575
## [26]    0.43871260   -5.57955067   -1.47075238   -3.78150055   51.79415602