Introduction

The stimulated data has 6 environmental variables and 25 genetic variables. The genetic variables are 0-1 indicator variables.The purpose of this report is to find whether there is an association of the outcome variables after controlling the environemntal variables. If there is an association amongst the variables we will find the best fitted model. The following is the summary statistics for each of the variables:

## Warning: package 'knitr' was built under R version 3.5.3
E mean median std min max lower quantile upper quantile count of NA
E1 -0.0229609 -0.0439624 0.9775116 -2.843948 2.877780 -0.6748619 0.6982267 20
E2 0.0458942 0.0752685 0.9706750 -2.829943 2.961952 -0.5415260 0.6797238 20
E3 -0.0026796 0.0430101 1.0445554 -3.004356 3.034495 -0.6464097 0.6679667 24
E4 0.0351207 0.0338548 0.9851153 -2.864615 2.945836 -0.6392719 0.7310173 16
E5 0.0353620 0.0675424 0.9972876 -3.124922 2.733572 -0.5943247 0.7181983 19
E6 0.0025000 0.0319514 0.9755737 -2.790353 2.867483 -0.6562792 0.6805751 17
Y mean median std min max lower quantile upper quantile count of NA
25% Y 100.2466 100.1036 6.739156 80.36916 123.8702 96.15011 104.76 156

Methodology

My methodology for dealing with missing data is to delete the observations listwise by using na.omit . The amount of missing values are below 25 (for E and G), which can still produce biased parameters and estimates, but I will take that into account during my analysis of my final model.