Introduction
The stimulated data has 6 environmental variables and 25 genetic variables. The genetic variables are 0-1 indicator variables.The purpose of this report is to find whether there is an association of the outcome variables after controlling the environemntal variables. If there is an association amongst the variables we will find the best fitted model. The following is the summary statistics for each of the variables:
## Warning: package 'knitr' was built under R version 3.5.3
| E | mean | median | std | min | max | lower quantile | upper quantile | count of NA |
|---|---|---|---|---|---|---|---|---|
| E1 | -0.0229609 | -0.0439624 | 0.9775116 | -2.843948 | 2.877780 | -0.6748619 | 0.6982267 | 20 |
| E2 | 0.0458942 | 0.0752685 | 0.9706750 | -2.829943 | 2.961952 | -0.5415260 | 0.6797238 | 20 |
| E3 | -0.0026796 | 0.0430101 | 1.0445554 | -3.004356 | 3.034495 | -0.6464097 | 0.6679667 | 24 |
| E4 | 0.0351207 | 0.0338548 | 0.9851153 | -2.864615 | 2.945836 | -0.6392719 | 0.7310173 | 16 |
| E5 | 0.0353620 | 0.0675424 | 0.9972876 | -3.124922 | 2.733572 | -0.5943247 | 0.7181983 | 19 |
| E6 | 0.0025000 | 0.0319514 | 0.9755737 | -2.790353 | 2.867483 | -0.6562792 | 0.6805751 | 17 |
| Y | mean | median | std | min | max | lower quantile | upper quantile | count of NA | |
|---|---|---|---|---|---|---|---|---|---|
| 25% | Y | 100.2466 | 100.1036 | 6.739156 | 80.36916 | 123.8702 | 96.15011 | 104.76 | 156 |
Methodology
My methodology for dealing with missing data is to delete the observations listwise by using na.omit . The amount of missing values are below 25 (for E and G), which can still produce biased parameters and estimates, but I will take that into account during my analysis of my final model.