This R markdown highlights way you can manipulate and plot data in R using both the console and R markdown. Data taken from a weight gain study of rats with respect to varying sources of protein.
The first challenge with analyzing data using R is loading data for use in the console. This is achieved by first importing a data set into the R environment (if importing from an Excel document, make sure the file is saved as a csv), and then by running the code:
FATRATS <- read.csv("/Users/matthewhecking/Documents/Intermediate Stats using R/FATRATS.csv")
This code is specific to where you have the file saved, to find this info, right click on the document and choose “get info”, the file's information should be listed under “where”.
Once the data is imported and loaded, we can check it using the command:
head(FATRATS)
## hilo source weight level animveg beefpork interanveg interbfprk
## 1 1 1 73 1 1 1 1 1
## 2 1 1 102 1 1 1 1 1
## 3 1 1 118 1 1 1 1 1
## 4 1 1 104 1 1 1 1 1
## 5 1 1 81 1 1 1 1 1
## 6 1 1 107 1 1 1 1 1
This gives us the first 5 rows of data from the file, and shows us that the file has successfully been loaded into R.
After the data set has been uploaded, we can now work in R and analyze it. To begin, we first attach the file in R using the command:
attach(FATRATS)
Once the file is attached, we can use a general linear model to get our first look at the data, by imputing
fit = glm(FATRATS)
summary(fit)
##
## Call:
## glm(formula = FATRATS)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.66e-15 -1.78e-15 -1.33e-15 -1.11e-15 -6.66e-16
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.50e+00 1.43e-15 1.05e+15 <2e-16 ***
## source -1.89e-16 2.55e-16 -7.40e-01 0.46
## weight -1.92e-17 1.50e-17 -1.28e+00 0.21
## level -5.00e-01 2.35e-16 -2.13e+15 <2e-16 ***
## animveg 9.84e-17 1.49e-16 6.60e-01 0.51
## beefpork NA NA NA NA
## interanveg 1.57e-16 1.54e-16 1.02e+00 0.31
## interbfprk 2.27e-16 2.55e-16 8.90e-01 0.38
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.597e-30)
##
## Null deviance: 1.5000e+01 on 59 degrees of freedom
## Residual deviance: 1.3766e-28 on 53 degrees of freedom
## AIC: -3909
##
## Number of Fisher Scoring iterations: 1
This gives the most basic model to describe the data, however some values are being falsely manipulated and given improper values (the p values of intercept and value for example are very low, which doesn't make logical sense). To improve the model, we can assign stricter parameters by assigning interactions and factors. An improved model may look something like
fit = glm(weight ~ level)
summary(fit)
##
## Call:
## glm(formula = weight ~ level)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -39.13 -8.73 1.13 9.52 26.40
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 87.87 1.94 45.41 <2e-16 ***
## level 7.27 1.94 3.76 4e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 224.7)
##
## Null deviance: 16199 on 59 degrees of freedom
## Residual deviance: 13031 on 58 degrees of freedom
## AIC: 499.1
##
## Number of Fisher Scoring iterations: 2
To further improve the model, we can look at every variable and include it within the model, by writing something like:
fit = aov(lm(weight ~ level + animveg + beefpork + interanveg + interbfprk))
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## level 1 3168 3168 14.77 0.00032 ***
## animveg 1 264 264 1.23 0.27221
## beefpork 1 2 2 0.01 0.91444
## interanveg 1 1178 1178 5.49 0.02283 *
## interbfprk 1 0 0 0.00 1.00000
## Residuals 54 11586 215
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After analyzing the data and looking at the significance of it mathematically, it helps to analyze the data visually by graphing the data. In R, this can be done by using the code “interaction.plot”
interaction.plot(hilo, source, weight)
the same information can also be written in a different format, by rearranging the variables, for example:
interaction.plot(source, hilo, weight)