R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

2. Using ggplot, create a histogram for LungCap variable.

library(ggplot2)
ggplot(lung, aes(x = LungCap)) + geom_histogram(color = 'blue')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3. Create a boxplot for LungCap & Age

ggplot(lung, aes(x = LungCap, y = Age)) + geom_boxplot()
## Warning: Continuous x aesthetic -- did you forget aes(group=...)?

4. Create a barplot for gender variable & print a proportion table as well.

ggplot(lung, aes(x = Gender)) + geom_bar()

prop.table(table(lung$Gender))
## 
##    female      male 
## 0.4937931 0.5062069

5. Create a side by side boxplot for gender & compare the lung capacity of male & female.

ggplot(lung, aes(x = Gender, y = LungCap)) + geom_boxplot()

6. Compare & Contrast the gender variable against Smoke variable using a bar plot.

ggplot(lung, aes(x = Gender, fill = factor(Smoke))) + geom_bar(position = position_dodge(1))

7. Generate a full linear model with all variable, print r squared, RMES and residual plot.

linearmodel <- lm(data = lung)
summary(linearmodel)
## 
## Call:
## lm(data = lung)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3388 -0.7200  0.0444  0.7093  3.0172 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -11.32249    0.47097 -24.041  < 2e-16 ***
## Age            0.16053    0.01801   8.915  < 2e-16 ***
## Height         0.26411    0.01006  26.248  < 2e-16 ***
## Smokeyes      -0.60956    0.12598  -4.839 1.60e-06 ***
## Gendermale     0.38701    0.07966   4.858 1.45e-06 ***
## Caesareanyes  -0.21422    0.09074  -2.361   0.0185 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.02 on 719 degrees of freedom
## Multiple R-squared:  0.8542, Adjusted R-squared:  0.8532 
## F-statistic: 842.8 on 5 and 719 DF,  p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(linearmodel)

8. Generate a reduced linear model with only significant variables, print the same values as above.

library(caret)
## Loading required package: lattice
library(lattice)
reducedlinearmodel <- lm(LungCap ~ Age + Height + Smoke + Gender, data = lung)
predictions <- predict(reducedlinearmodel, lung[-1])
RMSE(predictions, lung$LungCap)
## [1] 1.019516
s <- summary(reducedlinearmodel)
s
## 
## Call:
## lm(formula = LungCap ~ Age + Height + Smoke + Gender, data = lung)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2915 -0.7360  0.0184  0.7125  3.0599 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -11.33282    0.47245 -23.987  < 2e-16 ***
## Age           0.16012    0.01806   8.864  < 2e-16 ***
## Height        0.26363    0.01009  26.123  < 2e-16 ***
## Smokeyes     -0.61774    0.12633  -4.890 1.24e-06 ***
## Gendermale    0.38528    0.07991   4.822 1.74e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.023 on 720 degrees of freedom
## Multiple R-squared:  0.8531, Adjusted R-squared:  0.8523 
## F-statistic:  1045 on 4 and 720 DF,  p-value: < 2.2e-16

9. For F-test(ANOVA) write your null & alternate hypothesis.

#Null <- There is no relationship between two phenomena.
#alternative <- The Difference beween the two models is significant.