Intro

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Please refer to the Rmarkdown cheat sheet or this Rmarkdown for beginners video tutorial for a quick guide on syntax.

Packages for tables in R

Several packages support making beautiful tables with R, such as:

But we will use knitr instead in the next section.

Data exploration

Table

I am using the built in mtcars dataset in R. Please refer to R documentation on mtcars for description of data fields.

Data on the first 5 cars in the dataset are shown with knitr’s kable function.

This knitr table shows the first 5 rows of the mtcars dataset.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

This knitr table shows the first 5 rows of the built in dataset mtcars.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2

Data summary

Data summary showing basic statistics, in this case echo = TRUE was added to the code chunk to enable printing the R code.

summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

As an example for inline R code in Rmarkdown, the median of miles per gallon is 19.2 and the median of horse power is 123.

Data visualisation - basic plot

` Basic plot showing horse power vs miles per gallon.

plot(mtcars$hp,mtcars$mpg)

Data visualisation - advanced plot (ggplot2)

A nicer looking plot using ggplot2. To change the theme, check other options on ggthemes page.

library(ggplot2)
library(dplyr)
library(ggthemes)
ggplot(data=mtcars) + 
  geom_point(mapping=aes(x=hp, y=mpg,color=hp)) +
  labs(title="Miles per gallon vs. horse power",
       subtitle="Data: built in mtcars dataset",
       x="Horse power",
       y="Miles per gallon")+
  theme_classic()

Regression

A simple regression plot is shown.

Input variable: Horse power Output variable: Miles per gallon

x<-mtcars$hp
y<-mtcars$mpg
plot(x, y, main = "Linear regression - miles per gallon vs horse power",
     xlab = "Horse power", ylab = "Miles per gallon",
     pch = 19, frame = FALSE)
abline(lm(y ~ x, data = mtcars), col = "blue")

summary(lm(y~x))
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## x           -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

Regression with a smoother

The impact of horspower on miles per gallon is shown on the chart below using a smoother.

ggplot(mtcars, aes(hp, mpg)) +
  stat_smooth() + geom_point() +
  ylab("Miles per Gallon") +
  xlab ("No. of Horsepower") +
  ggtitle("Impact of Number of Horsepower on Miles per gallon")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Thanks you!

Caption