I chose this inbuilt R dataset 'ToothGrowth because I thought it was a strange study, so wanted to analyze it. ToothGrowth data set contains the result from an experiment studying the effect of vitamin C on tooth growth in 60 Guinea pigs.

data("ToothGrowth")
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Summary of the data

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Plotting to Check Linearity

#Y~X
ggplot(ToothGrowth, aes(x=dose, y=len)) +
  geom_point(size=2, shape=23) +geom_smooth(method = 'lm')
## `geom_smooth()` using formula = 'y ~ x'

m1 <- lm(ToothGrowth$len~ToothGrowth$dose, data = ToothGrowth)

print(m1)
## 
## Call:
## lm(formula = ToothGrowth$len ~ ToothGrowth$dose, data = ToothGrowth)
## 
## Coefficients:
##      (Intercept)  ToothGrowth$dose  
##            7.423             9.764
residuals <- resid(m1)
hist(residuals)

qqnorm(residuals)
qqline(residuals)

Conclusion: In conclusion, even though this was a super simple dataset, and I had no data cleaning to do; I do believe a linear regression is a right fit since as the dose increases, the guinea pig's teeth length also increased. The residuals are also somewhat normally distributed in the normal probability plot.

Sources: http://www.sthda.com/english/wiki/r-built-in-data-sets