Tut. for R Markdown

Hey Chloe watch this out, this is simple but amazing, am working on R Markdown lol

Generating my data

  • To run the code and prevent being shown in the result we use : echo= FALSE
  • The opposite, to show the code but prevent being run we use : aval= FALSE
setwd("C:/Users/Abdelghani/Desktop/R with chloe")
data<-read.csv("data.csv", header = T)
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 3.5.2

Some ploting and try a linear model

I Want to know the relation between Sexe and Crapace Length (CL)

attach(data)
plot(Sexe, CL)

plot(Sexe~CL)

lm1 <- lm(CL ~ Sexe + Poids..g.+ TL + PAL)
summary(lm1)
## 
## Call:
## lm(formula = CL ~ Sexe + Poids..g. + TL + PAL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.644  -8.166   4.410   6.611  17.219 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  65.69820   17.41588   3.772 0.001845 ** 
## SexeM       -14.92591    7.98541  -1.869 0.081254 .  
## SexeM (j)   -13.45112   11.19497  -1.202 0.248172    
## Poids..g.     0.07160    0.01675   4.274 0.000666 ***
## TL            0.31169    0.37481   0.832 0.418678    
## PAL           0.81192    0.52960   1.533 0.146076    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.56 on 15 degrees of freedom
## Multiple R-squared:  0.918,  Adjusted R-squared:  0.8906 
## F-statistic: 33.57 on 5 and 15 DF,  p-value: 1.246e-07
step(lm1)
## Start:  AIC=111.2
## CL ~ Sexe + Poids..g. + TL + PAL
## 
##             Df Sum of Sq    RSS    AIC
## - TL         1    109.03 2473.9 110.15
## <none>                   2364.9 111.20
## - Sexe       2    576.10 2941.0 111.78
## - PAL        1    370.55 2735.4 112.26
## - Poids..g.  1   2879.82 5244.7 125.93
## 
## Step:  AIC=110.15
## CL ~ Sexe + Poids..g. + PAL
## 
##             Df Sum of Sq    RSS    AIC
## <none>                   2473.9 110.15
## - Sexe       2     585.2 3059.1 110.61
## - PAL        1     966.1 3440.0 115.07
## - Poids..g.  1    4513.2 6987.1 129.95
## 
## Call:
## lm(formula = CL ~ Sexe + Poids..g. + PAL)
## 
## Coefficients:
## (Intercept)        SexeM    SexeM (j)    Poids..g.          PAL  
##    77.41810    -15.23794    -12.25386      0.07837      1.06755

the AIC confirmes that the CL is genaraly can be explained and predicted by two parameters:

  • Sexe.
  • weight.

and PAL with less degree

so we can exclue the other parameters, and see which one of these two parameters affects more the CL.

For that I am going to use another linear model including these two variables:

lm2 <- lm(CL ~ Sexe + Poids..g.)
summary(lm2)
## 
## Call:
## lm(formula = CL ~ Sexe + Poids..g.)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.589  -7.077   2.494   6.033  19.371 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 95.53476    8.10086  11.793 1.31e-09 ***
## SexeM       -6.81747    8.17379  -0.834    0.416    
## SexeM (j)   -7.52932   12.39027  -0.608    0.551    
## Poids..g.    0.10236    0.01244   8.227 2.49e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.23 on 17 degrees of freedom
## Multiple R-squared:  0.8807, Adjusted R-squared:  0.8596 
## F-statistic: 41.82 on 3 and 17 DF,  p-value: 4.611e-08
step(lm2)
## Start:  AIC=115.07
## CL ~ Sexe + Poids..g.
## 
##             Df Sum of Sq     RSS    AIC
## - Sexe       2     155.6  3595.6 112.00
## <none>                    3440.0 115.07
## - Poids..g.  1   13695.9 17135.9 146.79
## 
## Step:  AIC=112
## CL ~ Poids..g.
## 
##             Df Sum of Sq     RSS    AIC
## <none>                    3595.6 112.00
## - Poids..g.  1     25229 28824.7 153.71
## 
## Call:
## lm(formula = CL ~ Poids..g.)
## 
## Coefficients:
## (Intercept)    Poids..g.  
##     89.9108       0.1091

it s seems that sexe has more effect on the carapace length than the weight of the animal

lm3<- lm(CL ~ Sexe * Poids..g.)
summary(lm3)
## 
## Call:
## lm(formula = CL ~ Sexe * Poids..g.)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.1238  -2.8192   0.9154   4.5002  18.3557 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           99.66527    6.90931  14.425 3.37e-10 ***
## SexeM                -29.58695   11.16284  -2.650   0.0182 *  
## SexeM (j)           -231.96527  125.72503  -1.845   0.0849 .  
## Poids..g.              0.09500    0.01071   8.872 2.36e-07 ***
## SexeM:Poids..g.        0.10415    0.04062   2.564   0.0216 *  
## SexeM (j):Poids..g.    2.67773    1.51830   1.764   0.0981 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.81 on 15 degrees of freedom
## Multiple R-squared:  0.9274, Adjusted R-squared:  0.9032 
## F-statistic: 38.34 on 5 and 15 DF,  p-value: 5.031e-08
step(lm3)
## Start:  AIC=108.63
## CL ~ Sexe * Poids..g.
## 
##                  Df Sum of Sq    RSS    AIC
## <none>                        2091.9 108.63
## - Sexe:Poids..g.  2    1348.1 3440.0 115.07
## 
## Call:
## lm(formula = CL ~ Sexe * Poids..g.)
## 
## Coefficients:
##         (Intercept)                SexeM            SexeM (j)  
##             99.6653             -29.5870            -231.9653  
##           Poids..g.      SexeM:Poids..g.  SexeM (j):Poids..g.  
##              0.0950               0.1041               2.6777
opar= par(mfrow = c(2,2))
plot(lm3, which= 1:4)
## Warning: not plotting observations with leverage one:
##   4, 17

par(opar)

Now, let us do some awesome Graphs

For that i am going to use ggplots2 package. I already called for it above

ok wait i found this great package called ggfortify, fuck ggplot2 lol

ooh ggfortify is ust an autoplay for ggplot2 .. i feel stupid hhh

let s try it on lm3

library(ggfortify) 
## Warning: package 'ggfortify' was built under R version 3.5.2
autoplot(lm(CL ~ Sexe* Poids..g., data = data),
         colour = "red", label.size = 3)
## Warning: package 'bindrcpp' was built under R version 3.5.2
## Warning: Removed 21 rows containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_path).

Ok, I get it, that has nothing to do with graphs. That is just looking at my model in a more colorful way. I ll go back now using ggplot2 and use a nice graph, I am sorry ggplot2 lol.

data2<- data[data$Sexe %in% c("F", "M", "M(j)"), ]
theme_set(theme_bw())  
g <- ggplot(data2, aes(CL, Poids..g.)) + 
  labs(subtitle="",
       title="")

g + geom_jitter(aes(col=Sexe, size = H)) + 
  geom_smooth(aes(col=Sexe), method="lm", se=F)

########################## ##########################

working on online data .. wow

data(mpg, package="ggplot2")
mpg <- read.csv("http://goo.gl/uEeRGu")

mpg_select <- mpg[mpg$manufacturer %in% c("audi", "ford", "honda", "hyundai"), ]

# Scatterplot
theme_set(theme_bw())  # pre-set the bw theme.
g <- ggplot(mpg_select, aes(displ, cty)) + 
  labs(subtitle="mpg: Displacement vs City Mileage",
       title="Bubble chart")

g + geom_jitter(aes(col=manufacturer, size=hwy)) + 
  geom_smooth(aes(col=manufacturer), method="lm", se=F)