# Tut. for R Markdown

Hey Chloe watch this out, this is simple but amazing, am working on R Markdown lol

• I m gonna work on my data and use an owesome package and generate lovely plots yeey
• that s all just love the idea that i can make a A LIST yeah that s bold bb

### Generating my data

• To run the code and prevent being shown in the result we use : echo= FALSE
• The opposite, to show the code but prevent being run we use : aval= FALSE
setwd("C:/Users/Abdelghani/Desktop/R with chloe")
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 3.5.2

### Some ploting and try a linear model

I Want to know the relation between Sexe and Crapace Length (CL)

attach(data)
plot(Sexe, CL)

plot(Sexe~CL)

lm1 <- lm(CL ~ Sexe + Poids..g.+ TL + PAL)
summary(lm1)
##
## Call:
## lm(formula = CL ~ Sexe + Poids..g. + TL + PAL)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -21.644  -8.166   4.410   6.611  17.219
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  65.69820   17.41588   3.772 0.001845 **
## SexeM       -14.92591    7.98541  -1.869 0.081254 .
## SexeM (j)   -13.45112   11.19497  -1.202 0.248172
## Poids..g.     0.07160    0.01675   4.274 0.000666 ***
## TL            0.31169    0.37481   0.832 0.418678
## PAL           0.81192    0.52960   1.533 0.146076
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.56 on 15 degrees of freedom
## Multiple R-squared:  0.918,  Adjusted R-squared:  0.8906
## F-statistic: 33.57 on 5 and 15 DF,  p-value: 1.246e-07
step(lm1)
## Start:  AIC=111.2
## CL ~ Sexe + Poids..g. + TL + PAL
##
##             Df Sum of Sq    RSS    AIC
## - TL         1    109.03 2473.9 110.15
## <none>                   2364.9 111.20
## - Sexe       2    576.10 2941.0 111.78
## - PAL        1    370.55 2735.4 112.26
## - Poids..g.  1   2879.82 5244.7 125.93
##
## Step:  AIC=110.15
## CL ~ Sexe + Poids..g. + PAL
##
##             Df Sum of Sq    RSS    AIC
## <none>                   2473.9 110.15
## - Sexe       2     585.2 3059.1 110.61
## - PAL        1     966.1 3440.0 115.07
## - Poids..g.  1    4513.2 6987.1 129.95
##
## Call:
## lm(formula = CL ~ Sexe + Poids..g. + PAL)
##
## Coefficients:
## (Intercept)        SexeM    SexeM (j)    Poids..g.          PAL
##    77.41810    -15.23794    -12.25386      0.07837      1.06755

the AIC confirmes that the CL is genaraly can be explained and predicted by two parameters:

• Sexe.
• weight.

and PAL with less degree

so we can exclue the other parameters, and see which one of these two parameters affects more the CL.

For that I am going to use another linear model including these two variables:

lm2 <- lm(CL ~ Sexe + Poids..g.)
summary(lm2)
##
## Call:
## lm(formula = CL ~ Sexe + Poids..g.)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -31.589  -7.077   2.494   6.033  19.371
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 95.53476    8.10086  11.793 1.31e-09 ***
## SexeM       -6.81747    8.17379  -0.834    0.416
## SexeM (j)   -7.52932   12.39027  -0.608    0.551
## Poids..g.    0.10236    0.01244   8.227 2.49e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.23 on 17 degrees of freedom
## Multiple R-squared:  0.8807, Adjusted R-squared:  0.8596
## F-statistic: 41.82 on 3 and 17 DF,  p-value: 4.611e-08
step(lm2)
## Start:  AIC=115.07
## CL ~ Sexe + Poids..g.
##
##             Df Sum of Sq     RSS    AIC
## - Sexe       2     155.6  3595.6 112.00
## <none>                    3440.0 115.07
## - Poids..g.  1   13695.9 17135.9 146.79
##
## Step:  AIC=112
## CL ~ Poids..g.
##
##             Df Sum of Sq     RSS    AIC
## <none>                    3595.6 112.00
## - Poids..g.  1     25229 28824.7 153.71
##
## Call:
## lm(formula = CL ~ Poids..g.)
##
## Coefficients:
## (Intercept)    Poids..g.
##     89.9108       0.1091

it s seems that sexe has more effect on the carapace length than the weight of the animal

lm3<- lm(CL ~ Sexe * Poids..g.)
summary(lm3)
##
## Call:
## lm(formula = CL ~ Sexe * Poids..g.)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -27.1238  -2.8192   0.9154   4.5002  18.3557
##
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)           99.66527    6.90931  14.425 3.37e-10 ***
## SexeM                -29.58695   11.16284  -2.650   0.0182 *
## SexeM (j)           -231.96527  125.72503  -1.845   0.0849 .
## Poids..g.              0.09500    0.01071   8.872 2.36e-07 ***
## SexeM:Poids..g.        0.10415    0.04062   2.564   0.0216 *
## SexeM (j):Poids..g.    2.67773    1.51830   1.764   0.0981 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.81 on 15 degrees of freedom
## Multiple R-squared:  0.9274, Adjusted R-squared:  0.9032
## F-statistic: 38.34 on 5 and 15 DF,  p-value: 5.031e-08
step(lm3)
## Start:  AIC=108.63
## CL ~ Sexe * Poids..g.
##
##                  Df Sum of Sq    RSS    AIC
## <none>                        2091.9 108.63
## - Sexe:Poids..g.  2    1348.1 3440.0 115.07
##
## Call:
## lm(formula = CL ~ Sexe * Poids..g.)
##
## Coefficients:
##         (Intercept)                SexeM            SexeM (j)
##             99.6653             -29.5870            -231.9653
##           Poids..g.      SexeM:Poids..g.  SexeM (j):Poids..g.
##              0.0950               0.1041               2.6777
opar= par(mfrow = c(2,2))
plot(lm3, which= 1:4)
## Warning: not plotting observations with leverage one:
##   4, 17

par(opar)

### Now, let us do some awesome Graphs

For that i am going to use ggplots2 package. I already called for it above

ok wait i found this great package called ggfortify, fuck ggplot2 lol

ooh ggfortify is ust an autoplay for ggplot2 .. i feel stupid hhh

let s try it on lm3

library(ggfortify) 
## Warning: package 'ggfortify' was built under R version 3.5.2
autoplot(lm(CL ~ Sexe* Poids..g., data = data),
colour = "red", label.size = 3)
## Warning: package 'bindrcpp' was built under R version 3.5.2
## Warning: Removed 21 rows containing missing values (geom_path).
## Warning: Removed 2 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_path).

Ok, I get it, that has nothing to do with graphs. That is just looking at my model in a more colorful way. I ll go back now using ggplot2 and use a nice graph, I am sorry ggplot2 lol.

data2<- data[data$Sexe %in% c("F", "M", "M(j)"), ] theme_set(theme_bw()) g <- ggplot(data2, aes(CL, Poids..g.)) + labs(subtitle="", title="") g + geom_jitter(aes(col=Sexe, size = H)) + geom_smooth(aes(col=Sexe), method="lm", se=F) ########################## ########################## working on online data .. wow data(mpg, package="ggplot2") mpg <- read.csv("http://goo.gl/uEeRGu") mpg_select <- mpg[mpg$manufacturer %in% c("audi", "ford", "honda", "hyundai"), ]

# Scatterplot
theme_set(theme_bw())  # pre-set the bw theme.
g <- ggplot(mpg_select, aes(displ, cty)) +
labs(subtitle="mpg: Displacement vs City Mileage",
title="Bubble chart")

g + geom_jitter(aes(col=manufacturer, size=hwy)) +
geom_smooth(aes(col=manufacturer), method="lm", se=F)