class: center, middle, inverse, title-slide .title[ # An RMarkdown report ] .date[ ### 2024-03-09 ] --- ---- # Data description In this project we consider the classical `iris` data set that can be found in the base R package `datasets`. `iris` has 5 columns named **`Sepal.Length`**, **`Sepal.Width`**, **`Petal.Length`**, **`Petal.Width`**, **`Species`** and **150** observations. To get help about columns, run `?iris`. The data set `iris` is <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Sepal.Length"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Sepal.Width"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["Petal.Length"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Petal.Width"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Species"],"name":[5],"type":["fct"],"align":["left"]}],"data":[{"1":"5.1","2":"3.5","3":"1.4","4":"0.2","5":"setosa","_rn_":"1"},{"1":"4.9","2":"3.0","3":"1.4","4":"0.2","5":"setosa","_rn_":"2"},{"1":"4.7","2":"3.2","3":"1.3","4":"0.2","5":"setosa","_rn_":"3"},{"1":"4.6","2":"3.1","3":"1.5","4":"0.2","5":"setosa","_rn_":"4"},{"1":"5.0","2":"3.6","3":"1.4","4":"0.2","5":"setosa","_rn_":"5"},{"1":"5.4","2":"3.9","3":"1.7","4":"0.4","5":"setosa","_rn_":"6"},{"1":"4.6","2":"3.4","3":"1.4","4":"0.3","5":"setosa","_rn_":"7"},{"1":"5.0","2":"3.4","3":"1.5","4":"0.2","5":"setosa","_rn_":"8"},{"1":"4.4","2":"2.9","3":"1.4","4":"0.2","5":"setosa","_rn_":"9"},{"1":"4.9","2":"3.1","3":"1.5","4":"0.1","5":"setosa","_rn_":"10"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> --- # Descriptive analysis First, we compute some descriptive statistics with the `summary()` function: Table: Descriptive statistics | | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |:--|:-------------|:------------|:-------------|:------------|:-------------| | |Min. :4.30 |Min. :2.00 |Min. :1.00 |Min. :0.1 |setosa :50 | | |1st Qu.:5.10 |1st Qu.:2.80 |1st Qu.:1.60 |1st Qu.:0.3 |versicolor:50 | | |Median :5.80 |Median :3.00 |Median :4.35 |Median :1.3 |virginica :50 | | |Mean :5.84 |Mean :3.06 |Mean :3.76 |Mean :1.2 |NA | | |3rd Qu.:6.40 |3rd Qu.:3.30 |3rd Qu.:5.10 |3rd Qu.:1.8 |NA | | |Max. :7.90 |Max. :4.40 |Max. :6.90 |Max. :2.5 |NA | --- Second, we use the function `aggregate` to get the following table that shows the means of the 4 numerical variables for each species. Table: Means by species |Species | Sepal Length| Sepal Width| Petal Length| Petal width| |:----------|------------:|-----------:|------------:|-----------:| |setosa | 5.01| 3.43| 1.46| 0.246| |versicolor | 5.94| 2.77| 4.26| 1.326| |virginica | 6.59| 2.97| 5.55| 2.026| --- # Linear regression We use the function `cor()` to get the Pearson’s coefficients of correlation between all our numeric variables: Table: Pearson’s coefficients of correlation | | Sepal.Length| Sepal.Width| Petal.Length| Petal.Width| |:------------|------------:|-----------:|------------:|-----------:| |Sepal.Length | 1.000| -0.118| 0.872| 0.818| |Sepal.Width | -0.118| 1.000| -0.428| -0.366| |Petal.Length | 0.872| -0.428| 1.000| 0.963| |Petal.Width | 0.818| -0.366| 0.963| 1.000| --- Hered are 3 scatter plots that show the association between `Petal.Length` and the other numerical variables. <img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-4-1.png" width="50%" /><img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-4-2.png" width="50%" /> --- <img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-5-1.png" width="50%" /> --- Now, we would like to explain the variations in the length of the sepal as a function of the length of the petal. To do so, we use the following linear regression `lm(Sepal.Length ~ Petal.Length, data = iris)`. Here is the summary of this model: Table: Summary of lm: Sepal.Length ~ Petal. Length | | Estimate| Std. Error| t value| Pr(>|t|)| |:------------|--------:|----------:|-------:|------------------:| |(Intercept) | 4.307| 0.078| 54.9| 0| |Petal.Length | 0.409| 0.019| 21.6| 0| The model's equation is `$$\widehat{Sepal.Length} = 4.307 + 0.409Petal.Length$$` Here are the data with the regression line: --- ``` Warning in par(margin = c(3, 3, 3, 3)): "margin" is not a graphical parameter ``` <img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> --- Finally, we use the `plot(reg)` command to get some graphical representations of the residuals. <div class="figure" style="text-align: defaut"> <img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-8-1.png" alt="Residuals plot 1" width="50%" /><img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-8-2.png" alt="Residuals plot 1" width="50%" /> <p class="caption">Residuals plot 1</p> </div> --- <div class="figure" style="text-align: defaut"> <img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-9-1.png" alt="Residuals plot 2" width="50%" /><img src="data:image/png;base64,#3_files/figure-html/unnamed-chunk-9-2.png" alt="Residuals plot 2" width="50%" /> <p class="caption">Residuals plot 2</p> </div>