Your Document Title

Document Author

2020-06-14

## Loading required package: knitr

Data description

In this project we consider the classical iris data that can be found in the R datasets package. This data has 5 columns named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species_and 150, 5_ observations.

You can find help on these data here.

Descriptive analysis

First, we compute some descriptive statistics with the summary() function:

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 NA
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 NA
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 NA

Second, we use this code

to get the following table that shows the means of the \(4\) numerical variables for each species.

Means by species
Species Sepal Length Sepal Width Petal Length Petal Width
setosa 5.006 3.428 1.462 0.246
versicolor 5.936 2.770 4.260 1.326
virginica 6.588 2.974 5.552 2.026

Linear regression

We use the function cor() to get the Pearson’s coefficients of correlation between all our numeric variables:

Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000