## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
Count of Columns and Rows
## [1] 5
## [1] 150
Tail of Data Set
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
Exploring Dataset Visually - Scatter Plot Exploring Dataset Visually - Scatterplot Matrices
Scatterplot matrices are good for determining rough linear correlations of metadata that contain continuous variables. After checking the matrices, petal length and petal width are highly correlated over all species.
## Petal.Length Sepal.Length
## Petal.Length 1.0000000 0.8717538
## Sepal.Length 0.8717538 1.0000000
The black line passes through the midpoint of the set.
Data divided by species (setosa=red, versicolor=green, virginica=blue). Three separate linear regressions are ran.
##
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width:Species + Species - 1,
## data = iris)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.26067 -0.25861 -0.03305 0.18929 1.44917
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Speciessetosa 2.6390 0.5715 4.618 8.53e-06 ***
## Speciesversicolor 3.5397 0.5580 6.343 2.74e-09 ***
## Speciesvirginica 3.9068 0.5827 6.705 4.25e-10 ***
## Sepal.Width:Speciessetosa 0.6905 0.1657 4.166 5.31e-05 ***
## Sepal.Width:Speciesversicolor 0.8651 0.2002 4.321 2.88e-05 ***
## Sepal.Width:Speciesvirginica 0.9015 0.1948 4.628 8.16e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4397 on 144 degrees of freedom
## Multiple R-squared: 0.9947, Adjusted R-squared: 0.9944
## F-statistic: 4478 on 6 and 144 DF, p-value: < 2.2e-16
According to the p-value, the three variables (Sepal.Width, Petal.Length, Petal.Width) are significant related to Sepal.Length
This way suitable for normal distributions or categorical variables.