Summary of Iris Data Set

##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Count of Columns and Rows

## [1] 5
## [1] 150

Tail of Data Set

##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica

Exploring Dataset Visually - Scatter Plot Exploring Dataset Visually - Scatterplot Matrices Scatterplot matrices are good for determining rough linear correlations of metadata that contain continuous variables. After checking the matrices, petal length and petal width are highly correlated over all species.

##              Petal.Length Sepal.Length
## Petal.Length    1.0000000    0.8717538
## Sepal.Length    0.8717538    1.0000000

Linear Regression

The black line passes through the midpoint of the set.

Linear Models

Data divided by species (setosa=red, versicolor=green, virginica=blue). Three separate linear regressions are ran.

## 
## Call:
## lm(formula = Sepal.Length ~ Sepal.Width:Species + Species - 1, 
##     data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26067 -0.25861 -0.03305  0.18929  1.44917 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## Speciessetosa                   2.6390     0.5715   4.618 8.53e-06 ***
## Speciesversicolor               3.5397     0.5580   6.343 2.74e-09 ***
## Speciesvirginica                3.9068     0.5827   6.705 4.25e-10 ***
## Sepal.Width:Speciessetosa       0.6905     0.1657   4.166 5.31e-05 ***
## Sepal.Width:Speciesversicolor   0.8651     0.2002   4.321 2.88e-05 ***
## Sepal.Width:Speciesvirginica    0.9015     0.1948   4.628 8.16e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4397 on 144 degrees of freedom
## Multiple R-squared:  0.9947, Adjusted R-squared:  0.9944 
## F-statistic:  4478 on 6 and 144 DF,  p-value: < 2.2e-16

According to the p-value, the three variables (Sepal.Width, Petal.Length, Petal.Width) are significant related to Sepal.Length

This way suitable for normal distributions or categorical variables.