Executive Summary

JARI - Just Another R Interface, is a software developed by Eric Goh Ming HUi, author of the book - Learn R for Applied Statistics, published at Apress. This is a report generated ffrom JARI, for testing. You can download JARI at EGMHacademy.com. Executive Summary: Our company recently measured the iris flower and get the iris dataset.We would like to get some plots and exploration on the dataset. The company has three questions - how are the variables correlated to each other, what is the distribution of each variable, and how are the variables same to each other in mean or median. We will use the correlation matrix for the first question, the normality test for the second question, and Wilcoxon test for third questions.

1.0 Introduction

Our company recently measured the iris flower and get the iris dataset.We would like to get some plots and exploration on the dataset.

2.0 Data Source

Measured the iris flower and derived the iris dataset.

3.0 Key Findings

3.1 Three SMART Questions

  • How are the variables correlated to each other?
  • What is the distribution of each variable?
  • How are the variables same to each other in mean or median?

3.2 How are the variables correlated to each other?

We will knit the correlation matrix and correlation test.

3.21 Correlation Matrix

## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
## 
## ==================================================
##              sepal.length sepal.width petal.length
## --------------------------------------------------
## sepal.length      1         -0.118       0.874    
## sepal.width     -0.118         1         -0.426   
## petal.length    0.874       -0.426         1      
## --------------------------------------------------
## 
## <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>sepal.length</td><td>sepal.width</td><td>petal.length</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">sepal.length</td><td>1</td><td>-0.118</td><td>0.874</td></tr>
## <tr><td style="text-align:left">sepal.width</td><td>-0.118</td><td>1</td><td>-0.426</td></tr>
## <tr><td style="text-align:left">petal.length</td><td>0.874</td><td>-0.426</td><td>1</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr></table>

3.22 Correlation: Petal Length and Petal Width

## Warning: package 'pander' was built under R version 4.5.3
## Warning: package 'broom' was built under R version 4.5.3
## 
## Pearson
## [1] 0.9627723
## 
## Spearman
## [1] 0.9376545
## 
## Kendall
## [1] 0.8065882
Table continues below
estimate statistic p.value parameter conf.low conf.high
0.9628 43.18 2.082e-85 147 0.9489 0.9729
method alternative
Pearson’s product-moment correlation two.sided

We can infer that there is correlation between Petal Length and Petal Width.

3.23 Correlation: Sepal Length and Sepal Width

## 
## Pearson
## [1] -0.1181293
## 
## Spearman
## [1] -0.1681023
## 
## Kendall
## [1] -0.0786184
Table continues below
estimate statistic p.value parameter conf.low conf.high
-0.1181 -1.442 0.1513 147 -0.2737 0.0435
method alternative
Pearson’s product-moment correlation two.sided

We can infer that there is no correlation or negative correlation between Sepal Length and Sepal Width.

3.24 Scatterplots

## null device 
##           1

We can infer that there is correlation between Petal Length and Petal Width, Sepal Length and Petal Length, Petal Width and Sepal Width.

3.3 What is the distribution of each variable?

We will knit the Normality Test on each variable. If P-Value > 0.05, the variable is normal distribution. . #### 3.31 Normality Test: Petal Length

## 
## ---------------
## Variable: petal.length
## ---------------
## 
## Normality Test
statistic p.value method
0.8768 8.635e-10 Shapiro-Wilk normality test

P-Value < 0.05, We can infer that the variable is not normally distributed.

3.31 Normality Test: Petal Width

## 
## ---------------
## Variable: petal.width
## ---------------
## 
## Normality Test
statistic p.value method
0.9019 1.853e-08 Shapiro-Wilk normality test

3.31 Normality Test: Sepal Length

## 
## ---------------
## Variable: sepal.length
## ---------------
## 
## Normality Test
statistic p.value method
0.9756 0.009234 Shapiro-Wilk normality test

3.31 Normality Test: Sepal Width

## 
## ---------------
## Variable: sepal.width
## ---------------
## 
## Normality Test
statistic p.value method
0.985 0.1062 Shapiro-Wilk normality test

3.4 How are the variables same to each other in mean or median?

Since not all variable are normally distributed, we will use non-parametric. We will use unpaired Wilcoxon Test, assuming the variables are not related. If P-Value < 0.05, we can reject null hypothesis and the median of variable A is different of median of variable B.

3.41 Unpaired Wilcoxn Test: Petal Length and Petal Width

statistic p.value method alternative
19099 4.976e-27 Wilcoxon rank sum test with continuity correction two.sided

3.42 Unpaired Wilcoxn Test: Sepal Length and Sepal Width

statistic p.value method alternative
22199 2.084e-50 Wilcoxon rank sum test with continuity correction two.sided

4.0 Conclusions

We can infer that there is correlation between Petal Length and Petal Width, Sepal Length and Petal Length, Petal Width and Sepal Width. For Normality Test, Only Sepal Width is normal distribution. Both wilcoxon test showed P-Value < 0.05, hence, we can reject null hypothesis and the median of variable A is different of median of variable B. So, median of Petal Length and median of Petal Width is different. So, median of Sepal Length and median of Sepal Width is different