The Data

garlicMustard<-read.csv("/Users/jonathan/Downloads/ENVS 286 Garlic Mustard Data.csv")
summary(garlicMustard)
##     transect   plantHeight       flowerNo         leafNo      
##  0m     : 1   Min.   :42.00   Min.   : 0.00   Min.   :  7.00  
##  10m    : 1   1st Qu.:56.25   1st Qu.: 1.00   1st Qu.: 31.00  
##  11m    : 1   Median :59.75   Median : 4.00   Median : 47.00  
##  12m    : 1   Mean   :63.65   Mean   : 7.50   Mean   : 62.64  
##  13m    : 1   3rd Qu.:73.30   3rd Qu.:10.25   3rd Qu.: 75.75  
##  14m    : 1   Max.   :84.00   Max.   :30.00   Max.   :176.00  
##  (Other):16                                                   
##    seedPodNo        stalkNo   
##  Min.   :  4.0   Min.   :1.0  
##  1st Qu.: 45.5   1st Qu.:2.0  
##  Median : 78.0   Median :3.0  
##  Mean   :111.9   Mean   :3.5  
##  3rd Qu.:162.5   3rd Qu.:4.0  
##  Max.   :297.0   Max.   :7.0  
## 
###List of transformations of data points###
garlicMustard$plantHeight
##  [1] 68.0 84.0 67.0 65.0 58.5 42.0 57.8 49.6 79.0 80.0 60.0 56.0 84.0 70.0
## [15] 58.0 57.0 74.4 76.8 59.5 55.2 54.8 43.7
garlicMustard$seedPodNoSqrt<-sqrt(garlicMustard$seedPodNo)
garlicMustard$flowerNoSqrt<-sqrt(garlicMustard$flowerNo)
garlicMustard$stalkNoLog<-log10(garlicMustard$stalkNo+0.0001)
garlicMustard$leafNoLog<-log10(garlicMustard$leafNo+0.0001)

The Most Correlated

The Seed Pod number and Leaf Number had the most correlation than any of the other combinations tested. The p-value we got from this combination was less than 0.0001 which mean its significantly significant. The R^2 value is very high too at 0.7904 which shows that there is high correlation between the two. I transformed the seed pod number using the the square root runction and the leaf number by the log function.

###List of transformations used###
garlicMustard$seedPodNoSqrt<-sqrt(garlicMustard$seedPodNo)
garlicMustard$leafNoLog<-log10(garlicMustard$leafNo+0.0001)
qqnorm(garlicMustard$seedPodNoSqrt)
qqline(garlicMustard$seedPodNoSqrt)
Figure 1: Q-Q plot for amount of Garlic Mustard seed pod.

Figure 1: Q-Q plot for amount of Garlic Mustard seed pod.

qqnorm(garlicMustard$leafNoLog)
qqline(garlicMustard$leafNoLog)
Figure 2: Q-Q plot for amount of leaves on the Garlic Mustard plants.

Figure 2: Q-Q plot for amount of leaves on the Garlic Mustard plants.

best.LM<-lm(garlicMustard$seedPodNoSqrt~garlicMustard$leafNoLog)
summary(best.LM)
## 
## Call:
## lm(formula = garlicMustard$seedPodNoSqrt ~ garlicMustard$leafNoLog)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8641 -1.4178  0.1954  1.6161  3.8364 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -10.994      2.347  -4.684 0.000143 ***
## garlicMustard$leafNoLog   12.212      1.364   8.955 1.96e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.005 on 20 degrees of freedom
## Multiple R-squared:  0.8004, Adjusted R-squared:  0.7904 
## F-statistic:  80.2 on 1 and 20 DF,  p-value: 1.956e-08
plot(best.LM)
Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

Figure 3:Plot of the residuals from the Linear Model.

plot(garlicMustard$seedPodNoSqrt~garlicMustard$leafNoLog)
abline(best.LM)
Figure 4: Seed Pod number versus Leaf number of Garlic Mustard plants with a best fit line.

Figure 4: Seed Pod number versus Leaf number of Garlic Mustard plants with a best fit line.

The Least Correlated

This two factos have a high p-value of 0.0863 which makes it not significant and the R^2 value is 0.0969 which shows that there is very low correlation between these two factors. The transformation I used for the flower number is the square root function and the seed pod numbers were also transformed by the square root function.

###List of transformations used###
garlicMustard$flowerNoSqrt<-sqrt(garlicMustard$flowerNo)
garlicMustard$seedPodNoSqrt<-sqrt(garlicMustard$seedPodNo)
qqnorm(garlicMustard$flowerNoSqrt)
qqline(garlicMustard$flowerNoSqrt)
Figure 5: Q-Q plot for the amount of flowers on the Garlic Mustard plants.

Figure 5: Q-Q plot for the amount of flowers on the Garlic Mustard plants.

qqnorm(garlicMustard$seedPodNoSqrt)
qqline(garlicMustard$seedPodNoSqrt)
Figure 6: Q-Q plot for the amount of seed pods of the Garlic Mustard plants.

Figure 6: Q-Q plot for the amount of seed pods of the Garlic Mustard plants.

worst.LM<-lm(garlicMustard$flowerNoSqrt~garlicMustard$seedPodNoSqrt)
summary(worst.LM)
## 
## Call:
## lm(formula = garlicMustard$flowerNoSqrt ~ garlicMustard$seedPodNoSqrt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2794 -0.6691  0.1808  1.1853  2.2704 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                  0.85032    0.82657   1.029   0.3159  
## garlicMustard$seedPodNoSqrt  0.14095    0.07814   1.804   0.0863 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.568 on 20 degrees of freedom
## Multiple R-squared:  0.1399, Adjusted R-squared:  0.09693 
## F-statistic: 3.254 on 1 and 20 DF,  p-value: 0.08633
plot(garlicMustard$flowerNoSqrt~garlicMustard$seedPodNoSqrt)
abline(worst.LM)
Figure 7: Flower number versus Seed Pod number of Garlic mustard plants with a best fit line.

Figure 7: Flower number versus Seed Pod number of Garlic mustard plants with a best fit line.

plot(worst.LM)
Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Figure 8: Plot of the residuals from the Linear Model.

Summary

In this experiement we looked at 5 different factors dealing with Mustard Garlic plants. Through running linear models I found that seed pod number and leaf number are the most correlated(p-value<0.001, R-Squared=0.7904). For this test we rejected the null hypothesis because of the significance of thep p-value. The factors with the least correlation was the seed pod number and the and flower number(p-value=0.0863, R-Squared=0.0969). For this test we could not reject the p-value and there for there was no significance between these values. The data for flower number was transformed using the square root function. The data for seed pods was transformed using the square root function as well. The data for leaf data was also transformed using the log function. The reason for this finding is because when you think about a plant if the plant has more leaves it can photosynthesize better and have more energy to produce seeds. I think that the reason why there is such a small correlation between flower number and seed pod number is because they’re both different parts of reproduction so there isn’t a need to have a lot of both of them. Also it’s early in the season so most of the flowers probably haven’t bloomed yet.