DA 101 Lab 1: Garlic mustard This is the worksheet for the lab, worth 5% of your total course grade, with the parts to hand in as completed. Please hand it (the .html) in on Notebowl, along with your R code file (the .Rmd) before Wed 11:59pm next week.
While it is OK to talk through problems with a friend and to attend office hours to ask the Teaching Assistant for advice, your work should be your own, your answers should be reached based on your own understanding, and this assignment is meant to be completed individually.
summary(GarlicMustardData)
## Pop_Code Region Collection_Date Latitude
## Length:404 Length:404 Length:404 Min. :33.11
## Class :character Class :character Class :character 1st Qu.:40.81
## Mode :character Mode :character Mode :character Median :43.72
## Mean :44.86
## 3rd Qu.:48.40
## Max. :57.02
##
## Longitude Altitude Pop_Size Pct_Canopy_Cover
## Min. :-123.406 Min. : 0.0 Min. : 3 Min. : 0.00
## 1st Qu.: -79.653 1st Qu.: 40.0 1st Qu.: 25 1st Qu.: 50.00
## Median : -71.759 Median : 164.0 Median : 100 Median : 70.00
## Mean : -40.642 Mean : 226.8 Mean : 4601 Mean : 65.09
## 3rd Qu.: 8.908 3rd Qu.: 298.2 3rd Qu.: 450 3rd Qu.: 80.00
## Max. : 42.015 Max. :1711.5 Max. :745000 Max. :100.00
## NA's :11 NA's :13
## RosCount AdultCount RosDens AdultDens
## Min. : 0.0 Min. : 0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 15.5 1st Qu.: 28 1st Qu.: 3.20 1st Qu.: 6.65
## Median : 72.0 Median : 61 Median : 16.58 Median : 14.80
## Mean : 227.2 Mean : 118 Mean : 48.33 Mean : 26.89
## 3rd Qu.: 236.2 3rd Qu.: 142 3rd Qu.: 49.65 3rd Qu.: 30.05
## Max. :6538.0 Max. :1447 Max. :1307.60 Max. :413.43
##
## TotalDens AvgRosWidth AvgAdultHeight AvgNLeaves
## Min. : 0.00 Min. : 0.3643 Min. : 5.293 Min. : 0.000
## 1st Qu.: 21.00 1st Qu.: 4.5068 1st Qu.: 55.600 1st Qu.: 7.814
## Median : 42.14 Median : 6.8226 Median : 71.333 Median : 10.538
## Mean : 75.22 Mean : 7.9126 Mean : 71.845 Mean : 13.931
## 3rd Qu.: 88.65 3rd Qu.: 9.4900 3rd Qu.: 86.865 3rd Qu.: 15.767
## Max. :1357.40 Max. :46.2563 Max. :148.600 Max. :173.500
## NA's :41 NA's :23 NA's :34
## AvgNFruits Herb bio1 bio2
## Min. : 0.00 Min. :0.0000 Min. : 3.540 Min. : 6.213
## 1st Qu.: 11.62 1st Qu.:0.1012 1st Qu.: 8.638 1st Qu.: 8.553
## Median : 20.21 Median :0.2468 Median : 9.511 Median :10.218
## Mean : 31.53 Mean :0.3280 Mean : 9.671 Mean :10.139
## 3rd Qu.: 39.88 3rd Qu.:0.5041 3rd Qu.:10.771 3rd Qu.:11.437
## Max. :421.00 Max. :1.0000 Max. :16.787 Max. :15.799
## NA's :19 NA's :40
## bio3 bio4 bio5 bio6
## Min. :0.2075 Min. :0.01286 Min. :18.46 Min. :-17.732
## 1st Qu.:0.2964 1st Qu.:0.02391 1st Qu.:23.33 1st Qu.: -8.265
## Median :0.3175 Median :0.02933 Median :26.57 Median : -6.327
## Mean :0.3180 Mean :0.02740 Mean :26.24 Mean : -5.981
## 3rd Qu.:0.3315 3rd Qu.:0.03088 3rd Qu.:28.98 3rd Qu.: -3.465
## Max. :0.4253 Max. :0.04070 Max. :34.37 Max. : 4.429
##
## bio7 bio8 bio9 bio10
## Min. :16.95 Min. :-1.519 Min. :-8.9392 Min. :13.74
## 1st Qu.:27.29 1st Qu.:13.465 1st Qu.:-1.2444 1st Qu.:17.16
## Median :34.29 Median :17.038 Median : 0.4213 Median :19.50
## Mean :32.22 Mean :15.631 Mean : 2.6439 Mean :19.37
## 3rd Qu.:36.66 3rd Qu.:19.256 3rd Qu.: 3.9786 3rd Qu.:21.58
## Max. :45.38 Max. :23.993 Max. :23.7206 Max. :25.85
##
## bio11 bio12 bio13 bio14
## Min. :-9.6822 Min. : 409.0 Min. :12.00 Min. : 0.000
## 1st Qu.:-1.8492 1st Qu.: 752.0 1st Qu.:19.23 1st Qu.: 8.292
## Median :-0.2288 Median : 959.0 Median :24.63 Median :11.788
## Mean :-0.4633 Mean : 925.8 Mean :23.71 Mean :12.325
## 3rd Qu.: 1.1212 3rd Qu.:1123.0 3rd Qu.:26.70 3rd Qu.:17.116
## Max. : 9.0717 Max. :1530.4 Max. :55.70 Max. :22.647
##
## bio15 bio16 bio17 bio18
## Min. :0.06064 Min. :141.6 Min. : 0.4583 Min. : 0.5635
## 1st Qu.:0.11296 1st Qu.:230.9 1st Qu.:125.7550 1st Qu.:225.2529
## Median :0.17024 Median :303.5 Median :172.6889 Median :277.3571
## Mean :0.21134 Mean :287.7 Mean :178.2972 Mean :262.7103
## 3rd Qu.:0.27428 3rd Qu.:324.5 3rd Qu.:250.4504 3rd Qu.:308.4265
## Max. :1.04230 Max. :674.0 Max. :304.0691 Max. :389.3464
##
## bio19
## Min. : 46.8
## 1st Qu.:153.1
## Median :194.6
## Mean :198.7
## 3rd Qu.:260.8
## Max. :637.3
##
myvars <- c("Latitude", "Longitude", "Altitude")
GarlicMustardGeo <- GarlicMustardData[myvars]
summary(GarlicMustardGeo)
## Latitude Longitude Altitude
## Min. :33.11 Min. :-123.406 Min. : 0.0
## 1st Qu.:40.81 1st Qu.: -79.653 1st Qu.: 40.0
## Median :43.72 Median : -71.759 Median : 164.0
## Mean :44.86 Mean : -40.642 Mean : 226.8
## 3rd Qu.:48.40 3rd Qu.: 8.908 3rd Qu.: 298.2
## Max. :57.02 Max. : 42.015 Max. :1711.5
GarlicMustard_subset <- select(GarlicMustardData,1,5:8)
summary(GarlicMustard_subset)
## Pop_Code Longitude Altitude Pop_Size
## Length:404 Min. :-123.406 Min. : 0.0 Min. : 3
## Class :character 1st Qu.: -79.653 1st Qu.: 40.0 1st Qu.: 25
## Mode :character Median : -71.759 Median : 164.0 Median : 100
## Mean : -40.642 Mean : 226.8 Mean : 4601
## 3rd Qu.: 8.908 3rd Qu.: 298.2 3rd Qu.: 450
## Max. : 42.015 Max. :1711.5 Max. :745000
## NA's :11
## Pct_Canopy_Cover
## Min. : 0.00
## 1st Qu.: 50.00
## Median : 70.00
## Mean : 65.09
## 3rd Qu.: 80.00
## Max. :100.00
## NA's :13
GM_filtered <- GarlicMustardData %>%
filter(TotalDens>=4 & Altitude >=100)
summary(GM_filtered)
## Pop_Code Region Collection_Date Latitude
## Length:245 Length:245 Length:245 Min. :33.11
## Class :character Class :character Class :character 1st Qu.:40.51
## Mode :character Mode :character Mode :character Median :41.97
## Mean :43.74
## 3rd Qu.:47.40
## Max. :57.02
##
## Longitude Altitude Pop_Size Pct_Canopy_Cover
## Min. :-123.35 Min. : 102.0 Min. : 3 Min. : 0.00
## 1st Qu.: -84.58 1st Qu.: 191.0 1st Qu.: 30 1st Qu.:60.00
## Median : -73.83 Median : 265.7 Median : 100 Median :70.00
## Mean : -42.88 Mean : 347.0 Mean : 7207 Mean :67.17
## 3rd Qu.: 14.43 3rd Qu.: 380.0 3rd Qu.: 750 3rd Qu.:83.50
## Max. : 42.02 Max. :1711.5 Max. :745000 Max. :99.00
## NA's :4 NA's :10
## RosCount AdultCount RosDens AdultDens
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 12.0 1st Qu.: 23.0 1st Qu.: 2.60 1st Qu.: 5.20
## Median : 69.0 Median : 54.0 Median : 14.50 Median : 12.00
## Mean : 240.2 Mean :107.7 Mean : 50.62 Mean : 24.75
## 3rd Qu.: 264.0 3rd Qu.:122.0 3rd Qu.: 55.20 3rd Qu.: 27.60
## Max. :6538.0 Max. :952.0 Max. :1307.60 Max. :190.40
##
## TotalDens AvgRosWidth AvgAdultHeight AvgNLeaves
## Min. : 4.20 Min. : 0.3643 Min. : 9.00 Min. : 0.000
## 1st Qu.: 19.00 1st Qu.: 4.6422 1st Qu.: 56.63 1st Qu.: 7.584
## Median : 40.00 Median : 6.8000 Median : 74.44 Median :10.628
## Mean : 75.36 Mean : 7.9430 Mean : 73.22 Mean :14.524
## 3rd Qu.: 88.60 3rd Qu.: 9.5243 3rd Qu.: 88.18 3rd Qu.:18.654
## Max. :1357.40 Max. :46.2563 Max. :148.60 Max. :71.611
## NA's :26 NA's :16 NA's :19
## AvgNFruits Herb bio1 bio2
## Min. : 0.00 Min. :0.0000 Min. : 3.742 Min. : 6.377
## 1st Qu.: 12.78 1st Qu.:0.1083 1st Qu.: 8.511 1st Qu.: 9.471
## Median : 21.60 Median :0.2558 Median : 9.555 Median :10.611
## Mean : 34.05 Mean :0.3417 Mean : 9.601 Mean :10.613
## 3rd Qu.: 48.16 3rd Qu.:0.5359 3rd Qu.:10.611 3rd Qu.:11.881
## Max. :269.71 Max. :1.0000 Max. :16.787 Max. :15.799
## NA's :14 NA's :22
## bio3 bio4 bio5 bio6
## Min. :0.2075 Min. :0.01314 Min. :18.46 Min. :-17.210
## 1st Qu.:0.2960 1st Qu.:0.02523 1st Qu.:24.50 1st Qu.: -8.874
## Median :0.3159 Median :0.03004 Median :27.07 Median : -6.826
## Mean :0.3155 Mean :0.02876 Mean :26.87 Mean : -6.967
## 3rd Qu.:0.3268 3rd Qu.:0.03143 3rd Qu.:29.16 3rd Qu.: -5.242
## Max. :0.4143 Max. :0.04070 Max. :34.37 Max. : 4.429
##
## bio7 bio8 bio9 bio10
## Min. :17.16 Min. : 1.989 Min. :-8.9392 Min. :13.74
## 1st Qu.:30.16 1st Qu.:15.566 1st Qu.:-1.9976 1st Qu.:17.59
## Median :35.83 Median :17.651 Median :-0.4653 Median :19.74
## Mean :33.84 Mean :16.845 Mean : 0.7556 Mean :19.72
## 3rd Qu.:37.33 3rd Qu.:19.376 3rd Qu.: 1.1841 3rd Qu.:21.58
## Max. :45.38 Max. :23.993 Max. :23.7206 Max. :25.85
##
## bio11 bio12 bio13 bio14
## Min. :-9.6251 Min. : 409.0 Min. :12.00 Min. : 0.00
## 1st Qu.:-2.5187 1st Qu.: 752.0 1st Qu.:21.16 1st Qu.: 6.67
## Median :-0.7929 Median : 959.0 Median :24.87 Median :11.06
## Mean :-1.0962 Mean : 912.7 Mean :24.06 Mean :11.61
## 3rd Qu.: 0.1578 3rd Qu.:1101.0 3rd Qu.:26.68 3rd Qu.:15.69
## Max. : 9.0717 Max. :1394.0 Max. :42.21 Max. :22.65
##
## bio15 bio16 bio17 bio18
## Min. :0.06064 Min. :141.6 Min. : 0.4583 Min. : 0.5635
## 1st Qu.:0.13892 1st Qu.:257.0 1st Qu.:104.0951 1st Qu.:234.9915
## Median :0.19990 Median :304.6 Median :170.9110 Median :289.1817
## Mean :0.24540 Mean :291.7 Mean :168.5065 Mean :272.6435
## 3rd Qu.:0.33071 3rd Qu.:331.4 3rd Qu.:226.3498 3rd Qu.:313.0674
## Max. :1.04230 Max. :485.7 Max. :304.0691 Max. :389.3464
##
## bio19
## Min. : 46.8
## 1st Qu.:117.6
## Median :177.6
## Mean :180.7
## 3rd Qu.:231.3
## Max. :484.0
##
##Plotting Information using ggplot:
ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) +geom_point()
## Above table is showing data regarding the code, whereas the Total Density is higher or equal to 4 and that additionally have an altitude of more or equal to 100.
ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) + geom_point()
ggplot(GarlicMustardData, aes(x=Region, y=AvgNFruits)) +
geom_boxplot() +
scale_y_log10() +
labs(title = "Average Number of Fruits on Garlic Mustard")
#Europeam Garlic Mustard seems to have bigger quantities of fruit compared to North American ones as seen on the graph.
ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) + geom_point() + scale_y_log10()
ggplot(GarlicMustardData, aes(x = bio12, y = AvgAdultHeight)) +
geom_point(size = 3, color= "red") +
scale_y_log10() + labs(x="Anual Precipitation (bioclim)", y= "Average Adult Height (cm)", title= "Garlic Mustard in Europe and North America") +
annotation_logticks(sides = "1") +theme_bw()
#Precipitation seems to not have much influence over the plant’s growth and height.
ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight, color = Region)) + geom_point()
##Facet specification
ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) + geom_point() + facet_wrap(~Region)
ggplot(GarlicMustardData, aes(x=Region)) + geom_bar()
ggplot(GarlicMustardData, aes(x=AvgNFruits)) + geom_histogram()
#Garlic Mustard plants do not seem to produce much fruits as in the graph it is shown how about 20 fruits seem to be the majority produced.
ggplot(GarlicMustardData, aes(x=AvgNFruits, fill= Region)) +
geom_histogram(bins=10) + scale_x_log10() +labs(x="Average Number of Fruits", y="Number of Studies", title = "Average Number of Fruits on Garlic Mustard") + theme_bw(base_size = 16)
NorthAmericaData <- GarlicMustardData %>% filter(Region == "NorthAm")
hist(NorthAmericaData$AvgAdultHeight)
ggplot(NorthAmericaData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard North America")
EuropeData <- GarlicMustardData %>% filter(Region == "Europe")
ggplot(EuropeData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard Europe")
mean(NorthAmericaData$AdultCount, na.rm = TRUE)
## [1] 126.2445
mean(EuropeData$AdultCount, na.rm = TRUE)
## [1] 107.3029
ggplot(NorthAmericaData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard North America")
ggplot(EuropeData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard Europe")
What do your histograms tell you about Garlic Mustard and the EICA hypothesis? How confident are you in your response? Regarding the EICA theory, how it states that due to Garlic Mustard coming to North America ‘without its native insect’, thus using the energy it would normally use to fight off these predators, then it can implement that energy into “making bigger plants”. Though, as seen in the histograms above, it can be seen how in Europe, there are bigger Garlic Mustard plants, demonstrating the hypothesis to be wrong as of plants growing bigger in an environment without these insects (butterfly/moth larvae). Even though, when we take a look at the amount of Garlic Mustard plants in Europe and in North America, there is a difference, where there are more of these plants in North America. As shown in the graph, the count of Garlic Mustard Plants in North America goes over 20 in one case and some ranging in between 15 and 20. Compared to Europe, in which the limit of the graph is 15, it demonstrates that without the presence of predators, Garlic Mustard plants are able to reproduce to higher quantities.
Please write your command to calculate the mean height of plants in North America
mean(NorthAmericaData$AvgAdultHeight, na.rm = TRUE)
## [1] 67.11545
What happened? Why do you think this happened? (2 points) At first, due to missing values in the AvgAdultHeight variable, the result showed up as N/A. Though, as “na.rm=TRUE” is added to the code, it allows for R to be able to calculate the mean by ignoring the values that show NA.
What are the mean Adult Heights for each region? (2 points)
mean(EuropeData$AvgAdultHeight, na.rm = TRUE)
## [1] 77.53073
Europe:77.53073
North America : 67.11545
So, do the histograms and means support EICA’s predictions about plant height? Why or why not? They do not support EICA’s predictions about the height of Garlic Mustard Plants, because as we see above, the mean Adult Height for Europe is about 77.53, whilst North America comes to be about 67.12. This shows how even though there is not the presence of Garlic Mustard predator insects in North America, the plants were not growing bigger as the EICA’s hypothesis mentions.
Are there other variables of the plant in this dataset besides height that might measure plant size or plant health? Why or why not? If you think there are measures that represent these things, please list them below. If not, what would you measure instead? Other variables that could measure plant health could be the Average Number of Leaves variable. This is due to how probably a plant with a good amount of leaves comes to be a halthy plant, whereas one that lacks leaves may show signs of lack of nutrition. Additionally this might be involved with plant size because of how a plant that is healthier would tend to grow normally and reach of surpass the average plant height, yet one that does not count with a good health will probably not be able to grow to its fullest potential. Likewise, the number of fruits can be a factor that can measure its health since usually, unhealthy plants tend to not produce fruit, whikst healthy ones do. Thus, proving to be a factor that can help to see the health and welness of the plant.
Please write your command to plot the relationship between annual precipitation (hint: look at the bioclim variables) and average adult plant height and paste the plot below.
ggplot(GarlicMustardData, aes(x=AvgAdultHeight, y=bio12)) + geom_point()
Does there look like there is a relationship? If so, does it make sense to you? Why or why not? There seems to be no relation between both variables due to how the points in the graph are spread all over, showing neither a positive or negative tendency. The graph does not make much sense as it demonstrates how the anual precipitation (bio12) has no effect whatsoever in the Average Adult Height of the Garlic Mustard Plants. As it can be seen, there are plants that go above 100, with very low or very high levels of precipitation, proving how precipitation does not play a role in the amount a plant can grow regarding its height.
Paste your finished boxplot into the worksheet and the code you used to create it. Please include a few sentences describing what your dependent and independent variables are, and if you think the plot suggests that there are differences or not. Why?
BoxPlotGrowthBio1 <- ggplot(GarlicMustardData, aes(AvgAdultHeight, na.rm=TRUE , bio1))
BoxPlotGrowthBio1 + geom_boxplot()
The variables that I chose to analyze where the Average Adult Height along with the Annual Mean Temperature. The independent variable in this case, would come to be the temperature, due to how it does not become affected by the variables in the study. Wheras, the height of the plant would come to be dependent of the temperature where the plants are growing due to the need of different conditions in order for the plant to be able to grow and develop. As seen on the boxplot, the temperature where most of the Garlic Mustard plants tend to grow range from about 8 to around 10.5.
Be sure to upload your work (both .rmd and .html) to Notebowl by the due date.