DA 101 Lab 1: Garlic mustard This is the worksheet for the lab, worth 5% of your total course grade, with the parts to hand in as completed. Please hand it (the .html) in on Notebowl, along with your R code file (the .Rmd) before Wed 11:59pm next week.

While it is OK to talk through problems with a friend and to attend office hours to ask the Teaching Assistant for advice, your work should be your own, your answers should be reached based on your own understanding, and this assignment is meant to be completed individually.

summary(GarlicMustardData)
##    Pop_Code            Region          Collection_Date       Latitude    
##  Length:404         Length:404         Length:404         Min.   :33.11  
##  Class :character   Class :character   Class :character   1st Qu.:40.81  
##  Mode  :character   Mode  :character   Mode  :character   Median :43.72  
##                                                           Mean   :44.86  
##                                                           3rd Qu.:48.40  
##                                                           Max.   :57.02  
##                                                                          
##    Longitude           Altitude         Pop_Size      Pct_Canopy_Cover
##  Min.   :-123.406   Min.   :   0.0   Min.   :     3   Min.   :  0.00  
##  1st Qu.: -79.653   1st Qu.:  40.0   1st Qu.:    25   1st Qu.: 50.00  
##  Median : -71.759   Median : 164.0   Median :   100   Median : 70.00  
##  Mean   : -40.642   Mean   : 226.8   Mean   :  4601   Mean   : 65.09  
##  3rd Qu.:   8.908   3rd Qu.: 298.2   3rd Qu.:   450   3rd Qu.: 80.00  
##  Max.   :  42.015   Max.   :1711.5   Max.   :745000   Max.   :100.00  
##                                      NA's   :11       NA's   :13      
##     RosCount        AdultCount      RosDens          AdultDens     
##  Min.   :   0.0   Min.   :   0   Min.   :   0.00   Min.   :  0.00  
##  1st Qu.:  15.5   1st Qu.:  28   1st Qu.:   3.20   1st Qu.:  6.65  
##  Median :  72.0   Median :  61   Median :  16.58   Median : 14.80  
##  Mean   : 227.2   Mean   : 118   Mean   :  48.33   Mean   : 26.89  
##  3rd Qu.: 236.2   3rd Qu.: 142   3rd Qu.:  49.65   3rd Qu.: 30.05  
##  Max.   :6538.0   Max.   :1447   Max.   :1307.60   Max.   :413.43  
##                                                                    
##    TotalDens        AvgRosWidth      AvgAdultHeight      AvgNLeaves     
##  Min.   :   0.00   Min.   : 0.3643   Min.   :  5.293   Min.   :  0.000  
##  1st Qu.:  21.00   1st Qu.: 4.5068   1st Qu.: 55.600   1st Qu.:  7.814  
##  Median :  42.14   Median : 6.8226   Median : 71.333   Median : 10.538  
##  Mean   :  75.22   Mean   : 7.9126   Mean   : 71.845   Mean   : 13.931  
##  3rd Qu.:  88.65   3rd Qu.: 9.4900   3rd Qu.: 86.865   3rd Qu.: 15.767  
##  Max.   :1357.40   Max.   :46.2563   Max.   :148.600   Max.   :173.500  
##                    NA's   :41        NA's   :23        NA's   :34       
##    AvgNFruits          Herb             bio1             bio2       
##  Min.   :  0.00   Min.   :0.0000   Min.   : 3.540   Min.   : 6.213  
##  1st Qu.: 11.62   1st Qu.:0.1012   1st Qu.: 8.638   1st Qu.: 8.553  
##  Median : 20.21   Median :0.2468   Median : 9.511   Median :10.218  
##  Mean   : 31.53   Mean   :0.3280   Mean   : 9.671   Mean   :10.139  
##  3rd Qu.: 39.88   3rd Qu.:0.5041   3rd Qu.:10.771   3rd Qu.:11.437  
##  Max.   :421.00   Max.   :1.0000   Max.   :16.787   Max.   :15.799  
##  NA's   :19       NA's   :40                                        
##       bio3             bio4              bio5            bio6        
##  Min.   :0.2075   Min.   :0.01286   Min.   :18.46   Min.   :-17.732  
##  1st Qu.:0.2964   1st Qu.:0.02391   1st Qu.:23.33   1st Qu.: -8.265  
##  Median :0.3175   Median :0.02933   Median :26.57   Median : -6.327  
##  Mean   :0.3180   Mean   :0.02740   Mean   :26.24   Mean   : -5.981  
##  3rd Qu.:0.3315   3rd Qu.:0.03088   3rd Qu.:28.98   3rd Qu.: -3.465  
##  Max.   :0.4253   Max.   :0.04070   Max.   :34.37   Max.   :  4.429  
##                                                                      
##       bio7            bio8             bio9             bio10      
##  Min.   :16.95   Min.   :-1.519   Min.   :-8.9392   Min.   :13.74  
##  1st Qu.:27.29   1st Qu.:13.465   1st Qu.:-1.2444   1st Qu.:17.16  
##  Median :34.29   Median :17.038   Median : 0.4213   Median :19.50  
##  Mean   :32.22   Mean   :15.631   Mean   : 2.6439   Mean   :19.37  
##  3rd Qu.:36.66   3rd Qu.:19.256   3rd Qu.: 3.9786   3rd Qu.:21.58  
##  Max.   :45.38   Max.   :23.993   Max.   :23.7206   Max.   :25.85  
##                                                                    
##      bio11             bio12            bio13           bio14       
##  Min.   :-9.6822   Min.   : 409.0   Min.   :12.00   Min.   : 0.000  
##  1st Qu.:-1.8492   1st Qu.: 752.0   1st Qu.:19.23   1st Qu.: 8.292  
##  Median :-0.2288   Median : 959.0   Median :24.63   Median :11.788  
##  Mean   :-0.4633   Mean   : 925.8   Mean   :23.71   Mean   :12.325  
##  3rd Qu.: 1.1212   3rd Qu.:1123.0   3rd Qu.:26.70   3rd Qu.:17.116  
##  Max.   : 9.0717   Max.   :1530.4   Max.   :55.70   Max.   :22.647  
##                                                                     
##      bio15             bio16           bio17              bio18         
##  Min.   :0.06064   Min.   :141.6   Min.   :  0.4583   Min.   :  0.5635  
##  1st Qu.:0.11296   1st Qu.:230.9   1st Qu.:125.7550   1st Qu.:225.2529  
##  Median :0.17024   Median :303.5   Median :172.6889   Median :277.3571  
##  Mean   :0.21134   Mean   :287.7   Mean   :178.2972   Mean   :262.7103  
##  3rd Qu.:0.27428   3rd Qu.:324.5   3rd Qu.:250.4504   3rd Qu.:308.4265  
##  Max.   :1.04230   Max.   :674.0   Max.   :304.0691   Max.   :389.3464  
##                                                                         
##      bio19      
##  Min.   : 46.8  
##  1st Qu.:153.1  
##  Median :194.6  
##  Mean   :198.7  
##  3rd Qu.:260.8  
##  Max.   :637.3  
## 
myvars <- c("Latitude", "Longitude", "Altitude")

GarlicMustardGeo <- GarlicMustardData[myvars]

summary(GarlicMustardGeo)
##     Latitude       Longitude           Altitude     
##  Min.   :33.11   Min.   :-123.406   Min.   :   0.0  
##  1st Qu.:40.81   1st Qu.: -79.653   1st Qu.:  40.0  
##  Median :43.72   Median : -71.759   Median : 164.0  
##  Mean   :44.86   Mean   : -40.642   Mean   : 226.8  
##  3rd Qu.:48.40   3rd Qu.:   8.908   3rd Qu.: 298.2  
##  Max.   :57.02   Max.   :  42.015   Max.   :1711.5
GarlicMustard_subset <- select(GarlicMustardData,1,5:8)
  
summary(GarlicMustard_subset)
##    Pop_Code           Longitude           Altitude         Pop_Size     
##  Length:404         Min.   :-123.406   Min.   :   0.0   Min.   :     3  
##  Class :character   1st Qu.: -79.653   1st Qu.:  40.0   1st Qu.:    25  
##  Mode  :character   Median : -71.759   Median : 164.0   Median :   100  
##                     Mean   : -40.642   Mean   : 226.8   Mean   :  4601  
##                     3rd Qu.:   8.908   3rd Qu.: 298.2   3rd Qu.:   450  
##                     Max.   :  42.015   Max.   :1711.5   Max.   :745000  
##                                                         NA's   :11      
##  Pct_Canopy_Cover
##  Min.   :  0.00  
##  1st Qu.: 50.00  
##  Median : 70.00  
##  Mean   : 65.09  
##  3rd Qu.: 80.00  
##  Max.   :100.00  
##  NA's   :13
GM_filtered <- GarlicMustardData %>%
               filter(TotalDens>=4 & Altitude >=100)

summary(GM_filtered)
##    Pop_Code            Region          Collection_Date       Latitude    
##  Length:245         Length:245         Length:245         Min.   :33.11  
##  Class :character   Class :character   Class :character   1st Qu.:40.51  
##  Mode  :character   Mode  :character   Mode  :character   Median :41.97  
##                                                           Mean   :43.74  
##                                                           3rd Qu.:47.40  
##                                                           Max.   :57.02  
##                                                                          
##    Longitude          Altitude         Pop_Size      Pct_Canopy_Cover
##  Min.   :-123.35   Min.   : 102.0   Min.   :     3   Min.   : 0.00   
##  1st Qu.: -84.58   1st Qu.: 191.0   1st Qu.:    30   1st Qu.:60.00   
##  Median : -73.83   Median : 265.7   Median :   100   Median :70.00   
##  Mean   : -42.88   Mean   : 347.0   Mean   :  7207   Mean   :67.17   
##  3rd Qu.:  14.43   3rd Qu.: 380.0   3rd Qu.:   750   3rd Qu.:83.50   
##  Max.   :  42.02   Max.   :1711.5   Max.   :745000   Max.   :99.00   
##                                     NA's   :4        NA's   :10      
##     RosCount        AdultCount       RosDens          AdultDens     
##  Min.   :   0.0   Min.   :  0.0   Min.   :   0.00   Min.   :  0.00  
##  1st Qu.:  12.0   1st Qu.: 23.0   1st Qu.:   2.60   1st Qu.:  5.20  
##  Median :  69.0   Median : 54.0   Median :  14.50   Median : 12.00  
##  Mean   : 240.2   Mean   :107.7   Mean   :  50.62   Mean   : 24.75  
##  3rd Qu.: 264.0   3rd Qu.:122.0   3rd Qu.:  55.20   3rd Qu.: 27.60  
##  Max.   :6538.0   Max.   :952.0   Max.   :1307.60   Max.   :190.40  
##                                                                     
##    TotalDens        AvgRosWidth      AvgAdultHeight     AvgNLeaves    
##  Min.   :   4.20   Min.   : 0.3643   Min.   :  9.00   Min.   : 0.000  
##  1st Qu.:  19.00   1st Qu.: 4.6422   1st Qu.: 56.63   1st Qu.: 7.584  
##  Median :  40.00   Median : 6.8000   Median : 74.44   Median :10.628  
##  Mean   :  75.36   Mean   : 7.9430   Mean   : 73.22   Mean   :14.524  
##  3rd Qu.:  88.60   3rd Qu.: 9.5243   3rd Qu.: 88.18   3rd Qu.:18.654  
##  Max.   :1357.40   Max.   :46.2563   Max.   :148.60   Max.   :71.611  
##                    NA's   :26        NA's   :16       NA's   :19      
##    AvgNFruits          Herb             bio1             bio2       
##  Min.   :  0.00   Min.   :0.0000   Min.   : 3.742   Min.   : 6.377  
##  1st Qu.: 12.78   1st Qu.:0.1083   1st Qu.: 8.511   1st Qu.: 9.471  
##  Median : 21.60   Median :0.2558   Median : 9.555   Median :10.611  
##  Mean   : 34.05   Mean   :0.3417   Mean   : 9.601   Mean   :10.613  
##  3rd Qu.: 48.16   3rd Qu.:0.5359   3rd Qu.:10.611   3rd Qu.:11.881  
##  Max.   :269.71   Max.   :1.0000   Max.   :16.787   Max.   :15.799  
##  NA's   :14       NA's   :22                                        
##       bio3             bio4              bio5            bio6        
##  Min.   :0.2075   Min.   :0.01314   Min.   :18.46   Min.   :-17.210  
##  1st Qu.:0.2960   1st Qu.:0.02523   1st Qu.:24.50   1st Qu.: -8.874  
##  Median :0.3159   Median :0.03004   Median :27.07   Median : -6.826  
##  Mean   :0.3155   Mean   :0.02876   Mean   :26.87   Mean   : -6.967  
##  3rd Qu.:0.3268   3rd Qu.:0.03143   3rd Qu.:29.16   3rd Qu.: -5.242  
##  Max.   :0.4143   Max.   :0.04070   Max.   :34.37   Max.   :  4.429  
##                                                                      
##       bio7            bio8             bio9             bio10      
##  Min.   :17.16   Min.   : 1.989   Min.   :-8.9392   Min.   :13.74  
##  1st Qu.:30.16   1st Qu.:15.566   1st Qu.:-1.9976   1st Qu.:17.59  
##  Median :35.83   Median :17.651   Median :-0.4653   Median :19.74  
##  Mean   :33.84   Mean   :16.845   Mean   : 0.7556   Mean   :19.72  
##  3rd Qu.:37.33   3rd Qu.:19.376   3rd Qu.: 1.1841   3rd Qu.:21.58  
##  Max.   :45.38   Max.   :23.993   Max.   :23.7206   Max.   :25.85  
##                                                                    
##      bio11             bio12            bio13           bio14      
##  Min.   :-9.6251   Min.   : 409.0   Min.   :12.00   Min.   : 0.00  
##  1st Qu.:-2.5187   1st Qu.: 752.0   1st Qu.:21.16   1st Qu.: 6.67  
##  Median :-0.7929   Median : 959.0   Median :24.87   Median :11.06  
##  Mean   :-1.0962   Mean   : 912.7   Mean   :24.06   Mean   :11.61  
##  3rd Qu.: 0.1578   3rd Qu.:1101.0   3rd Qu.:26.68   3rd Qu.:15.69  
##  Max.   : 9.0717   Max.   :1394.0   Max.   :42.21   Max.   :22.65  
##                                                                    
##      bio15             bio16           bio17              bio18         
##  Min.   :0.06064   Min.   :141.6   Min.   :  0.4583   Min.   :  0.5635  
##  1st Qu.:0.13892   1st Qu.:257.0   1st Qu.:104.0951   1st Qu.:234.9915  
##  Median :0.19990   Median :304.6   Median :170.9110   Median :289.1817  
##  Mean   :0.24540   Mean   :291.7   Mean   :168.5065   Mean   :272.6435  
##  3rd Qu.:0.33071   3rd Qu.:331.4   3rd Qu.:226.3498   3rd Qu.:313.0674  
##  Max.   :1.04230   Max.   :485.7   Max.   :304.0691   Max.   :389.3464  
##                                                                         
##      bio19      
##  Min.   : 46.8  
##  1st Qu.:117.6  
##  Median :177.6  
##  Mean   :180.7  
##  3rd Qu.:231.3  
##  Max.   :484.0  
## 
##Plotting Information using ggplot:

ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) +geom_point()

## Above table is showing data regarding the code, whereas the Total Density is higher or equal to 4 and that additionally have an altitude of more or equal to 100.

ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) + geom_point()

This chart is comparing the Annual Precipitation to the average height of adult Garlic Mustard

ggplot(GarlicMustardData, aes(x=Region, y=AvgNFruits)) +
                          geom_boxplot() + 
                          scale_y_log10() +
                          labs(title = "Average Number of Fruits on Garlic Mustard")

#Europeam Garlic Mustard seems to have bigger quantities of fruit compared to North American ones as seen on the graph.

ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) + geom_point() + scale_y_log10()

ggplot(GarlicMustardData, aes(x = bio12, y = AvgAdultHeight)) + 
      geom_point(size = 3, color= "red") + 
      scale_y_log10() + labs(x="Anual Precipitation (bioclim)", y= "Average Adult Height (cm)", title= "Garlic Mustard in Europe and North America") + 
      annotation_logticks(sides = "1") +theme_bw()

#Precipitation seems to not have much influence over the plant’s growth and height.

ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight, color = Region)) + geom_point()

##Facet specification

ggplot(GarlicMustardData, aes(x=bio12, y=AvgAdultHeight)) + geom_point() + facet_wrap(~Region)

ggplot(GarlicMustardData, aes(x=Region)) + geom_bar()

ggplot(GarlicMustardData, aes(x=AvgNFruits)) + geom_histogram()

#Garlic Mustard plants do not seem to produce much fruits as in the graph it is shown how about 20 fruits seem to be the majority produced.

ggplot(GarlicMustardData, aes(x=AvgNFruits, fill= Region)) + 
      geom_histogram(bins=10) + scale_x_log10() +labs(x="Average Number of Fruits", y="Number of Studies", title = "Average Number of Fruits on Garlic Mustard") + theme_bw(base_size = 16)

NorthAmericaData <- GarlicMustardData %>% filter(Region == "NorthAm")


hist(NorthAmericaData$AvgAdultHeight)

ggplot(NorthAmericaData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard North America")

EuropeData <- GarlicMustardData %>% filter(Region == "Europe")

ggplot(EuropeData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard Europe")

mean(NorthAmericaData$AdultCount, na.rm = TRUE)
## [1] 126.2445
mean(EuropeData$AdultCount, na.rm = TRUE)
## [1] 107.3029

Calculation of the mean count of adult Garlic Mustard plants in order to be able to analyze with more precision whether there are more plants in Europe or in North America.

  1. Paste both of your ggplot histograms (Europe & North America) below. Add a title to your graph, or include a caption so it is obvious which one is which.
ggplot(NorthAmericaData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard North America")

ggplot(EuropeData, aes(AvgAdultHeight)) + geom_histogram() + labs(y="Count", x="Average Adult Height", title = "Average Adult Height of Garlic Mustard Europe")

  1. What do your histograms tell you about Garlic Mustard and the EICA hypothesis? How confident are you in your response? Regarding the EICA theory, how it states that due to Garlic Mustard coming to North America ‘without its native insect’, thus using the energy it would normally use to fight off these predators, then it can implement that energy into “making bigger plants”. Though, as seen in the histograms above, it can be seen how in Europe, there are bigger Garlic Mustard plants, demonstrating the hypothesis to be wrong as of plants growing bigger in an environment without these insects (butterfly/moth larvae). Even though, when we take a look at the amount of Garlic Mustard plants in Europe and in North America, there is a difference, where there are more of these plants in North America. As shown in the graph, the count of Garlic Mustard Plants in North America goes over 20 in one case and some ranging in between 15 and 20. Compared to Europe, in which the limit of the graph is 15, it demonstrates that without the presence of predators, Garlic Mustard plants are able to reproduce to higher quantities.

  2. Please write your command to calculate the mean height of plants in North America

mean(NorthAmericaData$AvgAdultHeight, na.rm = TRUE)
## [1] 67.11545
  1. What happened? Why do you think this happened? (2 points) At first, due to missing values in the AvgAdultHeight variable, the result showed up as N/A. Though, as “na.rm=TRUE” is added to the code, it allows for R to be able to calculate the mean by ignoring the values that show NA.

  2. What are the mean Adult Heights for each region? (2 points)

mean(EuropeData$AvgAdultHeight, na.rm = TRUE)
## [1] 77.53073
  1. So, do the histograms and means support EICA’s predictions about plant height? Why or why not? They do not support EICA’s predictions about the height of Garlic Mustard Plants, because as we see above, the mean Adult Height for Europe is about 77.53, whilst North America comes to be about 67.12. This shows how even though there is not the presence of Garlic Mustard predator insects in North America, the plants were not growing bigger as the EICA’s hypothesis mentions.

  2. Are there other variables of the plant in this dataset besides height that might measure plant size or plant health? Why or why not? If you think there are measures that represent these things, please list them below. If not, what would you measure instead? Other variables that could measure plant health could be the Average Number of Leaves variable. This is due to how probably a plant with a good amount of leaves comes to be a halthy plant, whereas one that lacks leaves may show signs of lack of nutrition. Additionally this might be involved with plant size because of how a plant that is healthier would tend to grow normally and reach of surpass the average plant height, yet one that does not count with a good health will probably not be able to grow to its fullest potential. Likewise, the number of fruits can be a factor that can measure its health since usually, unhealthy plants tend to not produce fruit, whikst healthy ones do. Thus, proving to be a factor that can help to see the health and welness of the plant.

  3. Please write your command to plot the relationship between annual precipitation (hint: look at the bioclim variables) and average adult plant height and paste the plot below.

ggplot(GarlicMustardData, aes(x=AvgAdultHeight, y=bio12)) + geom_point()

  1. Does there look like there is a relationship? If so, does it make sense to you? Why or why not? There seems to be no relation between both variables due to how the points in the graph are spread all over, showing neither a positive or negative tendency. The graph does not make much sense as it demonstrates how the anual precipitation (bio12) has no effect whatsoever in the Average Adult Height of the Garlic Mustard Plants. As it can be seen, there are plants that go above 100, with very low or very high levels of precipitation, proving how precipitation does not play a role in the amount a plant can grow regarding its height.

  2. Paste your finished boxplot into the worksheet and the code you used to create it. Please include a few sentences describing what your dependent and independent variables are, and if you think the plot suggests that there are differences or not. Why?

BoxPlotGrowthBio1 <- ggplot(GarlicMustardData, aes(AvgAdultHeight, na.rm=TRUE , bio1))

BoxPlotGrowthBio1 + geom_boxplot()

The variables that I chose to analyze where the Average Adult Height along with the Annual Mean Temperature. The independent variable in this case, would come to be the temperature, due to how it does not become affected by the variables in the study. Wheras, the height of the plant would come to be dependent of the temperature where the plants are growing due to the need of different conditions in order for the plant to be able to grow and develop. As seen on the boxplot, the temperature where most of the Garlic Mustard plants tend to grow range from about 8 to around 10.5.

Be sure to upload your work (both .rmd and .html) to Notebowl by the due date.