1 Question 1

A researcher was interested to see whether the bird density at a particular site can be explained by the total amount of foliage at that site. Data was collected at a random sample of 17 different California oak woodland sites in spring, during the bird breeding season.

The data is stored in the file Birds.csv and contains the variables:

Variable Description
Foliage An approximate measure of the total amount of foliage at a site. The units are called f.p. units (foliage profile units). The higher the f.p., the greater the amount of foliage.
Density A measure of the bird population density. It is simply the number of pairs of birds per hectare.

1.1 Question of interest/goal of the study

We wish to investigate the relationship between bird density and the total amount of foliage at California oak woodland sites.

1.2 Read in and inspect the data:

Birds.df=read.csv("Birds.csv", header=T)
plot(Density~Foliage, main="Bird density versus Amount of Foliage",data=Birds.df)

1.3 Comment on the plots

Looking at this plot, it is clear that there is a linear relationship were as Foliage increases, Density also increases. The scatter appears to be fairly constant and there do not appear to be any unusual data points.

1.4 Fit an appropriate linear model, including model checks and relevant output.

Birds.lm=lm(Density~Foliage,data=Birds.df)
modelcheck(Birds.lm)

summary(Birds.lm)
## 
## Call:
## lm(formula = Density ~ Foliage, data = Birds.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1379 -1.3054  0.2054  1.1470  2.9159 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.6178     1.5568   1.039    0.315    
## Foliage       0.2560     0.0469   5.458  6.6e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.664 on 15 degrees of freedom
## Multiple R-squared:  0.6651, Adjusted R-squared:  0.6428 
## F-statistic: 29.79 on 1 and 15 DF,  p-value: 6.601e-05
confint(Birds.lm)
##                  2.5 %    97.5 %
## (Intercept) -1.7003172 4.9359358
## Foliage      0.1560233 0.3559404
confint(Birds.lm)*5
##                  2.5 %    97.5 %
## (Intercept) -8.5015861 24.679679
## Foliage      0.7801166  1.779702

1.5 Create a scatter plot with the fitted line from your model superimposed over it.

plot(Density~Foliage, main="Bird density versus Amount of Foliage",data=Birds.df)

(abline(Birds.lm, col = "blue"))

## NULL

1.6 Method and Assumption Checks

Since we have a linear relationship in the data, we have fitted a simple linear regression model to our data. We have a random sample of sites, so assume they are independent of each other. The residuals show patternless scatter with fairly constant variability - so no problems. The normality checks don’t show any major problems and the Cook’s plot doesn’t reveal any unduly influential points. Overall, all the model assumptions are satisfied.

1.6.1 Complete the equation below:

Our model is:

\(Density_i=\beta_0 + \beta_1 \times Foliage_i+\epsilon_i\) where \(\epsilon_i \sim iid ~ N(0,\sigma^2)\)

1.6.2 Complete the statement

Our model explains 64% of the variation in the response variable.

1.7 Executive Summary

We are interested in whether the total amount of foliage at California oak woodland sites can be used to explain bird density.

Their is strong evidence that the total amount of foliage at California oak woodland sites can be used to explain bird density (p-value < 0.001). For a 5 unit increase in foliage profile units, density is estimated to increase by between 0.78 and 1.78 pairs of birds per hectare.