Formative Assessment

Author

Erica Bass

Loading the necessary packages

library(tidyverse) # installs package containing ggplot2 and dyplr, needed for manipulating data and producing graphs.

Importing the data

Note

Remember to set the session’s working directory, before attempting to import data.

deer.data <- read.table("C:\\Users\\erica\\Documents\\Deer Data Set.txt") # imports text file as a dataframe

tibble(deer.data)

# A tibble: 33 × 3
   V1       V2    V3   
   <chr>    <chr> <chr>
 1 woodland roe   sika 
 2 120      832   1082 
 3 153      1010  1212 
 4 171      1032  1548 
 5 295      1001  1301 
 6 307      947   1136 
 7 325      1006  1509 
 8 336      928   1206 
 9 422      1015  1218 
10 498      840   1260 
# ℹ 23 more rows

Adjusting the data table

deer.data <- deer.data %>% # creates a new (altered) data set
  rename(Woodland = V1, Roe = V2, Sika = V3) %>% #renames column headers
  mutate(Row = row_number()) %>% # adds new variable of row numbers
  filter(row_number() %in% c(2:33)) # removes row 1 of data

Exploring the data

deer.data %>%
  str() # tells you which types of variables you have

'data.frame':   32 obs. of  4 variables:
 $ Woodland: chr  "120" "153" "171" "295" ...
 $ Roe     : chr  "832" "1010" "1032" "1001" ...
 $ Sika    : chr  "1082" "1212" "1548" "1301" ...
 $ Row     : int  2 3 4 5 6 7 8 9 10 11 ...

Above we can see that there are 3 character (or string) variables, and 1 integer variable.

Exploring the data will be easier if the variables for numbers of Roe and Sika deer, are converted from character to numeric.

deer.data <- deer.data %>% # creates new (altered data set)
  mutate_at(c('Roe', 'Sika'), as.numeric) # converts character variables to numeric

deer.data %>%
  str()

'data.frame':   32 obs. of  4 variables:
 $ Woodland: chr  "120" "153" "171" "295" ...
 $ Roe     : num  832 1010 1032 1001 947 ...
 $ Sika    : num  1082 1212 1548 1301 1136 ...
 $ Row     : int  2 3 4 5 6 7 8 9 10 11 ...

Now a statistical summary of the numbers of Roe and Sika deer can be produced.

deer.data %>%
  summary()

   Woodland              Roe              Sika           Row       
 Length:32          Min.   : 701.0   Min.   : 841   Min.   : 2.00  
 Class :character   1st Qu.: 840.0   1st Qu.:1076   1st Qu.: 9.75  
 Mode  :character   Median : 916.0   Median :1210   Median :17.50  
                    Mean   : 905.5   Mean   :1203   Mean   :17.50  
                    3rd Qu.:1002.2   3rd Qu.:1303   3rd Qu.:25.25  
                    Max.   :1062.0   Max.   :1593   Max.   :33.00

Analysing the data

As both of the variables of interest are quantitative, I would use a linear regression to analyse the data. This would show how strong the relationship is between the number of Roe and Sika deer at any given site.

A scatter plot, showing a regression line and standard error, can be produced to visualise this.

ggplot(deer.data, aes(x = Roe,
                      y = Sika)) + # determines position for each variable
  geom_point() + # produces scatter plot
  geom_smooth(method = "lm", # adds regression line
              se = TRUE) + # adds standard error to regression line
  labs(x = "Number of Roe Deer",
       y = "Number of Sika Deer",
       caption = "Figure 1. A comparison between the number of Sika and Roe deer across 32 woodland sites.")

Asking questions

Example statistical hypotheses:

There is a higher abundance of Sika deer in woodland habitats with large Roe deer populations.
Woodland habitats contain larger numbers of Sika deer than Roe deer.

Example scientific hypothesis:

Food availability in larger woodlands, increases total abundance of Roe and Sika deer.

Further information

The following additional information would be useful, to be able to explore the data better:

Size of each woodland
Age / sex distributions within species populations
Sampling at different time points eg. seasons