I am playing with the knitcitations package written by Boettiger (2014).

Potential Problem for Chapter 2

Do the questions seem appropriate? Should we ask anything else?

Consider the data frame PAMTEMP from the PASWR2 package (Arnholt 2014) which contains temperature and precipitation for Pamplona, Spain from January 1, 1990 to Decmber 31, 2010.

  • Create side-by-side violin plots of the variable tmean for each month.
    Make sure the level of month is correct. Hint: look at the examples for PAMTEMP. Characterize the pattern of side-by-side violin plots.
library(PASWR2)
levels(PAMTEMP$month)
 [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
PAMTEMP$month <- factor(PAMTEMP$month, levels = month.abb[1:12])
levels(PAMTEMP$month)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
ggplot(data = PAMTEMP) + 
  geom_violin(aes(x = month, y = tmean, fill = month)) + 
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "", y = "Temperature (Celcius)")

The center of each violin plot as one moves from January to July generally increases. As one moves from August to December the center of each violin plot decreases. There is a cyclical pattern of warming up and then cooling down as one goes through the year.

  • Create side-by-side violin plots of the variable tmean for each year.
    Characterize the pattern of side-by-side violin plots.
ggplot(data = PAMTEMP) + 
  geom_violin(aes(x = as.factor(year), y = tmean, fill = as.factor(year))) + 
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "", y = "Temperature (Celcius)") + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

There is no apparent pattern from the side-by-side violin plots of tmean. Temperature variation over the time period 1990 to 2010 for Pamplona, Spain appears similar.

  • Find the date for the minimum value of tmean.
PAMTEMP[which.min(PAMTEMP$tmean), ]
     tmax tmin precip day month year tmean
4285    2  -10    0.5  25   Dec 2001    -4

The minimum value of tmean is -4 \(^{\circ}\) C which occured on Dec 25, 2001.

  • Find the date for the maximum value of tmean.
PAMTEMP[which.max(PAMTEMP$tmean), ]
     tmax tmin precip day month year tmean
4873   39   23      0   5   Aug 2003    31

The maximum value of tmean is 31 \(^{\circ}\) C which occured on Aug 5, 2003.

  • How many days have reported a tmax value greater than 38 \(^{\circ}\) C?
sum(PAMTEMP$tmax > 38)
[1] 15

15 days reported a tmax value greater than 38 \(^{\circ}\) C.

  • Find the date for the maximum value of precip.
PAMTEMP[which.max(PAMTEMP$precip), ]
     tmax tmin precip day month year tmean
1455  8.6    4   69.2  25   Dec 1993   6.3

The maximum value of precip is 69.2 mm which occured on Dec 25, 1993.

  • Create a barplot showing the total precipitation by month for the period January 1, 1990 to Decmeber 31, 2010. Based on your barplot, which month has had the least amount of precipitation? Which month has had the greatest amount of precipitiation? Hint: use the plyr package (Wickham 2011) to create an appropriate data frame.
library(plyr)
SEL <- ddply(PAMTEMP, .(year, month), summarize, TP = sum(precip))
head(SEL)
  year month       TP
1 1990   Jan  31.1008
2 1990   Feb  26.8005
3 1990   Mar   9.3001
4 1990   Apr 121.1001
5 1990   May 120.5002
6 1990   Jun  77.0006
ggplot(data = SEL, aes(x = month, y = TP, fill = month)) + 
  geom_bar(stat = "identity") +
  labs(y = "Total Percipitation (1990-2010) in mm", x= "") +
  theme_bw() +
  guides(fill = FALSE)

August has the minimum total percipitation of all of the months for the period 1990-2010. November has the maximum total percipitation of all of the months for the period 1990-2010.

  • Create a barplot showing the total precipitation by year for the period January 1, 1990 to Decmeber 31, 2010. Based on your barplot, which year has had the least amount of precipitation? Which year has had the greatest amount of precipitiation? Hint: use the plyr package to create an appropriate data frame.
SELY <- ddply(PAMTEMP, .(year), summarize, TP = sum(precip))
head(SELY)
  year       TP
1 1990 692.5048
2 1991 704.0052
3 1992 902.8038
4 1993 752.1041
5 1994 638.8045
6 1995 582.3028
ggplot(data = SELY, aes(x = year, y = TP, fill = as.factor(year))) + 
  geom_bar(stat = "identity") +
  labs(x = "", y = "Total Percipitation (mm)") +
  theme_bw() +
  guides(fill = FALSE)

SELY[which.max(SELY$TP), ]
  year       TP
8 1997 929.2025
SELY[which.min(SELY$TP), ]
  year       TP
9 1998 566.2011

The greatest yearly total percipitiation on record (929.2025 mm) occurred in 1997. The least yearly total percipitiation on record (566.2011 mm) occurred in 1998.

  • Create a graph showing the maximum temperature versus year and the minimum temperature versus year. Does the graph suggest temperatures are becoming more extreme over time?
SEL <- ddply(PAMTEMP, .(year), summarize, Tmax = max(tmax), Tmin = min(tmin))
head(SEL)
  year Tmax Tmin
1 1990 37.0 -5.6
2 1991 38.2 -5.2
3 1992 36.4 -4.4
4 1993 37.6 -4.5
5 1994 37.2 -6.8
6 1995 40.0 -5.0
ggplot(data = SEL, aes(x = year, y = Tmax)) + 
  geom_line(color = "red") +
  geom_line(aes(x = year, y = Tmin), color = "blue") +
  theme_bw() +
  labs(y = "Temperature (Celcius)") + 
  geom_smooth(method = "lm", color = "red") + 
  geom_smooth(aes(x = year, y = Tmin), method = "lm")

Based on the graph, there is too much variability from year to year to make any statment about the weather becoming more extreme over time.

References

Arnholt, Alan T. 2014. PASWR2: Probability and Statistics with R, Second Edition.

Boettiger, Carl. 2014. knitcitations: Citations for Knitr Markdown Files. http://CRAN.R-project.org/package=knitcitations.

Wickham, Hadley. 2011. “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software 40 (1): 1–29. http://www.jstatsoft.org/v40/i01/.