I am playing with the knitcitations package written by Boettiger (2014).

Potential Problem for Chapter 2

Do the questions seem appropriate? Should we ask anything else?

Consider the data frame PAMTEMP from the PASWR2 package (Arnholt 2014) which contains temperature and precipitation for Pamplona, Spain, from January 1, 1990, to Decmber 31, 2010.

  • Create side-by-side violin plots of the variable tmean for each month.
    Make sure the level of month is correct. Hint: look at the examples for PAMTEMP. Characterize the pattern of side-by-side violin plots.
library(PASWR2)
levels(PAMTEMP$month)
 [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep"
PAMTEMP$month <- factor(PAMTEMP$month, levels = month.abb[1:12])
levels(PAMTEMP$month)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
ggplot(data = PAMTEMP) + 
  geom_violin(aes(x = month, y = tmean, fill = month)) + 
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "", y = "Temperature (Celcius)")

The center of each violin plot as one moves from January to July generally increases. As one moves from August to December the center of each violin plot decreases. There is a cyclical pattern of warming and then cooling as one traverses the year.

  • Create side-by-side violin plots of the variable tmean for each year.
    Characterize the pattern of side-by-side violin plots.
ggplot(data = PAMTEMP) + 
  geom_violin(aes(x = as.factor(year), y = tmean, fill = as.factor(year))) + 
  theme_bw() + 
  guides(fill = FALSE) + 
  labs(x = "", y = "Temperature (Celcius)") + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

There is no apparent pattern from the side-by-side violin plots of tmean. Temperature variation over the time period 1990 to 2010 for Pamplona, Spain, appears similar.

  • Find the date for the minimum value of tmean.
PAMTEMP[which.min(PAMTEMP$tmean), ]
     tmax tmin precip day month year tmean
4285    2  -10    0.5  25   Dec 2001    -4

The minimum value of tmean is -4 \(^{\circ}\) C which occured on Dec 25, 2001.

  • Find the date for the maximum value of tmean.
PAMTEMP[which.max(PAMTEMP$tmean), ]
     tmax tmin precip day month year tmean
4873   39   23      0   5   Aug 2003    31

The maximum value of tmean is 31 \(^{\circ}\) C which occured on Aug 5, 2003.

  • How many days have reported a tmax value greater than 38 \(^{\circ}\) C?
sum(PAMTEMP$tmax > 38)
[1] 15

15 days reported a tmax value greater than 38 \(^{\circ}\) C.

  • Create a barplot showing the total precipitation by month for the period January 1, 1990, to Decmeber 31, 2010. Based on your barplot, which month has had the least amount of precipitation? Which month has had the greatest amount of precipitiation? Hint: use the plyr package (Wickham 2011) to create an appropriate data frame.
library(plyr)
SEL <- ddply(PAMTEMP, .(year, month), summarize, TP = sum(precip))
head(SEL)
  year month       TP
1 1990   Jan  31.1008
2 1990   Feb  26.8005
3 1990   Mar   9.3001
4 1990   Apr 121.1001
5 1990   May 120.5002
6 1990   Jun  77.0006
ggplot(data = SEL, aes(x = month, y = TP, fill = month)) + 
  geom_bar(stat = "identity") +
  labs(y = "Total Percipitation (1990-2010) in mm", x= "") +
  theme_bw() +
  guides(fill = FALSE)

August has the minimum total precipitation of all of the months for the period 1990-2010. November has the maximum total precipitation of all of the months for the period 1990-2010.

  • Create a barplot showing the total precipitation by year for the period January 1, 1990, to Decmeber 31, 2010. Based on your barplot, which year had the least amount of precipitation? Which year had the greatest amount of precipitiation? Hint: use the plyr package to create an appropriate data frame.
SELY <- ddply(PAMTEMP, .(year), summarize, TP = sum(precip))
head(SELY)
  year       TP
1 1990 692.5048
2 1991 704.0052
3 1992 902.8038
4 1993 752.1041
5 1994 638.8045
6 1995 582.3028
ggplot(data = SELY, aes(x = year, y = TP, fill = as.factor(year))) + 
  geom_bar(stat = "identity") +
  labs(x = "", y = "Total Percipitation (mm)") +
  theme_bw() +
  guides(fill = FALSE)

SELY[which.max(SELY$TP), ]
  year       TP
8 1997 929.2025
SELY[which.min(SELY$TP), ]
  year       TP
9 1998 566.2011

The greatest yearly total precipitiation on record (929.2025 mm) occurred in 1997. The least yearly total precipitiation on record (566.2011 mm) occurred in 1998.

References

Arnholt, Alan T. 2014. PASWR2: Probability and Statistics with R, Second Edition.

Boettiger, Carl. 2014. knitcitations: Citations for Knitr Markdown Files. http://CRAN.R-project.org/package=knitcitations.

Wickham, Hadley. 2011. “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software 40 (1): 1–29. http://www.jstatsoft.org/v40/i01/.