Analysis of Rainfall and Temperature Data Using R.

Introduction

Climograph

A climograph is a graphical representation of a location’s basic climate. Climographs display data for two variables: (a) monthly average temperature and (b) monthly average precipitation. These are useful tools to quickly describe a location’s climate.

Package and Data installation.

Packages

The following packages were used according to the requirements of the code:

library(rJava)
library(xlsxjars)
library(xlsx)
library(plyr)
library(dplyr)
library(ggplot2)
library(reshape2)

The Packages have the following functions xlsx to read and manipulate excel files

plyr plyr is a set of tools for a common set of problems to split up a big data structure into homogeneous pieces, apply a function to each piece and then combine all the results back together.

dplyr dplyr is a grammar of data manipulation, providing a consistent set of verbs that help us to solve the most common data manipulation challenges.

ggplot2 ggplot2 is a R package dedicated to data visualization. It can greatly improve the quality and aesthetics of your graphics, and will make you much more efficient in creating them.

reshape reshape2 is an R package written by Hadley Wickham that makes it easy to transform data between wide and long formats.

rJava rJava provides a low-level bridge between R and Java

xlsxjars The xlsxjars package collects all the external jar required for the xlxs package.

Data

An csv file containing data for major stations of India was downloaded from Indian Meteorological and saved in the desired folder. The working directory was set to the folder containing the climate data and the climate data was loaded into R.

Compiling the Code

After installing all the prerequisites, the primary code for plotting the Climograph is as follows:

Reading the data

The data from the csv file is read using the following code:

data <- read.xlsx("datafile1.xls", sheetIndex = 1)

Editing the data for a suitable format

Now the file contains data of various stations of India.Only those which have stations having 100 year data was required and they were sorted out using the following line of code:

unique(data$Station.Name)
##   [1] "Abu"                   "Agartala (A)"          "Agra"                 
##   [4] "Ahmedabad"             "Aijal/Aizwal"          "Ajmer"                
##   [7] "Akola (A)"             "Allahabad"             "Ambikapur"            
##  [10] "Amini Divi"            "Amritsar (Rajasansi)"  "Anantpur"             
##  [13] "Androth"               "Aurangabad"            "Balasore"             
##  [16] "Bangalore"             "Bareilly"              "Baroda (A)"           
##  [19] "Belgaum Samra"         "Bhagalpur"             "Bhatinda"             
##  [22] "Bhopal (Bairagarh)"    "Bhubaneshwar (A)"      "Bhuj (Rudramata)"     
##  [25] "Bikaner"               "Cannanore"             "Chandigarh"           
##  [28] "Chennai (Minambakkam)" "Cherrapunji"           "Coimbatore (Pilamedu)"
##  [31] "Cooch Behar"           "Darjeeling"            "Dehra Dun"            
##  [34] "Dharmsala"             "Dibrugarh/Mohanbari"   "Gadag"                
##  [37] "Gangtok"               "Gaya"                  "Gopalpur"             
##  [40] "Gorakhpur"             "Gulbarga"              "Guwahati (Bhorjar)"   
##  [43] "Gwalior"               "Hassan"                "Hissar"               
##  [46] "Hyderabad (A)"         "Imphal/Tulihal"        "Indore"               
##  [49] "Jabalpur"              "Jagdalpur"             "Jaipur (Sanganer)"    
##  [52] "Jaisalmer"             "Jammu"                 "Jamshedpur"           
##  [55] "Jharsuguda"            "Jodhpur"               "Joshimath"            
##  [58] "Jullundur"             "Kanpur (A)"            "Kanyakumari"          
##  [61] "Karnal"                "Khajuraho"             "Kodaikanal"           
##  [64] "Kohima"                "Kolkata (Alipur)"      "Kota (A)"             
##  [67] "Kozhikode (A)"         "Lucknow (Amausi)"      "Ludhiana"             
##  [70] "Madurai (A)"           "Mahabaleshwar"         "Malda"                
##  [73] "Manali"                "Mangalore (Bajpe)"     "Masulipatnam"         
##  [76] "Minicoy"               "Mukteswar (Kumaun)"    "Mumbai (Santa Cruz)"  
##  [79] "Mysore"                "Nagpur (Sonegaon)"     "Nainital"             
##  [82] "Nasik"                 "New Delhi (Palam)"     "New Delhi (SFD)"      
##  [85] "Palakkad (Palghat)"    "Panjim"                "Parbhani"             
##  [88] "Pasighat"              "Patna (A)"             "Pondicherry (A)"      
##  [91] "Port Blair"            "Pune"                  "Raipur"               
##  [94] "Rajkot (A)"            "Ranchi (A)"            "Sambalpur"            
##  [97] "Shillong"              "Shimla"                "Silchar"              
## [100] "Solapur"               "Sri Niketan"           "Srinagar"             
## [103] "Surat"                 "Thiruvananthapuram"    "Tirupathy"            
## [106] "Tura"                  "Udaipur (Dabok)"       "Uthagamandalam"       
## [109] "Varanasi (Babatpur)"   "Vijayawada"            "Vishakhapatnam"
data100 <- data[data$No..of.Years == 100,]

Counting no of stations having 100 year data

unique(data100$Station.Name)
##  [1] "Abu"                   "Agra"                  "Ahmedabad"            
##  [4] "Ajmer"                 "Allahabad"             "Bangalore"            
##  [7] "Chennai (Minambakkam)" "Hassan"                "Jabalpur"             
## [10] "Jodhpur"               "Kodaikanal"            "Kolkata (Alipur)"     
## [13] "Masulipatnam"          "Mukteswar (Kumaun)"    "Mysore"               
## [16] "Nagpur (Sonegaon)"     "New Delhi (SFD)"       "Pune"                 
## [19] "Solapur"               "Thiruvananthapuram"

Now the required data set has been read properly by R. Before plotting the climograph, the column names of a few stations have been changed in order to better understand the climograph using the following code:

cnames <-colnames(data100)
 
cnames[1] <- "SN"
cnames[4] <- "years"
cnames[5] <- "Tmax"
cnames[6] <- "Tmin"
cnames[7] <- "Rainfall"

colnames(data100) <- cnames

data100$Month <- factor(as.character(data100$Month), c("January","February","March","April","May","June","July","August","September","October","November","December"))

Plotting the Climograph

The data frame is now ready to be presented in the graphical form. It is done using the “ggplot2” function. It is an extremely powerful function to represent data graphically. The following code is used:

data100 %>% ggplot(aes(x=Month,group=1))+
  geom_bar(aes(y=Rainfall/10,col="Rainfall"),fill="green",stat = "identity",alpha=0.5)+
  geom_line(aes(y=Tmax,col="Temp.Max"),size=1)+
  geom_line(aes(y=Tmin,col="Temp.Min"),size=1)+
  geom_point(aes(y=Tmax))+
  geom_point(aes(y=Tmin))+
  
  theme_bw()+
  facet_wrap(.~SN,scales= "free_y")+
theme(axis.text.x = element_text(angle = 45,vjust = 0.5))+
scale_y_continuous(sec.axis = sec_axis(~.*10, name= "Rainfall in mm"))+

labs(x="Month",y="Temperature(Degree Celsius)",colour = "Parameter",title="Climograph -100 years Average 1901 to 2000 ")+
  scale_colour_manual(values = c("darkgreen","red","blue"))

Climograph

The above climograph displays the 100 year average of rainfall and temperature data of the major stations of India for each month.

  • In the x axis the months of the year were plotted and in the y axis the amount of rainfall. The geom_bar feature was used to plot the rainfall data in bar format.

  • Individual graphs were specified for each station using the facet wrap function.

  • After having plotted the rainfall in bar format, the temperature was plotted in line format using geom_line function.

  • The same axis could not be used for temperature as both rainfall and temperature had different scales. Hence a secondary axis along the y axis was created to plot ‘temperature’ against ’month’in the climograph using sec.axis function.

  • Different colours were assigned to ‘Rainfall’ bars and maximum minimum ‘Temperature’ lines.

Rainfall according to the seasons

Categorizing months

The months were categorized according to seasons to find out about rainfall data for each season.

data100$Season <-NA
data100$Season[data100$Month %in% c("June","July","August","September")] <- "Monsoon"
data100$Season[data100$Month %in% c("October","November")] <- "Post Monsoon"
data100$Season[data100$Month %in% c("December","January","February")] <- "Winter"
data100$Season[data100$Month %in% c("March","April","May")] <- "Summer"

Calculating total rainfall for each season

The total rainfall for each season was calculated using the ddply function, and the same data was stored into a new data frame.

rainfall <- ddply(data100, .variables = c("SN","Season"),function(x){
  Rainfall <- sum(x$Rainfall, na.rm = T)
  data.frame(Rainfall)
})

Creating the graph

Representing the data into graphical form using ggplot2

rainfall %>% ggplot(aes(x=Season))+
  geom_bar(aes(y=Rainfall),stat="identity")+
  theme_bw()+
  facet_wrap(.~SN)

Reshaping the dataframe

The recently created season wise data frame is then reshaped for further analysis of data.

Converting the season data into columns, we’ll get a data set where season wise rainfall for each Station is obtained. This is done using the dcast function (Used to convert long form data into wide form)

rainfall2 <-dcast(rainfall,formula= SN~Season)

The above data is obtained (picture)

then the total seasonal rainfall for each station is obtained.

rainfall2$Total <- rainfall2$Monsoon+rainfall2$`Post Monsoon`+rainfall2$Summer+rainfall2$Winter

Temperature analysis.

The months which had minimum and maximum temperature for each station over a period of 100 years can be obtained.

Defining the function

At first, the mean value of maximum and minimum temperature of a month for each station is calculated

data100$Tmean <- (data100$Tmax+data100$Tmin)/2

Then ddply function is used to create a user defined function to get the maximum and minimum values of Tmean.

Also another function is defined to obtain the month in which minimum and maximum temperature was detected.

temp <-ddply(data100, .variables = "SN",function(x){
  Tmin <-min(x$Tmean)
  Tmax <- max(x$Tmean)
  
  MinMonth <- x$Month[x$Tmean == Tmin]
  MaxMonth <- x$Month[x$Tmean == Tmax]

  data.frame(Tmin,MinMonth,Tmax,MaxMonth)
  
  })

Use of the function

Month Analysis : The function is used to find out the frequency of the months for which maximum and minimum temperatures were detected

plot(temp$MinMonth)

plot(temp$MaxMonth)

Station Analysis : The function is used to find out the stations where maximum and minimum temperatures were detected over a 100 year period.

temp$SN[temp$Tmin == min(temp$Tmin)]
## [1] "Mukteswar (Kumaun)"
temp$SN[temp$Tmax == min(temp$Tmax)]
## [1] "Kodaikanal"

Data Output

The entire formatted data is then outputted as an excel file using the following code.

write.xlsx(data100,"Climatology.xlsx",sheetName = "Data",row.names = FALSE)

write.xlsx(rainfall2,"Climatology.xlsx",sheetName = "Rainfall",row.names = FALSE,append = TRUE)

write.xlsx(temp,"Climatology.xlsx",sheetName = "temperature",row.names = FALSE,append = TRUE)