1 Introduction
2 Data
This data set is meteorological data from the HI-SEAS weather station from four months (September through December 2016) between Mission IV and Mission V.
The data set contains such columns as: “wind direction”, “wind speed”, “humidity” and temperature. The response parameter that is to be predicted is: “Solar_radiation”. It contains measurements for the past 4 months and you have to predict the level of solar radiation. Just imagine that you’ve got solar energy batteries and you want to know will it be reasonable to use them in future?.
2.1 Reading data
SolarRadPrediction <- read_csv("Data/SolarRadPrediction.csv",
col_types = cols(Data = col_datetime(format = "%m/%d/%Y %H:%M:%S %p"),
Time = col_time(format = "%H:%M:%S"),
TimeSunRise = col_time(format = "%H:%M:%S"),
TimeSunSet = col_time(format = "%H:%M:%S")))
colnames(SolarRadPrediction)[8] <- "WindDirection"
dim(SolarRadPrediction)## [1] 32686 11
## tibble [32,686 x 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ UNIXTime : num [1:32686] 1.48e+09 1.48e+09 1.48e+09 1.48e+09 1.48e+09 ...
## $ Data : POSIXct[1:32686], format: "2016-09-29" "2016-09-29" ...
## $ Time : 'hms' num [1:32686] 23:55:26 23:50:23 23:45:26 23:40:21 ...
## ..- attr(*, "units")= chr "secs"
## $ Radiation : num [1:32686] 1.21 1.21 1.23 1.21 1.17 1.21 1.2 1.24 1.23 1.21 ...
## $ Temperature : num [1:32686] 48 48 48 48 48 48 49 49 49 49 ...
## $ Pressure : num [1:32686] 30.5 30.5 30.5 30.5 30.5 ...
## $ Humidity : num [1:32686] 59 58 57 60 62 64 72 71 80 85 ...
## $ WindDirection: num [1:32686] 177 177 159 138 105 ...
## $ Speed : num [1:32686] 5.62 3.37 3.37 3.37 5.62 5.62 6.75 5.62 4.5 4.5 ...
## $ TimeSunRise : 'hms' num [1:32686] 06:13:00 06:13:00 06:13:00 06:13:00 ...
## ..- attr(*, "units")= chr "secs"
## $ TimeSunSet : 'hms' num [1:32686] 18:13:00 18:13:00 18:13:00 18:13:00 ...
## ..- attr(*, "units")= chr "secs"
## - attr(*, "spec")=
## .. cols(
## .. UNIXTime = col_double(),
## .. Data = col_datetime(format = "%m/%d/%Y %H:%M:%S %p"),
## .. Time = col_time(format = "%H:%M:%S"),
## .. Radiation = col_double(),
## .. Temperature = col_double(),
## .. Pressure = col_double(),
## .. Humidity = col_double(),
## .. `WindDirection(Degrees)` = col_double(),
## .. Speed = col_double(),
## .. TimeSunRise = col_time(format = "%H:%M:%S"),
## .. TimeSunSet = col_time(format = "%H:%M:%S")
## .. )
| UNIXTime | Data | Time | Radiation | Temperature | Pressure | Humidity | WindDirection | Speed | TimeSunRise | TimeSunSet |
|---|---|---|---|---|---|---|---|---|---|---|
| 1475229326 | 2016-09-29 | 23:55:26 | 1.21 | 48 | 30.46 | 59 | 177.39 | 5.62 | 06:13:00 | 18:13:00 |
| 1475229023 | 2016-09-29 | 23:50:23 | 1.21 | 48 | 30.46 | 58 | 176.78 | 3.37 | 06:13:00 | 18:13:00 |
| 1475228726 | 2016-09-29 | 23:45:26 | 1.23 | 48 | 30.46 | 57 | 158.75 | 3.37 | 06:13:00 | 18:13:00 |
| 1475228421 | 2016-09-29 | 23:40:21 | 1.21 | 48 | 30.46 | 60 | 137.71 | 3.37 | 06:13:00 | 18:13:00 |
| 1475228124 | 2016-09-29 | 23:35:24 | 1.17 | 48 | 30.46 | 62 | 104.95 | 5.62 | 06:13:00 | 18:13:00 |
| 1475227824 | 2016-09-29 | 23:30:24 | 1.21 | 48 | 30.46 | 64 | 120.20 | 5.62 | 06:13:00 | 18:13:00 |
2.2 About data
Data set contains these variables,
- UNIXTIME: Unix form of time variable.
- Data: Date in format of yyyy-mm-dd
- Time: The local time in the format of hh:mm:ss 24-hr.
- Radiation: Solar radiation in watts per meter squared (\(1kg/s^3\)).
- Temperature: Temperature in degrees fahrenheit (\(^\circ F\)).
- Pressure: Barometric Pressure in \(Hg\).
- Humidity: Humidity precent.
- WindDirection: Wind derection in degrees.
- Speed: Wind speed in miles per hour (mph).
- TimeSunRise: Hawaii time of Sun rise.
- TimeSunSet: Hawaii time of Sun set.
I’m assuming that the location is Hawaii. Furthermore, the wind direction is measured clockwise from 0 degrees North.
3 EDA
3.2 Dependent variable Radiation
3.2.1 Radiation
3.2.1.1 Tables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.11 1.23 2.66 207.12 354.24 1601.26
| Date | Time | Radiation | Temperature | Pressure | Humidity | WindDirection | WindSpeed | TimeSunRise | TimeSunSet |
|---|---|---|---|---|---|---|---|---|---|
| 2016-09-04 | 12:15:04 | 1601.26 | 61 | 30.47 | 93 | 3.56 | 9 | 06:08:00 | 18:35:00 |
| Date | Time | Radiation | Temperature | Pressure | Humidity | WindDirection | WindSpeed | TimeSunRise | TimeSunSet |
|---|---|---|---|---|---|---|---|---|---|
| 2016-12-29 | 02:50:49 | 1.11 | 37 | 30.35 | 54 | 192.35 | 6.75 | 06:56:00 | 17:53:00 |
3.3 Handle Date variable
## m
## 9 10 11 12
## 7417 8821 8284 8164
3.4 Additional Tables
## Radiation Temperature Pressure Humidity
## Min. : 1.11 Min. :34.0 Min. :30.19 Min. : 8.00
## 1st Qu.: 1.23 1st Qu.:46.0 1st Qu.:30.40 1st Qu.: 56.00
## Median : 2.66 Median :50.0 Median :30.43 Median : 85.00
## Mean : 207.12 Mean :51.1 Mean :30.42 Mean : 75.02
## 3rd Qu.: 354.24 3rd Qu.:55.0 3rd Qu.:30.46 3rd Qu.: 97.00
## Max. :1601.26 Max. :71.0 Max. :30.56 Max. :103.00
## WindSpeed
## Min. : 0.000
## 1st Qu.: 3.370
## Median : 5.620
## Mean : 6.244
## 3rd Qu.: 7.870
## Max. :40.500
3.5 Additional Figures
# Plotting wind speed and wind direction over time
df1 <- df %>%
select(ws = WindSpeed, wd = WindDirection, date = Date)
## weekdays wind
polarFreq(mydata = df1, cols = "jet")## weekdays * season wind
# polarFreq(mydata = df1, cols = "jet", type = c("weekday", "season"))
#Correlations
#numData <- df[, -c(1,2,9,10)]
#ggpairs(numData)dfd <- df # new data frame since no need to change df
month <- month(df$Date)
day <- day(df$Date)
ggplot(dfd, aes(factor(month), Radiation)) +
geom_boxplot(aes(fill = factor(month))) +
ggtitle("Boxplot of Radiation values for each month") +
scale_x_discrete(labels = c("September", "Octomber", "November", "December")) +
scale_fill_discrete(name = "Months",
labels = c("September", "Octomber", "November", "December")) +
xlab("Month")# df <- SolarRadPrediction
# # any(is.na(df))
#
# getDate<-function(x,pos1,pos2){
# if(pos1==1){
# val<-as.numeric(strsplit(strsplit(as.character(x)," ")[[1]][pos1],'/')[[1]][pos2])
# }
# else if(pos1==3 & pos2==0){
# val<-as.factor(strsplit(strsplit(as.character(x)," ")[[1]][pos1],'/')[[1]])
# }
# return(val)
# }
#
# getTIME<-function(x,pos){
# val<-strsplit(as.character(x),":")[[1]][pos]
# return(as.numeric(val))
# }# df$Month <- sapply(df$Data, getDate,1,1)
# df$Day <- sapply(df$Data,getDate,1,2)
# df$Year <- sapply(df$Data,getDate,1,3)
# df$TimeAbbr <- sapply(df$Data,getDate,3,0)
# df$hour <- sapply(df$Time,getTIME,1)
# df$minute <- sapply(df$Time,getTIME,2)
# df$sec <- sapply(df$Time,getTIME,3)# mymonths <- c("January","February",
# "March","April","May","June","July",
# "August","September","October","November","December")
#
# df$MonthAbb <- mymonths[ df$Month ]
# df$ordered_month <- factor(df$MonthAbb, levels = month.name)
#
# df$DateTs<-as.POSIXct(paste0(df$Year,'-',
# df$Month,'-',
# df$Day,' ',
# as.character(df$Time)),
# format="%Y-%m-%d %H:%M:%S")
#
# df$DailyTs <- as.POSIXct(as.character(df$Time), format="%H:%M:%S")
#
# df$DiffTime<-as.numeric(difftime(as.POSIXct(paste0(df$Year,
# '-',df$Month,
# '-',df$Day,' ',
# as.character(df$TimeSunSet)),
# format="%Y-%m-%d %H:%M:%S"),
#
# as.POSIXct(paste0(df$Year,'-',
# df$Month,'-',
# df$Day,' ',
# as.character(df$TimeSunRise)),
# format="%Y-%m-%d %H:%M:%S"),
# units='sec'))# plot
# ggplot(data=df,aes(x=Radiation,fill=ordered_month)) +
# geom_histogram(bins=100) +
# scale_y_log10() +
# scale_fill_manual(name="",values=rainbow(4)) +
# theme(legend.position='top') +
# facet_wrap(~ordered_month) +
# xlab('Radiation level [W/m^-2]') + ylab('Count')
#
# # plot
# ggplot(data=df,aes(x=DiffTime,y=Radiation)) +
# geom_point(aes(color=ordered_month)) +
# scale_color_manual(name="",values=rainbow(4)) +
# theme(legend.position='top') + xlab("SunSet -SunRise [sec]")
#
# # plot
# df %>% select(ordered_month,Day,Radiation) %>%
# group_by(ordered_month,Day) %>%
# summarise(dailyRad = mean(Radiation)) %>%
# ggplot(aes(x=ordered_month,y=dailyRad,color=dailyRad)) +
# scale_color_gradientn(colours=rev(brewer.pal(10,'Spectral'))) +
# geom_boxplot(colour='black',size=.4,alpha=.5) +
# geom_jitter(shape=16,width=.2,size=2) +
# xlab('') + ylab('') + theme(legend.position='top')