Loading data

Suicide Rates

These datasets are meteorological data from the HI-SEAS weather station from four months (September through December 2016) between Mission IV and Mission V.

For each dataset, the fields are:

A row number (1-n) useful in sorting this export’s results The UNIX time_t date (seconds since Jan 1, 1970). Useful in sorting this export’s results with other export’s results The date in yyyy-mm-dd format The local time of day in hh:mm:ss 24-hour format The numeric data, if any (may be an empty string) The text data, if any (may be an empty string)

The units of each dataset are:

Solar radiation: watts per meter^2 Temperature: degrees Fahrenheit Humidity: percent Barometric pressure: Hg Wind direction: degrees Wind speed: miles per hour Sunrise/sunset: Hawaii time

General dataset parameters

First 5 rows

library(knitr)
kable(SolarPrediction[1:5,], caption = "dataset parameters")
dataset parameters
UNIXTime Data Time Radiation Temperature Pressure Humidity WindDirection.Degrees. Speed TimeSunRise TimeSunSet
1475229326 9/29/2016 12:00:00 AM 23:55:26 1.21 48 30.46 59 177.39 5.62 06:13:00 18:13:00
1475229023 9/29/2016 12:00:00 AM 23:50:23 1.21 48 30.46 58 176.78 3.37 06:13:00 18:13:00
1475228726 9/29/2016 12:00:00 AM 23:45:26 1.23 48 30.46 57 158.75 3.37 06:13:00 18:13:00
1475228421 9/29/2016 12:00:00 AM 23:40:21 1.21 48 30.46 60 137.71 3.37 06:13:00 18:13:00
1475228124 9/29/2016 12:00:00 AM 23:35:24 1.17 48 30.46 62 104.95 5.62 06:13:00 18:13:00

Dataset dimension

SolarPrediction.dim <- dim(SolarPrediction)

Dataset contains 27820 rows and 12 columns #### Columns list

str(SolarPrediction)
## 'data.frame':    32686 obs. of  11 variables:
##  $ UNIXTime              : int  1475229326 1475229023 1475228726 1475228421 1475228124 1475227824 1475227519 1475227222 1475226922 1475226622 ...
##  $ Data                  : chr  "9/29/2016 12:00:00 AM" "9/29/2016 12:00:00 AM" "9/29/2016 12:00:00 AM" "9/29/2016 12:00:00 AM" ...
##  $ Time                  : chr  "23:55:26" "23:50:23" "23:45:26" "23:40:21" ...
##  $ Radiation             : num  1.21 1.21 1.23 1.21 1.17 1.21 1.2 1.24 1.23 1.21 ...
##  $ Temperature           : int  48 48 48 48 48 48 49 49 49 49 ...
##  $ Pressure              : num  30.5 30.5 30.5 30.5 30.5 ...
##  $ Humidity              : int  59 58 57 60 62 64 72 71 80 85 ...
##  $ WindDirection.Degrees.: num  177 177 159 138 105 ...
##  $ Speed                 : num  5.62 3.37 3.37 3.37 5.62 5.62 6.75 5.62 4.5 4.5 ...
##  $ TimeSunRise           : chr  "06:13:00" "06:13:00" "06:13:00" "06:13:00" ...
##  $ TimeSunSet            : chr  "18:13:00" "18:13:00" "18:13:00" "18:13:00" ...

Basic statistical characteristics of the dataset

summary(SolarPrediction)
##     UNIXTime             Data               Time             Radiation      
##  Min.   :1.473e+09   Length:32686       Length:32686       Min.   :   1.11  
##  1st Qu.:1.476e+09   Class :character   Class :character   1st Qu.:   1.23  
##  Median :1.478e+09   Mode  :character   Mode  :character   Median :   2.66  
##  Mean   :1.478e+09                                         Mean   : 207.12  
##  3rd Qu.:1.480e+09                                         3rd Qu.: 354.24  
##  Max.   :1.483e+09                                         Max.   :1601.26  
##   Temperature      Pressure        Humidity      WindDirection.Degrees.
##  Min.   :34.0   Min.   :30.19   Min.   :  8.00   Min.   :  0.09        
##  1st Qu.:46.0   1st Qu.:30.40   1st Qu.: 56.00   1st Qu.: 82.23        
##  Median :50.0   Median :30.43   Median : 85.00   Median :147.70        
##  Mean   :51.1   Mean   :30.42   Mean   : 75.02   Mean   :143.49        
##  3rd Qu.:55.0   3rd Qu.:30.46   3rd Qu.: 97.00   3rd Qu.:179.31        
##  Max.   :71.0   Max.   :30.56   Max.   :103.00   Max.   :359.95        
##      Speed        TimeSunRise         TimeSunSet       
##  Min.   : 0.000   Length:32686       Length:32686      
##  1st Qu.: 3.370   Class :character   Class :character  
##  Median : 5.620   Mode  :character   Mode  :character  
##  Mean   : 6.244                                        
##  3rd Qu.: 7.870                                        
##  Max.   :40.500

Data visualization

library(ggplot2)
ggplot(SolarPrediction, aes(x=Temperature)) + geom_histogram(binwidth = 1)

library(ggplot2)
ggplot(SolarPrediction, aes(x=Temperature, y=Radiation)) + geom_point()

ggplot(SolarPrediction, aes(x=Pressure, y=WindDirection.Degrees.)) + geom_point()

ggplot(SolarPrediction, aes(x=Data, y=TimeSunRise)) + geom_point()

pairs(~  Radiation + Temperature + Humidity + WindDirection.Degrees. + Speed, data = SolarPrediction, main = 'Solar Prediction Data')

Feature correlation

## corrplot 0.84 loaded
##                          Radiation Temperature    Pressure     Humidity
## Radiation               1.00000000  0.73495476  0.11901566 -0.226170647
## Temperature             0.73495476  1.00000000  0.31117348 -0.285054954
## Pressure                0.11901566  0.31117348  1.00000000 -0.223973259
## Humidity               -0.22617065 -0.28505495 -0.22397326  1.000000000
## WindDirection.Degrees. -0.23032355 -0.25942119 -0.22900997 -0.001833315
## Speed                   0.07362687 -0.03145814 -0.08363929 -0.211623673
##                        WindDirection.Degrees.       Speed
## Radiation                        -0.230323549  0.07362687
## Temperature                      -0.259421188 -0.03145814
## Pressure                         -0.229009974 -0.08363929
## Humidity                         -0.001833315 -0.21162367
## WindDirection.Degrees.            1.000000000  0.07309242
## Speed                             0.073092422  1.00000000

## Normalization

## Loading required package: lattice
##    Radiation          Temperature        Pressure         Humidity     
##  Min.   :0.0000000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000750   1st Qu.:0.3243   1st Qu.:0.5676   1st Qu.:0.5053  
##  Median :0.0009687   Median :0.4324   Median :0.6486   Median :0.8105  
##  Mean   :0.1287471   Mean   :0.4623   Mean   :0.6294   Mean   :0.7054  
##  3rd Qu.:0.2206824   3rd Qu.:0.5676   3rd Qu.:0.7297   3rd Qu.:0.9368  
##  Max.   :1.0000000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##  WindDirection.Degrees.     Speed        
##  Min.   :0.0000         Min.   :0.00000  
##  1st Qu.:0.2282         1st Qu.:0.08321  
##  Median :0.4102         Median :0.13877  
##  Mean   :0.3985         Mean   :0.15417  
##  3rd Qu.:0.4980         3rd Qu.:0.19432  
##  Max.   :1.0000         Max.   :1.00000

Conclusion

The results of the exploratory data analysis are the main statistical characteristics of the dataset features, the determination of the need to remove noise from data, the elimination of highly correlated parameters.