options(width=100)
knitr::opts_chunk$set(out.width='1000px',dpi=200,message=FALSE,warning=FALSE)
#load packages and csv file
library(ggplot2)
library(dplyr)
library(gridExtra)
library(Amelia)
library(corrplot)
library(forecast)
This dataset is a a collection of measures from a NASA solar flare observatory : Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI, originally High Energy Solar Spectroscopic Imager or HESSI). It was launched in 2002 and its primary mission is to explore the physics of particle acceleration and energy release in solar flares.
df<-read.csv('hessi.solar.flare.2002to2016.csv',sep=',')
print(colnames(df))
## [1] "flare" "start.date" "start.time" "peak" "end"
## [6] "duration.s" "peak.c.s" "total.counts" "energy.kev" "x.pos.asec"
## [11] "y.pos.asec" "radial" "active.region.ar" "flag.1" "flag.2"
## [16] "flag.3" "flag.4" "flag.5"
print(any(is.na(df)))
## [1] FALSE
#missmap(df,cols=c('black','yellow'))
We see there is no NA values. We may re-arrange/create new features to make the plots easier to interpret. In particular :
energy.kev factor :df$energy.kev.ordered<-ordered(df$energy.kev, levels = c('3-6','6-12','12-25','25-50','50-100','100-300','300-800','800-7000','7000-20000'))
radial feature. We can see indeed that this feature can be split in 3 regions.#ggplot(data=df,aes(x=radial)) + geom_histogram(bins=100) +scale_y_log10()
df$radialFactor<-cut(df$radial,breaks=c(-1,5000,10000,15000),labels=c('R1','R2','R3'))
ggplot(data=df,aes(x=energy.kev.ordered),alpha=.25) + geom_bar() + scale_y_log10() + xlab('Energy range [keV]')
We see that most of the flares measures are from the range [6-12] keV. High energy rays (>1MeV) are less present.
ggplot(data=df,aes(x=radial)) + geom_histogram(bins=200) + scale_y_log10() + xlab('Radial distance [m?]')
This plot show the motivation of creating a factor variable for the
radial distance. I guess the units is in meters [s.i.]
ggplot(data=df,aes(x=total.counts)) + geom_histogram(bins=200) + scale_y_log10() + xlab('counts in[6-12] keV integrated over time')
ggplot(data=df,aes(x=duration.s)) + geom_histogram(bins=200) + scale_y_log10() + xlab('Duration of the flare [sec.]')
ggplot(data=df,aes(x=duration.s,y=total.counts)) + geom_point(aes(size=radialFactor,color=energy.kev.ordered),alpha=.25) + scale_y_log10()
ggplot(data=df,aes(x=duration.s,y=total.counts)) + geom_point(aes(color=radialFactor),alpha=.25) + scale_y_log10() + facet_wrap(~energy.kev.ordered)