options(width=100)
knitr::opts_chunk$set(out.width='1000px',dpi=200,message=FALSE,warning=FALSE)
#load packages and csv file
library(ggplot2)
library(dplyr)
library(gridExtra)
library(Amelia)
library(corrplot)
library(forecast)
This dataset is a a collection of measures from a NASA solar flare observatory : Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI, originally High Energy Solar Spectroscopic Imager or HESSI)
. It was launched in 2002 and its primary mission is to explore the physics of particle acceleration and energy release in solar flares.
df<-read.csv('hessi.solar.flare.2002to2016.csv',sep=',')
print(colnames(df))
## [1] "flare" "start.date" "start.time" "peak" "end"
## [6] "duration.s" "peak.c.s" "total.counts" "energy.kev" "x.pos.asec"
## [11] "y.pos.asec" "radial" "active.region.ar" "flag.1" "flag.2"
## [16] "flag.3" "flag.4" "flag.5"
print(any(is.na(df)))
## [1] FALSE
#missmap(df,cols=c('black','yellow'))
We see there is no NA values. We may re-arrange/create new features to make the plots easier to interpret. In particular :
energy.kev
factor :df$energy.kev.ordered<-ordered(df$energy.kev, levels = c('3-6','6-12','12-25','25-50','50-100','100-300','300-800','800-7000','7000-20000'))
radial
feature. We can see indeed that this feature can be split in 3 regions.#ggplot(data=df,aes(x=radial)) + geom_histogram(bins=100) +scale_y_log10()
df$radialFactor<-cut(df$radial,breaks=c(-1,5000,10000,15000),labels=c('R1','R2','R3'))
ggplot(data=df,aes(x=energy.kev.ordered),alpha=.25) + geom_bar() + scale_y_log10() + xlab('Energy range [keV]')
We see that most of the flares measures are from the range [6-12] keV. High energy rays (>1MeV) are less present.
ggplot(data=df,aes(x=radial)) + geom_histogram(bins=200) + scale_y_log10() + xlab('Radial distance [m?]')
This plot show the motivation of creating a factor variable for the
radial
distance. I guess the units is in meters [s.i.]
ggplot(data=df,aes(x=total.counts)) + geom_histogram(bins=200) + scale_y_log10() + xlab('counts in[6-12] keV integrated over time')
ggplot(data=df,aes(x=duration.s)) + geom_histogram(bins=200) + scale_y_log10() + xlab('Duration of the flare [sec.]')
ggplot(data=df,aes(x=duration.s,y=total.counts)) + geom_point(aes(size=radialFactor,color=energy.kev.ordered),alpha=.25) + scale_y_log10()
ggplot(data=df,aes(x=duration.s,y=total.counts)) + geom_point(aes(color=radialFactor),alpha=.25) + scale_y_log10() + facet_wrap(~energy.kev.ordered)