Wendy Sarrett
February 20, 2017
The purpose of Explore the Data Set is to allow one to see a basic analysis on the basic datasets included in rStudio that meet the following criteria:
With these datasets you can quickly and easily see how the columns are related and after selecting four columns (your x, y z and color) you can see a plot_ly graph that allows you to visualize the relationships
The trickiest thing about this app is getting the list of datasets:
#Only data of the type data.frame or >= 4 columns
getDataSets<-function() {
ds<-data(package="datasets")
res<-ds$results
dsNames<-res[,3]
choise<-c()
dataLst<-list()
for(i in 1:length(dsNames)) {
wd<-strsplit(dsNames[[i]]," ")[[1]]
dsNames[[i]]<-wd[1]
assign("xo", get(wd[1]))
if("data.frame" %in% class(xo)){
if(ncol(xo) >= 4) {
choise<-c(choise,wd[1])
dataLst[[wd[1]]]<-xo
}
} else {
if("ts" %in% class(xo)){
xoDF <- as.data.frame(xo)
if(ncol(xoDF) >= 4) {
choise<-c(choise,wd[1])
dataLst[[wd[1]]]<-xoDF
}
}
}
}
## note: commented out ui element update for this demo
dataLst
}
##Reactive method takes selected dataset and calculates the lm
## which is then displayed.
plotdata<- reactive({
shinyjs::hideElement("pPlot")
datasel <- input$visData
data2<-dataLst[[datasel]]
if(class(data2) == "data.frame") {
newdata<-data2
} else {
newdata<-as.data.frame(data2)
}
choise<-names(newdata)
updateSelectInput(session, "colA",
choices = choise)
updateSelectInput(session, "colB",
choices = choise)
updateSelectInput(session, "colC",
choices = choise)
updateSelectInput(session, "colD",
choices = choise)
##Avoiding issue with y = factor variable
col<-0
for(i in 1:ncol(newdata)) {
if(!is.factor(newdata[,i]) && col == 0) {
col<-i
}
}
if(col > 0) {
x<-summary(lm(newdata[,col] ~., data = newdata))
} else {
x<- "no non-factor columns .... lm is not valid"
}
x
})
##End of reactive method
The main server calculation for a dataset would look as follows if mtcars was selected. Note for the purposes of display well set newdata = mtcars and col = 1:
x<-summary(lm(newdata[,col] ~., data = newdata))
x
Call:
lm(formula = newdata[, col] ~ ., data = newdata)
Residuals:
Min 1Q Median 3Q Max
-2.740e-15 -4.193e-16 3.000e-19 2.972e-16 6.276e-15
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.000e+00 1.222e-14 0.000e+00 1.000
mpg 1.000e+00 1.410e-16 7.094e+15 <2e-16 ***
cyl 7.826e-17 6.753e-16 1.160e-01 0.909
disp -2.380e-18 1.169e-17 -2.040e-01 0.841
hp -1.379e-17 1.438e-17 -9.590e-01 0.349
drat 1.459e-16 1.062e-15 1.370e-01 0.892
wt -2.701e-16 1.331e-15 -2.030e-01 0.841
qsec 2.251e-16 4.861e-16 4.630e-01 0.648
vs -1.473e-15 1.360e-15 -1.083e+00 0.292
am 1.026e-15 1.375e-15 7.460e-01 0.465
gear -4.885e-16 9.691e-16 -5.040e-01 0.620
carb 3.681e-16 5.361e-16 6.870e-01 0.500
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.712e-15 on 20 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 3.492e+31 on 11 and 20 DF, p-value: < 2.2e-16
The benefits of this application are the following:
Current Limitations
Only one type of graph There are enhancements that might be done to improve this
Allow options of other calculations such as predictive functions (ie. machine learning)
Expand the number of datasets available
Allow a dataset to be loaded from a URL