Stat-805 Summer-2023 - Final Project

STAT 805 - Presenting Class Data using R Markdown, R Shiny and Html Widgets

The purpose of this project is to showcase class data using R Markdown tools, R libraries and R packages including flexdashboard, R Shiny and HTML Widgets

R markdown

R markdown is a simple and easy to use plain text language used to combine your R code, results from your data analysis (including plots and tables) and written commentary into a single nicely formatted and reproducible document (like a report, publication, thesis chapter or a web page like this one).

Great Resources for R Markdown are the following free online books - R Markdown - The Definitive Guide and R Markdown Cookbook

Cookbook

Definitive Guide

Flexdashboard

Use R Markdown to publish a group of related data visualizations as a dashboard.
Support for a wide variety of components including Htmlwidgets; base, lattice, and grid graphics; tabular data; gauges and value boxes; and text annotations.
Flexible and easy to specify row and column-based layouts. Components are intelligently re-sized to fill the browser and adapted for display on mobile devices.
Storyboard layouts for presenting sequences of visualizations and related commentary.
Optionally use Shiny to drive visualizations dynamically.
A variety of themes are available to modify the base appearance of flexdashboard. Available themes include: - default - cosmo - bootstrap - cerulean - journal - flatly - readable - spacelab - united - lumen - paper - sandstone - simplex - yeti

CSS - CSS stands for Cascading Style Sheets and describes how HTML elements are to be displayed on screen, paper, or in other media. CSS saves a lot of work. It can control the layout of multiple web pages all at once by using external stylesheets stored as CSS files. Tutorial for applying Custom CSS to R markdown storyboards can be found - here

R Shiny - Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R. It helps you create interactive and dynamic dashboards similar to Tableau / Power Bi. Shiny helps you turn your analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge.

Instructor Information

Shana L. Palla, EdD, PStat

Email: spalla@kumc.edu

Course Description: This web-based course addresses issues in professionalism, leadership and ethics that are specific to students training to become statisticians, biostatisticians, and data scientists.

University of Kansas Undergrad Ranking, Tuition and Enrollment Stats compared to other Universities using ggplot2

Observations

Definitive Guide

Strong correlation between College Rank and Tuition. However, Correlation does not imply Causation.
University of Kansas - Rank = 118, Tuition = $25,932
Median Rank = 111, Median Tuition = $31,608

Observations

Total No. of Students = 19
M = 16, F = 3
Women are underrepresented in STEM Majors due to various reasons. Therefore, KUDOS to all the ladies in our class.

Related R Library

pie3D function with plotrix

Current Location of students using Plotly Graphing Library

Observations

Total No. of Students = 19
Midwest = 10, West Coast = 1, East Coast = 3
5 students didn’t share their current location in the “Introduction” videos.

Related Library

Plotly R Graphing Library - Maps

Student Hobbies and Interests using Text Mining and Word Cloud

Observations

Animals / Pets, Reading and Travel seemed to be the most common hobbies / interests among students
Other interesting hobbies / interests include Photography, Swimming, Tuning Pianos, Tracking Data in reading, Bowling, Chemistry Watching TV, Electronics, Outdoor activities and Spending time with family

Observations

Majority of Undergrad Degrees were in Life Sciences
Wide variety of reasons for pursuing this degree including transitioning careers, getting promoted, and gaining additional skills.

Observations

Wide range in current Occupations

Observations

Top 3 most active students were Emily, Jack and Ummer.

Observations

Top 3 students with most engaging posts were Jack, Sophia and Michael
Early posters had higher chance of getting replies.

Observations

Activity and Engagement fell as the Semester progressed.
Discussions 8,9, and 10 had the least amount of activity and engagement.

Resources

---
title: "Stat-805 Summer-2023 - Final Project"
output: 
  flexdashboard::flex_dashboard:
    theme: united
    storyboard: true
    social: menu
    source: embed
    css: "su_styles.css"
   
---

## STAT 805 - Presenting Class Data using R Markdown, R Shiny and Html Widgets
<body>
<div class="col-2-3">
The purpose of this project is to showcase class data using R Markdown tools, R libraries and R packages including flexdashboard, R Shiny and HTML Widgets

<b>R markdown</b>  

 - R markdown is a simple and easy to use plain text language used to combine your R code, results from your data analysis (including plots and tables) and written commentary into a single nicely formatted and reproducible document (like a report, publication, thesis chapter or a web page like this one).

Great Resources for R Markdown are the following free online books - <b><a href="https://bookdown.org/yihui/rmarkdown/" target="_blank">R Markdown - The Definitive Guide</a></b> and <b><a href="https://bookdown.org/yihui/rmarkdown-cookbook/" target="_blank">R Markdown Cookbook</a></b>  

<img src="cookbook.png" alt="Cookbook" class="Photoright" />

<img src="definitive.png" alt="Definitive Guide" class="Photoleft" />


<b><a href="https://pkgs.rstudio.com/flexdashboard/index.html" target="_blank">Flexdashboard</a></b>  

 - Use R Markdown to publish a group of related data visualizations as a dashboard.

 - Support for a wide variety of components including Htmlwidgets; base, lattice, and grid graphics; tabular data; gauges and value boxes; and text annotations.

 - Flexible and easy to specify row and column-based layouts. Components are intelligently re-sized to fill the browser and adapted for display on mobile devices.

 - <b><a href="https://pkgs.rstudio.com/flexdashboard/articles/layouts.html" target="_blank">Storyboard layouts</a></b> for presenting sequences of visualizations and related commentary.

 - Optionally use Shiny to drive visualizations dynamically.
 
 - A variety of <b><a href="https://pkgs.rstudio.com/flexdashboard/articles/using.html" target="_blank">themes</a></b> are available to modify the base appearance of flexdashboard. Available themes include: - default - cosmo - bootstrap - cerulean - journal - flatly - readable - spacelab - <b>united</b> - lumen - paper - sandstone - simplex - yeti
 
 <b><a href="https://www.w3schools.com/css/css_intro.asp">CSS</a></b> - CSS stands for Cascading Style Sheets and describes how HTML elements are to be displayed on screen, paper, or in other media. CSS saves a lot of work. It can control the layout of multiple web pages all at once by using external stylesheets stored as CSS files. Tutorial for applying Custom CSS to R markdown storyboards can be found -  <b><a href="https://bookdown.org/yihui/rmarkdown-cookbook/html-css.html" target="_blank">here</a></b> 

<b><a href="https://www.rstudio.com/products/shiny/#:~:text=Shiny%20is%20an%20open%20source,opens%20in%20a%20new%20tab" target="_blank">R Shiny</a></b>  - Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R. It helps you create interactive and dynamic dashboards similar to Tableau / Power Bi. Shiny helps you turn your analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge. 


</div>

<div class="col-1-3">
<img src="professor1.jpg" alt="Female Professor" class="center" />
<h3>Instructor Information</h3>
<h4 style="text-align:center;">Shana L. Palla, EdD, PStat</h4>
<p style="text-align:center;"><b>Email</b>: spalla@kumc.edu</p>
<b>Course Description:</b> This web-based course addresses issues in professionalism, leadership and ethics that are specific to students training to become statisticians, biostatisticians, and data scientists.

```{r out.width='100%', echo=FALSE}
library(leaflet)
leaflet() %>% addTiles() %>%
  setView(-94.60914860650841, 39.05602617791458, zoom = 17) %>%
  addPopups(
    -94.60914860650841, 39.05602617791458,
    'Location of <b>KU Medical Center</b>'
  )
```

</div>

```{r setup, include=FALSE}
library(flexdashboard)
```
 <div class="row"> 
 
### University of Kansas Undergrad Ranking, Tuition and Enrollment Stats compared to other Universities using ggplot2

```{r}
library(readr)
library(plotly)
hsb2 <- read_csv("rankings_2017.csv")

plot_ly(data = hsb2, x = ~Tuition, y = ~Rank, text = ~Name,type = "scatter", mode = "markers", marker = list(size = 12, colorbar = list(title = "Enrollment (in thousands)"), color = ~Enroll, colorscale='Viridis', reversescale =T)) %>% 
  layout(title = "College Rank versus Tuition (Year = 2017)", xaxis = list(title = "Tuition"), 
         yaxis = list(title = "Rank"))

```

***


<h2>Observations</h2>

<img src="corr1.png" alt="Definitive Guide" class="corr"/>

- Strong correlation between College Rank and Tuition. However, Correlation does not imply Causation.

- University of Kansas - Rank = 118, Tuition = $25,932

- Median Rank = 111, Median Tuition = $31,608

<h2>Related Links</h2>

<b><a href="https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html" target="_blank">An Introduction to corrplot Package</a></b>

<b><a href="https://data.world/education/university-rankings-2017" target="_blank">University Rankings 2017 - Dataset</a></b>

<b><a href="https://plotly.com/ggplot2/" target="_blank">Plotly ggplot2 Library</a></b>

### Gender Distribution of Class Using pie3D

```{r}
library(plotrix)

data <- c(16, 3)
lab <- paste0(round(data/sum(data) * 100, 2), "%")
pie3D(data,main= 'Gender Distribution of Students, Blue = Male, Pink = Female',radius=0.9,explode=0.3,
      col=c("lightblue","pink"),
      labels = lab)
```

***
<h2>Observations</h2>

- Total No. of Students = 19
- M = 16, F = 3

- Women are underrepresented in STEM Majors due to various reasons. Therefore, KUDOS to all the ladies in our class.


<h2>Related R Library</h2>

<b><a href="https://r-charts.com/part-whole/pie3d/" target="_blank">pie3D function with plotrix</a></b>

### Current Location of students using Plotly Graphing Library

```{r}
library(tidyverse)
library(plotly)
set.seed(1)

density <-c(0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,7,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,2,0,0,0,0,0,0,0)

g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  lakecolor = toRGB('white')
  )

plot_ly() %>%
  layout(title = 'Current Location of Students',geo = g) %>%
  add_trace(type = "choropleth", locationmode = 'USA-states',
            locations = state.abb,
            z = ~density, text = state.name,
            color = ~density, autocolorscale = TRUE) %>%
  add_trace(type = "scattergeo", locationmode = 'USA-states',
            locations = state.abb, text = paste0(state.abb, "\n", density),
            mode = "text",
            textfont = list(color = rgb(0,0,0), size = 12))

```

***

<h2>Observations</h2>

- Total No. of Students = 19

- Midwest = 10, West Coast = 1, East Coast = 3

-  5 students didn't share their current location in the "Introduction" videos.

<h2>Related Library</h2>

<b><a href="https://plotly.com/r/maps/" target="_blank">Plotly R Graphing Library - Maps</a></b>

### Student Hobbies and Interests using Text Mining and Word Cloud

```{r}
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

hobbies <- "C:/Users/sidd0/OneDrive/Desktop/Stat-823/hobbies.txt"
text <- readLines(hobbies)

docs <- Corpus(VectorSource(text))
docs <- tm_map(docs, content_transformer(tolower))
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

```

***

<h2>Observations</h2>

- Animals / Pets, Reading and Travel seemed to be the most common hobbies / interests among students

-  Other interesting hobbies / interests include Photography, Swimming, Tuning Pianos, Tracking Data in reading, Bowling, Chemistry
Watching TV, Electronics, Outdoor activities and Spending time with family

<h2>Related Topic(s)</h2>

<b><a href="http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know" target="_blank">Text mining and Word Cloud Fundamentals in R</a></b>


### Undergrad Degree and Reason to Pursue Master's Degree using interactive Data Tables from HTML Widgets

```{r}
library(DT)
library(readr)
reason <- read_csv("reason.csv")
datatable(reason, options = list(pageLength = 8))
```

***

<h2>Observations</h2>

- Majority of Undergrad Degrees were in Life Sciences

- Wide variety of reasons for pursuing this degree including transitioning careers, getting promoted, and gaining additional skills. 

<h2>Related Topic(s)</h2>

<b><a href="https://www.htmlwidgets.org/showcase_leaflet.html" target="_blank">htmlwidgets for R</a></b>


### Current Occupation of Students using Collapsible Trees

```{r}
library(collapsibleTree)
library(readr)
library(dplyr)
Stat_805 <- read_csv("Stat-805.csv")

Stat_805 %>%
  group_by(Gender,CurrentOccupation,Occupation) %>%
  collapsibleTreeSummary(
    hierarchy = c("Gender","CurrentOccupation","Occupation"),
    root = "Stat_805",
    width = 800
  )


```

***

<h2>Observations</h2>

- Wide range in current Occupations

<h2>Related Topic(s)</h2>

<b><a href="https://adeelk93.github.io/collapsibleTree/" target="_blank">Creating Collapsible Trees - Geography Example
</a></b>

### Discussion Board Activity by Each Student Using Stacked Bar Charts Plotly

```{r}
library(plotly)

Students <- c("Emily","Jack","Ummer","Michael","Anthony","Alice","Oliver","David","Ethan","Sophia","Ben","Henry","Mason","Rob","James","Dan","Noah","John","Tom"
)
Discussion1 <- c(7,4,5,2,4,4,3,3,3,2,2,2,2,2,1,1,1,1,1)
Discussion2  <- c(2,3,2,2,2,3,3,2,2,2,2,2,2,2,2,2,2,2,2)
Discussion3  <- c(3,3,0,3,4,3,2,2,3,2,2,2,2,2,2,2,2,2,2)
Discussion4  <- c(4,4,4,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
Discussion5  <- c(3,3,3,3,2,2,2,2,2,3,2,2,2,2,2,2,2,2,2)
Discussion6  <- c(3,3,4,2,3,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
Discussion7  <- c(2,2,4,4,2,3,2,2,1,2,2,2,2,2,2,2,2,2,2)
Discussion8  <- c(2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
Discussion9  <- c(3,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
Discussion10  <- c(2,2,2,3,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
data <- data.frame(Students ,Discussion1,Discussion2,Discussion3,Discussion4,Discussion5,Discussion6,Discussion7,Discussion8,Discussion9,Discussion10)

fig <- plot_ly(data, x = ~Students, y = ~Discussion1,  type = 'bar', name = 'Discussion 1')
fig <- fig %>% add_trace(y = ~Discussion2, name = 'Discussion 2')
fig <- fig %>% add_trace(y = ~Discussion3, name = 'Discussion 3')
fig <- fig %>% add_trace(y = ~Discussion4, name = 'Discussion 4')
fig <- fig %>% add_trace(y = ~Discussion5, name = 'Discussion5')
fig <- fig %>% add_trace(y = ~Discussion6, name = 'Discussion 6')
fig <- fig %>% add_trace(y = ~Discussion7, name = 'Discussion 7')
fig <- fig %>% add_trace(y = ~Discussion8, name = 'Discussion 8')
fig <- fig %>% add_trace(y = ~Discussion9, name = 'Discussion 9')
fig <- fig %>% add_trace(y = ~Discussion10, name = 'Discussion 10')
fig <- fig %>% layout(title = "Discussion Board Activity",yaxis = list(title = 'Count'), barmode = 'stack')
fig
```
***
<h2>Observations</h2>

- Top 3 most active students were Emily, Jack and Ummer.

<h2>Related Topic(s)</h2>

<b><a href="https://plotly.com/r/bar-charts" target="_blank">Creating Stacked Bar Charts with Plotly in R</a></b>

### Discussion Board Engagement - No. Of Replies to each Post Using Stacked Bar Charts Plotly

```{r}
library(plotly)

Students <- c("Jack", "Michael","Sophia","Noah","Ummer","Anthony","Ben","Emily","Oliver","Henry","Alice","Mason","Rob","Ethan","Dan","David","Tom","James","John")
Discussion1 <- c(4,2,0,2,3,1,0,3,0,1,4,2,4,1,2,1,0,1,0)
Discussion2  <- c(2,0,6,1,2,2,1,2,2,0,1,0,1,0,0,1,1,0,0)
Discussion3  <- c(4,3,1,2,0,5,1,1,0,2,1,1,1,1,0,1,1,0,0)
Discussion4  <- c(4,3,4,1,2,2,2,1,1,1,1,1,1,1,0,0,0,0,0)
Discussion5  <- c(2,3,6,1,0,1,3,1,1,2,1,1,0,2,0,0,0,0,0)
Discussion6  <- c(3,1,2,1,4,1,1,3,1,2,0,1,1,0,1,1,0,1,0)
Discussion7  <- c(3,6,1,2,5,1,1,0,1,1,0,0,0,1,0,0,0,1,0)
Discussion8  <- c(2,0,1,5,0,1,1,1,2,0,1,0,1,0,2,0,1,1,0)
Discussion9  <- c(2,4,1,1,1,1,2,0,1,1,0,3,0,1,0,0,2,0,0)
Discussion10  <- c(1,2,2,4,3,0,2,0,1,0,0,0,0,0,1,2,1,0,0)
data <- data.frame(Students , Discussion1,Discussion2,Discussion3,Discussion4,Discussion5,Discussion6,Discussion7,Discussion8,Discussion9,Discussion10)

fig <- plot_ly(data, x = ~Students, y = ~Discussion1,  type = 'bar', name = 'Discussion 1')
fig <- fig %>% add_trace(y = ~Discussion2, name = 'Discussion 2')
fig <- fig %>% add_trace(y = ~Discussion3, name = 'Discussion 3')
fig <- fig %>% add_trace(y = ~Discussion4, name = 'Discussion 4')
fig <- fig %>% add_trace(y = ~Discussion5, name = 'Discussion5')
fig <- fig %>% add_trace(y = ~Discussion6, name = 'Discussion 6')
fig <- fig %>% add_trace(y = ~Discussion7, name = 'Discussion 7')
fig <- fig %>% add_trace(y = ~Discussion8, name = 'Discussion 8')
fig <- fig %>% add_trace(y = ~Discussion9, name = 'Discussion 9')
fig <- fig %>% add_trace(y = ~Discussion10, name = 'Discussion 10')
fig <- fig %>% layout(title = "Discussion Board Engagement",yaxis = list(title = 'Count'), barmode = 'stack')
fig
```

***

<h2>Observations</h2>

- Top 3 students with most engaging posts were Jack, Sophia and Michael

- Early posters had higher chance of getting replies. 

<h2>Related Topic(s)</h2>

<b><a href="https://plotly.com/r/bar-charts" target="_blank">Creating Stacked Bar Charts with Plotly in R</a></b>

### Total Number of Discussions and No. of Replies Using Grouped Bar Charts Plotly

```{r}
library(plotly)

Discussion <- c("Disc 1","Disc 2","Disc 3","Disc 4","Disc 5","Disc 6","Disc 7","Disc 8","Disc 9","Disc Ten")
Total_Discussion <- c(50,41,43,44,43,43,42,38,39,39)
Total_Engagement <- c(31,22,25,25,24,24,23,19,20,19)
data <- data.frame(Discussion,Total_Discussion,Total_Engagement)

fig <- plot_ly(data, x = ~Discussion , y = ~Total_Discussion , type = 'bar', name = 'Total Discussion Posts',title = "A Simple Histogram")
fig <- fig %>% add_trace(y = ~Total_Engagement, name = 'Total No. of Replies')
fig <- fig %>% layout(title = "Total Number of Discussions and No. of Replies(Engagement)",yaxis = list(title = 'Count'), barmode = 'group')

fig
```

***


<h2>Observations</h2>

- Activity and Engagement fell as the Semester progressed.

- Discussions 8,9, and 10 had the least amount of activity and engagement.

<h2>Related Topic(s)</h2>

<b><a href="https://plotly.com/r/bar-charts" target="_blank">Creating Grouped Bar Charts with Plotly in R</a></b>


### One of my hobbies - Exploring Film Database with R-Shiny

```{r,out.width="100%"}

knitr::include_app("https://gallery.shinyapps.io/051-movie-explorer/", height="100%")

```
***

<h2>Resources</h2>

 - <a href="https://www.shinyapps.io/" target="_blank">Deploying Shiny applications on the Web</a>

 - <a href="https://datasciencegenie.com/how-to-embed-a-shiny-app-on-website/" target="_blank">How to Embed A Shiny App on Website</a>
 
 - <a href="https://us7923.shinyapps.io/disease/" target="_blank">Example of R Shiny Dashboard I created</a>
 

</body>

STAT 805 - Presenting Class Data using R Markdown, R Shiny and Html Widgets

Instructor Information

Shana L. Palla, EdD, PStat

University of Kansas Undergrad Ranking, Tuition and Enrollment Stats compared to other Universities using ggplot2

Observations

Related Links

Gender Distribution of Class Using pie3D

Observations

Related R Library

Current Location of students using Plotly Graphing Library

Observations

Related Library

Student Hobbies and Interests using Text Mining and Word Cloud

Observations

Related Topic(s)

Undergrad Degree and Reason to Pursue Master’s Degree using interactive Data Tables from HTML Widgets

Observations

Related Topic(s)

Current Occupation of Students using Collapsible Trees

Observations

Related Topic(s)

Discussion Board Activity by Each Student Using Stacked Bar Charts Plotly

Observations

Related Topic(s)

Discussion Board Engagement - No. Of Replies to each Post Using Stacked Bar Charts Plotly

Observations

Related Topic(s)

Total Number of Discussions and No. of Replies Using Grouped Bar Charts Plotly

Observations

Related Topic(s)

One of my hobbies - Exploring Film Database with R-Shiny

Resources