HW1

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(plotly)

## Warning: package 'plotly' was built under R version 4.2.3

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(tidyverse)

## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──

## ✔ tibble  3.1.7     ✔ purrr   0.3.4
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ plotly::filter() masks dplyr::filter(), stats::filter()
## ✖ dplyr::lag()     masks stats::lag()

library(knitr)
library(data.table)

## 
## Attaching package: 'data.table'
## 
## The following object is masked from 'package:purrr':
## 
##     transpose
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

library(shinydashboard)

## Warning: package 'shinydashboard' was built under R version 4.2.3

## 
## Attaching package: 'shinydashboard'
## 
## The following object is masked from 'package:graphics':
## 
##     box

library(shiny)

## Warning: package 'shiny' was built under R version 4.2.3

Instructions

All assignments are in pairs!
This assignment should be submitted before 15/05/2023, 23:59.
Submit two HTML files, HW1 for Q1, and Dash.html for Q2
Questions about the assignment should be asked using the Assignments forums.
All of the instructions of this assignment should be included in the final .html file.

##Q1 The first data you have been given is admissions.csv, and it presents descriptions of students’ scores and whether they were accepted into the university for a master’s degree.

build two graphs using at least three variables for each graph, and explain the graphs.

admissions <- read.csv("admission.csv", header = TRUE)

ggplot(admissions, aes(x = `GRE.Score`, y = CGPA, color = factor(Admission))) +
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE, formula = y ~ x) +
  scale_color_manual(values = c("lightcyan4", "deeppink3")) +
  labs(title = "GRE Score vs CGPA by Admission Status",
       x = "GRE Score",
       y = "CGPA",
       color = "Admission Status")

## Warning: Removed 12 rows containing non-finite values (stat_smooth).

## Warning: Removed 12 rows containing missing values (geom_point).

Graph 1: Scatterplot of GRE score vs. CGPA

This graph shows the relationship between GRE Score and CGPA, with the color of the points indicating the Admission status (1 for admitted, 0 for not admitted). The x-axis represents the GRE Score, while the y-axis shows the CGPA score. The line is the linear regression line, which shows the overall trend between GRE Score and CGPA.

We can see that in general, there is a positive correlation between GRE Score and CGPA, which means that as GRE Scores increase, so do CGPA scores. This trend is visible for both admitted and non-admitted applicants.

In terms of the relationship between Admission status and these two variables, we can see that admitted applicants tend to have higher GRE Scores and higher CGPA scores compared to non-admitted applicants. This trend is reflected in the color of the points, with more pink points (representing admitted applicants) appearing in the top-right corner of the graph where both GRE Score and CGPA are high.

The linear regression line shows that the relationship between GRE Score and CGPA is strong and positive, with higher GRE Scores generally associated with higher CGPA scores. However, it also shows that the two variables are not perfectly correlated, as there is still some variability in CGPA scores even for applicants with high GRE Scores.

Overall, this graph suggests that both GRE Score and CGPA are important factors in the admissions decision, and that admitted applicants tend to have higher scores in both of these areas compared to non-admitted applicants.

ggplot(data = admissions, aes(x = factor(`Admission`), y = `CGPA`, fill = factor(`Admission`))) + 
  geom_boxplot() +
  labs(x = "Admission", y = "CGPA") +
  facet_wrap(~ factor(`University.Rating`), ncol = 3, labeller = as_labeller(c(`1` = "University Rating 1", `2` = "University Rating 2", `3` = "University Rating 3", `4` = "University Rating 4", `5` = "University Rating 5"))) +
  ggtitle("Boxplot of CGPA by Admission status and University Rating") +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_fill_manual(values = c("lightcyan4","deeppink3"), name = "Admission", labels = c("Not Admitted", "Admitted"))

## Warning: Removed 6 rows containing non-finite values (stat_boxplot).

Graph 2: Boxplot of CGPA by Admission status and University Rating

This graph shows the distribution of CGPA by Admission status for each level of University Rating. The x-axis represents whether the applicant was admitted (1) or not admitted (0), while the y-axis shows the CGPA score. The facet_wrap function creates separate boxplots for each level of University Rating, with three boxplots per row.

We can see that in general, applicants who were admitted tend to have higher CGPA scores than those who were not admitted, across all levels of University Rating. The effect of University Rating on CGPA is also apparent, with higher University Ratings associated with higher CGPA scores.

In terms of the specific boxplots for each level of University Rating, we can see that the median CGPA score for admitted applicants is consistently higher than the median score for non-admitted applicants at all levels of University Rating. The range of CGPA scores is also generally wider for admitted applicants than for non-admitted applicants, especially at higher levels of University Rating.

Overall, this graph suggests that CGPA is an important factor in the admissions decision, with higher CGPA scores increasing the likelihood of admission. It also suggests that University Rating may be another important factor, with higher-rated universities having higher average CGPA scores for admitted applicants.

##Q2 The second data you have been given is titanic.csv, and it presents descriptions of different passengers on the Titanic and whether 1.

build a dashboard that contains four interactive graphs. Explain the graphs and the relationship in all the dash.

data <- data.table(read.csv("titanic.csv", header = TRUE))

ui <- dashboardPage(
  dashboardHeader(title = "Titanic Dashboard"),
  dashboardSidebar(),
  dashboardBody(
    fluidRow(
      box(plotlyOutput("plot1"), width = 6),
      box(plotlyOutput("plot2"), width = 6),
      box(plotlyOutput("plot3"), width = 6),
      box(plotlyOutput("plot4"), width = 6))))


server <- function(input, output){
    output$plot1 <- renderPlotly({plot_ly(data, x = ~Age, y = ~Fare, mode = "markers",
        type = "scatter", color = ~factor(Survived), colors = c("#008080", "#FFD1DC"), marker = list(opacity = 0.5)) %>% 
  layout(title = "Titanic Passenger Age vs. Fare",legend = list(title = list(text = "Survival"),
         xaxis = list(title = "Age"), yaxis = list(title = "Fare"),
         annotations = list(x = 0.5, y = -0.15, xref = "paper", yref = "paper",showarrow = FALSE)))})
    
    
    output$plot2 <- renderPlotly({data.table(data)[, .(count = .N), by = .(Pclass, Survived)][, percentage := round(count / sum(count)*100,3), by = Pclass] %>% plot_ly(x = ~Pclass, y = ~percentage, color = ~factor(Survived), colors = c("#800020", "#FFD1DC"), type = "bar")%>% layout(title = "Survival Rate by Passenger Class", xaxis = list(title = "Passenger Class"), yaxis = list(title = "Survival Rate (%)"), legend = list(title = list(text = "Survival"), bgcolor = "white", bordercolor = "gray", borderwidth = 1))
})
    
    
    
    output$plot3 <- renderPlotly({data[, .(count = .N), by = Survived][, Survival := ifelse(Survived == 1, "Survived", "Not Survived")] %>%  plot_ly(labels = ~Survival, values = ~count, type = "pie", textposition = "inside", marker = list(colors = ~c("#008080", "#FFD1DC"))) %>% layout(title = "Survival Rates", showlegend = T)})
    
    
    
    output$plot4 <- renderPlotly({plot_ly(data,  x = ~Pclass,  y = ~Fare, type = "box", color = ~factor(Pclass), colors = c("#008080", "#800020", "#FFD1DC"), 
             legendgroup = ~factor(Pclass), hovertemplate = paste("Pclass: %{x}<br>", "Fare: %{y:$,.2f}<br>")) %>% layout(title = "Fare and Pclass",
  xaxis = list(title = "Passenger Class"),yaxis = list(title = "Fare (USD)"),legend = list(title = list(text = "Class", font = list(size = 14)), tracegroupgap = 10, 
                traceorder = "reversed", font = list(size = 12)))})
  
  
}

Graph1

This graph is a scatter plot that displays the relationship between two variables: the age and fare of passengers on the Titanic. Each point on the plot represents a single passenger, and the position of the point corresponds to the age and fare of that passenger. The color of each point represents the survival status of the passenger, with blue indicating that the passenger did not survive and pink indicating that the passenger did survive. The legend on the graph provides a key to the color scheme. There appears to be a cluster of blue markers in the lower-left corner of the graph, which suggests that many of the passengers who did not survive were younger and paid a lower fare. The pink markers are more spread out across the graph, which suggests that passengers who survived were of varying ages and paid varying fares. The majority of passengers on the Titanic paid fares of $100 or less, as indicated by the concentration of markers along the left side of the graph. There are a few outliers in the data, such as the passenger who paid the highest fare (marked by a pink marker), and the older passenger who paid a very low fare (marked by a blue marker). The graph provides a visual representation of the correlation between age and fare. While there is no clear trend line or strong correlation between these variables, it is evident that there were passengers of all ages who paid a wide range of fares.

Graph2

The graph shows the survival rate of passengers by passenger class on the Titanic. The x-axis represents the passenger class (1st, 2nd, and 3rd), and the y-axis represents the percentage of passengers who survived. The bars are colored based on the survival outcome: grey for passengers who did not survive and pink for passengers who survived.

From the graph, we can see that the survival rate is highest for passengers in the first class, followed by the second class, and then the third class. This suggests that passenger class was a significant factor in determining survival rates on the Titanic.

Overall, the graph provides insight into the relationship between passenger class and survival rates on the Titanic and highlights the importance of socio-economic factors in determining survival outcomes during disasters.

Graph3

The pie chart shows the survival rates of passengers aboard the Titanic. The chart is divided into two sections: “Survived” and “Not Survived”, each represented by a slice of the pie. The size of each slice corresponds to the percentage of passengers in that category.

From the chart, we can see that roughly 42% of passengers survived the disaster, while the remaining 58% did not. This highlights the severity of the disaster and the relatively low survival rates.

The chart is also color-coded using a custom color palette with shades of light blue and deep pink. The legend shows which color corresponds to each section of the chart. The chart is interactive, so you can hover over each slice to see the exact percentage and count of passengers in that category.

Overall, the pie chart provides a quick and visually appealing way to understand the survival rates of passengers aboard the Titanic.

Graph4

The box plot shows the distribution of the fares paid by passengers in each passenger class (1st, 2nd, and 3rd).

We can see that the fares for 1st class passengers had a much wider range than the fares for 2nd and 3rd class passengers. The median fare for 1st class was also much higher than the median fares for 2nd and 3rd class. There were some outliers in all three passenger classes, but they were more numerous and farther from the median in 1st class.

Overall, this plot shows how the fares paid by passengers varied depending on their passenger class, with 1st class passengers paying much higher fares on average than 2nd and 3rd class passengers.

shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

library(htmltools)

## Warning: package 'htmltools' was built under R version 4.2.3

HW1

2023-05-02