This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
The attributes like marital status , application mode , application order , course and so on many elements are not clear. But after reading documentation I came to understand that for every description they assigned a numerical value. for example in column Marital status 1 – single 2 – married 3 – widower 4 – divorced 5 – facto union 6 – legally separated the numerical values are assigned to each category in the column marital status this can be understood only when we read the documentation of the dataset.
The documentation of the dataset doesnot include a clear explanation on the element GDP. GDP: GDP is defined as the GROSS DOMESTIC PRODUCT.it is the monetary measure of the market value of all the final goods and services produced in a specific time period by a country. GDP can simply calculated by GDP= COE + GOS + GMI + (T-S)
COE= Compensation of Employees GOS= Gross Operationg Surplus GMI= Gross mixed Income T-S = Taxes less subsidies on production and imports
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(ggplot2)
library(dplyr)
# Load the dataset
setwd("/Users/saitejaravulapalli/Documents/IUPUI_SEM 01/Intro to Statistic in R/DATA SET")
student_dropout <- read.csv("student dropout.csv" , sep= ";", header = TRUE)
# Create a summary table by GDP value
summary_table <- student_dropout %>%
group_by(GDP, Target) %>%
summarize(Count = n(), .groups = 'keep')
print(summary_table)
## # A tibble: 30 × 3
## # Groups: GDP, Target [30]
## GDP Target Count
## <dbl> <chr> <int>
## 1 -4.06 Dropout 139
## 2 -4.06 Enrolled 69
## 3 -4.06 Graduate 189
## 4 -3.12 Dropout 174
## 5 -3.12 Enrolled 109
## 6 -3.12 Graduate 250
## 7 -1.7 Dropout 141
## 8 -1.7 Enrolled 63
## 9 -1.7 Graduate 215
## 10 -0.92 Dropout 139
## # ℹ 20 more rows
ggplot(summary_table, aes(x = GDP, y = Count, fill = Target)) +
geom_bar(stat = "identity", position = position_dodge(width = 1.0), width = 0.1) +
geom_text(aes(label = GDP), vjust = -0.5, hjust = 0.5, size = 3) + # Add GDP labels
labs(
title = "Students Enrolled, Graduated, and Dropped Out by GDP Value",
x = "GDP",
y = "Count",
fill = "Outcome"
) +
theme_minimal() +
scale_fill_manual(values = c("Enrolled" = "blue", "Graduate" = "green", "Dropout" = "red")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: `position_dodge()` requires non-overlapping x intervals
Here in this graph we can clearly see the effect of GDP on the students. the lower the GDP the dropout rate is more. and also the students of the countries having 1.79 and 2.92 GDP i.e. countries which are in the developing stage majorly focusing on the education. This can only be understood when we plot the graph for each GDP value seperately for each student category (dropout,graduate,Enrolled).