R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Question 1

The attributes like marital status , application mode , application order , course and so on many elements are not clear. But after reading documentation I came to understand that for every description they assigned a numerical value. for example in column Marital status 1 – single 2 – married 3 – widower 4 – divorced 5 – facto union 6 – legally separated the numerical values are assigned to each category in the column marital status this can be understood only when we read the documentation of the dataset.

Question 2

The documentation of the dataset doesnot include a clear explanation on the element GDP. GDP: GDP is defined as the GROSS DOMESTIC PRODUCT.it is the monetary measure of the market value of all the final goods and services produced in a specific time period by a country. GDP can simply calculated by GDP= COE + GOS + GMI + (T-S)

COE= Compensation of Employees GOS= Gross Operationg Surplus GMI= Gross mixed Income T-S = Taxes less subsidies on production and imports

Question 3

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(ggplot2)
library(dplyr)
# Load the dataset

setwd("/Users/saitejaravulapalli/Documents/IUPUI_SEM 01/Intro to Statistic in R/DATA SET")
student_dropout <- read.csv("student dropout.csv" , sep= ";", header = TRUE)

# Create a summary table by GDP value
summary_table <- student_dropout %>%
  group_by(GDP, Target) %>%
  summarize(Count = n(), .groups = 'keep')
print(summary_table)
## # A tibble: 30 × 3
## # Groups:   GDP, Target [30]
##      GDP Target   Count
##    <dbl> <chr>    <int>
##  1 -4.06 Dropout    139
##  2 -4.06 Enrolled    69
##  3 -4.06 Graduate   189
##  4 -3.12 Dropout    174
##  5 -3.12 Enrolled   109
##  6 -3.12 Graduate   250
##  7 -1.7  Dropout    141
##  8 -1.7  Enrolled    63
##  9 -1.7  Graduate   215
## 10 -0.92 Dropout    139
## # ℹ 20 more rows
ggplot(summary_table, aes(x = GDP, y = Count, fill = Target)) +
  geom_bar(stat = "identity", position = position_dodge(width = 1.0), width = 0.1) + 
  geom_text(aes(label = GDP), vjust = -0.5, hjust = 0.5, size = 3) +  # Add GDP labels
  labs(
    title = "Students Enrolled, Graduated, and Dropped Out by GDP Value",
    x = "GDP",
    y = "Count",
    fill = "Outcome"
  ) +
  theme_minimal() +
  scale_fill_manual(values = c("Enrolled" = "blue", "Graduate" = "green", "Dropout" = "red")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: `position_dodge()` requires non-overlapping x intervals

Here in this graph we can clearly see the effect of GDP on the students. the lower the GDP the dropout rate is more. and also the students of the countries having 1.79 and 2.92 GDP i.e. countries which are in the developing stage majorly focusing on the education. This can only be understood when we plot the graph for each GDP value seperately for each student category (dropout,graduate,Enrolled).