`— title: “Data Dive 6” author: “Pritesh Shah” date: “2023-10-02” output: html_document editor_options: markdown: wrap: 72 —

Data Dive 6

Build at least three sets of variable combinations For each set of variables, include at least one column that you created (i.e., calculated based on others) All variables for this data dive should be either continuous (i.e., numeric) or ordered (e.g., [‘small’, ‘medium’, ‘large’] is okay, but [“apples”, “oranges”, “bananas”] is not) For each set, there should be one response variable with the others as explanatory variables Plot a visualization for each response-explanatory relationship, and draw some conclusions based on the plot Use what we’ve covered so far in class to scrutinize the plot (e.g., are there any outliers?) Calculate the appropriate correlation coefficient for each of these combinations Explain why the value makes sense (or doesn’t) based on the visualization(s) Build a confidence interval for each of the response variables. Provide a detailed conclusion of the response variable (i.e., the population) based on your confidence interval. For each of the above tasks, you must explain to the reader what insight was gathered, its significance, and any further questions you have which might need to be further

Importing all the libararies

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'kableExtra'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

We get the data loaded into dataframe named ‘data’

#Loading the dataset
data <- read_delim("data.csv", delim = ";")
## Rows: 4424 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ";"
## chr  (1): Target
## dbl (36): Marital status, Application mode, Application order, Course, Dayti...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The column names in the Dataframe

##  [1] "Marital status"                                
##  [2] "Application mode"                              
##  [3] "Application order"                             
##  [4] "Course"                                        
##  [5] "Daytime/evening attendance\t"                  
##  [6] "Previous qualification"                        
##  [7] "Previous qualification (grade)"                
##  [8] "Nacionality"                                   
##  [9] "Mother's qualification"                        
## [10] "Father's qualification"                        
## [11] "Mother's occupation"                           
## [12] "Father's occupation"                           
## [13] "Admission grade"                               
## [14] "Displaced"                                     
## [15] "Educational special needs"                     
## [16] "Debtor"                                        
## [17] "Tuition fees up to date"                       
## [18] "Gender"                                        
## [19] "Scholarship holder"                            
## [20] "Age at enrollment"                             
## [21] "International"                                 
## [22] "Curricular units 1st sem (credited)"           
## [23] "Curricular units 1st sem (enrolled)"           
## [24] "Curricular units 1st sem (evaluations)"        
## [25] "Curricular units 1st sem (approved)"           
## [26] "Curricular units 1st sem (grade)"              
## [27] "Curricular units 1st sem (without evaluations)"
## [28] "Curricular units 2nd sem (credited)"           
## [29] "Curricular units 2nd sem (enrolled)"           
## [30] "Curricular units 2nd sem (evaluations)"        
## [31] "Curricular units 2nd sem (approved)"           
## [32] "Curricular units 2nd sem (grade)"              
## [33] "Curricular units 2nd sem (without evaluations)"
## [34] "Unemployment rate"                             
## [35] "Inflation rate"                                
## [36] "GDP"                                           
## [37] "Target"

Here in set 1 we have Admission grade, Age of enrollment and Previous Qualification Grade

## There is a negative correlation between Admission grade and Age at enrollment.
## There is a positive correlation between Admission grade and Previous qualification grade.
## We are 95% confident that the true mean Admission grade falls within the interval:
## 126.5513 to 127.405

In Age at Enrollment vs Admission Grade there are few outliers in th e age of 60 to 70, but most of the age of enrollment is between is 18 to 55 years of Age. There is a negative correlation between Admission grade and Age at enrollment. There is a positive correlation between Admission grade and Previous qualification grade. We are 95% confident that the true mean Admission grade falls within the interval:126.5513 to 127.405

Here in Set 2 we have calculated a new column Average Circular units, which takes average of Circular units 1st sem (grade) and 2nd Sem (Grade)

## There is a negative correlation between Age at Enrollment and Average Curricular Unit Grade.
## We are 95% confident that the true mean Age at Enrollment falls within the interval:
## 23.04149 to 23.4888

Most of the students in this plot have 10 to 15 curricular unit grades, few have between 5 to 10 but the frequency is lesser in the age group 18 to 30. There are few outliers where the average curricular uniits are more than 15 but There is a negative correlation between Age at Enrollment and Average Curricular Unit Grade. We are 95% confident that the true mean Age at Enrollment falls within the interval:23.04149 to 23.4888

The scatter plot is divide into 3 parts, the students who have not taken any curricular units, students who have taken only 5 to 20 and those who taken 10 to 15 credits.

Here in Set 3 We have created new column Average Unemployment Rate

## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.

## Correlation between Average Unemployment Rate and Marital Status: -0.2374
## Correlation between Average Unemployment Rate and Nationality: -0.007593384
## We are 95% confident that the true mean Average Unemployment Rate falls within the interval:
## 11.55941 to 11.57287

Since the Frequency of nationality ‘1’ is much more than the frequency of other nationality the average unemployment rate % falls around 11 to 12% . Correlation between Average Unemployment Rate and Marital Status: -0.2374 Correlation between Average Unemployment Rate and Nationality: -0.007593384 We are 95% confident that the true mean Average Unemployment Rate falls within the interval:11.55941 to 11.57287

There is no correlation between the marital status of a student to their unemployment rate and the average unemployment rate falls between 11.5 % for all the nationality.