title: “Stat 353 Project 1: Exploring College data”
author: “Mujahid Mohammed”
date: “2024-11-17”
output: html_document
library(readxl)
data <- read_excel("data.xlsx")
This report provides an examination of the "CollegeScores4yr" dataset by applying a range of techniques discussed in Chapter 6 of the study material. The dataset encompasses information on colleges and contains diverse factors including acceptance rates, completion rates, tuition expenses and average SAT scores. The aim of this endeavor is to scrutinize the dataset utilizing statistical approaches, like determining averages, medians, variances and illustrating data through histograms, box plots and bar graphs.
Through addressing research inquiries with these approaches the examination offers perspectives on patterns and allocations in the data for example the correlation between if elements like mean SAT scores show notable differences, among establishments. To compile this report I utilized RStudio for conducting computations and visualization aids.
The information was obtained from https://www.lock5stat.com/datapage3e.html, "CollegeScores4yr". Imported into RStudio for further processing and cleaning to address any missing data values by either omitting rows with missing values in certain queries or using na.rm = TRUE or using "use = "complete.obs"" in functions to ensure precise computations.
The analysis addresses ten questions that cover a range of descriptive statistics methods. Here are the ten questions that guided this analysis:
1 What is the mean of cost for instate and out of state students
2 What is the Minimum and maximum cost
3 what is the correlation between debt and white student percentage
4 what is the correlation between debt and black student percentage
5 What is the variance of the average SAT scores across the colleges?
6 what is the distribution of cost
7 Rre Tuition between Public, Private and Profit different?
8 what is the relationship between InState tuition and average SAT score
9 Is there Correlation between graduation and tuitin fees
10 What is the median graduation rate
For each question, the appropriate descriptive statistics methods were applied. Below is a summary of the methods used and the insights gained from each analysis.
mean_TuitionIn = mean(data$TuitionIn, na.rm = TRUE)
mean_TuitionOut = mean(data$TuitonOut, na.rm = TRUE)
cat("the average tuition fee for in-state students is ", mean_TuitionIn, ".\n", sep = "")
## the average tuition fee for in-state students is 21948.55.
cat("the average tuition fee for out-of-state students is ", mean_TuitionOut, ".\n", sep = "")
## the average tuition fee for out-of-state students is 25336.66.
min_cost = min(data$Cost, na.rm = TRUE)
max_cost = max(data$Cost, na.rm = TRUE)
range(data$Cost, na.rm = TRUE)
## [1] 5950 72717
cat("The min of cost is ", min_cost, ".\n", sep = "") # \n represents new line.
## The min of cost is 5950.
cat("the max of cost is ", max_cost, ".\n", sep = "")
## the max of cost is 72717.
cor(data$Debt, data$White, use = "complete.obs")
## [1] -0.117376
cor(data$Debt, data$Black, use = "complete.obs")
## [1] 0.03675958
var(data$AvgSAT, na.rm = TRUE)
## [1] 16617.2
hist(data$Cost, xlab= "Cost")
boxplot(data$Cost ~ data$Control,xlab = "Control", ylab = "Cost")
plot(data$TuitionIn ~ data$AvgSAT, xlab = "Average Sat", ylab= "In state Tuition", col = "red")
cor(data$CompRate, data$TuitionIn, use = "complete.obs")
## [1] 0.5477039
median(data$CompRate, na.rm = TRUE)
## [1] 52.45
Mean of In-State and Out-of-State Tuition Fees:
The average tuition fee for in-state students is $21,948.55, while for out-of-state students, it is $25,336.66. This indicates a significant difference in tuition costs for non-resident students, which is consistent with the common pricing structure for public and private colleges.
Minimum and Maximum Cost: The total cost of attendance ranges from a minimum of $5,950 to a maximum of $72,717. This wide range reflects the variability in college affordability depending on factors such as institutional control (public, private, for-profit) and geographic location.
3 & 4. Correlation between Debt and Student Demographics: The correlation between student debt and the percentage of White students is -0.117, suggesting a weak negative relationship. This imply that colleges with higher percentages of White students tend to have slightly lower average student debt. Conversely, the correlation between student debt and the percentage of Black students is 0.037, indicating a weak positive relationship. Although the correlation is minimal, it highlights potential disparities in how student debt burdens different racial groups.
Variance in Average SAT Scores: The variance in average SAT scores across colleges is 16,617.2, showing significant variability in academic preparedness of students admitted to different institutions.
Distribution of Costs: The histogram of costs (Q6) shows a right-skewed distribution, with most colleges falling in the cost range of $20,000–$40,000, but with a few institutions having much higher costs, driving the skewness.
Comparison of Costs by Institutional Control: From the boxplot (Q7), private colleges generally have the highest median costs, followed by profit and public colleges. Public colleges appear to be the most affordable, but the variability in cost among private colleges is much greater. Correlation between Graduation Rate and Tuition Fees:
The correlation between graduation rate and in-state tuition fees is 0.548, indicating a moderate positive relationship. This suggests that colleges with higher tuition fees tend to have better graduation outcomes, likely due to more resources for student support and academic programs.
Relationship Between In-State Tuition and Average SAT Score Institutions with higher average SAT scores generally tend to have higher in-state tuition fees. From the scatter plot we understand that colleges attracting students with higher academic performance may charge more due to their reputation, resources, or perceived value. The spread in tuition values at each SAT range indicates variability, reflecting differences among institutions even for similar academic profiles of admitted students.
Median Graduation Rate: The median graduation rate is 52.45%, indicating that over half of the students typically complete their programs at most colleges. Although the its low number for only half of the students to graduate, this aligns with national trends in college completion rates.
Findings:
-1 Tuition costs for out-of-state students are significantly higher than for in-state students.
-2 The distribution of total costs is right-skewed, with the majority of colleges charging between $20,000 and $40,000 annually.
-3 Public colleges are the most affordable, while private colleges are the most expensive and variable in cost.
-4 The weak negative correlation between student debt and White student percentage suggests less financial strain in colleges with a higher proportion of White students. In contrast, the weak positive correlation between debt and Black student percentage might hint at systemic disparities in funding or financial aid availability.
-5 Graduation rates are moderately positively correlated with tuition, suggesting a link between higher costs and better student outcomes.
-6 Students with different SAT scores are charged similar prices across universities as detailed in the scatter plot suggesting that SAT is not only driving factor for tuition rate
This study offers perspectives on the expenses and results linked to higher education. Though affordability is a concern school tuition tend to rise with better graduation rates. Besides, differences in student debt among groups point out aspects that need policy enhancements. Subsequent studies should investigate cause and effect connections for informed decisions, by institutions and policymakers.