This dataset has college application information from 8 MCPS high schools: Albert Einstein Bethesda Chevy Chase Montgomery Blair Richard Montgomery Walt Whitman Walter Johnson Winston Churchill Thomas S. Wootton This information was self reported by the students at these schools and collected by MCPS. The data shows how many students from each school applied to, were admitted to, and enrolled in various colleges across the nation. To qualify for this dataset, a college/University had to total at least 6 applications across the 8 schools.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
#Long VersionMocoUniLong<- MocoUni|>pivot_longer(cols =2:28, #Which columnsnames_to ="names", #Combines all namesvalues_to ="values") # Combines all values
#Long VersionMocoUniLong1<- MocoUni|>pivot_longer(cols =2:25, #Which columnsnames_to ="names", #Combines all namesvalues_to ="values") # Combines all values
#Change to numericMocoUni1 <- MocoUni |>mutate(across(c(2:28), ~as.numeric(as.character(.))))
MocoUni2 <- MocoUni1 |>mutate(percent_accepted = total_admitted / total_applied)MocoUni2 <- MocoUni2 |>mutate(across(c("percent_accepted"), round, 4)) #Used google for assistance
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `across(c("percent_accepted"), round, 4)`.
Caused by warning:
! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
Supply arguments directly to `.fns` through an anonymous function instead.
# Previously
across(a:b, mean, na.rm = TRUE)
# Now
across(a:b, \(x) mean(x, na.rm = TRUE))
#Filter out for at least 50 applicationsHighApplied <- MocoUni2 |>filter (total_applied >=50)
#Filter out for at least 50 applicationsOutlierApplied <- MocoUni2 |>filter (total_applied <=500)
#Percent accepted totalMocoUniTop <- MocoUni2|>#Creates a top 20 listarrange(desc(percent_accepted)) |># Desc makes descending order thus top 10mutate(column ="top") |>#add a column for them to be consider top 20head(20) #Use 20 variablesMocoUniTop
MocoUniBottom <- MocoUni2|>#Creates a bottom 20 listarrange(percent_accepted) |>#Orders 20 from bottommutate(column ="bottom") |>#add a column for them to be consider low 20head(20) #Use 20 variablesMocoUniBottom
#Percent accepted with a minimum of 50 applicationsPacceptedHT <- HighApplied|>#Creates a top 20 listarrange(desc(percent_accepted)) |># Desc makes descending order thus top 10mutate(column ="top") |>#add a column for them to be consider top 20head(20) #Use 20 variablesPacceptedHT
PacceptedHB <- HighApplied|>#Creates a bottom 20 listarrange(percent_accepted) |>#Orders 20 from bottommutate(column ="bottom") |>#add a column for them to be consider low 20head(20) #Use 20 variablesPacceptedHB
MostApplied <- MocoUni2|>#Creates a top 20 listarrange(desc(total_applied)) |># Desc makes descending order thus top 10mutate(column ="top") |>#add a column for them to be consider top 20head(20) #Use 20 variablesMostApplied
LeastApplied <- MocoUni2|>#Creates a bottom 20 list(at least 6 applications)arrange(total_applied) |>#Orders 20 from bottommutate(column ="bottom") |>#add a column for them to be consider low 20head(20) #Use 20 variablesLeastApplied
#Linear PlotLinreg <- OutlierApplied |>ggplot(aes( x = total_admitted, y = total_enrolled)) +geom_point() +geom_smooth() +labs( x ="Total Students Admitted", y ="Total Students Enrolled" , title ="Stduents Admitted vs Enrolled")Linreg
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
#DO NOT GRADE!!treemap(PacceptedHT, index="sort", vSize="total_applied", vColor="percent_accepted", type="manual", border.col =c("purple"), border.lwds =c(5), title ="Montgomery County College Applications", title.legend ="Percent Accepted", palette="Purples")
#Do grade!treemap(PacceptedHB, index="sort", vSize="total_applied", vColor="percent_accepted", type="manual", border.col =c("purple"), border.lwds =c(5), title ="Montgomery County College Applications", title.legend ="Percent Accepted", palette="Purples")
There was a fairly large amount of cleaning to do in this dataset. Firstly, the column names were all in the first row, and the actual columns were just numbered. I fixed this by changing the column names to the first row, and then deleting that row. This was done with AI assistance. I then had to change all of the values from characters to numerical, using the mutate function to as.numeric. I performed the basic Saidi Cleaning for the data names. I created a new column that gave me the percent of students accepted to each university from the total schools. From there, I sorted out the less applied to colleges, by filtering out any university with less than 50 applications, and I used this data in my visualization to provide the 10 colleges with the lowest percent acceptance rate for students from Montgomery County. I would have liked to split things up by school, and I think I might have a general idea of how to do that, however I don’t have the skills in R studio, nor the time to learn exactly how that would work for this particular dataset. I would have also used the split by school for the linear regression plot, and would have explored which school has the highest acceptance rate etc. This was a really interesting dataset to play with, and I’d like to keep working with it as I gain knowledge about R studio.