Since CEO salary is a continuous variable and we are modeling it as a function of the predictors (industry, number of employees, and profits), it’s a regression problem.
We are interested in understanding which factors have an impact on CEO salary. This means we are focused on interpreting the relationships between the predictors (industry, number of employees, and profits) and the response (CEO salary), rather than predicting CEO salaries for new firms. Thus, the goal here is inference—we want to identify and understand the importance of each factor and how it influences CEO salaries.
Since the response variable is binary (success or failure), it’s a classification problem.
We aim to predict whether a new product will be a success or a failure based on historical data about similar products. The predictors include price charged, marketing budget, competition price, and ten other variables. Here, the goal is prediction, as we are not focused on understanding the relationships between predictors and the response but rather on accurately forecasting the outcome (success or failure) for a new product.
Since the percent change in the USD/Euro exchange rate in relation to the weekly changes in the world stock markets is the response variable, and it’s continuous, this is a regression problem.
The objective is to predict the future percentage change in the USD/Euro exchange rate based on weekly changes in stock market indices for the US, British, and German markets. This makes the goal prediction, as we are focused on using the relationships between the predictors and the response to forecast future changes in the exchange rate, not to understand or interpret the relationships themselves.
# Please change path if checking this code on a seperate device
college <- read.csv("/Applications/R_Folder/College.csv")
View(college)
rownames(college) <- college[, 1]
View(college)
college <- college[, -1]
View(college)
summary(college)
Private Apps Accept Enroll
Length:777 Min. : 81 Min. : 72 Min. : 35
Class :character 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242
Mode :character Median : 1558 Median : 1110 Median : 434
Mean : 3002 Mean : 2019 Mean : 780
3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902
Max. :48094 Max. :26330 Max. :6392
Top10perc Top25perc F.Undergrad P.Undergrad
Min. : 1.00 Min. : 9.0 Min. : 139 Min. : 1.0
1st Qu.:15.00 1st Qu.: 41.0 1st Qu.: 992 1st Qu.: 95.0
Median :23.00 Median : 54.0 Median : 1707 Median : 353.0
Mean :27.56 Mean : 55.8 Mean : 3700 Mean : 855.3
3rd Qu.:35.00 3rd Qu.: 69.0 3rd Qu.: 4005 3rd Qu.: 967.0
Max. :96.00 Max. :100.0 Max. :31643 Max. :21836.0
Outstate Room.Board Books Personal
Min. : 2340 Min. :1780 Min. : 96.0 Min. : 250
1st Qu.: 7320 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850
Median : 9990 Median :4200 Median : 500.0 Median :1200
Mean :10441 Mean :4358 Mean : 549.4 Mean :1341
3rd Qu.:12925 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700
Max. :21700 Max. :8124 Max. :2340.0 Max. :6800
PhD Terminal S.F.Ratio perc.alumni
Min. : 8.00 Min. : 24.0 Min. : 2.50 Min. : 0.00
1st Qu.: 62.00 1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00
Median : 75.00 Median : 82.0 Median :13.60 Median :21.00
Mean : 72.66 Mean : 79.7 Mean :14.09 Mean :22.74
3rd Qu.: 85.00 3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00
Max. :103.00 Max. :100.0 Max. :39.80 Max. :64.00
Expend Grad.Rate
Min. : 3186 Min. : 10.00
1st Qu.: 6751 1st Qu.: 53.00
Median : 8377 Median : 65.00
Mean : 9660 Mean : 65.46
3rd Qu.:10830 3rd Qu.: 78.00
Max. :56233 Max. :118.00
college$Private <- ifelse(college$Private == "Yes", 1, 0)
pairs(college[,1:10], main = "Pairwise Scatterplots of the First Ten Variables")
boxplot(Outstate ~ Private, data = college,
main = "Out-of-State Tuition for Private and Public Universities",
xlab = "University Type (Private/Public)", ylab = "Out-of-State Tuition",
col = c("lightblue", "lightgreen"))
Elite <- rep("No", nrow(college))
Elite[college$Top10perc > 50] <- "Yes"
Elite <- as.factor(Elite)
college <- data.frame(college, Elite)
summary(college$Elite)
No Yes
699 78
boxplot(Outstate ~ Elite, data = college,
main = "Outstate Tuition vs Elite Status",
xlab = "Elite University (yes/no)", ylab = "Outstate Tuition",
col = c("pink", "purple"))
par(mfrow = c(2, 2))
hist(college$Apps, main = "Histogram of Applications", xlab = "Number of Applications", col = "lightblue", breaks = 20)
hist(college$Accept, main = "Histogram of Acceptances", xlab = "Number of Acceptances", col = "lightgreen", breaks = 30)
hist(college$Enroll, main = "Histogram of Enrollments", xlab = "Number of Enrollments", col = "lightpink", breaks = 15)
hist(college$Outstate, main = "Histogram of Outstate Tuition", xlab = "Outstate Tuition", col = "lightgray", breaks = 25)
summary(college[, c("Apps", "Accept", "Enroll", "Outstate")])
Apps Accept Enroll Outstate
Min. : 81 Min. : 72 Min. : 35 Min. : 2340
1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.: 7320
Median : 1558 Median : 1110 Median : 434 Median : 9990
Mean : 3002 Mean : 2019 Mean : 780 Mean :10441
3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:12925
Max. :48094 Max. :26330 Max. :6392 Max. :21700
The dataset reveals significant variability across universities. Applications range from as low as 81 to as high as 48,094, with a median of 1,558. This indicates that while a few universities receive an exceptionally large number of applications, most fall far below these outliers. Acceptances and enrollments follow similar patterns, with medians of 1,110 and 434, respectively, but there are still schools admitting and enrolling far more students. Out-of-state tuition is more consistent, with a median of $9,990 and a range from $2,340 to $21,700. Most universities tend to cluster around $10,000 in tuition costs.
The histograms further illustrate these trends. Applications, acceptances, and enrollments are all heavily skewed, with a few universities standing out as extreme outliers. In contrast, out-of-state tuition is distributed more evenly. These findings show the diversity among universities, from smaller, lower-cost institutions to larger schools with higher tuition fees and more substantial student populations.