Insert a link here to your potential data source and briefly describe your data. Be sure to address the following questions:
https://www.kaggle.com/wsj/college-salaries/data
- The dataset contains information about salaries for graduates from different undergraduate majors. The data was compiled using salary information from PayScale and was originally published by the Wall Street Journal. It includes information about starting salaries and mid-career salaries across various academic fields. The dataset contains 50 observations (rows) and 8 variables (columns). Each observation represents one undergraduate major and the associated salary statistics for that major. This dataset is considered an observational study because the data records real-world salary outcomes from individuals and does not involve manipulating variables or conducting an experiment.
Please include a 4-5 sentence description of your research question or project idea.
- Choosing a college major is an important decision that can affect career opportunities and long-term income. Many students are interested in knowing whether majors with higher starting salaries also lead to higher earnings later in their careers. This project will explore the relationship between starting salaries and mid-career salaries across different undergraduate majors. In particular, I want to examine whether majors that pay more immediately after graduation also tend to have higher salaries during mid-career. Understanding this relationship may help provide insight into how different academic fields influence long-term earning potential.
What variables in your data set might you consider to answer your research question? List at least three. For full points, describe each variable as follows:
-
Undergraduate Major:
Type: Categorical (nominal)
Levels: Different majors such as Engineering, Economics, Psychology, Biology, etc.
Description: This variable shows the name of the undergraduate major. Each row in the dataset represents a different major and the salary data associated with it.
Missing values: There probably aren’t missing values for this variable, but if there are any they could just be removed by filtering those rows out.
Starting Median Salary :
Type: Numeric (continuous)
Range: Around $30,000 to about $70,000
Description: This variable shows the median salary people earn early in their careers after graduating with that major. I will use this variable to compare how much people typically make right after college across different majors.
Missing values: If there are missing values, I would likely remove those rows so they don’t affect the analysis.
Mid-Career Median Salary :
Type: Numeric (continuous)
Range: Around $60,000 to about $130,000
Description: This variable represents the median salary people earn later in their careers, usually around 10 or more years after graduating. This will help me compare long-term salary differences between majors.
Missing values: If there are any missing values, they could be filtered out before making graphs or doing analysis.
Type: Numeric (continuous)
Range: Around $100,000 to over $200,000
Description: This variable shows the salary for the top 10% of earners within each major during mid- career. It gives an idea of the highest earning potential within different fields.
Missing values: If there are NAs in this variable, they could be removed during the data cleaning step.
1. One thing compelling about the
visualizations:
- One thing I liked about the visualizations is how they clearly showed
changes over time. The graphs made it easy to see how middle-class jobs
have decreased while other types of jobs increased. The use of color
also helped separate different job categories so it wasn’t
confusing.
2. One thing compelling about the text:
- The text was easy to follow and explained why these changes were
happening. It didn’t just show the data but also connected it to
real-world factors like technology and economic changes, which made it
more interesting.
3. Description of one visualization:
- I think the authors did a good job explaining the graph that showed
job changes over time. The title and labels made it clear what was being
shown. However, one thing that could have been better is adding a bit
more explanation about what counts as “middle-class jobs,” since that
part could be confusing.
4. One idea for my project:
- This article showed me that it’s important to clearly explain what the
categories mean. For my project, I should make sure to explain what each
salary variable represents so people understand the differences.
1. One thing compelling about the
visualizations:
- The visualizations were interesting because they combined sports and
politics in a way that was easy to understand. The maps and charts made
it easy to compare different teams and see patterns across regions.
2. One thing compelling about the text:
- The text explained the patterns in a simple way and connected them to
geographic and cultural differences. It helped make sense of why certain
teams had fans with different political views.
3. Description of one visualization:
- The visualization showing team fan bases and political leanings was
clear and well-labeled. The colors helped show differences between
groups. One thing that could have been improved is adding more
explanation for how the political leaning was measured.
4. One idea for my project:
- This article gave me the idea that visuals should be easy to compare
across categories. For my project, I could compare different majors in a
clear way so it’s easy to see which ones have higher salaries.
Choosing a college major is something most students think a lot about because it can affect future jobs and income. Many people assume that picking a major with a high starting salary will automatically lead to higher earnings later in life, but that may not always be true. Different majors can have very different career paths, and some may grow more over time than others. Because of this, it is interesting to look at how salaries change from early career to mid-career across different fields. This project will focus on whether higher starting salaries actually relate to higher long-term earnings.
The data used in this project comes from salary information collected by PayScale and published by the Wall Street Journal. It includes data on different undergraduate majors and their associated salaries at both early career and mid-career stages. Each row in the dataset represents a different major, and the data includes variables such as starting median salary and mid-career median salary. The dataset contains 50 majors and shows salary ranges across different points in a person’s career. This data was collected through surveys of workers in different fields, so it reflects real-world salary outcomes rather than results from an experiment.