1. Insert a link here to your potential data source and briefly describe your data. Be sure to address the following questions:

    https://www.kaggle.com/wsj/college-salaries/data

    - The dataset contains information about salaries for graduates from different undergraduate majors. The data was compiled using salary information from PayScale and was originally published by the Wall Street Journal. It includes information about starting salaries and mid-career salaries across various academic fields. The dataset contains 50 observations (rows) and 8 variables (columns). Each observation represents one undergraduate major and the associated salary statistics for that major. This dataset is considered an observational study because the data records real-world salary outcomes from individuals and does not involve manipulating variables or conducting an experiment.

  1. Please include a 4-5 sentence description of your research question or project idea.

    - Choosing a college major is an important decision that can affect career opportunities and long-term income. Many students are interested in knowing whether majors with higher starting salaries also lead to higher earnings later in their careers. This project will explore the relationship between starting salaries and mid-career salaries across different undergraduate majors. In particular, I want to examine whether majors that pay more immediately after graduation also tend to have higher salaries during mid-career. Understanding this relationship may help provide insight into how different academic fields influence long-term earning potential.

  2. What variables in your data set might you consider to answer your research question? List at least three. For full points, describe each variable as follows:

-

Undergraduate Major:

Type: Categorical (nominal)

Levels: Different majors such as Engineering, Economics, Psychology, Biology, etc.

Description: This variable shows the name of the undergraduate major. Each row in the dataset represents a different major and the salary data associated with it.

Missing values: There probably aren’t missing values for this variable, but if there are any they could just be removed by filtering those rows out.

Starting Median Salary :

Type: Numeric (continuous)

Range: Around $30,000 to about $70,000

Description: This variable shows the median salary people earn early in their careers after graduating with that major. I will use this variable to compare how much people typically make right after college across different majors.

Missing values: If there are missing values, I would likely remove those rows so they don’t affect the analysis.

Mid-Career Median Salary :

Type: Numeric (continuous)

Range: Around $60,000 to about $130,000

Description: This variable represents the median salary people earn later in their careers, usually around 10 or more years after graduating. This will help me compare long-term salary differences between majors.

Missing values: If there are any missing values, they could be filtered out before making graphs or doing analysis.

Mid-Career 90th Percentile Salary :

Type: Numeric (continuous)

Range: Around $100,000 to over $200,000

Description: This variable shows the salary for the top 10% of earners within each major during mid- career. It gives an idea of the highest earning potential within different fields.

Missing values: If there are NAs in this variable, they could be removed during the data cleaning step.

STAGE 2:

Article 1: The Changing Nature of Middle-Class Jobs

1. One thing compelling about the visualizations:
- One thing I liked about the visualizations is how they clearly showed changes over time. The graphs made it easy to see how middle-class jobs have decreased while other types of jobs increased. The use of color also helped separate different job categories so it wasn’t confusing.

2. One thing compelling about the text:
- The text was easy to follow and explained why these changes were happening. It didn’t just show the data but also connected it to real-world factors like technology and economic changes, which made it more interesting.

3. Description of one visualization:
- I think the authors did a good job explaining the graph that showed job changes over time. The title and labels made it clear what was being shown. However, one thing that could have been better is adding a bit more explanation about what counts as “middle-class jobs,” since that part could be confusing.

4. One idea for my project:
- This article showed me that it’s important to clearly explain what the categories mean. For my project, I should make sure to explain what each salary variable represents so people understand the differences.

Article 2: How Every NFL Team’s Fans Lean Politically

1. One thing compelling about the visualizations:
- The visualizations were interesting because they combined sports and politics in a way that was easy to understand. The maps and charts made it easy to compare different teams and see patterns across regions.

2. One thing compelling about the text:
- The text explained the patterns in a simple way and connected them to geographic and cultural differences. It helped make sense of why certain teams had fans with different political views.

3. Description of one visualization:
- The visualization showing team fan bases and political leanings was clear and well-labeled. The colors helped show differences between groups. One thing that could have been improved is adding more explanation for how the political leaning was measured.

4. One idea for my project:
- This article gave me the idea that visuals should be easy to compare across categories. For my project, I could compare different majors in a clear way so it’s easy to see which ones have higher salaries.

Draft Introduction

Choosing a college major is something most students think a lot about because it can affect future jobs and income. Many people assume that picking a major with a high starting salary will automatically lead to higher earnings later in life, but that may not always be true. Different majors can have very different career paths, and some may grow more over time than others. Because of this, it is interesting to look at how salaries change from early career to mid-career across different fields. This project will focus on whether higher starting salaries actually relate to higher long-term earnings.

The data used in this project comes from salary information collected by PayScale and published by the Wall Street Journal. It includes data on different undergraduate majors and their associated salaries at both early career and mid-career stages. Each row in the dataset represents a different major, and the data includes variables such as starting median salary and mid-career median salary. The dataset contains 50 majors and shows salary ranges across different points in a person’s career. This data was collected through surveys of workers in different fields, so it reflects real-world salary outcomes rather than results from an experiment.