Libraries

library(ggplot2)
library(dplyr)
library(ISLR)
data("Credit")
data("Wage")

Credit Data Analysis

1.Overall Distribution of Credit Scores:

What is the overall distribution of credit scores in the dataset?

ggplot(data = Credit, aes(x = Rating)) +
  geom_histogram(binwidth = 50, fill = "red", color = "black") +
  labs(title = "Distribution of Credit Scores", x = "Credit Score", y = "Frequency")

2.Credit Score by Age Group:

How do credit scores vary across different age groups within the dataset?

Credit <- Credit %>%
  mutate(age_group = cut(Age, breaks = c(18, 30, 45, 60, 75, 100), labels = c("18-30", "31-45", "46-60", "61-75", "76-100")))

ggplot(Credit, aes(x = age_group, y = Rating)) +
  geom_boxplot(fill = "blue") +
  labs(title = "Credit Score by Age Group", x = "Age Group", y = "Credit Score")

3.Impact of Payment History on Average Credit Score:

How does payment history affect the average credit score among individuals?

payment_history <- Credit %>%
  group_by(Cards) %>%
  summarize(avg_credit_score = mean(Rating))

ggplot(payment_history, aes(x = Cards, y = avg_credit_score)) +
  geom_line() +
  geom_point(size = 3, color = 'lightblue') +
  labs(title = "Impact of Payment History (Cards) on Credit Score", x = "Number of Credit Cards", y = "Average Credit Score")

4.Relationship between Credit Utilization Ratio and Credit Score:

How does payment history affect the average credit score among individuals?

Credit$UtilizationRatio <- Credit$Balance / Credit$Limit

ggplot(Credit, aes(x = UtilizationRatio, y = Rating)) +
  geom_point(color = 'black') +
  geom_smooth(method = 'lm', color = 'blue') +
  labs(title = "Credit Utilization Ratio vs Credit Score", 
       x = "Credit Utilization Ratio", 
       y = "Credit Score")
## `geom_smooth()` using formula = 'y ~ x'

Wage Data

1.Wage Distribution:

What is the distribution of wages in the dataset, and how does it compare to national averages?

ggplot(Wage, aes(x = wage)) +
  geom_histogram(binwidth = 25, fill = "green", color = "black") +
  labs(title = "Wage Distribution", x = "Wage", y = "Frequency")

2.Average Wage by Industry:

What are the average wages across different industries represented in the dataset?

Wage%>%
  group_by(jobclass) %>%
  summarise(avg_wage = mean(wage, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(jobclass, -avg_wage), y = avg_wage)) +
  geom_col(fill = "orange") +
  coord_flip() +
  labs(title = "Average Wage by Industry", x = "Industry", y = "Average Wage")

3.Wage Growth Over Time:

How have wages changed over time within the dataset? Are there specific periods of growth or decline?

ggplot(Wage, aes(x = year, y = wage)) +
  geom_line(stat = "summary", fun = "mean", color = "black") +
  labs(title = "Wage Growth Over Time", x = "Year", y = "Average Wage")

4.Wage by Education Level:

How do wages differ by education level, and is there a significant correlation between education and wage?

Wage %>%
  group_by(education) %>%
  summarise(avg_wage = mean(wage, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(education, -avg_wage), y = avg_wage)) +
  geom_col(fill = "red") +
  labs(title = "Average Wage by Education Level", x = "Education Level", y = "Average Wage")

5.Impact of Employment Type on Wage:

How do wages vary between full-time, part-time, and contract workers in the dataset?

Wage %>%
  group_by(jobclass) %>%
  summarise(avg_wage = mean(wage, na.rm = TRUE)) %>%
  ggplot(aes(x = jobclass, y = avg_wage)) +
  geom_col(fill = "lightblue") +
  labs(title = "Average Wage by Employment Type", x = "Employment Type", y = "Average Wage")