Introduction

Project Overview

This report provides a comprehensive analysis of credit score and wage data, exploring key trends and insights. The analysis focuses on the following areas:

  • Credit Score Distribution: Understanding the overall distribution of credit scores.
  • Credit Score by Age Group: Examining variations in credit scores across different age demographics.
  • Impact of Payment History: Investigating how payment history affects average credit scores.
  • Credit Utilization Ratio: Analyzing the relationship between credit utilization ratios and credit scores.
  • Wage Data Analysis: Exploring trends in wage data across different industries and educational levels.

Credit Score Analysis

1. Credit Score Distribution

Purpose: To analyze the overall distribution of credit scores within the dataset.

# Load necessary libraries
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)

# Create Credit Score Data
credit_data <- data.frame(
  ID = 1:1000,
  age = sample(18:80, 1000, replace = TRUE),  # Age of the individual
  credit_score = sample(300:850, 1000, replace = TRUE),  # Credit score range from 300 to 850
  utilization_ratio = runif(1000, 0, 1),  # Credit utilization ratio between 0 and 1
  payment_history = sample(c("On Time", "Late", "Defaulted"), 1000, replace = TRUE, prob = c(0.7, 0.25, 0.05)),  # Payment history categories
  credit_history_age = sample(1:30, 1000, replace = TRUE),  # Years of credit history
  year = sample(2015:2023, 1000, replace = TRUE),  # Year of the record
  Amount = sample(100:5000, 1000, replace = TRUE),  # Transaction amount or balance
  Frequency = sample(1:50, 1000, replace = TRUE)  # Frequency of credit transactions or actions
)

# Save the credit data as CSV
write_csv(credit_data, "credit_data.csv")

# Plot credit score distribution ggplot(credit_data, aes(x = Amount)) +
  ggplot(credit_data, aes(x = Amount)) +
  geom_histogram(binwidth = 50, fill = "blue", color = "black") +
  labs(title = "Transaction Amount Distribution", x = "Amount", y = "Frequency")

Findings: The histogram reveals the most common credit score ranges and the overall distribution shape, indicating potential areas for financial improvement.

2. Credit Score by Age Group

Purpose: To investigate how credit scores differ across various age groups.

credit_data <- credit_data %>%
    mutate(age_group = cut(age, breaks = c(18, 25, 35, 45, 55, 65, 75, Inf),
                           labels = c("18-25", "26-35", "36-45", "46-55", "56-65", "66-75", "75+")))
# Boxplot of credit score by age group
ggplot(credit_data, aes(x = age_group, y = credit_score)) +
    geom_boxplot(fill = "lightblue") +
    labs(title = "Credit Score by Age Group", x = "Age Group", y = "Credit Score") +
    theme_minimal()

Findings: Analyze trends in credit scores across age groups, providing insights on financial behavior relative to age.

3. Impact of Payment History on Credit Score

Purpose: To assess how various payment histories influence credit scores.

payment_history_avg <- credit_data %>%
    group_by(payment_history) %>%
    summarize(avg_score = mean(credit_score, na.rm = TRUE))

# Bar chart for average credit score by payment history
ggplot(payment_history_avg, aes(x = payment_history, y = avg_score)) +
    geom_bar(stat = "identity", fill = "coral") +
    labs(title = "Average Credit Score by Payment History", x = "Payment History", y = "Average Credit Score") +
    theme_minimal()

Findings: Highlight the critical impact of payment history on credit scores and the importance of maintaining good payment practices.

4. Credit Utilization Ratio and Credit Score

Purpose: To explore the relationship between credit utilization ratios and credit scores.

ggplot(credit_data, aes(x = utilization_ratio, y = credit_score)) +
    geom_point(alpha = 0.6) +
    geom_smooth(method = "lm", color = "red") +
    labs(title = "Credit Utilization vs Credit Score", x = "Credit Utilization Ratio", y = "Credit Score") +
    theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Findings: Analyze the correlation between credit utilization and credit scores to provide recommendations for better financial management.

Wage Data Analysis

1. Wage Distribution

Purpose: To visualize the distribution of wages within the dataset and compare it to national averages.

# Create Wage Data
wage_data <- data.frame(
  ID = 1:1000,
  age = c(18:65, 1000, replace = TRUE),  # Age of the individual
  wage = round(rnorm(1000, mean = 50000, sd = 15000), -2),  # Wage with a normal distribution centered around 50,000
  industry = sample(c("Technology", "Healthcare", "Finance", "Education", "Retail", "Manufacturing"), 1000, replace = TRUE),  # Industry sectors
  education_level = sample(c("High School", "Associate Degree", "Bachelor's Degree", "Master's Degree", "Doctorate"), 1000, replace = TRUE, prob = c(0.2, 0.25, 0.35, 0.15, 0.05)),  # Education levels
  employment_type = sample(c("Full-time", "Part-time", "Contract"), 1000, replace = TRUE, prob = c(0.7, 0.2, 0.1)),  # Employment types
  year = sample(2015:2023, 1000, replace = TRUE)  # Year of the record
)

# Ensure that all wages are positive by setting a minimum threshold
wage_data$wage <- pmax(wage_data$wage, 20000)

# Save the wage data as CSV
write_csv(wage_data, "wage_data.csv")

# Plot wage distribution
ggplot(wage_data, aes(x = wage)) +
    geom_histogram(binwidth = 5000, fill = "orange", color = "black") +
    labs(title = "Wage Distribution", x = "Wage", y = "Frequency") +
    theme_minimal()

Findings: Describe the distribution, pointing out the most common wage ranges and any anomalies.

2. Average Wage by Industry

Purpose: To compare average wages across different industries to see which sectors have the highest and lowest average wages.

industry_avg <- wage_data %>%
    group_by(industry) %>%
    summarize(avg_wage = mean(wage, na.rm = TRUE))
# Bar chart for average wage by industry
ggplot(industry_avg, aes(x = reorder(industry, -avg_wage), y = avg_wage)) +
    geom_bar(stat = "identity", fill = "steelblue") +
    labs(title = "Average Wage by Industry", x = "Industry", y = "Average Wage") +
    coord_flip() +
    theme_minimal()

Findings: Summarize the industries with the highest and lowest wages, highlighting any trends or outliers.

3. Wage Growth Over Time

Purpose: To analyze wage trends over time, identifying periods of significant growth or decline.

wage_trends <- wage_data %>%
    group_by(year) %>%
    summarize(avg_wage = mean(wage, na.rm = TRUE))
# Line chart for wage growth over time
ggplot(wage_trends, aes(x = year, y = avg_wage)) +
    geom_line(color = "green") +
    labs(title = "Wage Growth Over Time", x = "Year", y = "Average Wage") +
    theme_minimal()

Findings: Describe any notable trends, such as economic cycles impacting wages, to inform on broader economic insights.

4. Wage vs. Education Level

Purpose: To compare wages across different education levels and determine if higher education correlates with higher wages.

ggplot(wage_data, aes(x = education_level, y = wage)) +
    geom_boxplot(fill = "purple") +
    labs(title = "Wage by Education Level", x = "Education Level", y = "Wage") +
    theme_minimal()

Findings: Summarize the differences in wages by education, highlighting any clear advantages for specific education levels.

Conclusion

Summarize the key insights from both the credit score and wage data analyses. Offer recommendations if applicable, such as highlighting the importance of credit utilization and payment history for maintaining high credit scores or the benefits of industry and education on wage potential