library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(pastecs)
##
## Attaching package: 'pastecs'
## The following objects are masked from 'package:dplyr':
##
## first, last
county <- read_csv("https://corgis-edu.github.io/corgis/datasets/csv/county_demographics/county_demographics.csv")
## Rows: 3139 Columns: 43
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): County, State
## dbl (41): Age.Percent 65 and Older, Age.Percent Under 18 Years, Age.Percent ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
texas <- county %>%
filter(State == "TX" | State == "Texas")
The variable selected for this analysis is median household income, which represents the median income earned by households within each Texas county. This variable reflects the midpoint of the income distribution and provides insight into the overall economic well-being of residents in each county.
stat.desc(texas$`Income.Median Houseold Income`)
## nbr.val nbr.null nbr.na min max range
## 2.540000e+02 0.000000e+00 0.000000e+00 2.509800e+04 1.009200e+05 7.582200e+04
## sum median mean SE.mean CI.mean.0.95 var
## 1.342329e+07 5.137100e+04 5.284759e+04 8.021239e+02 1.579691e+03 1.634243e+08
## std.dev coef.var
## 1.278375e+04 2.418985e-01
texas_clean <- texas %>%
filter(!is.na(`Income.Median Houseold Income`))
hist(texas_clean$`Income.Median Houseold Income`,
main = "Median Household Income (Texas Counties)",
xlab = "Income")
texas_clean <- texas_clean %>%
mutate(log_income = log(`Income.Median Houseold Income`))
hist(texas_clean$log_income,
main = "Log of Median Household Income",
xlab = "Log Income")
The original income distribution appears right-skewed, with most counties clustered at lower income levels and a smaller number of counties with higher incomes. After applying a log transformation, the distribution becomes more symmetric, reducing skewness and making the data more suitable for statistical analysis.