The data includes age which is continuous. It includea the work class: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. Final weight is represented by “fnlwgt” which is continuous. ducation is divided up into Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. The education-num is continuous and shows education level. Marital-status is defined by Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. The occupations include: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. Relationship status includes: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. Race includes: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. Sex is Female or Male. The column of capital-gain, capital-loss, hours-per-week are continuous.
Loading the libraries and view the “income evaluation” dataset
setwd("~/Data 101")
Income <- read.csv("income_evaluation.csv")
Loading libraries and viewing the data
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
view(Income)
Loading the Janitor package to
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
Use library Janitor to clean up and format the column names
Income <- Income%>%
janitor::clean_names()
Income%>%
summary()
## age workclass fnlwgt education
## Min. :17.00 Length:32561 Min. : 12285 Length:32561
## 1st Qu.:28.00 Class :character 1st Qu.: 117827 Class :character
## Median :37.00 Mode :character Median : 178356 Mode :character
## Mean :38.58 Mean : 189778
## 3rd Qu.:48.00 3rd Qu.: 237051
## Max. :90.00 Max. :1484705
## education_num marital_status occupation relationship
## Min. : 1.00 Length:32561 Length:32561 Length:32561
## 1st Qu.: 9.00 Class :character Class :character Class :character
## Median :10.00 Mode :character Mode :character Mode :character
## Mean :10.08
## 3rd Qu.:12.00
## Max. :16.00
## race sex capital_gain capital_loss
## Length:32561 Length:32561 Min. : 0 Min. : 0.0
## Class :character Class :character 1st Qu.: 0 1st Qu.: 0.0
## Mode :character Mode :character Median : 0 Median : 0.0
## Mean : 1078 Mean : 87.3
## 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :99999 Max. :4356.0
## hours_per_week native_country income
## Min. : 1.00 Length:32561 Length:32561
## 1st Qu.:40.00 Class :character Class :character
## Median :40.00 Mode :character Mode :character
## Mean :40.44
## 3rd Qu.:45.00
## Max. :99.00
table(Income$marital_status)
##
## Divorced Married-AF-spouse Married-civ-spouse
## 4443 23 14976
## Married-spouse-absent Never-married Separated
## 418 10683 1025
## Widowed
## 993
table(Income$marital_status)/length(Income$marital_status)
##
## Divorced Married-AF-spouse Married-civ-spouse
## 0.1364515832 0.0007063665 0.4599367341
## Married-spouse-absent Never-married Separated
## 0.0128374436 0.3280918891 0.0314793772
## Widowed
## 0.0304966064
table1 <- table(Income$education, Income$occupation)
table1
##
## ? Adm-clerical Armed-Forces Craft-repair Exec-managerial
## 10th 102 38 0 170 24
## 11th 119 67 0 175 34
## 12th 40 38 1 58 13
## 1st-4th 12 0 0 23 4
## 5th-6th 30 6 0 43 1
## 7th-8th 73 11 0 116 19
## 9th 51 14 0 96 13
## Assoc-acdm 47 193 0 115 145
## Assoc-voc 61 167 0 252 150
## Bachelors 173 506 1 226 1369
## Doctorate 15 5 0 2 55
## HS-grad 533 1365 4 1922 807
## Masters 48 68 1 22 501
## Preschool 5 2 0 4 0
## Prof-school 18 9 0 7 52
## Some-college 516 1281 2 868 879
##
## Farming-fishing Handlers-cleaners Machine-op-inspct
## 10th 44 71 101
## 11th 37 123 99
## 12th 16 38 35
## 1st-4th 18 16 23
## 5th-6th 36 40 56
## 7th-8th 70 46 93
## 9th 28 49 76
## Assoc-acdm 14 24 33
## Assoc-voc 52 28 63
## Bachelors 77 50 69
## Doctorate 1 0 1
## HS-grad 404 611 1023
## Masters 10 5 8
## Preschool 9 2 11
## Prof-school 4 0 1
## Some-college 174 267 310
##
## Other-service Priv-house-serv Prof-specialty
## 10th 194 6 9
## 11th 238 14 20
## 12th 85 4 10
## 1st-4th 40 11 4
## 5th-6th 64 14 1
## 7th-8th 98 8 9
## 9th 101 10 3
## Assoc-acdm 78 2 138
## Assoc-voc 115 4 170
## Bachelors 181 7 1495
## Doctorate 1 0 321
## HS-grad 1281 50 233
## Masters 19 1 844
## Preschool 15 2 1
## Prof-school 4 0 452
## Some-college 781 16 430
##
## Protective-serv Sales Tech-support Transport-moving
## 10th 6 81 3 84
## 11th 7 144 6 92
## 12th 6 47 3 39
## 1st-4th 1 8 0 8
## 5th-6th 1 12 1 28
## 7th-8th 9 29 5 60
## 9th 4 32 2 35
## Assoc-acdm 34 144 73 27
## Assoc-voc 48 106 126 40
## Bachelors 100 809 230 62
## Doctorate 0 8 3 1
## HS-grad 215 1069 159 825
## Masters 15 134 37 10
## Preschool 0 0 0 0
## Prof-school 1 18 7 3
## Some-college 202 1009 273 283
table2 <- table(Income$race, Income$workclass)
table2
##
## ? Federal-gov Local-gov Never-worked Private
## Amer-Indian-Eskimo 25 19 36 0 190
## Asian-Pac-Islander 65 44 39 0 713
## Black 213 169 288 2 2176
## Other 23 7 10 0 213
## White 1510 721 1720 5 19404
##
## Self-emp-inc Self-emp-not-inc State-gov Without-pay
## Amer-Indian-Eskimo 2 24 15 0
## Asian-Pac-Islander 46 73 58 1
## Black 23 93 159 1
## Other 5 9 4 0
## White 1040 2342 1062 12
ggplot(data = Income, aes(x = race, y = income, fill = income)) +
geom_bar(stat = "identity")+
ggtitle("Income of Each Race")+
xlab("Race")+
ylab("Income")

pie_data <- Income%>%
group_by(race,relationship)%>%
summarize(counts = n(),
percentage = n()/ nrow(Income))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
view(pie_data)
x <- c(311, 1039, 3124, 271, 2781)
labels <- c("Amer_Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White")
piepercent<- round(100*x/sum(x), 1)
pie(x, labels = piepercent, main = "Relationship According to Race Pie Chart",col = rainbow(length(x)))
legend("topleft", c("Amer_Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"), cex = 0.6,
fill = rainbow(length(x)))

hist(Income$age, col="blue",
xlab= "Age",
ylab= "Frequency",
main="Hours Worked per Week")

hist(Income$hours_per_week, col="green",
xlab= "Hours per Week",
ylab= "Frequency",
main="Hours Worked per Week")

view(Income)
boxplot(age ~ marital_status, data = Income, xlab = "Marital Status",
ylab = "Age", main = "Marital Status for Ages")

boxplot(hours_per_week ~ sex, data = Income, xlab = "Sex",
ylab = "Hours per Week", main = "Hours per Week for Each Sex")

The information tells us a lot about the dataset. I think something to highlight is the fact that the averahe hours per week is 40 hours and some people even work up to 90 hours every week.The data also shows that most people in this dataset (representative of a larger group) are either not married or Married-civilian-spouse.When looking at education and occupation, most of these categorical positions are held by people with education levels of high school grad, bachelor’s degree, and some college. Every job category:Federal-government Local-government, Private, Self-employed are held by White individuals.It’s also apparent that for Self-employed and State-government jobs are held by white individuals. White individuals also hold the the greatest amount of individuals who make 50 thousand dollars or more. However, white individuals are pretty evenly distributed amongst individuals who make less than 50 thousand and ones who make 50 thousand or more.
Overall, the data doesn’t incorporate a huge data population, however, it is representative of each subgroup that is present in the data set. The summary statistsics shows the data includes 32561 people and that most people work 40 hour weeks accodrinf to the 1st Quartile(40.00), Median (40.00), Mean (40.44), and maxing out at Max 99 hours.