The data includes age which is continuous. It includea the work class: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. Final weight is represented by “fnlwgt” which is continuous. ducation is divided up into Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. The education-num is continuous and shows education level. Marital-status is defined by Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. The occupations include: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. Relationship status includes: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. Race includes: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. Sex is Female or Male. The column of capital-gain, capital-loss, hours-per-week are continuous.

Loading the libraries and view the “income evaluation” dataset

setwd("~/Data 101")
Income <- read.csv("income_evaluation.csv")

Loading libraries and viewing the data

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dplyr)
view(Income)

Loading the Janitor package to

library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Use library Janitor to clean up and format the column names

Income <- Income%>%
janitor::clean_names()
Income%>%
summary()
##       age         workclass             fnlwgt         education        
##  Min.   :17.00   Length:32561       Min.   :  12285   Length:32561      
##  1st Qu.:28.00   Class :character   1st Qu.: 117827   Class :character  
##  Median :37.00   Mode  :character   Median : 178356   Mode  :character  
##  Mean   :38.58                      Mean   : 189778                     
##  3rd Qu.:48.00                      3rd Qu.: 237051                     
##  Max.   :90.00                      Max.   :1484705                     
##  education_num   marital_status      occupation        relationship      
##  Min.   : 1.00   Length:32561       Length:32561       Length:32561      
##  1st Qu.: 9.00   Class :character   Class :character   Class :character  
##  Median :10.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :10.08                                                           
##  3rd Qu.:12.00                                                           
##  Max.   :16.00                                                           
##      race               sex             capital_gain    capital_loss   
##  Length:32561       Length:32561       Min.   :    0   Min.   :   0.0  
##  Class :character   Class :character   1st Qu.:    0   1st Qu.:   0.0  
##  Mode  :character   Mode  :character   Median :    0   Median :   0.0  
##                                        Mean   : 1078   Mean   :  87.3  
##                                        3rd Qu.:    0   3rd Qu.:   0.0  
##                                        Max.   :99999   Max.   :4356.0  
##  hours_per_week  native_country        income         
##  Min.   : 1.00   Length:32561       Length:32561      
##  1st Qu.:40.00   Class :character   Class :character  
##  Median :40.00   Mode  :character   Mode  :character  
##  Mean   :40.44                                        
##  3rd Qu.:45.00                                        
##  Max.   :99.00
table(Income$marital_status)
## 
##               Divorced      Married-AF-spouse     Married-civ-spouse 
##                   4443                     23                  14976 
##  Married-spouse-absent          Never-married              Separated 
##                    418                  10683                   1025 
##                Widowed 
##                    993
table(Income$marital_status)/length(Income$marital_status)
## 
##               Divorced      Married-AF-spouse     Married-civ-spouse 
##           0.1364515832           0.0007063665           0.4599367341 
##  Married-spouse-absent          Never-married              Separated 
##           0.0128374436           0.3280918891           0.0314793772 
##                Widowed 
##           0.0304966064
table1 <- table(Income$education, Income$occupation)
table1
##                
##                    ?  Adm-clerical  Armed-Forces  Craft-repair  Exec-managerial
##    10th          102            38             0           170               24
##    11th          119            67             0           175               34
##    12th           40            38             1            58               13
##    1st-4th        12             0             0            23                4
##    5th-6th        30             6             0            43                1
##    7th-8th        73            11             0           116               19
##    9th            51            14             0            96               13
##    Assoc-acdm     47           193             0           115              145
##    Assoc-voc      61           167             0           252              150
##    Bachelors     173           506             1           226             1369
##    Doctorate      15             5             0             2               55
##    HS-grad       533          1365             4          1922              807
##    Masters        48            68             1            22              501
##    Preschool       5             2             0             4                0
##    Prof-school    18             9             0             7               52
##    Some-college  516          1281             2           868              879
##                
##                  Farming-fishing  Handlers-cleaners  Machine-op-inspct
##    10th                       44                 71                101
##    11th                       37                123                 99
##    12th                       16                 38                 35
##    1st-4th                    18                 16                 23
##    5th-6th                    36                 40                 56
##    7th-8th                    70                 46                 93
##    9th                        28                 49                 76
##    Assoc-acdm                 14                 24                 33
##    Assoc-voc                  52                 28                 63
##    Bachelors                  77                 50                 69
##    Doctorate                   1                  0                  1
##    HS-grad                   404                611               1023
##    Masters                    10                  5                  8
##    Preschool                   9                  2                 11
##    Prof-school                 4                  0                  1
##    Some-college              174                267                310
##                
##                  Other-service  Priv-house-serv  Prof-specialty
##    10th                    194                6               9
##    11th                    238               14              20
##    12th                     85                4              10
##    1st-4th                  40               11               4
##    5th-6th                  64               14               1
##    7th-8th                  98                8               9
##    9th                     101               10               3
##    Assoc-acdm               78                2             138
##    Assoc-voc               115                4             170
##    Bachelors               181                7            1495
##    Doctorate                 1                0             321
##    HS-grad                1281               50             233
##    Masters                  19                1             844
##    Preschool                15                2               1
##    Prof-school               4                0             452
##    Some-college            781               16             430
##                
##                  Protective-serv  Sales  Tech-support  Transport-moving
##    10th                        6     81             3                84
##    11th                        7    144             6                92
##    12th                        6     47             3                39
##    1st-4th                     1      8             0                 8
##    5th-6th                     1     12             1                28
##    7th-8th                     9     29             5                60
##    9th                         4     32             2                35
##    Assoc-acdm                 34    144            73                27
##    Assoc-voc                  48    106           126                40
##    Bachelors                 100    809           230                62
##    Doctorate                   0      8             3                 1
##    HS-grad                   215   1069           159               825
##    Masters                    15    134            37                10
##    Preschool                   0      0             0                 0
##    Prof-school                 1     18             7                 3
##    Some-college              202   1009           273               283
table2 <- table(Income$race, Income$workclass)
table2
##                      
##                           ?  Federal-gov  Local-gov  Never-worked  Private
##    Amer-Indian-Eskimo    25           19         36             0      190
##    Asian-Pac-Islander    65           44         39             0      713
##    Black                213          169        288             2     2176
##    Other                 23            7         10             0      213
##    White               1510          721       1720             5    19404
##                      
##                        Self-emp-inc  Self-emp-not-inc  State-gov  Without-pay
##    Amer-Indian-Eskimo             2                24         15            0
##    Asian-Pac-Islander            46                73         58            1
##    Black                         23                93        159            1
##    Other                          5                 9          4            0
##    White                       1040              2342       1062           12
ggplot(data = Income, aes(x = race, y = income, fill = income)) +
  geom_bar(stat = "identity")+
ggtitle("Income of Each Race")+
xlab("Race")+
ylab("Income")

pie_data <- Income%>% 
  group_by(race,relationship)%>%
  summarize(counts = n(), 
            percentage = n()/ nrow(Income))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
view(pie_data)
x <-  c(311, 1039, 3124, 271, 2781)
labels <-  c("Amer_Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White")

piepercent<- round(100*x/sum(x), 1)
pie(x, labels = piepercent, main = "Relationship According to Race Pie Chart",col = rainbow(length(x)))
legend("topleft", c("Amer_Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"), cex = 0.6,
   fill = rainbow(length(x)))

hist(Income$age, col="blue",
xlab= "Age",
ylab= "Frequency",
main="Hours Worked per Week")

hist(Income$hours_per_week, col="green",
     xlab= "Hours per Week",
ylab= "Frequency",
main="Hours Worked per Week")

view(Income)
boxplot(age ~ marital_status, data = Income, xlab = "Marital Status",
   ylab = "Age", main = "Marital Status for Ages")

boxplot(hours_per_week ~ sex, data = Income, xlab = "Sex",
   ylab = "Hours per Week", main = "Hours per Week for Each Sex")

The information tells us a lot about the dataset. I think something to highlight is the fact that the averahe hours per week is 40 hours and some people even work up to 90 hours every week.The data also shows that most people in this dataset (representative of a larger group) are either not married or Married-civilian-spouse.When looking at education and occupation, most of these categorical positions are held by people with education levels of high school grad, bachelor’s degree, and some college. Every job category:Federal-government Local-government, Private, Self-employed are held by White individuals.It’s also apparent that for Self-employed and State-government jobs are held by white individuals. White individuals also hold the the greatest amount of individuals who make 50 thousand dollars or more. However, white individuals are pretty evenly distributed amongst individuals who make less than 50 thousand and ones who make 50 thousand or more.

Overall, the data doesn’t incorporate a huge data population, however, it is representative of each subgroup that is present in the data set. The summary statistsics shows the data includes 32561 people and that most people work 40 hour weeks accodrinf to the 1st Quartile(40.00), Median (40.00), Mean (40.44), and maxing out at Max 99 hours.