Basic Data Analysis Project

Tayyab Sarfraz

import the required libraries

Import required dataset

How many records are there in data.csv?

In this dataset there are 48842 rows and 10 columns

In data.csv, there are some missing values, which are denoted by ?

Interesting column Job and column Jobclass contains many missing values. How many records have both these columns missing?

Replace all ? occurrences with NaN

List unique enteries in columns Job, Jobclass and native-country

Replace NaN in columns Job, Jobclass and native-country with the most frequently occurred value.

income column contains strings <=50K and >50K. Replace with 0 and 1 respectively.

List down the data types of every column

Convert column income data type to int32

Plot the heat map of the corelation between all the numeric columns in the dataframe as shown below

This heatmap show that there is a relationship between the "age" and "income" but not much more

Plot boxplot between the age and income

The Education-level and income are usually related. Plot a bar chart that shows the count of people who have a certain level in education and have income

This plot desribe the salary of different education levels , if we see that in HS-grad have most people have salary less then 50K on the other hand in the Bechelors most of the people have salary greater then 50K.

plot for Job and income

This plot describe that the more income in the jobs are "Pro-speciality" and "Exec-managerial" and more less salaries jobs are "Adm-clerical" and "other-services"