## dept title salary
## 1732 Fire PHYSICIAN,MD 342784
## 8405 Health & Human Services CHIEF PHYSICIAN,MD 186685
## 3897 Houston Airport System AVIATION DIRECTOR 275000
## 10704 Houston Public Works PUBLIC WORKS DIRECTOR 275000
## 7564 Library LIBRARY DIRECTOR 170000
## 13338 Other CITY ATTORNEY 275000
## 11679 Parks & Recreation PARKS & RECREATION DIRECTOR 150000
## 4413 Police POLICE CHIEF 280000
## 20244 Solid Waste Management SOLID WASTE DIRECTOR 195000
Technically, there are several employees that are at the maximum salary within the different departments. The following table is a count of the employees.
## title dept salary
## 1 AVIATION DIRECTOR Houston Airport System 1
## 2 CHIEF PHYSICIAN,MD Health & Human Services 3
## 3 CITY ATTORNEY Other 1
## 4 COMMUNITY DEVELOPMENT DIRECTOR Other 1
## 5 GENERAL SERVICES DIRECTOR Other 1
## 6 HUMAN RESOURCES DIRECTOR Other 1
## 7 LIBRARY DIRECTOR Library 1
## 8 PARKS & RECREATION DIRECTOR Parks & Recreation 1
## 9 PHYSICIAN,MD Fire 16
## 10 PLANNING DIRECTOR Other 2
## 11 POLICE CHIEF Police 1
## 12 PRESIDING JUDGE OF MUNICIPAL COURTS Other 1
## 13 PUBLIC WORKS DIRECTOR Houston Public Works 1
## 14 SOLID WASTE DIRECTOR Solid Waste Management 1
## Male Female
## 16950 7358
Males appear the most in this data set
## Female Male
## Fire 0.0548 0.9452
## Health & Human Services 0.7295 0.2705
## Houston Airport System 0.3643 0.6357
## Houston Public Works 0.2852 0.7148
## Library 0.7176 0.2824
## Other 0.5393 0.4607
## Parks & Recreation 0.3160 0.6840
## Police 0.2346 0.7654
## Solid Waste Management 0.2520 0.7480
From the data set, It looks like the majority of departments are dominated by males. The two exceptions are the Library department and Health and Human Services.
## dept title hire_date salary sex race
## 13705 Library CUSTOMER SERVICE CLERK 2016-01-19 9912 Female White
## 1183 Library CUSTOMER SERVICE CLERK 2016-01-19 9912 Female Hispanic
## 9838 Library CUSTOMER SERVICE CLERK 2016-02-29 9912 Female Black
The table shows the last three employees in a decreasing order of Salary.
## Female Male
## Asian 488 1059
## Black 3587 5074
## Hispanic 1940 4208
## Native American 39 107
## No Answer 13 14
## White 1291 6488
The table shows the race the count of type of sex with in that race.
I substituted ‘No Answer’ for ‘NA’ because I thought it is easier to understand.
## dept sex Mean 25 % 75 %
## 1 Fire Female 62212.64 45791.20 70181.28
## 2 Fire Male 60479.31 51194.00 66522.56
## 3 Health & Human Services Female 53838.31 36639.50 62496.00
## 4 Health & Human Services Male 59230.43 45109.75 64327.50
## 5 Houston Airport System Female 51099.30 32687.50 61781.00
## 6 Houston Airport System Male 57278.31 37710.00 67987.00
## 7 Houston Public Works Female 51294.45 36462.00 60201.00
## 8 Houston Public Works Male 51490.11 35277.00 58968.00
## 9 Library Female 41126.96 29869.00 49295.75
## 10 Library Male 44399.94 30930.00 53743.00
## 11 Other Female 61232.95 41142.00 72100.00
## 12 Other Male 61687.97 41298.50 72831.00
## 13 Parks & Recreation Female 36952.82 27851.00 40632.50
## 14 Parks & Recreation Male 37081.00 27851.00 39998.00
## 15 Police Female 58374.81 40258.75 71994.00
## 16 Police Male 69443.14 58379.88 75942.10
## 17 Solid Waste Management Female 46682.63 35506.00 52437.00
## 18 Solid Waste Management Male 42776.72 35506.00 42026.00
Here is the table with 25% percentale,mean , and 50% percentale rounded to two decimal places.
import pandas as pan
import numpy as np
employee_data = pan.read_csv("https://raw.githubusercontent.com/Rusty1299/HW_files/877c944316fa5445e972a1606d009dce5781b4d4/employee.csv")
Employee_data_types = employee_data.dtypes
def percentages(x):
copy_of_x = x.copy()
count_of_people =np.array(copy_of_x.loc[:,['Female','Male']])
total_people = np.reshape(count_of_people.sum(axis=1), (9,1))
P_value = count_of_people/total_people
copy_of_x.loc[:,['Female','Male']] = P_value[:,:]
return copy_of_x
def my25(g):
return np.percentile(g, 25)
def my75(g):
return np.percentile(g, 75)
## Finding the Unique values for the dataframe
Job_Titles = pan.unique(employee_data['title'])
Sex = pan.unique(employee_data['sex'])
Race = pan.unique(employee_data['race'])
Departments = pan.unique(employee_data['dept'])
Race_nan = employee_data[pan.isnull(employee_data['race'])]
# Question_1
Max_salary_Index = employee_data.groupby('dept')['salary'].idxmax()
Question_1 = employee_data.loc[Max_salary_Index,['dept','title','salary']]
Employees_with_max_Salary = employee_data[employee_data['salary'].isin(Question_1['salary'])]
Q1_Bonus = Employees_with_max_Salary.pivot_table(index=('title','dept'),aggfunc=('count')).fillna(0)
Q1_Bonus = Q1_Bonus.reset_index()
Q1_Bonus = Q1_Bonus.loc[:,('title','dept','salary')]
# Question 2
Question_2 = employee_data['sex'].value_counts()
# Question 3
Sex_count = employee_data.groupby('sex')['dept'].count()
Counting_males_females_per_dept= employee_data.pivot_table(values='sex', columns=employee_data['sex'],index='dept',aggfunc='count')
Question_3 = percentages(Counting_males_females_per_dept)
print(Question_3)
Employees_with_Salary = employee_data[pan.notnull(employee_data['salary'])]
Question_4 = Employees_with_Salary.sort_values('salary',ascending=False)
print(Question_4.iloc[-3:])
empolyee_copy_data = employee_data.copy()
empolyee_copy_data['race'] = empolyee_copy_data['race'].fillna('No Answer')
Question_5 = empolyee_copy_data.pivot_table(values='sex', columns=employee_data['sex'],index='race',aggfunc='count')
Question_6 =round(Employees_with_Salary.pivot_table(values='salary',index=('dept','sex'),aggfunc=(my25,np.mean,my75)),2)
Question_6.columns =['Mean','25 %',"75 %"]
Question_6 = Question_6.reset_index()
print('\n',Question_6)