Part 1

Sort the employee by salary. Return the top employee of each department by salary. Return just the department, title, and salary columns

##                          dept                       title salary
## 1732                     Fire                PHYSICIAN,MD 342784
## 8405  Health & Human Services          CHIEF PHYSICIAN,MD 186685
## 3897   Houston Airport System           AVIATION DIRECTOR 275000
## 10704    Houston Public Works       PUBLIC WORKS DIRECTOR 275000
## 7564                  Library            LIBRARY DIRECTOR 170000
## 13338                   Other               CITY ATTORNEY 275000
## 11679      Parks & Recreation PARKS & RECREATION DIRECTOR 150000
## 4413                   Police                POLICE CHIEF 280000
## 20244  Solid Waste Management        SOLID WASTE DIRECTOR 195000

Bonus for Part 1

Technically, there are several employees that are at the maximum salary within the different departments. The following table is a count of the employees.

##                                  title                    dept salary
## 1                    AVIATION DIRECTOR  Houston Airport System      1
## 2                   CHIEF PHYSICIAN,MD Health & Human Services      3
## 3                        CITY ATTORNEY                   Other      1
## 4       COMMUNITY DEVELOPMENT DIRECTOR                   Other      1
## 5            GENERAL SERVICES DIRECTOR                   Other      1
## 6             HUMAN RESOURCES DIRECTOR                   Other      1
## 7                     LIBRARY DIRECTOR                 Library      1
## 8          PARKS & RECREATION DIRECTOR      Parks & Recreation      1
## 9                         PHYSICIAN,MD                    Fire     16
## 10                   PLANNING DIRECTOR                   Other      2
## 11                        POLICE CHIEF                  Police      1
## 12 PRESIDING JUDGE OF MUNICIPAL COURTS                   Other      1
## 13               PUBLIC WORKS DIRECTOR    Houston Public Works      1
## 14                SOLID WASTE DIRECTOR  Solid Waste Management      1

Part 2

Which sex and race combination appears the most in the dataset?

##   Male Female 
##  16950   7358

Males appear the most in this data set

Part 3

Find the percentage of each sex by department

##                         Female   Male
## Fire                    0.0548 0.9452
## Health & Human Services 0.7295 0.2705
## Houston Airport System  0.3643 0.6357
## Houston Public Works    0.2852 0.7148
## Library                 0.7176 0.2824
## Other                   0.5393 0.4607
## Parks & Recreation      0.3160 0.6840
## Police                  0.2346 0.7654
## Solid Waste Management  0.2520 0.7480

From the data set, It looks like the majority of departments are dominated by males. The two exceptions are the Library department and Health and Human Services.

Part 4

Drop employee with missing salary. Sort the data by salary in decreasing order. Return the last 3 employees from each department.

##          dept                  title  hire_date salary    sex     race
## 13705 Library CUSTOMER SERVICE CLERK 2016-01-19   9912 Female    White
## 1183  Library CUSTOMER SERVICE CLERK 2016-01-19   9912 Female Hispanic
## 9838  Library CUSTOMER SERVICE CLERK 2016-02-29   9912 Female    Black

The table shows the last three employees in a decreasing order of Salary.

Part 5

Count the number of frequency of occurrence of each race within each sex.

##                 Female Male
## Asian              488 1059
## Black             3587 5074
## Hispanic          1940 4208
## Native American     39  107
## No Answer           13   14
## White             1291 6488

The table shows the race the count of type of sex with in that race.

I substituted ‘No Answer’ for ‘NA’ because I thought it is easier to understand.

Part 6

Create a pivot table of mean salary with index of department and sex and column of salary quartiles. .

##                       dept    sex     Mean     25 %     75 %
## 1                     Fire Female 62212.64 45791.20 70181.28
## 2                     Fire   Male 60479.31 51194.00 66522.56
## 3  Health & Human Services Female 53838.31 36639.50 62496.00
## 4  Health & Human Services   Male 59230.43 45109.75 64327.50
## 5   Houston Airport System Female 51099.30 32687.50 61781.00
## 6   Houston Airport System   Male 57278.31 37710.00 67987.00
## 7     Houston Public Works Female 51294.45 36462.00 60201.00
## 8     Houston Public Works   Male 51490.11 35277.00 58968.00
## 9                  Library Female 41126.96 29869.00 49295.75
## 10                 Library   Male 44399.94 30930.00 53743.00
## 11                   Other Female 61232.95 41142.00 72100.00
## 12                   Other   Male 61687.97 41298.50 72831.00
## 13      Parks & Recreation Female 36952.82 27851.00 40632.50
## 14      Parks & Recreation   Male 37081.00 27851.00 39998.00
## 15                  Police Female 58374.81 40258.75 71994.00
## 16                  Police   Male 69443.14 58379.88 75942.10
## 17  Solid Waste Management Female 46682.63 35506.00 52437.00
## 18  Solid Waste Management   Male 42776.72 35506.00 42026.00

Here is the table with 25% percentale,mean , and 50% percentale rounded to two decimal places.

import pandas as pan 
import numpy as np



employee_data = pan.read_csv("https://raw.githubusercontent.com/Rusty1299/HW_files/877c944316fa5445e972a1606d009dce5781b4d4/employee.csv")
Employee_data_types = employee_data.dtypes

def percentages(x):
    copy_of_x = x.copy()
    count_of_people =np.array(copy_of_x.loc[:,['Female','Male']])
    total_people = np.reshape(count_of_people.sum(axis=1), (9,1))
    P_value = count_of_people/total_people
    copy_of_x.loc[:,['Female','Male']] = P_value[:,:]
    return copy_of_x

def my25(g):
    return np.percentile(g, 25)

def my75(g):
    return np.percentile(g, 75)



## Finding the Unique values for the dataframe

Job_Titles = pan.unique(employee_data['title'])
Sex = pan.unique(employee_data['sex'])
Race = pan.unique(employee_data['race'])
Departments = pan.unique(employee_data['dept'])
Race_nan = employee_data[pan.isnull(employee_data['race'])]








# Question_1


Max_salary_Index = employee_data.groupby('dept')['salary'].idxmax()
Question_1 = employee_data.loc[Max_salary_Index,['dept','title','salary']]

Employees_with_max_Salary = employee_data[employee_data['salary'].isin(Question_1['salary'])]

Q1_Bonus = Employees_with_max_Salary.pivot_table(index=('title','dept'),aggfunc=('count')).fillna(0)

Q1_Bonus = Q1_Bonus.reset_index()
Q1_Bonus = Q1_Bonus.loc[:,('title','dept','salary')]

# Question 2

Question_2 = employee_data['sex'].value_counts()

# Question 3

Sex_count = employee_data.groupby('sex')['dept'].count()

Counting_males_females_per_dept= employee_data.pivot_table(values='sex', columns=employee_data['sex'],index='dept',aggfunc='count')


Question_3 = percentages(Counting_males_females_per_dept)

print(Question_3)


Employees_with_Salary = employee_data[pan.notnull(employee_data['salary'])]

Question_4 = Employees_with_Salary.sort_values('salary',ascending=False)

print(Question_4.iloc[-3:])


empolyee_copy_data = employee_data.copy()


empolyee_copy_data['race'] = empolyee_copy_data['race'].fillna('No Answer')

Question_5 = empolyee_copy_data.pivot_table(values='sex', columns=employee_data['sex'],index='race',aggfunc='count')


Question_6 =round(Employees_with_Salary.pivot_table(values='salary',index=('dept','sex'),aggfunc=(my25,np.mean,my75)),2)
Question_6.columns =['Mean','25 %',"75 %"]

Question_6 = Question_6.reset_index()



print('\n',Question_6)