Objetivo: Analizar Datos
Utilizar la libreria dplyr para analizar datos de Salarios
Cargar las Librerias
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Los Datos
# Cargar datos de salarios
# salarios <- read.csv("Va la ruta en donde estan los datos")
salarios <- read.csv("C:/Users/l-RaVeN-l/Documents/Ciencia de los Datos/Datos/Salaries.csv")
# salarios # Ya no los queremos ver
str(salarios)
## 'data.frame': 148654 obs. of 13 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ EmployeeName : Factor w/ 110810 levels "A Bernard Fatooh",..: 77636 34712 1560 17232 81101 23164 3271 22709 73975 47938 ...
## $ JobTitle : Factor w/ 2159 levels "Account Clerk",..: 836 298 298 2149 594 135 246 609 246 370 ...
## $ BasePay : num 167411 155966 212739 77916 134402 ...
## $ OvertimePay : num 0 245132 106088 56121 9737 ...
## $ OtherPay : num 400184 137811 16453 198307 182235 ...
## $ Benefits : num NA NA NA NA NA NA NA NA NA NA ...
## $ TotalPay : num 567595 538909 335280 332344 326373 ...
## $ TotalPayBenefits: num 567595 538909 335280 332344 326373 ...
## $ Year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ Notes : logi NA NA NA NA NA NA ...
## $ Agency : Factor w/ 1 level "San Francisco": 1 1 1 1 1 1 1 1 1 1 ...
## $ Status : Factor w/ 3 levels "","FT","PT": 1 1 1 1 1 1 1 1 1 1 ...
summary(salarios)
## Id EmployeeName
## Min. : 1 Kevin Lee : 13
## 1st Qu.: 37164 Richard Lee : 11
## Median : 74328 Steven Lee : 11
## Mean : 74328 William Wong: 11
## 3rd Qu.:111491 John Chan : 9
## Max. :148654 KEVIN LEE : 9
## (Other) :148590
## JobTitle BasePay
## Transit Operator : 7036 Min. : -166
## Special Nurse : 4389 1st Qu.: 33588
## Registered Nurse : 3736 Median : 65007
## Public Svc Aide-Public Works: 2518 Mean : 66325
## Police Officer 3 : 2421 3rd Qu.: 94691
## Custodian : 2418 Max. :319275
## (Other) :126136 NA's :609
## OvertimePay OtherPay Benefits
## Min. : -0.01 Min. : -7058.6 Min. : -33.89
## 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.:11535.40
## Median : 0.00 Median : 811.3 Median :28628.62
## Mean : 5066.06 Mean : 3648.8 Mean :25007.89
## 3rd Qu.: 4658.18 3rd Qu.: 4236.1 3rd Qu.:35566.86
## Max. :245131.88 Max. :400184.2 Max. :96570.66
## NA's :4 NA's :4 NA's :36163
## TotalPay TotalPayBenefits Year Notes
## Min. : -618.1 Min. : -618.1 Min. :2011 Mode:logical
## 1st Qu.: 36169.0 1st Qu.: 44065.7 1st Qu.:2012 NA's:148654
## Median : 71426.6 Median : 92404.1 Median :2013
## Mean : 74768.3 Mean : 93692.6 Mean :2013
## 3rd Qu.:105839.1 3rd Qu.:132876.5 3rd Qu.:2014
## Max. :567595.4 Max. :567595.4 Max. :2014
##
## Agency Status
## San Francisco:148654 :110535
## FT: 22334
## PT: 15785
##
##
##
##
head(salarios) # Los primeros seis registros
## Id EmployeeName JobTitle
## 1 1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY
## 2 2 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT)
## 3 3 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT)
## 4 4 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC
## 5 5 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)
## 6 6 DAVID SULLIVAN ASSISTANT DEPUTY CHIEF II
## BasePay OvertimePay OtherPay Benefits TotalPay TotalPayBenefits Year
## 1 167411.2 0.00 400184.2 NA 567595.4 567595.4 2011
## 2 155966.0 245131.88 137811.4 NA 538909.3 538909.3 2011
## 3 212739.1 106088.18 16452.6 NA 335279.9 335279.9 2011
## 4 77916.0 56120.71 198306.9 NA 332343.6 332343.6 2011
## 5 134401.6 9737.00 182234.6 NA 326373.2 326373.2 2011
## 6 118602.0 8601.00 189082.7 NA 316285.7 316285.7 2011
## Notes Agency Status
## 1 NA San Francisco
## 2 NA San Francisco
## 3 NA San Francisco
## 4 NA San Francisco
## 5 NA San Francisco
## 6 NA San Francisco
Analisis Elemental
Mostrar los valores estadisticos
paste("Valor media de Ingreso Total",Media)
## [1] "Valor media de Ingreso Total 93692.5548105668"
paste("Valor desviacion std de Ingreso Total",Desviacion)
## [1] "Valor desviacion std de Ingreso Total 62793.5334832377"
paste("Valor maximo de Ingreso Total",Maximo)
## [1] "Valor maximo de Ingreso Total 567595.43"
paste("Valor minimo de Ingreso Total",Minimo)
## [1] "Valor minimo de Ingreso Total -618.13"
Analisis Final
En esta practica se bajo al disco duro los datos para hace mas rapido la carga en memoria.
se utilizaron funciones para ver la estructura de los datos “str” y un analisis descriptivo de los daros “summary” asi como mostrar los primeros 6 registros “head”
Tambien se sacaron los datos estadisticos (Media,Desviación, Maximo y Minimo) de una columna de los datros a estudiar.