When you make an Rmarkdown file, always keep this chunk:
By now you have some familiarity with R as a concept. Let’s discuss data sources and how they can be brought into R.
#set your working directory to the folder with "district.xls"
library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")
#notice the "quotation marks" around "district.xls". R can be picky about grammar, so if you get an error - check your quotations, etc.
#also, you can make comments in code by starting the line with "#"
#click on district in the "Global Environment" to the right, and take a moment to consider what you see
# in the "files" section on the lower right, click on "district.lyt"
head(district)
## # A tibble: 6 × 137
## DISTNAME DISTRICT DZCNTYNM REGION DZRATING DZCAMPUS DPETALLC DPETBLAP DPETHISP
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CAYUGA … 001902 001 AND… 07 A 3 574 4.4 11.5
## 2 ELKHART… 001903 001 AND… 07 A 4 1150 4 11.8
## 3 FRANKST… 001904 001 AND… 07 A 3 808 8.5 11.3
## 4 NECHES … 001906 001 AND… 07 A 2 342 8.2 13.5
## 5 PALESTI… 001907 001 AND… 07 B 6 3360 25.1 42.9
## 6 WESTWOO… 001908 001 AND… 07 B 4 1332 19.7 26.2
## # ℹ 128 more variables: DPETWHIP <dbl>, DPETINDP <dbl>, DPETASIP <dbl>,
## # DPETPCIP <dbl>, DPETTWOP <dbl>, DPETECOP <dbl>, DPETLEPP <dbl>,
## # DPETSPEP <dbl>, DPETBILP <dbl>, DPETVOCP <dbl>, DPETGIFP <dbl>,
## # DA0AT21R <dbl>, DA0912DR21R <dbl>, DAGC4X21R <dbl>, DAGC5X20R <dbl>,
## # DAGC6X19R <dbl>, DA0GR21N <dbl>, DA0GS21N <dbl>, DDA00A001S22R <dbl>,
## # DDA00A001222R <dbl>, DDA00A001322R <dbl>, DDA00AR01S22R <dbl>,
## # DDA00AR01222R <dbl>, DDA00AR01322R <dbl>, DDA00AM01S22R <dbl>, …
There’s a lot going on here:
summary(district)
## DISTNAME DISTRICT DZCNTYNM REGION
## Length:1207 Length:1207 Length:1207 Length:1207
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## DZRATING DZCAMPUS DPETALLC DPETBLAP
## Length:1207 Min. : 1.000 Min. : 4.0 Min. : 0.000
## Class :character 1st Qu.: 2.000 1st Qu.: 337.5 1st Qu.: 0.700
## Mode :character Median : 3.000 Median : 884.0 Median : 2.900
## Mean : 7.428 Mean : 4476.3 Mean : 8.765
## 3rd Qu.: 5.000 3rd Qu.: 2746.0 3rd Qu.:10.750
## Max. :273.000 Max. :193727.0 Max. :98.100
##
## DPETHISP DPETWHIP DPETINDP DPETASIP
## Min. : 0.00 Min. : 0.00 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 21.00 1st Qu.:18.55 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 37.90 Median :44.40 Median : 0.2000 Median : 0.400
## Mean : 43.29 Mean :43.15 Mean : 0.3283 Mean : 1.614
## 3rd Qu.: 61.90 3rd Qu.:67.75 3rd Qu.: 0.4000 3rd Qu.: 1.000
## Max. :100.00 Max. :97.10 Max. :19.8000 Max. :54.300
##
## DPETPCIP DPETTWOP DPETECOP DPETLEPP
## Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 1.200 1st Qu.: 47.95 1st Qu.: 2.90
## Median : 0.0000 Median : 2.400 Median : 61.90 Median : 7.50
## Mean : 0.1005 Mean : 2.758 Mean : 60.75 Mean : 12.69
## 3rd Qu.: 0.1000 3rd Qu.: 3.900 3rd Qu.: 77.15 3rd Qu.: 17.00
## Max. :14.5000 Max. :15.000 Max. :100.00 Max. :100.00
##
## DPETSPEP DPETBILP DPETVOCP DPETGIFP
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 9.90 1st Qu.: 2.90 1st Qu.:23.00 1st Qu.: 3.100
## Median :12.10 Median : 7.30 Median :27.80 Median : 5.400
## Mean :12.27 Mean : 12.58 Mean :26.47 Mean : 5.574
## 3rd Qu.:14.20 3rd Qu.: 16.80 3rd Qu.:32.90 3rd Qu.: 7.500
## Max. :51.70 Max. :100.00 Max. :82.80 Max. :100.000
##
## DA0AT21R DA0912DR21R DAGC4X21R DAGC5X20R
## Min. : -1.00 Min. :-1.000 Min. : -1.00 Min. : -1.00
## 1st Qu.: 94.05 1st Qu.: 0.000 1st Qu.: 93.20 1st Qu.: 95.50
## Median : 95.40 Median : 0.400 Median : 96.90 Median : 98.30
## Mean : 94.76 Mean : 1.243 Mean : 93.91 Mean : 95.76
## 3rd Qu.: 96.40 3rd Qu.: 1.400 3rd Qu.:100.00 3rd Qu.:100.00
## Max. :100.00 Max. :50.500 Max. :100.00 Max. :100.00
## NA's :4 NA's :112 NA's :133 NA's :141
## DAGC6X19R DA0GR21N DA0GS21N DDA00A001S22R
## Min. : -1.00 Min. : 1.0 Min. : 0.0 Min. : 4.00
## 1st Qu.: 95.20 1st Qu.: 29.0 1st Qu.: 26.0 1st Qu.: 68.00
## Median : 98.20 Median : 69.0 Median : 61.0 Median : 76.00
## Mean : 95.72 Mean : 331.6 Mean : 278.9 Mean : 74.77
## 3rd Qu.:100.00 3rd Qu.: 208.0 3rd Qu.: 167.0 3rd Qu.: 83.00
## Max. :100.00 Max. :11588.0 Max. :9607.0 Max. :100.00
## NA's :149 NA's :126 NA's :126 NA's :5
## DDA00A001222R DDA00A001322R DDA00AR01S22R DDA00AR01222R
## Min. : 0.00 Min. : 0.00 Min. : -1.00 Min. : -1.00
## 1st Qu.:37.00 1st Qu.:15.00 1st Qu.: 70.00 1st Qu.: 43.00
## Median :46.00 Median :20.00 Median : 77.00 Median : 52.00
## Mean :46.48 Mean :21.05 Mean : 76.22 Mean : 52.12
## 3rd Qu.:55.00 3rd Qu.:26.00 3rd Qu.: 84.00 3rd Qu.: 61.00
## Max. :88.00 Max. :64.00 Max. :100.00 Max. :100.00
## NA's :5 NA's :5 NA's :5 NA's :5
## DDA00AR01322R DDA00AM01S22R DDA00AM01222R DDA00AM01322R
## Min. :-1.00 Min. : -1.00 Min. :-1.00 Min. :-1.00
## 1st Qu.:17.00 1st Qu.: 66.00 1st Qu.:30.00 1st Qu.:11.00
## Median :22.00 Median : 74.00 Median :40.00 Median :17.00
## Mean :23.64 Mean : 72.78 Mean :40.51 Mean :18.21
## 3rd Qu.:29.00 3rd Qu.: 82.00 3rd Qu.:50.00 3rd Qu.:23.00
## Max. :66.00 Max. :100.00 Max. :91.00 Max. :65.00
## NA's :5 NA's :5 NA's :5 NA's :5
## DDA00AC01S22R DDA00AC01222R DDA00AC01322R DDA00AS01S22R
## Min. : -1.00 Min. : -1.00 Min. :-1.00 Min. : -1.00
## 1st Qu.: 68.00 1st Qu.: 34.00 1st Qu.:10.00 1st Qu.: 66.00
## Median : 77.00 Median : 44.00 Median :16.00 Median : 75.00
## Mean : 75.04 Mean : 44.29 Mean :17.39 Mean : 73.09
## 3rd Qu.: 85.00 3rd Qu.: 55.00 3rd Qu.:23.00 3rd Qu.: 83.00
## Max. :100.00 Max. :100.00 Max. :56.00 Max. :100.00
## NA's :11 NA's :11 NA's :11 NA's :43
## DDA00AS01222R DDA00AS01322R DDB00A001S22R DDB00A001222R
## Min. : -1.0 Min. :-1.00 Min. : -1.00 Min. : -1.0
## 1st Qu.: 37.0 1st Qu.:17.00 1st Qu.: 50.00 1st Qu.: 21.0
## Median : 46.0 Median :24.00 Median : 63.00 Median : 31.0
## Mean : 45.7 Mean :25.29 Mean : 57.65 Mean : 31.5
## 3rd Qu.: 55.0 3rd Qu.:32.00 3rd Qu.: 74.50 3rd Qu.: 43.0
## Max. :100.0 Max. :77.00 Max. :100.00 Max. :100.0
## NA's :43 NA's :43 NA's :192 NA's :192
## DDB00A001322R DDH00A001S22R DDH00A001222R DDH00A001322R
## Min. :-1.00 Min. : -1.0 Min. : -1.00 Min. :-1.00
## 1st Qu.: 5.00 1st Qu.: 66.0 1st Qu.: 34.00 1st Qu.:12.00
## Median :11.00 Median : 73.0 Median : 41.00 Median :16.00
## Mean :12.45 Mean : 71.7 Mean : 41.63 Mean :16.86
## 3rd Qu.:17.00 3rd Qu.: 80.0 3rd Qu.: 49.00 3rd Qu.:21.00
## Max. :90.00 Max. :100.0 Max. :100.00 Max. :59.00
## NA's :192 NA's :6 NA's :6 NA's :6
## DDW00A001S22R DDW00A001222R DDW00A001322R DDI00A001S22R
## Min. : -1.00 Min. : -1.00 Min. :-1.00 Min. : -1.0
## 1st Qu.: 75.00 1st Qu.: 45.00 1st Qu.:19.00 1st Qu.: -1.0
## Median : 82.00 Median : 54.00 Median :26.00 Median : 66.0
## Mean : 78.69 Mean : 53.24 Mean :26.15 Mean : 50.8
## 3rd Qu.: 88.00 3rd Qu.: 63.00 3rd Qu.:33.00 3rd Qu.: 83.0
## Max. :100.00 Max. :100.00 Max. :79.00 Max. :100.0
## NA's :26 NA's :26 NA's :26 NA's :472
## DDI00A001222R DDI00A001322R DD300A001S22R DD300A001222R
## Min. : -1.00 Min. : -1.00 Min. : -1.0 Min. : -1.00
## 1st Qu.: -1.00 1st Qu.: -1.00 1st Qu.: 50.0 1st Qu.: 17.00
## Median : 35.00 Median : 11.00 Median : 89.0 Median : 67.00
## Mean : 32.08 Mean : 14.35 Mean : 67.8 Mean : 53.76
## 3rd Qu.: 54.00 3rd Qu.: 23.00 3rd Qu.: 96.0 3rd Qu.: 80.00
## Max. :100.00 Max. :100.00 Max. :100.0 Max. :100.00
## NA's :472 NA's :472 NA's :423 NA's :423
## DD300A001322R DD400A001S22R DD400A001222R DD400A001322R
## Min. : -1.00 Min. : -1.00 Min. : -1.00 Min. :-1.00
## 1st Qu.: 0.00 1st Qu.: -1.00 1st Qu.: -1.00 1st Qu.:-1.00
## Median : 37.00 Median : 60.00 Median : 22.50 Median : 0.00
## Mean : 33.08 Mean : 44.08 Mean : 29.23 Mean :14.05
## 3rd Qu.: 52.00 3rd Qu.: 83.00 3rd Qu.: 56.75 3rd Qu.:25.75
## Max. :100.00 Max. :100.00 Max. :100.00 Max. :83.00
## NA's :423 NA's :797 NA's :797 NA's :797
## DD200A001S22R DD200A001222R DD200A001322R DDE00A001S22R
## Min. : -1.00 Min. : -1.00 Min. : -1.00 Min. : -1.00
## 1st Qu.: 64.00 1st Qu.: 33.00 1st Qu.: 10.00 1st Qu.: 63.00
## Median : 76.00 Median : 47.00 Median : 20.00 Median : 70.00
## Mean : 68.18 Mean : 43.98 Mean : 21.06 Mean : 69.68
## 3rd Qu.: 86.00 3rd Qu.: 60.00 3rd Qu.: 30.00 3rd Qu.: 77.00
## Max. :100.00 Max. :100.00 Max. :100.00 Max. :100.00
## NA's :133 NA's :133 NA's :133 NA's :10
## DDE00A001222R DDE00A001322R DA0CT21R DA0CC21R
## Min. : -1.00 Min. :-1.0 Min. : -2.00 Min. :-1.00
## 1st Qu.: 32.00 1st Qu.:12.0 1st Qu.: 40.42 1st Qu.:12.90
## Median : 39.00 Median :15.0 Median : 63.05 Median :23.55
## Mean : 39.23 Mean :15.9 Mean : 60.76 Mean :26.10
## 3rd Qu.: 46.00 3rd Qu.:19.0 3rd Qu.: 85.38 3rd Qu.:37.08
## Max. :100.00 Max. :80.0 Max. :100.00 Max. :97.70
## NA's :10 NA's :10 NA's :125 NA's :147
## DA0CSA21R DA0CAA21R DPSATOFC DPSTTOFC
## Min. : -1.0 Min. :-1.00 Min. : 1.0 Min. : 0.00
## 1st Qu.: 887.0 1st Qu.:16.30 1st Qu.: 58.9 1st Qu.: 30.57
## Median : 973.0 Median :19.00 Median : 144.1 Median : 72.00
## Mean : 823.9 Mean :16.13 Mean : 622.5 Mean : 307.06
## 3rd Qu.:1039.0 3rd Qu.:21.20 3rd Qu.: 405.2 3rd Qu.: 196.50
## Max. :1344.0 Max. :31.40 Max. :23716.2 Max. :10619.50
## NA's :262 NA's :236 NA's :3 NA's :3
## DPSCTOFP DPSSTOFP DPSUTOFP DPSTTOFP
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 1.200 1st Qu.: 2.60 1st Qu.: 4.100 1st Qu.:46.60
## Median : 1.800 Median : 3.10 Median : 6.600 Median :50.80
## Mean : 2.178 Mean : 3.58 Mean : 7.169 Mean :50.98
## 3rd Qu.: 2.600 3rd Qu.: 3.90 3rd Qu.: 9.800 3rd Qu.:54.80
## Max. :14.900 Max. :100.00 Max. :49.700 Max. :88.30
## NA's :3 NA's :3 NA's :3 NA's :3
## DPSETOFP DPSXTOFP DPSCTOSA DPSSTOSA
## Min. : 0.00 Min. : 0.00 Min. : -2 Min. : -2
## 1st Qu.: 9.70 1st Qu.:19.38 1st Qu.: 95459 1st Qu.: 73469
## Median :12.60 Median :23.65 Median :106674 Median : 78723
## Mean :12.95 Mean :23.14 Mean :108039 Mean : 79435
## 3rd Qu.:16.20 3rd Qu.:27.50 3rd Qu.:119540 3rd Qu.: 84945
## Max. :48.80 Max. :55.40 Max. :270000 Max. :192500
## NA's :3 NA's :3 NA's :10 NA's :11
## DPSUTOSA DPSTTOSA DPSAMIFP DPSAKIDR
## Min. : -2 Min. : 36081 Min. : 0.00 Min. : 0.100
## 1st Qu.: 57969 1st Qu.: 50439 1st Qu.: 13.30 1st Qu.: 5.400
## Median : 63015 Median : 53382 Median : 26.60 Median : 6.300
## Mean : 62424 Mean : 53971 Mean : 35.24 Mean : 6.734
## 3rd Qu.: 67941 3rd Qu.: 56919 3rd Qu.: 50.62 3rd Qu.: 7.300
## Max. :228972 Max. :110560 Max. :100.00 Max. :349.100
## NA's :54 NA's :4 NA's :3 NA's :3
## DPSTKIDR DPST05FP DPSTEXPA DPSTADFP
## Min. :-2.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.:10.80 1st Qu.: 24.35 1st Qu.:10.07 1st Qu.:14.88
## Median :12.70 Median : 32.40 Median :12.00 Median :20.90
## Mean :12.56 Mean : 34.88 Mean :11.75 Mean :20.86
## 3rd Qu.:14.40 3rd Qu.: 41.65 3rd Qu.:13.90 3rd Qu.:26.12
## Max. :37.30 Max. :100.00 Max. :22.90 Max. :78.70
## NA's :3 NA's :3 NA's :3 NA's :3
## DPSTURNR DPSTBLFP DPSTHIFP DPSTWHFP
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 14.80 1st Qu.: 0.00 1st Qu.: 4.20 1st Qu.: 58.67
## Median : 19.50 Median : 1.60 Median : 10.10 Median : 82.40
## Mean : 21.51 Mean : 6.99 Mean : 19.05 Mean : 71.57
## 3rd Qu.: 25.90 3rd Qu.: 6.20 3rd Qu.: 22.50 3rd Qu.: 92.60
## Max. :100.00 Max. :100.00 Max. :100.00 Max. :100.00
## NA's :7 NA's :3 NA's :3 NA's :3
## DPSTINFP DPSTASFP DPSTPIFP DPSTTWFP
## Min. : 0.0000 Min. : 0.000 Min. :0.00000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.000 Median :0.00000 Median : 0.0000
## Mean : 0.3566 Mean : 1.118 Mean :0.08206 Mean : 0.7566
## 3rd Qu.: 0.3000 3rd Qu.: 0.900 3rd Qu.:0.00000 3rd Qu.: 1.2000
## Max. :16.7000 Max. :94.800 Max. :7.80000 Max. :11.7000
## NA's :3 NA's :3 NA's :3 NA's :3
## DPSTREFP DPSTSPFP DPSTCOFP DPSTBIFP
## Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 71.00 1st Qu.: 4.500 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 76.90 Median : 7.000 Median : 3.350 Median : 0.000
## Mean : 76.87 Mean : 7.145 Mean : 4.131 Mean : 2.311
## 3rd Qu.: 82.42 3rd Qu.: 9.600 3rd Qu.: 6.125 3rd Qu.: 2.100
## Max. :100.00 Max. :22.800 Max. :32.500 Max. :94.300
## NA's :3 NA's :3 NA's :3 NA's :3
## DPSTVOFP DPSTGOFP DPFVTOTK DPFTADPR
## Min. : 0.000 Min. : 0.000 Min. : 0 Min. :0.0000
## 1st Qu.: 4.800 1st Qu.: 0.000 1st Qu.: 238299 1st Qu.:0.9892
## Median : 7.000 Median : 1.200 Median : 419144 Median :1.1670
## Mean : 7.154 Mean : 2.308 Mean : 665067 Mean :1.0212
## 3rd Qu.: 9.425 3rd Qu.: 3.700 3rd Qu.: 670248 3rd Qu.:1.3017
## Max. :71.900 Max. :30.800 Max. :26416597 Max. :1.7480
## NA's :3 NA's :3 NA's :5 NA's :5
## DPFRAALLT DPFRAALLK DPFRAOPRT DPFRASTAP
## Min. :6.428e+05 Min. : 8923 Min. :6.428e+05 Min. : 1.70
## 1st Qu.:5.828e+06 1st Qu.: 12953 1st Qu.:5.524e+06 1st Qu.: 33.85
## Median :1.381e+07 Median : 14653 Median :1.241e+07 Median : 51.00
## Mean :5.995e+07 Mean : 16365 Mean :5.129e+07 Mean : 49.05
## 3rd Qu.:3.805e+07 3rd Qu.: 17081 3rd Qu.:3.241e+07 3rd Qu.: 64.80
## Max. :2.619e+09 Max. :214078 Max. :2.213e+09 Max. :103.40
## NA's :5 NA's :5 NA's :5 NA's :5
## DZRVLOCP DPFRAFEDP DPFRAORVT DPFUNAB1T
## Min. :-6.20 Min. : 0.00 Min. : -655726 Min. : -746998
## 1st Qu.:21.10 1st Qu.: 8.70 1st Qu.: 98311 1st Qu.: 1226730
## Median :35.30 Median :12.30 Median : 1093332 Median : 3589384
## Mean :37.92 Mean :13.04 Mean : 8659622 Mean : 13498263
## 3rd Qu.:53.70 3rd Qu.:16.20 3rd Qu.: 4471240 3rd Qu.: 9357248
## Max. :97.60 Max. :49.00 Max. :405596099 Max. :662450197
## NA's :5 NA's :5 NA's :5 NA's :5
## DPFUNA4T DPFEAALLT DPFEAOPFT DPFEAOPFK
## Min. : -7033092 Min. :6.229e+05 Min. :6.120e+05 Min. : 6755
## 1st Qu.: 0 1st Qu.:5.475e+06 1st Qu.:4.754e+06 1st Qu.: 10916
## Median : 0 Median :1.328e+07 Median :1.102e+07 Median : 12228
## Mean : 510514 Mean :6.597e+07 Mean :4.951e+07 Mean : 13121
## 3rd Qu.: 0 3rd Qu.:3.787e+07 3rd Qu.:3.038e+07 3rd Qu.: 14012
## Max. :126144201 Max. :2.656e+09 Max. :2.068e+09 Max. :178467
## NA's :5 NA's :5 NA's :5 NA's :5
## DPFEAINSP DZEXADMP DZEXADSP DZEXPLAP
## Min. :18.50 Min. : 2.700 Min. : 0.000 Min. : 0.20
## 1st Qu.:52.00 1st Qu.: 7.125 1st Qu.: 4.900 1st Qu.:10.40
## Median :55.10 Median : 8.800 Median : 5.700 Median :11.80
## Mean :54.73 Mean : 9.606 Mean : 6.015 Mean :12.46
## 3rd Qu.:57.80 3rd Qu.:11.200 3rd Qu.: 6.400 3rd Qu.:13.50
## Max. :84.40 Max. :35.800 Max. :22.700 Max. :43.10
## NA's :5 NA's :5 NA's :5 NA's :5
## DZEXOTHP DPFEAINST DPFEAINSK DPFPAREGP
## Min. : 0.30 Min. :2.439e+05 Min. : 3122 Min. : 0.00
## 1st Qu.:15.30 1st Qu.:2.563e+06 1st Qu.: 6056 1st Qu.:35.12
## Median :18.00 Median :6.013e+06 Median : 6702 Median :39.70
## Mean :17.15 Mean :2.835e+07 Mean : 7074 Mean :39.80
## 3rd Qu.:20.00 3rd Qu.:1.683e+07 3rd Qu.: 7577 3rd Qu.:43.90
## Max. :69.30 Max. :1.177e+09 Max. :54954 Max. :79.10
## NA's :5 NA's :5 NA's :5 NA's :5
## DPFPASPEP DPFPACOMP DPFPABILP DPFPAVOCP
## Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.00
## 1st Qu.: 5.800 1st Qu.: 6.500 1st Qu.: 0.1000 1st Qu.: 2.90
## Median : 8.900 Median : 9.200 Median : 0.4000 Median : 4.10
## Mean : 9.711 Mean : 9.883 Mean : 0.7496 Mean : 3.96
## 3rd Qu.:12.500 3rd Qu.:12.100 3rd Qu.: 1.0000 3rd Qu.: 5.20
## Max. :49.000 Max. :90.600 Max. :26.0000 Max. :19.80
## NA's :5 NA's :5 NA's :5 NA's :5
## DPFPAGIFP DPFPAATHP DPFPAHSAP DPFPREKP
## Min. :0.0000 Min. :0.000 Min. :0.0000 Min. : 0.0000
## 1st Qu.:0.0000 1st Qu.:1.600 1st Qu.:0.0000 1st Qu.: 0.0000
## Median :0.2000 Median :2.900 Median :0.0000 Median : 0.6000
## Mean :0.3823 Mean :2.809 Mean :0.1578 Mean : 0.8909
## 3rd Qu.:0.4000 3rd Qu.:4.000 3rd Qu.:0.1000 3rd Qu.: 1.3000
## Max. :6.9000 Max. :9.000 Max. :3.4000 Max. :31.7000
## NA's :5 NA's :5 NA's :5 NA's :5
## DPFPAOTHP DISTSIZE COMMTYPE PROPWLTH
## Min. : 3.50 Length:1207 Length:1207 Length:1207
## 1st Qu.:25.40 Class :character Class :character Class :character
## Median :28.70 Mode :character Mode :character Mode :character
## Mean :29.22
## 3rd Qu.:32.40
## Max. :76.30
## NA's :5
## TAXRATE
## Length:1207
## Class :character
## Mode :character
##
##
##
##
There is, maybe, a better way of examining data type:
str(district)
## tibble [1,207 × 137] (S3: tbl_df/tbl/data.frame)
## $ DISTNAME : chr [1:1207] "CAYUGA ISD" "ELKHART ISD" "FRANKSTON ISD" "NECHES ISD" ...
## $ DISTRICT : chr [1:1207] "001902" "001903" "001904" "001906" ...
## $ DZCNTYNM : chr [1:1207] "001 ANDERSON" "001 ANDERSON" "001 ANDERSON" "001 ANDERSON" ...
## $ REGION : chr [1:1207] "07" "07" "07" "07" ...
## $ DZRATING : chr [1:1207] "A" "A" "A" "A" ...
## $ DZCAMPUS : num [1:1207] 3 4 3 2 6 4 2 6 4 5 ...
## $ DPETALLC : num [1:1207] 574 1150 808 342 3360 ...
## $ DPETBLAP : num [1:1207] 4.4 4 8.5 8.2 25.1 19.7 0.3 0.8 15.7 7.2 ...
## $ DPETHISP : num [1:1207] 11.5 11.8 11.3 13.5 42.9 26.2 8.6 68.7 31.2 27.9 ...
## $ DPETWHIP : num [1:1207] 79.1 80.3 75.2 75.1 27.3 48 87 28.2 48.5 60.6 ...
## $ DPETINDP : num [1:1207] 0 0.3 0.4 0.3 0.2 0.7 0 0.3 0.1 0.3 ...
## $ DPETASIP : num [1:1207] 0.5 0.2 1 0.3 0.7 0.5 0.6 0.3 1 1 ...
## $ DPETPCIP : num [1:1207] 0 0 0 0 0.1 0.1 0 0 0.1 0.1 ...
## $ DPETTWOP : num [1:1207] 4.5 3.4 3.6 2.6 3.7 4.9 3.6 1.7 3.4 3 ...
## $ DPETECOP : num [1:1207] 40.8 45.4 54.2 54.1 81.6 74 46.8 49.6 57.8 50.1 ...
## $ DPETLEPP : num [1:1207] 1 2.8 4.1 2 17.7 7.1 0.6 14.2 5.1 6.9 ...
## $ DPETSPEP : num [1:1207] 14.6 12.1 13.1 10.5 13.5 14.5 14.7 10.4 11.6 11.9 ...
## $ DPETBILP : num [1:1207] 1 2.7 4.1 2 16.1 6.8 0.6 15.2 5 6 ...
## $ DPETVOCP : num [1:1207] 30.5 31.8 43.9 29.5 30.6 38.7 37.7 24.8 18.9 34.4 ...
## $ DPETGIFP : num [1:1207] 6.1 4.6 7.3 5.6 2.3 3.2 3.3 6.8 9.2 6 ...
## $ DA0AT21R : num [1:1207] 96.7 96 95.4 95.8 93.7 94.5 96.7 92.8 97.3 95.2 ...
## $ DA0912DR21R : num [1:1207] 0 0.3 0.4 0 0 0 0 0.4 0.4 0.7 ...
## $ DAGC4X21R : num [1:1207] 100 100 95.2 95.8 99 97.8 100 96.8 100 94.1 ...
## $ DAGC5X20R : num [1:1207] 100 98.9 100 97 99.6 97 100 97.2 100 95.6 ...
## $ DAGC6X19R : num [1:1207] 96 98.8 33.3 100 98.6 97.4 100 96.7 100 95.9 ...
## $ DA0GR21N : num [1:1207] 36 91 41 23 201 95 32 293 52 196 ...
## $ DA0GS21N : num [1:1207] 34 79 40 17 198 77 27 238 52 154 ...
## $ DDA00A001S22R: num [1:1207] 84 85 83 90 74 69 86 76 82 86 ...
## $ DDA00A001222R: num [1:1207] 62 59 57 64 46 40 55 47 56 60 ...
## $ DDA00A001322R: num [1:1207] 33 30 25 27 20 16 25 21 30 31 ...
## $ DDA00AR01S22R: num [1:1207] 81 85 84 87 72 70 86 75 82 84 ...
## $ DDA00AR01222R: num [1:1207] 67 64 63 67 48 45 66 50 60 62 ...
## $ DDA00AR01322R: num [1:1207] 39 34 24 30 20 19 31 22 31 31 ...
## $ DDA00AM01S22R: num [1:1207] 88 84 85 94 75 66 81 76 81 88 ...
## $ DDA00AM01222R: num [1:1207] 65 49 57 69 44 34 42 44 53 62 ...
## $ DDA00AM01322R: num [1:1207] 34 23 26 27 20 14 19 21 29 33 ...
## $ DDA00AC01S22R: num [1:1207] 85 86 81 90 78 73 96 75 83 84 ...
## $ DDA00AC01222R: num [1:1207] 54 63 49 54 48 41 45 46 57 52 ...
## $ DDA00AC01322R: num [1:1207] 22 29 21 23 22 15 16 18 27 21 ...
## $ DDA00AS01S22R: num [1:1207] 78 90 74 83 72 68 92 81 82 87 ...
## $ DDA00AS01222R: num [1:1207] 47 63 48 51 42 38 73 50 51 60 ...
## $ DDA00AS01322R: num [1:1207] 21 42 26 26 20 15 38 27 32 36 ...
## $ DDB00A001S22R: num [1:1207] 60 46 74 88 64 56 -1 71 68 71 ...
## $ DDB00A001222R: num [1:1207] 17 22 38 48 33 26 -1 41 38 37 ...
## $ DDB00A001322R: num [1:1207] 3 8 6 19 11 11 -1 13 14 14 ...
## $ DDH00A001S22R: num [1:1207] 74 85 75 91 73 69 87 72 81 81 ...
## $ DDH00A001222R: num [1:1207] 53 56 46 69 44 36 57 42 50 53 ...
## $ DDH00A001322R: num [1:1207] 24 25 19 26 19 12 20 17 24 24 ...
## $ DDW00A001S22R: num [1:1207] 87 88 85 89 83 75 86 84 88 89 ...
## $ DDW00A001222R: num [1:1207] 66 61 62 66 60 48 55 58 67 66 ...
## $ DDW00A001322R: num [1:1207] 35 32 28 29 29 21 26 29 40 35 ...
## $ DDI00A001S22R: num [1:1207] NA 100 80 -1 75 NA NA 83 -1 62 ...
## $ DDI00A001222R: num [1:1207] NA 100 20 -1 50 NA NA 28 -1 8 ...
## $ DDI00A001322R: num [1:1207] NA 100 20 -1 17 NA NA 6 -1 0 ...
## $ DD300A001S22R: num [1:1207] 33 -1 84 -1 85 100 NA 100 93 97 ...
## $ DD300A001222R: num [1:1207] 33 -1 53 -1 77 100 NA 87 73 82 ...
## $ DD300A001322R: num [1:1207] 17 -1 16 -1 44 88 NA 67 53 56 ...
## $ DD400A001S22R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
## $ DD400A001222R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
## $ DD400A001322R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
## $ DD200A001S22R: num [1:1207] 83 77 75 -1 74 62 88 85 74 83 ...
## $ DD200A001222R: num [1:1207] 54 46 58 -1 44 38 50 58 48 50 ...
## $ DD200A001322R: num [1:1207] 34 23 28 -1 18 13 6 31 13 29 ...
## $ DDE00A001S22R: num [1:1207] 76 77 77 86 70 65 81 67 77 78 ...
## $ DDE00A001222R: num [1:1207] 50 42 49 53 40 34 45 36 48 46 ...
## $ DDE00A001322R: num [1:1207] 23 19 17 17 16 14 17 14 23 19 ...
## $ DA0CT21R : num [1:1207] 58.3 51.6 92.7 87 43.3 40 12.5 42 9.6 38.3 ...
## $ DA0CC21R : num [1:1207] 19 27.7 36.8 15 49.4 28.9 -1 35.8 60 60 ...
## $ DA0CSA21R : num [1:1207] 980 979 980 1007 1048 ...
## $ DA0CAA21R : num [1:1207] NA -1 -1 18.8 21 -1 -1 22.3 NA 23.1 ...
## $ DPSATOFC : num [1:1207] 99.9 186.6 146.7 60.1 553.4 ...
## $ DPSTTOFC : num [1:1207] 46.7 104.9 74.5 30.2 260.3 ...
## $ DPSCTOFP : num [1:1207] 1.5 1.1 1.4 3.1 2.1 1.1 4.1 1.5 4.5 0.9 ...
## $ DPSSTOFP : num [1:1207] 5 2.1 3.5 5 3.4 4.6 3.4 2.6 3.1 3.9 ...
## $ DPSUTOFP : num [1:1207] 5.4 4.9 2 1.7 8.3 4.4 3 5.8 10 6 ...
## $ DPSTTOFP : num [1:1207] 46.8 56.2 50.8 50.3 47 45.5 56.7 50.8 50 49.7 ...
## $ DPSETOFP : num [1:1207] 14.8 16.2 15 13.7 19.7 19.2 9.8 15.4 11.1 8.2 ...
## $ DPSXTOFP : num [1:1207] 26.5 19.5 27.4 26.2 19.5 25.2 23 23.9 21.4 31.3 ...
## $ DPSCTOSA : num [1:1207] 93333 100313 98293 85537 99324 ...
## $ DPSSTOSA : num [1:1207] 73300 79305 71215 81593 80415 ...
## $ DPSUTOSA : num [1:1207] 59550 60616 58022 77642 63829 ...
## $ DPSTTOSA : num [1:1207] 55570 47916 50382 55346 48825 ...
## $ DPSAMIFP : num [1:1207] 15.6 13.4 10.9 16.3 32.1 29.9 1.9 41.3 22.2 18.8 ...
## $ DPSAKIDR : num [1:1207] 5.7 6.2 5.5 5.7 6.1 5 5.2 7.3 7.4 6.5 ...
## $ DPSTKIDR : num [1:1207] 12.3 11 10.8 11.3 12.9 11 9.3 14.4 14.8 13.2 ...
## $ DPST05FP : num [1:1207] 10.4 23.8 32.7 9.7 33.8 44.8 17.9 21.5 35 21.9 ...
## $ DPSTEXPA : num [1:1207] 16.7 13.5 12.8 14.8 12.7 10.3 15.4 13.8 10.2 13.8 ...
## $ DPSTADFP : num [1:1207] 14.8 19 30.7 9.6 15.4 17.4 16.9 24.3 18.5 22.4 ...
## $ DPSTURNR : num [1:1207] 19.1 13.9 21.6 18.3 17.9 30.6 14.6 11.5 17 9.5 ...
## $ DPSTBLFP : num [1:1207] 8.3 2.9 4 6.5 9.6 11.6 0 1.4 4.4 0.5 ...
## $ DPSTHIFP : num [1:1207] 0 6.7 1.3 0 13.8 6.6 0 25.7 8.9 5.6 ...
## $ DPSTWHFP : num [1:1207] 91.7 90.5 93.3 93.5 74.6 80.9 100 69 86.7 93.9 ...
## $ DPSTINFP : num [1:1207] 0 0 0 0 0 0.8 0 0.3 0 0 ...
## $ DPSTASFP : num [1:1207] 0 0 0 0 0 0 0 0.7 0 0 ...
## $ DPSTPIFP : num [1:1207] 0 0 0 0 0 0 0 0 0 0 ...
## $ DPSTTWFP : num [1:1207] 0 0 1.3 0 1.9 0 0 2.8 0 0 ...
## $ DPSTREFP : num [1:1207] 81.6 71.5 87.6 70 71.4 71.4 61 41.7 82.7 66.4 ...
## $ DPSTSPFP : num [1:1207] 9.9 8.4 7.5 5.5 10.2 6.4 5.8 14.4 6.8 9.6 ...
## $ DPSTCOFP : num [1:1207] 0 4.9 2.7 12 5 6.1 19.2 6.5 7.4 9.2 ...
## [list output truncated]
mean(district$DZCAMPUS)
## [1] 7.428335
mean(district$TAXRATE)
## Warning in mean.default(district$TAXRATE): argument is not numeric or logical:
## returning NA
## [1] NA
diamonds<-diamonds
head(diamonds)
## # A tibble: 6 × 10
## carat cut color clarity depth table price x y z
## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
We can pull the levels from “Ordered factors”
levels(diamonds$cut)
## [1] "Fair" "Good" "Very Good" "Premium" "Ideal"
This will come in handy later.
How can we categorize characters or factors if we can’t do math on them?
We can count how many there are in the data:
table(diamonds$cut)
##
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551
This can also be done more neatly via dplyr:
diamonds %>% count(cut)
## # A tibble: 5 × 2
## cut n
## <ord> <int>
## 1 Fair 1610
## 2 Good 4906
## 3 Very Good 12082
## 4 Premium 13791
## 5 Ideal 21551
We can also count their proportion in the overall data:
proportions(table(diamonds$cut))
##
## Fair Good Very Good Premium Ideal
## 0.02984798 0.09095291 0.22398962 0.25567297 0.39953652
Let’s explore the data a bit using graphs (this is very useful)
ggplot(diamonds,aes(x=carat)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Remember “cut” from previously?
ggplot(diamonds,aes(cut)) + geom_bar()
We can use geom_point to quickly compare two numerical variables, in this case – carats vs. price for diamonds
ggplot(diamonds,aes(x=carat,y=price)) + geom_point()
What is going on here?
Do price and carat appear to be correlated?
#We can figure this out mathematically!
cor(diamonds$carat,diamonds$price)
## [1] 0.9215913
#We can also add extra dimensions, such as color:
ggplot(diamonds,aes(x=carat,y=price,color=cut)) + geom_point()
Now for a much bigger question, can we compare groups?
Yes, in fact, character variables and ordered factors make good “groups” to compare!
ggplot(diamonds,aes(clarity)) + geom_bar()
ggplot(diamonds,aes(x=clarity,y=price)) + geom_boxplot()
LOTS of outliers here!
Luckily the “district” data is already tidy, for the most part.
head(district)
## # A tibble: 6 × 137
## DISTNAME DISTRICT DZCNTYNM REGION DZRATING DZCAMPUS DPETALLC DPETBLAP DPETHISP
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 CAYUGA … 001902 001 AND… 07 A 3 574 4.4 11.5
## 2 ELKHART… 001903 001 AND… 07 A 4 1150 4 11.8
## 3 FRANKST… 001904 001 AND… 07 A 3 808 8.5 11.3
## 4 NECHES … 001906 001 AND… 07 A 2 342 8.2 13.5
## 5 PALESTI… 001907 001 AND… 07 B 6 3360 25.1 42.9
## 6 WESTWOO… 001908 001 AND… 07 B 4 1332 19.7 26.2
## # ℹ 128 more variables: DPETWHIP <dbl>, DPETINDP <dbl>, DPETASIP <dbl>,
## # DPETPCIP <dbl>, DPETTWOP <dbl>, DPETECOP <dbl>, DPETLEPP <dbl>,
## # DPETSPEP <dbl>, DPETBILP <dbl>, DPETVOCP <dbl>, DPETGIFP <dbl>,
## # DA0AT21R <dbl>, DA0912DR21R <dbl>, DAGC4X21R <dbl>, DAGC5X20R <dbl>,
## # DAGC6X19R <dbl>, DA0GR21N <dbl>, DA0GS21N <dbl>, DDA00A001S22R <dbl>,
## # DDA00A001222R <dbl>, DDA00A001322R <dbl>, DDA00AR01S22R <dbl>,
## # DDA00AR01222R <dbl>, DDA00AR01322R <dbl>, DDA00AM01S22R <dbl>, …
Lets examine just school administrator salaries for 2022
#district administrative salaries are kept in "DPSCTOSA" per the data dictionary (district.lyt)
#we can select just the variables we need with dplyr and "SELECT"
your_variable_here<-district %>% select(DISTNAME,DPSCTOSA)
head(your_variable_here)
## # A tibble: 6 × 2
## DISTNAME DPSCTOSA
## <chr> <dbl>
## 1 CAYUGA ISD 93333
## 2 ELKHART ISD 100313
## 3 FRANKSTON ISD 98293
## 4 NECHES ISD 85537
## 5 PALESTINE ISD 99324
## 6 WESTWOOD ISD 121228
Must be nice! But there are some problems:
summary(your_variable_here)
## DISTNAME DPSCTOSA
## Length:1207 Min. : -2
## Class :character 1st Qu.: 95459
## Mode :character Median :106674
## Mean :108039
## 3rd Qu.:119540
## Max. :270000
## NA's :10
Ten missing observations and some “-2” salaries?
mean(your_variable_here$DPSCTOSA)
## [1] NA
# trying to compute the mean manually results in "NA" because there is missing data. "Summary" above is dropping the NA's behind the scenes.
# that's helpful, but we want to be exact. Also, the "-2" salaries are slightly skewing the average. Lets clean it up!
your_variable_here_cleaned<-your_variable_here %>% filter(DPSCTOSA>0)
mean(your_variable_here_cleaned$DPSCTOSA)
## [1] 108401.4
Let’s graph this a bit
ggplot(your_variable_here_cleaned,aes(DPSCTOSA)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
compare_two<-district %>% select(DISTNAME,DPSCTOSA,DPSTTOSA)
compare_two<-compare_two %>% filter(DPSCTOSA>0)
ggplot(compare_two,aes(DPSCTOSA,DPSTTOSA)) + geom_point()
For your homework this week, please (take a deep breath) do the following:
Due two weeks from now: