When you make an Rmarkdown file, always keep this chunk:

Recap

By now you have some familiarity with R as a concept. Let’s discuss data sources and how they can be brought into R.

#set your working directory to the folder with "district.xls"
library(readxl)
library(tidyverse)                                                                       
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
district<-read_excel("district.xls")

#notice the "quotation marks" around "district.xls". R can be picky about grammar, so if you get an error - check your quotations, etc.

#also, you can make comments in code by starting the line with "#"

#click on district in the "Global Environment" to the right, and take a moment to consider what you see

# in the "files" section on the lower right, click on "district.lyt"

head(district)
## # A tibble: 6 × 137
##   DISTNAME DISTRICT DZCNTYNM REGION DZRATING DZCAMPUS DPETALLC DPETBLAP DPETHISP
##   <chr>    <chr>    <chr>    <chr>  <chr>       <dbl>    <dbl>    <dbl>    <dbl>
## 1 CAYUGA … 001902   001 AND… 07     A               3      574      4.4     11.5
## 2 ELKHART… 001903   001 AND… 07     A               4     1150      4       11.8
## 3 FRANKST… 001904   001 AND… 07     A               3      808      8.5     11.3
## 4 NECHES … 001906   001 AND… 07     A               2      342      8.2     13.5
## 5 PALESTI… 001907   001 AND… 07     B               6     3360     25.1     42.9
## 6 WESTWOO… 001908   001 AND… 07     B               4     1332     19.7     26.2
## # ℹ 128 more variables: DPETWHIP <dbl>, DPETINDP <dbl>, DPETASIP <dbl>,
## #   DPETPCIP <dbl>, DPETTWOP <dbl>, DPETECOP <dbl>, DPETLEPP <dbl>,
## #   DPETSPEP <dbl>, DPETBILP <dbl>, DPETVOCP <dbl>, DPETGIFP <dbl>,
## #   DA0AT21R <dbl>, DA0912DR21R <dbl>, DAGC4X21R <dbl>, DAGC5X20R <dbl>,
## #   DAGC6X19R <dbl>, DA0GR21N <dbl>, DA0GS21N <dbl>, DDA00A001S22R <dbl>,
## #   DDA00A001222R <dbl>, DDA00A001322R <dbl>, DDA00AR01S22R <dbl>,
## #   DDA00AR01222R <dbl>, DDA00AR01322R <dbl>, DDA00AM01S22R <dbl>, …

Data types

There’s a lot going on here:

summary(district)
##    DISTNAME           DISTRICT           DZCNTYNM            REGION         
##  Length:1207        Length:1207        Length:1207        Length:1207       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    DZRATING            DZCAMPUS          DPETALLC           DPETBLAP     
##  Length:1207        Min.   :  1.000   Min.   :     4.0   Min.   : 0.000  
##  Class :character   1st Qu.:  2.000   1st Qu.:   337.5   1st Qu.: 0.700  
##  Mode  :character   Median :  3.000   Median :   884.0   Median : 2.900  
##                     Mean   :  7.428   Mean   :  4476.3   Mean   : 8.765  
##                     3rd Qu.:  5.000   3rd Qu.:  2746.0   3rd Qu.:10.750  
##                     Max.   :273.000   Max.   :193727.0   Max.   :98.100  
##                                                                          
##     DPETHISP         DPETWHIP        DPETINDP          DPETASIP     
##  Min.   :  0.00   Min.   : 0.00   Min.   : 0.0000   Min.   : 0.000  
##  1st Qu.: 21.00   1st Qu.:18.55   1st Qu.: 0.0000   1st Qu.: 0.000  
##  Median : 37.90   Median :44.40   Median : 0.2000   Median : 0.400  
##  Mean   : 43.29   Mean   :43.15   Mean   : 0.3283   Mean   : 1.614  
##  3rd Qu.: 61.90   3rd Qu.:67.75   3rd Qu.: 0.4000   3rd Qu.: 1.000  
##  Max.   :100.00   Max.   :97.10   Max.   :19.8000   Max.   :54.300  
##                                                                     
##     DPETPCIP          DPETTWOP         DPETECOP         DPETLEPP     
##  Min.   : 0.0000   Min.   : 0.000   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 0.0000   1st Qu.: 1.200   1st Qu.: 47.95   1st Qu.:  2.90  
##  Median : 0.0000   Median : 2.400   Median : 61.90   Median :  7.50  
##  Mean   : 0.1005   Mean   : 2.758   Mean   : 60.75   Mean   : 12.69  
##  3rd Qu.: 0.1000   3rd Qu.: 3.900   3rd Qu.: 77.15   3rd Qu.: 17.00  
##  Max.   :14.5000   Max.   :15.000   Max.   :100.00   Max.   :100.00  
##                                                                      
##     DPETSPEP        DPETBILP         DPETVOCP        DPETGIFP      
##  Min.   : 0.00   Min.   :  0.00   Min.   : 0.00   Min.   :  0.000  
##  1st Qu.: 9.90   1st Qu.:  2.90   1st Qu.:23.00   1st Qu.:  3.100  
##  Median :12.10   Median :  7.30   Median :27.80   Median :  5.400  
##  Mean   :12.27   Mean   : 12.58   Mean   :26.47   Mean   :  5.574  
##  3rd Qu.:14.20   3rd Qu.: 16.80   3rd Qu.:32.90   3rd Qu.:  7.500  
##  Max.   :51.70   Max.   :100.00   Max.   :82.80   Max.   :100.000  
##                                                                    
##     DA0AT21R       DA0912DR21R       DAGC4X21R        DAGC5X20R     
##  Min.   : -1.00   Min.   :-1.000   Min.   : -1.00   Min.   : -1.00  
##  1st Qu.: 94.05   1st Qu.: 0.000   1st Qu.: 93.20   1st Qu.: 95.50  
##  Median : 95.40   Median : 0.400   Median : 96.90   Median : 98.30  
##  Mean   : 94.76   Mean   : 1.243   Mean   : 93.91   Mean   : 95.76  
##  3rd Qu.: 96.40   3rd Qu.: 1.400   3rd Qu.:100.00   3rd Qu.:100.00  
##  Max.   :100.00   Max.   :50.500   Max.   :100.00   Max.   :100.00  
##  NA's   :4        NA's   :112      NA's   :133      NA's   :141     
##    DAGC6X19R         DA0GR21N          DA0GS21N      DDA00A001S22R   
##  Min.   : -1.00   Min.   :    1.0   Min.   :   0.0   Min.   :  4.00  
##  1st Qu.: 95.20   1st Qu.:   29.0   1st Qu.:  26.0   1st Qu.: 68.00  
##  Median : 98.20   Median :   69.0   Median :  61.0   Median : 76.00  
##  Mean   : 95.72   Mean   :  331.6   Mean   : 278.9   Mean   : 74.77  
##  3rd Qu.:100.00   3rd Qu.:  208.0   3rd Qu.: 167.0   3rd Qu.: 83.00  
##  Max.   :100.00   Max.   :11588.0   Max.   :9607.0   Max.   :100.00  
##  NA's   :149      NA's   :126       NA's   :126      NA's   :5       
##  DDA00A001222R   DDA00A001322R   DDA00AR01S22R    DDA00AR01222R   
##  Min.   : 0.00   Min.   : 0.00   Min.   : -1.00   Min.   : -1.00  
##  1st Qu.:37.00   1st Qu.:15.00   1st Qu.: 70.00   1st Qu.: 43.00  
##  Median :46.00   Median :20.00   Median : 77.00   Median : 52.00  
##  Mean   :46.48   Mean   :21.05   Mean   : 76.22   Mean   : 52.12  
##  3rd Qu.:55.00   3rd Qu.:26.00   3rd Qu.: 84.00   3rd Qu.: 61.00  
##  Max.   :88.00   Max.   :64.00   Max.   :100.00   Max.   :100.00  
##  NA's   :5       NA's   :5       NA's   :5        NA's   :5       
##  DDA00AR01322R   DDA00AM01S22R    DDA00AM01222R   DDA00AM01322R  
##  Min.   :-1.00   Min.   : -1.00   Min.   :-1.00   Min.   :-1.00  
##  1st Qu.:17.00   1st Qu.: 66.00   1st Qu.:30.00   1st Qu.:11.00  
##  Median :22.00   Median : 74.00   Median :40.00   Median :17.00  
##  Mean   :23.64   Mean   : 72.78   Mean   :40.51   Mean   :18.21  
##  3rd Qu.:29.00   3rd Qu.: 82.00   3rd Qu.:50.00   3rd Qu.:23.00  
##  Max.   :66.00   Max.   :100.00   Max.   :91.00   Max.   :65.00  
##  NA's   :5       NA's   :5        NA's   :5       NA's   :5      
##  DDA00AC01S22R    DDA00AC01222R    DDA00AC01322R   DDA00AS01S22R   
##  Min.   : -1.00   Min.   : -1.00   Min.   :-1.00   Min.   : -1.00  
##  1st Qu.: 68.00   1st Qu.: 34.00   1st Qu.:10.00   1st Qu.: 66.00  
##  Median : 77.00   Median : 44.00   Median :16.00   Median : 75.00  
##  Mean   : 75.04   Mean   : 44.29   Mean   :17.39   Mean   : 73.09  
##  3rd Qu.: 85.00   3rd Qu.: 55.00   3rd Qu.:23.00   3rd Qu.: 83.00  
##  Max.   :100.00   Max.   :100.00   Max.   :56.00   Max.   :100.00  
##  NA's   :11       NA's   :11       NA's   :11      NA's   :43      
##  DDA00AS01222R   DDA00AS01322R   DDB00A001S22R    DDB00A001222R  
##  Min.   : -1.0   Min.   :-1.00   Min.   : -1.00   Min.   : -1.0  
##  1st Qu.: 37.0   1st Qu.:17.00   1st Qu.: 50.00   1st Qu.: 21.0  
##  Median : 46.0   Median :24.00   Median : 63.00   Median : 31.0  
##  Mean   : 45.7   Mean   :25.29   Mean   : 57.65   Mean   : 31.5  
##  3rd Qu.: 55.0   3rd Qu.:32.00   3rd Qu.: 74.50   3rd Qu.: 43.0  
##  Max.   :100.0   Max.   :77.00   Max.   :100.00   Max.   :100.0  
##  NA's   :43      NA's   :43      NA's   :192      NA's   :192    
##  DDB00A001322R   DDH00A001S22R   DDH00A001222R    DDH00A001322R  
##  Min.   :-1.00   Min.   : -1.0   Min.   : -1.00   Min.   :-1.00  
##  1st Qu.: 5.00   1st Qu.: 66.0   1st Qu.: 34.00   1st Qu.:12.00  
##  Median :11.00   Median : 73.0   Median : 41.00   Median :16.00  
##  Mean   :12.45   Mean   : 71.7   Mean   : 41.63   Mean   :16.86  
##  3rd Qu.:17.00   3rd Qu.: 80.0   3rd Qu.: 49.00   3rd Qu.:21.00  
##  Max.   :90.00   Max.   :100.0   Max.   :100.00   Max.   :59.00  
##  NA's   :192     NA's   :6       NA's   :6        NA's   :6      
##  DDW00A001S22R    DDW00A001222R    DDW00A001322R   DDI00A001S22R  
##  Min.   : -1.00   Min.   : -1.00   Min.   :-1.00   Min.   : -1.0  
##  1st Qu.: 75.00   1st Qu.: 45.00   1st Qu.:19.00   1st Qu.: -1.0  
##  Median : 82.00   Median : 54.00   Median :26.00   Median : 66.0  
##  Mean   : 78.69   Mean   : 53.24   Mean   :26.15   Mean   : 50.8  
##  3rd Qu.: 88.00   3rd Qu.: 63.00   3rd Qu.:33.00   3rd Qu.: 83.0  
##  Max.   :100.00   Max.   :100.00   Max.   :79.00   Max.   :100.0  
##  NA's   :26       NA's   :26       NA's   :26      NA's   :472    
##  DDI00A001222R    DDI00A001322R    DD300A001S22R   DD300A001222R   
##  Min.   : -1.00   Min.   : -1.00   Min.   : -1.0   Min.   : -1.00  
##  1st Qu.: -1.00   1st Qu.: -1.00   1st Qu.: 50.0   1st Qu.: 17.00  
##  Median : 35.00   Median : 11.00   Median : 89.0   Median : 67.00  
##  Mean   : 32.08   Mean   : 14.35   Mean   : 67.8   Mean   : 53.76  
##  3rd Qu.: 54.00   3rd Qu.: 23.00   3rd Qu.: 96.0   3rd Qu.: 80.00  
##  Max.   :100.00   Max.   :100.00   Max.   :100.0   Max.   :100.00  
##  NA's   :472      NA's   :472      NA's   :423     NA's   :423     
##  DD300A001322R    DD400A001S22R    DD400A001222R    DD400A001322R  
##  Min.   : -1.00   Min.   : -1.00   Min.   : -1.00   Min.   :-1.00  
##  1st Qu.:  0.00   1st Qu.: -1.00   1st Qu.: -1.00   1st Qu.:-1.00  
##  Median : 37.00   Median : 60.00   Median : 22.50   Median : 0.00  
##  Mean   : 33.08   Mean   : 44.08   Mean   : 29.23   Mean   :14.05  
##  3rd Qu.: 52.00   3rd Qu.: 83.00   3rd Qu.: 56.75   3rd Qu.:25.75  
##  Max.   :100.00   Max.   :100.00   Max.   :100.00   Max.   :83.00  
##  NA's   :423      NA's   :797      NA's   :797      NA's   :797    
##  DD200A001S22R    DD200A001222R    DD200A001322R    DDE00A001S22R   
##  Min.   : -1.00   Min.   : -1.00   Min.   : -1.00   Min.   : -1.00  
##  1st Qu.: 64.00   1st Qu.: 33.00   1st Qu.: 10.00   1st Qu.: 63.00  
##  Median : 76.00   Median : 47.00   Median : 20.00   Median : 70.00  
##  Mean   : 68.18   Mean   : 43.98   Mean   : 21.06   Mean   : 69.68  
##  3rd Qu.: 86.00   3rd Qu.: 60.00   3rd Qu.: 30.00   3rd Qu.: 77.00  
##  Max.   :100.00   Max.   :100.00   Max.   :100.00   Max.   :100.00  
##  NA's   :133      NA's   :133      NA's   :133      NA's   :10      
##  DDE00A001222R    DDE00A001322R     DA0CT21R         DA0CC21R    
##  Min.   : -1.00   Min.   :-1.0   Min.   : -2.00   Min.   :-1.00  
##  1st Qu.: 32.00   1st Qu.:12.0   1st Qu.: 40.42   1st Qu.:12.90  
##  Median : 39.00   Median :15.0   Median : 63.05   Median :23.55  
##  Mean   : 39.23   Mean   :15.9   Mean   : 60.76   Mean   :26.10  
##  3rd Qu.: 46.00   3rd Qu.:19.0   3rd Qu.: 85.38   3rd Qu.:37.08  
##  Max.   :100.00   Max.   :80.0   Max.   :100.00   Max.   :97.70  
##  NA's   :10       NA's   :10     NA's   :125      NA's   :147    
##    DA0CSA21R        DA0CAA21R        DPSATOFC          DPSTTOFC       
##  Min.   :  -1.0   Min.   :-1.00   Min.   :    1.0   Min.   :    0.00  
##  1st Qu.: 887.0   1st Qu.:16.30   1st Qu.:   58.9   1st Qu.:   30.57  
##  Median : 973.0   Median :19.00   Median :  144.1   Median :   72.00  
##  Mean   : 823.9   Mean   :16.13   Mean   :  622.5   Mean   :  307.06  
##  3rd Qu.:1039.0   3rd Qu.:21.20   3rd Qu.:  405.2   3rd Qu.:  196.50  
##  Max.   :1344.0   Max.   :31.40   Max.   :23716.2   Max.   :10619.50  
##  NA's   :262      NA's   :236     NA's   :3         NA's   :3         
##     DPSCTOFP         DPSSTOFP         DPSUTOFP         DPSTTOFP    
##  Min.   : 0.000   Min.   :  0.00   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.: 1.200   1st Qu.:  2.60   1st Qu.: 4.100   1st Qu.:46.60  
##  Median : 1.800   Median :  3.10   Median : 6.600   Median :50.80  
##  Mean   : 2.178   Mean   :  3.58   Mean   : 7.169   Mean   :50.98  
##  3rd Qu.: 2.600   3rd Qu.:  3.90   3rd Qu.: 9.800   3rd Qu.:54.80  
##  Max.   :14.900   Max.   :100.00   Max.   :49.700   Max.   :88.30  
##  NA's   :3        NA's   :3        NA's   :3        NA's   :3      
##     DPSETOFP        DPSXTOFP        DPSCTOSA         DPSSTOSA     
##  Min.   : 0.00   Min.   : 0.00   Min.   :    -2   Min.   :    -2  
##  1st Qu.: 9.70   1st Qu.:19.38   1st Qu.: 95459   1st Qu.: 73469  
##  Median :12.60   Median :23.65   Median :106674   Median : 78723  
##  Mean   :12.95   Mean   :23.14   Mean   :108039   Mean   : 79435  
##  3rd Qu.:16.20   3rd Qu.:27.50   3rd Qu.:119540   3rd Qu.: 84945  
##  Max.   :48.80   Max.   :55.40   Max.   :270000   Max.   :192500  
##  NA's   :3       NA's   :3       NA's   :10       NA's   :11      
##     DPSUTOSA         DPSTTOSA         DPSAMIFP         DPSAKIDR      
##  Min.   :    -2   Min.   : 36081   Min.   :  0.00   Min.   :  0.100  
##  1st Qu.: 57969   1st Qu.: 50439   1st Qu.: 13.30   1st Qu.:  5.400  
##  Median : 63015   Median : 53382   Median : 26.60   Median :  6.300  
##  Mean   : 62424   Mean   : 53971   Mean   : 35.24   Mean   :  6.734  
##  3rd Qu.: 67941   3rd Qu.: 56919   3rd Qu.: 50.62   3rd Qu.:  7.300  
##  Max.   :228972   Max.   :110560   Max.   :100.00   Max.   :349.100  
##  NA's   :54       NA's   :4        NA's   :3        NA's   :3        
##     DPSTKIDR        DPST05FP         DPSTEXPA        DPSTADFP    
##  Min.   :-2.00   Min.   :  0.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.:10.80   1st Qu.: 24.35   1st Qu.:10.07   1st Qu.:14.88  
##  Median :12.70   Median : 32.40   Median :12.00   Median :20.90  
##  Mean   :12.56   Mean   : 34.88   Mean   :11.75   Mean   :20.86  
##  3rd Qu.:14.40   3rd Qu.: 41.65   3rd Qu.:13.90   3rd Qu.:26.12  
##  Max.   :37.30   Max.   :100.00   Max.   :22.90   Max.   :78.70  
##  NA's   :3       NA's   :3        NA's   :3       NA's   :3      
##     DPSTURNR         DPSTBLFP         DPSTHIFP         DPSTWHFP     
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 14.80   1st Qu.:  0.00   1st Qu.:  4.20   1st Qu.: 58.67  
##  Median : 19.50   Median :  1.60   Median : 10.10   Median : 82.40  
##  Mean   : 21.51   Mean   :  6.99   Mean   : 19.05   Mean   : 71.57  
##  3rd Qu.: 25.90   3rd Qu.:  6.20   3rd Qu.: 22.50   3rd Qu.: 92.60  
##  Max.   :100.00   Max.   :100.00   Max.   :100.00   Max.   :100.00  
##  NA's   :7        NA's   :3        NA's   :3        NA's   :3       
##     DPSTINFP          DPSTASFP         DPSTPIFP          DPSTTWFP      
##  Min.   : 0.0000   Min.   : 0.000   Min.   :0.00000   Min.   : 0.0000  
##  1st Qu.: 0.0000   1st Qu.: 0.000   1st Qu.:0.00000   1st Qu.: 0.0000  
##  Median : 0.0000   Median : 0.000   Median :0.00000   Median : 0.0000  
##  Mean   : 0.3566   Mean   : 1.118   Mean   :0.08206   Mean   : 0.7566  
##  3rd Qu.: 0.3000   3rd Qu.: 0.900   3rd Qu.:0.00000   3rd Qu.: 1.2000  
##  Max.   :16.7000   Max.   :94.800   Max.   :7.80000   Max.   :11.7000  
##  NA's   :3         NA's   :3        NA's   :3         NA's   :3        
##     DPSTREFP         DPSTSPFP         DPSTCOFP         DPSTBIFP     
##  Min.   :  0.00   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 71.00   1st Qu.: 4.500   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 76.90   Median : 7.000   Median : 3.350   Median : 0.000  
##  Mean   : 76.87   Mean   : 7.145   Mean   : 4.131   Mean   : 2.311  
##  3rd Qu.: 82.42   3rd Qu.: 9.600   3rd Qu.: 6.125   3rd Qu.: 2.100  
##  Max.   :100.00   Max.   :22.800   Max.   :32.500   Max.   :94.300  
##  NA's   :3        NA's   :3        NA's   :3        NA's   :3       
##     DPSTVOFP         DPSTGOFP         DPFVTOTK           DPFTADPR     
##  Min.   : 0.000   Min.   : 0.000   Min.   :       0   Min.   :0.0000  
##  1st Qu.: 4.800   1st Qu.: 0.000   1st Qu.:  238299   1st Qu.:0.9892  
##  Median : 7.000   Median : 1.200   Median :  419144   Median :1.1670  
##  Mean   : 7.154   Mean   : 2.308   Mean   :  665067   Mean   :1.0212  
##  3rd Qu.: 9.425   3rd Qu.: 3.700   3rd Qu.:  670248   3rd Qu.:1.3017  
##  Max.   :71.900   Max.   :30.800   Max.   :26416597   Max.   :1.7480  
##  NA's   :3        NA's   :3        NA's   :5          NA's   :5       
##    DPFRAALLT           DPFRAALLK        DPFRAOPRT           DPFRASTAP     
##  Min.   :6.428e+05   Min.   :  8923   Min.   :6.428e+05   Min.   :  1.70  
##  1st Qu.:5.828e+06   1st Qu.: 12953   1st Qu.:5.524e+06   1st Qu.: 33.85  
##  Median :1.381e+07   Median : 14653   Median :1.241e+07   Median : 51.00  
##  Mean   :5.995e+07   Mean   : 16365   Mean   :5.129e+07   Mean   : 49.05  
##  3rd Qu.:3.805e+07   3rd Qu.: 17081   3rd Qu.:3.241e+07   3rd Qu.: 64.80  
##  Max.   :2.619e+09   Max.   :214078   Max.   :2.213e+09   Max.   :103.40  
##  NA's   :5           NA's   :5        NA's   :5           NA's   :5       
##     DZRVLOCP       DPFRAFEDP       DPFRAORVT           DPFUNAB1T        
##  Min.   :-6.20   Min.   : 0.00   Min.   :  -655726   Min.   :  -746998  
##  1st Qu.:21.10   1st Qu.: 8.70   1st Qu.:    98311   1st Qu.:  1226730  
##  Median :35.30   Median :12.30   Median :  1093332   Median :  3589384  
##  Mean   :37.92   Mean   :13.04   Mean   :  8659622   Mean   : 13498263  
##  3rd Qu.:53.70   3rd Qu.:16.20   3rd Qu.:  4471240   3rd Qu.:  9357248  
##  Max.   :97.60   Max.   :49.00   Max.   :405596099   Max.   :662450197  
##  NA's   :5       NA's   :5       NA's   :5           NA's   :5          
##     DPFUNA4T           DPFEAALLT           DPFEAOPFT           DPFEAOPFK     
##  Min.   : -7033092   Min.   :6.229e+05   Min.   :6.120e+05   Min.   :  6755  
##  1st Qu.:        0   1st Qu.:5.475e+06   1st Qu.:4.754e+06   1st Qu.: 10916  
##  Median :        0   Median :1.328e+07   Median :1.102e+07   Median : 12228  
##  Mean   :   510514   Mean   :6.597e+07   Mean   :4.951e+07   Mean   : 13121  
##  3rd Qu.:        0   3rd Qu.:3.787e+07   3rd Qu.:3.038e+07   3rd Qu.: 14012  
##  Max.   :126144201   Max.   :2.656e+09   Max.   :2.068e+09   Max.   :178467  
##  NA's   :5           NA's   :5           NA's   :5           NA's   :5       
##    DPFEAINSP        DZEXADMP         DZEXADSP         DZEXPLAP    
##  Min.   :18.50   Min.   : 2.700   Min.   : 0.000   Min.   : 0.20  
##  1st Qu.:52.00   1st Qu.: 7.125   1st Qu.: 4.900   1st Qu.:10.40  
##  Median :55.10   Median : 8.800   Median : 5.700   Median :11.80  
##  Mean   :54.73   Mean   : 9.606   Mean   : 6.015   Mean   :12.46  
##  3rd Qu.:57.80   3rd Qu.:11.200   3rd Qu.: 6.400   3rd Qu.:13.50  
##  Max.   :84.40   Max.   :35.800   Max.   :22.700   Max.   :43.10  
##  NA's   :5       NA's   :5        NA's   :5        NA's   :5      
##     DZEXOTHP       DPFEAINST           DPFEAINSK       DPFPAREGP    
##  Min.   : 0.30   Min.   :2.439e+05   Min.   : 3122   Min.   : 0.00  
##  1st Qu.:15.30   1st Qu.:2.563e+06   1st Qu.: 6056   1st Qu.:35.12  
##  Median :18.00   Median :6.013e+06   Median : 6702   Median :39.70  
##  Mean   :17.15   Mean   :2.835e+07   Mean   : 7074   Mean   :39.80  
##  3rd Qu.:20.00   3rd Qu.:1.683e+07   3rd Qu.: 7577   3rd Qu.:43.90  
##  Max.   :69.30   Max.   :1.177e+09   Max.   :54954   Max.   :79.10  
##  NA's   :5       NA's   :5           NA's   :5       NA's   :5      
##    DPFPASPEP        DPFPACOMP        DPFPABILP         DPFPAVOCP    
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.0000   Min.   : 0.00  
##  1st Qu.: 5.800   1st Qu.: 6.500   1st Qu.: 0.1000   1st Qu.: 2.90  
##  Median : 8.900   Median : 9.200   Median : 0.4000   Median : 4.10  
##  Mean   : 9.711   Mean   : 9.883   Mean   : 0.7496   Mean   : 3.96  
##  3rd Qu.:12.500   3rd Qu.:12.100   3rd Qu.: 1.0000   3rd Qu.: 5.20  
##  Max.   :49.000   Max.   :90.600   Max.   :26.0000   Max.   :19.80  
##  NA's   :5        NA's   :5        NA's   :5         NA's   :5      
##    DPFPAGIFP        DPFPAATHP       DPFPAHSAP         DPFPREKP      
##  Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   : 0.0000  
##  1st Qu.:0.0000   1st Qu.:1.600   1st Qu.:0.0000   1st Qu.: 0.0000  
##  Median :0.2000   Median :2.900   Median :0.0000   Median : 0.6000  
##  Mean   :0.3823   Mean   :2.809   Mean   :0.1578   Mean   : 0.8909  
##  3rd Qu.:0.4000   3rd Qu.:4.000   3rd Qu.:0.1000   3rd Qu.: 1.3000  
##  Max.   :6.9000   Max.   :9.000   Max.   :3.4000   Max.   :31.7000  
##  NA's   :5        NA's   :5       NA's   :5        NA's   :5        
##    DPFPAOTHP       DISTSIZE           COMMTYPE           PROPWLTH        
##  Min.   : 3.50   Length:1207        Length:1207        Length:1207       
##  1st Qu.:25.40   Class :character   Class :character   Class :character  
##  Median :28.70   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :29.22                                                           
##  3rd Qu.:32.40                                                           
##  Max.   :76.30                                                           
##  NA's   :5                                                               
##    TAXRATE         
##  Length:1207       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

There is, maybe, a better way of examining data type:

str(district)
## tibble [1,207 × 137] (S3: tbl_df/tbl/data.frame)
##  $ DISTNAME     : chr [1:1207] "CAYUGA ISD" "ELKHART ISD" "FRANKSTON ISD" "NECHES ISD" ...
##  $ DISTRICT     : chr [1:1207] "001902" "001903" "001904" "001906" ...
##  $ DZCNTYNM     : chr [1:1207] "001 ANDERSON" "001 ANDERSON" "001 ANDERSON" "001 ANDERSON" ...
##  $ REGION       : chr [1:1207] "07" "07" "07" "07" ...
##  $ DZRATING     : chr [1:1207] "A" "A" "A" "A" ...
##  $ DZCAMPUS     : num [1:1207] 3 4 3 2 6 4 2 6 4 5 ...
##  $ DPETALLC     : num [1:1207] 574 1150 808 342 3360 ...
##  $ DPETBLAP     : num [1:1207] 4.4 4 8.5 8.2 25.1 19.7 0.3 0.8 15.7 7.2 ...
##  $ DPETHISP     : num [1:1207] 11.5 11.8 11.3 13.5 42.9 26.2 8.6 68.7 31.2 27.9 ...
##  $ DPETWHIP     : num [1:1207] 79.1 80.3 75.2 75.1 27.3 48 87 28.2 48.5 60.6 ...
##  $ DPETINDP     : num [1:1207] 0 0.3 0.4 0.3 0.2 0.7 0 0.3 0.1 0.3 ...
##  $ DPETASIP     : num [1:1207] 0.5 0.2 1 0.3 0.7 0.5 0.6 0.3 1 1 ...
##  $ DPETPCIP     : num [1:1207] 0 0 0 0 0.1 0.1 0 0 0.1 0.1 ...
##  $ DPETTWOP     : num [1:1207] 4.5 3.4 3.6 2.6 3.7 4.9 3.6 1.7 3.4 3 ...
##  $ DPETECOP     : num [1:1207] 40.8 45.4 54.2 54.1 81.6 74 46.8 49.6 57.8 50.1 ...
##  $ DPETLEPP     : num [1:1207] 1 2.8 4.1 2 17.7 7.1 0.6 14.2 5.1 6.9 ...
##  $ DPETSPEP     : num [1:1207] 14.6 12.1 13.1 10.5 13.5 14.5 14.7 10.4 11.6 11.9 ...
##  $ DPETBILP     : num [1:1207] 1 2.7 4.1 2 16.1 6.8 0.6 15.2 5 6 ...
##  $ DPETVOCP     : num [1:1207] 30.5 31.8 43.9 29.5 30.6 38.7 37.7 24.8 18.9 34.4 ...
##  $ DPETGIFP     : num [1:1207] 6.1 4.6 7.3 5.6 2.3 3.2 3.3 6.8 9.2 6 ...
##  $ DA0AT21R     : num [1:1207] 96.7 96 95.4 95.8 93.7 94.5 96.7 92.8 97.3 95.2 ...
##  $ DA0912DR21R  : num [1:1207] 0 0.3 0.4 0 0 0 0 0.4 0.4 0.7 ...
##  $ DAGC4X21R    : num [1:1207] 100 100 95.2 95.8 99 97.8 100 96.8 100 94.1 ...
##  $ DAGC5X20R    : num [1:1207] 100 98.9 100 97 99.6 97 100 97.2 100 95.6 ...
##  $ DAGC6X19R    : num [1:1207] 96 98.8 33.3 100 98.6 97.4 100 96.7 100 95.9 ...
##  $ DA0GR21N     : num [1:1207] 36 91 41 23 201 95 32 293 52 196 ...
##  $ DA0GS21N     : num [1:1207] 34 79 40 17 198 77 27 238 52 154 ...
##  $ DDA00A001S22R: num [1:1207] 84 85 83 90 74 69 86 76 82 86 ...
##  $ DDA00A001222R: num [1:1207] 62 59 57 64 46 40 55 47 56 60 ...
##  $ DDA00A001322R: num [1:1207] 33 30 25 27 20 16 25 21 30 31 ...
##  $ DDA00AR01S22R: num [1:1207] 81 85 84 87 72 70 86 75 82 84 ...
##  $ DDA00AR01222R: num [1:1207] 67 64 63 67 48 45 66 50 60 62 ...
##  $ DDA00AR01322R: num [1:1207] 39 34 24 30 20 19 31 22 31 31 ...
##  $ DDA00AM01S22R: num [1:1207] 88 84 85 94 75 66 81 76 81 88 ...
##  $ DDA00AM01222R: num [1:1207] 65 49 57 69 44 34 42 44 53 62 ...
##  $ DDA00AM01322R: num [1:1207] 34 23 26 27 20 14 19 21 29 33 ...
##  $ DDA00AC01S22R: num [1:1207] 85 86 81 90 78 73 96 75 83 84 ...
##  $ DDA00AC01222R: num [1:1207] 54 63 49 54 48 41 45 46 57 52 ...
##  $ DDA00AC01322R: num [1:1207] 22 29 21 23 22 15 16 18 27 21 ...
##  $ DDA00AS01S22R: num [1:1207] 78 90 74 83 72 68 92 81 82 87 ...
##  $ DDA00AS01222R: num [1:1207] 47 63 48 51 42 38 73 50 51 60 ...
##  $ DDA00AS01322R: num [1:1207] 21 42 26 26 20 15 38 27 32 36 ...
##  $ DDB00A001S22R: num [1:1207] 60 46 74 88 64 56 -1 71 68 71 ...
##  $ DDB00A001222R: num [1:1207] 17 22 38 48 33 26 -1 41 38 37 ...
##  $ DDB00A001322R: num [1:1207] 3 8 6 19 11 11 -1 13 14 14 ...
##  $ DDH00A001S22R: num [1:1207] 74 85 75 91 73 69 87 72 81 81 ...
##  $ DDH00A001222R: num [1:1207] 53 56 46 69 44 36 57 42 50 53 ...
##  $ DDH00A001322R: num [1:1207] 24 25 19 26 19 12 20 17 24 24 ...
##  $ DDW00A001S22R: num [1:1207] 87 88 85 89 83 75 86 84 88 89 ...
##  $ DDW00A001222R: num [1:1207] 66 61 62 66 60 48 55 58 67 66 ...
##  $ DDW00A001322R: num [1:1207] 35 32 28 29 29 21 26 29 40 35 ...
##  $ DDI00A001S22R: num [1:1207] NA 100 80 -1 75 NA NA 83 -1 62 ...
##  $ DDI00A001222R: num [1:1207] NA 100 20 -1 50 NA NA 28 -1 8 ...
##  $ DDI00A001322R: num [1:1207] NA 100 20 -1 17 NA NA 6 -1 0 ...
##  $ DD300A001S22R: num [1:1207] 33 -1 84 -1 85 100 NA 100 93 97 ...
##  $ DD300A001222R: num [1:1207] 33 -1 53 -1 77 100 NA 87 73 82 ...
##  $ DD300A001322R: num [1:1207] 17 -1 16 -1 44 88 NA 67 53 56 ...
##  $ DD400A001S22R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
##  $ DD400A001222R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
##  $ DD400A001322R: num [1:1207] NA NA NA NA -1 -1 NA NA -1 -1 ...
##  $ DD200A001S22R: num [1:1207] 83 77 75 -1 74 62 88 85 74 83 ...
##  $ DD200A001222R: num [1:1207] 54 46 58 -1 44 38 50 58 48 50 ...
##  $ DD200A001322R: num [1:1207] 34 23 28 -1 18 13 6 31 13 29 ...
##  $ DDE00A001S22R: num [1:1207] 76 77 77 86 70 65 81 67 77 78 ...
##  $ DDE00A001222R: num [1:1207] 50 42 49 53 40 34 45 36 48 46 ...
##  $ DDE00A001322R: num [1:1207] 23 19 17 17 16 14 17 14 23 19 ...
##  $ DA0CT21R     : num [1:1207] 58.3 51.6 92.7 87 43.3 40 12.5 42 9.6 38.3 ...
##  $ DA0CC21R     : num [1:1207] 19 27.7 36.8 15 49.4 28.9 -1 35.8 60 60 ...
##  $ DA0CSA21R    : num [1:1207] 980 979 980 1007 1048 ...
##  $ DA0CAA21R    : num [1:1207] NA -1 -1 18.8 21 -1 -1 22.3 NA 23.1 ...
##  $ DPSATOFC     : num [1:1207] 99.9 186.6 146.7 60.1 553.4 ...
##  $ DPSTTOFC     : num [1:1207] 46.7 104.9 74.5 30.2 260.3 ...
##  $ DPSCTOFP     : num [1:1207] 1.5 1.1 1.4 3.1 2.1 1.1 4.1 1.5 4.5 0.9 ...
##  $ DPSSTOFP     : num [1:1207] 5 2.1 3.5 5 3.4 4.6 3.4 2.6 3.1 3.9 ...
##  $ DPSUTOFP     : num [1:1207] 5.4 4.9 2 1.7 8.3 4.4 3 5.8 10 6 ...
##  $ DPSTTOFP     : num [1:1207] 46.8 56.2 50.8 50.3 47 45.5 56.7 50.8 50 49.7 ...
##  $ DPSETOFP     : num [1:1207] 14.8 16.2 15 13.7 19.7 19.2 9.8 15.4 11.1 8.2 ...
##  $ DPSXTOFP     : num [1:1207] 26.5 19.5 27.4 26.2 19.5 25.2 23 23.9 21.4 31.3 ...
##  $ DPSCTOSA     : num [1:1207] 93333 100313 98293 85537 99324 ...
##  $ DPSSTOSA     : num [1:1207] 73300 79305 71215 81593 80415 ...
##  $ DPSUTOSA     : num [1:1207] 59550 60616 58022 77642 63829 ...
##  $ DPSTTOSA     : num [1:1207] 55570 47916 50382 55346 48825 ...
##  $ DPSAMIFP     : num [1:1207] 15.6 13.4 10.9 16.3 32.1 29.9 1.9 41.3 22.2 18.8 ...
##  $ DPSAKIDR     : num [1:1207] 5.7 6.2 5.5 5.7 6.1 5 5.2 7.3 7.4 6.5 ...
##  $ DPSTKIDR     : num [1:1207] 12.3 11 10.8 11.3 12.9 11 9.3 14.4 14.8 13.2 ...
##  $ DPST05FP     : num [1:1207] 10.4 23.8 32.7 9.7 33.8 44.8 17.9 21.5 35 21.9 ...
##  $ DPSTEXPA     : num [1:1207] 16.7 13.5 12.8 14.8 12.7 10.3 15.4 13.8 10.2 13.8 ...
##  $ DPSTADFP     : num [1:1207] 14.8 19 30.7 9.6 15.4 17.4 16.9 24.3 18.5 22.4 ...
##  $ DPSTURNR     : num [1:1207] 19.1 13.9 21.6 18.3 17.9 30.6 14.6 11.5 17 9.5 ...
##  $ DPSTBLFP     : num [1:1207] 8.3 2.9 4 6.5 9.6 11.6 0 1.4 4.4 0.5 ...
##  $ DPSTHIFP     : num [1:1207] 0 6.7 1.3 0 13.8 6.6 0 25.7 8.9 5.6 ...
##  $ DPSTWHFP     : num [1:1207] 91.7 90.5 93.3 93.5 74.6 80.9 100 69 86.7 93.9 ...
##  $ DPSTINFP     : num [1:1207] 0 0 0 0 0 0.8 0 0.3 0 0 ...
##  $ DPSTASFP     : num [1:1207] 0 0 0 0 0 0 0 0.7 0 0 ...
##  $ DPSTPIFP     : num [1:1207] 0 0 0 0 0 0 0 0 0 0 ...
##  $ DPSTTWFP     : num [1:1207] 0 0 1.3 0 1.9 0 0 2.8 0 0 ...
##  $ DPSTREFP     : num [1:1207] 81.6 71.5 87.6 70 71.4 71.4 61 41.7 82.7 66.4 ...
##  $ DPSTSPFP     : num [1:1207] 9.9 8.4 7.5 5.5 10.2 6.4 5.8 14.4 6.8 9.6 ...
##  $ DPSTCOFP     : num [1:1207] 0 4.9 2.7 12 5 6.1 19.2 6.5 7.4 9.2 ...
##   [list output truncated]

You can do math on numeric variables, but not on characters (chr)!

mean(district$DZCAMPUS)
## [1] 7.428335
mean(district$TAXRATE)
## Warning in mean.default(district$TAXRATE): argument is not numeric or logical:
## returning NA
## [1] NA

There are more types of data than just numbers and characters. Lets switch gears a bit and look at the “Diamonds” dataset:

diamonds<-diamonds
head(diamonds)
## # A tibble: 6 × 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

We can pull the levels from “Ordered factors”

levels(diamonds$cut)
## [1] "Fair"      "Good"      "Very Good" "Premium"   "Ideal"

This will come in handy later.

How can we categorize characters or factors if we can’t do math on them?

We can count how many there are in the data:

table(diamonds$cut)
## 
##      Fair      Good Very Good   Premium     Ideal 
##      1610      4906     12082     13791     21551

This can also be done more neatly via dplyr:

diamonds %>% count(cut)
## # A tibble: 5 × 2
##   cut           n
##   <ord>     <int>
## 1 Fair       1610
## 2 Good       4906
## 3 Very Good 12082
## 4 Premium   13791
## 5 Ideal     21551

We can also count their proportion in the overall data:

proportions(table(diamonds$cut))
## 
##       Fair       Good  Very Good    Premium      Ideal 
## 0.02984798 0.09095291 0.22398962 0.25567297 0.39953652

BREAK

Let’s explore the data a bit using graphs (this is very useful)

ggplot(diamonds,aes(x=carat)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Remember “cut” from previously?

ggplot(diamonds,aes(cut)) + geom_bar()

Rule of Thumb:

barcharts (geom_bar()) for categorical/ordered variables, histograms (geom_histogram()) for numeric variables!!

scatterplots (geom_point()) to compare two numerical variables

We can use geom_point to quickly compare two numerical variables, in this case – carats vs. price for diamonds

ggplot(diamonds,aes(x=carat,y=price)) + geom_point() 

What is going on here?

Do price and carat appear to be correlated?

#We can figure this out mathematically!

cor(diamonds$carat,diamonds$price)
## [1] 0.9215913
#We can also add extra dimensions, such as color:

ggplot(diamonds,aes(x=carat,y=price,color=cut)) + geom_point() 

Now for a much bigger question, can we compare groups?

Yes, in fact, character variables and ordered factors make good “groups” to compare!

ggplot(diamonds,aes(clarity)) + geom_bar()

ggplot(diamonds,aes(x=clarity,y=price)) + geom_boxplot()

LOTS of outliers here!

TIDYING DATA

Luckily the “district” data is already tidy, for the most part.

head(district)
## # A tibble: 6 × 137
##   DISTNAME DISTRICT DZCNTYNM REGION DZRATING DZCAMPUS DPETALLC DPETBLAP DPETHISP
##   <chr>    <chr>    <chr>    <chr>  <chr>       <dbl>    <dbl>    <dbl>    <dbl>
## 1 CAYUGA … 001902   001 AND… 07     A               3      574      4.4     11.5
## 2 ELKHART… 001903   001 AND… 07     A               4     1150      4       11.8
## 3 FRANKST… 001904   001 AND… 07     A               3      808      8.5     11.3
## 4 NECHES … 001906   001 AND… 07     A               2      342      8.2     13.5
## 5 PALESTI… 001907   001 AND… 07     B               6     3360     25.1     42.9
## 6 WESTWOO… 001908   001 AND… 07     B               4     1332     19.7     26.2
## # ℹ 128 more variables: DPETWHIP <dbl>, DPETINDP <dbl>, DPETASIP <dbl>,
## #   DPETPCIP <dbl>, DPETTWOP <dbl>, DPETECOP <dbl>, DPETLEPP <dbl>,
## #   DPETSPEP <dbl>, DPETBILP <dbl>, DPETVOCP <dbl>, DPETGIFP <dbl>,
## #   DA0AT21R <dbl>, DA0912DR21R <dbl>, DAGC4X21R <dbl>, DAGC5X20R <dbl>,
## #   DAGC6X19R <dbl>, DA0GR21N <dbl>, DA0GS21N <dbl>, DDA00A001S22R <dbl>,
## #   DDA00A001222R <dbl>, DDA00A001322R <dbl>, DDA00AR01S22R <dbl>,
## #   DDA00AR01222R <dbl>, DDA00AR01322R <dbl>, DDA00AM01S22R <dbl>, …

Lets examine just school administrator salaries for 2022

#district administrative salaries are kept in "DPSCTOSA" per the data dictionary (district.lyt)

#we can select just the variables we need with dplyr and "SELECT"

your_variable_here<-district %>% select(DISTNAME,DPSCTOSA)

head(your_variable_here)
## # A tibble: 6 × 2
##   DISTNAME      DPSCTOSA
##   <chr>            <dbl>
## 1 CAYUGA ISD       93333
## 2 ELKHART ISD     100313
## 3 FRANKSTON ISD    98293
## 4 NECHES ISD       85537
## 5 PALESTINE ISD    99324
## 6 WESTWOOD ISD    121228

Must be nice! But there are some problems:

summary(your_variable_here)
##    DISTNAME            DPSCTOSA     
##  Length:1207        Min.   :    -2  
##  Class :character   1st Qu.: 95459  
##  Mode  :character   Median :106674  
##                     Mean   :108039  
##                     3rd Qu.:119540  
##                     Max.   :270000  
##                     NA's   :10

Ten missing observations and some “-2” salaries?

mean(your_variable_here$DPSCTOSA)
## [1] NA
# trying to compute the mean manually results in "NA" because there is missing data. "Summary" above is dropping the NA's behind the scenes.
# that's helpful, but we want to be exact. Also, the "-2" salaries are slightly skewing the average. Lets clean it up!
your_variable_here_cleaned<-your_variable_here %>% filter(DPSCTOSA>0)

mean(your_variable_here_cleaned$DPSCTOSA)
## [1] 108401.4

Let’s graph this a bit

ggplot(your_variable_here_cleaned,aes(DPSCTOSA)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

compare_two<-district %>% select(DISTNAME,DPSCTOSA,DPSTTOSA)

compare_two<-compare_two %>% filter(DPSCTOSA>0)

ggplot(compare_two,aes(DPSCTOSA,DPSTTOSA)) + geom_point()

Homework

For your homework this week, please (take a deep breath) do the following:

Due two weeks from now:

  1. create an Rmarkdown document with “district” data (like this one)
  2. create a new data frame with “DISTNAME”, “DPETSPEP” (percent special education) and “DPFPASPEP” (money spent on special education). call the dataframe whatever you want
  3. give me “summary()” statistics for both DPETSPEP and DFPASPEP. You can summarize them separately if you want.
  4. Which variable has missing values?
  5. remove the missing observations. How many are left overall?
  6. Create a point graph (hint: ggplot + geom_point()) to compare DPFPASPEP and DPETSPEP. Are they correlated?
  7. Do a mathematical check (cor()) of DPFPASPEP and DPETSPEP. What is the result?
  8. How would you interpret these results? (No real right or wrong answer – just tell me what you see)
  9. Knit the Rmarkdown and submit to Rpubs for publishing
  10. submit the link to Rpubs on CANVAS