I chose this library dataset because I read about 100 library books each year, usually ebooks and audiobooks, from the Montgomery County Public Library system. I wondered about public library use across the US and how this varied from State to State.

These are steps I took to view, analyze, and plot the data.

1. Pull in the dataset as a csv and assign it the dataframe name publiclibrary.

publiclibrary <- read.csv("PLS_FY17_State_pud17i.csv")

2. Get the dimensions of the dataframe.

dim(publiclibrary)
## [1]  54 120
Wow, that is a lot of variables! There are 54 instances (rows) of 120 variables (columns).

3. Look at a summary of the variables in the dataframe.

summary(publiclibrary)
##      STABR       POPU_LSA        F_POPLSA     POPU_UND       
##  AK     : 1   Min.   :   53883   IF17: 1   Min.   :   53883  
##  AL     : 1   1st Qu.: 1425636   R_17:53   1st Qu.: 1357138  
##  AR     : 1   Median : 4080292             Median : 3668491  
##  AS     : 1   Mean   : 5963516             Mean   : 5811297  
##  AZ     : 1   3rd Qu.: 7096462             3rd Qu.: 6919552  
##  CA     : 1   Max.   :39507301             Max.   :39507301  
##  (Other):48                                                  
##     POPU_ST            CENTLIB      F_CENLIB     BRANLIB      F_BRLIB  
##  Min.   :   53883   Min.   :  1.0   R_17:54   Min.   :  1.0   R_17:54  
##  1st Qu.: 1359070   1st Qu.: 56.0             1st Qu.: 28.5            
##  Median : 3999956   Median : 98.0             Median : 85.5            
##  Mean   : 5971812   Mean   :167.4             Mean   :142.5            
##  3rd Qu.: 6919608   3rd Qu.:232.5             3rd Qu.:188.5            
##  Max.   :39523613   Max.   :755.0             Max.   :959.0            
##                                                                        
##      BKMOB       F_BKMOB       MASTER       F_MASTER     LIBRARIA     
##  Min.   : 0.00   R_17:54   Min.   :   0.0   IF17:23   Min.   :   0.0  
##  1st Qu.: 3.00             1st Qu.: 122.0   R_17:31   1st Qu.: 232.1  
##  Median : 8.00             Median : 308.9             Median : 621.9  
##  Mean   :12.44             Mean   : 623.5             Mean   : 920.0  
##  3rd Qu.:16.50             3rd Qu.: 766.2             3rd Qu.:1212.7  
##  Max.   :74.00             Max.   :3668.8             Max.   :4434.3  
##                                                                       
##  F_LIBRAR     OTHPAID     F_OTHSTF     TOTSTAFF       F_TOTSTF 
##  IF17:21   Min.   :   9   IF17:22   Min.   :   10.0   IF17:21  
##  R_17:33   1st Qu.: 355   R_17:32   1st Qu.:  652.9   R_17:33  
##            Median :1049             Median : 1832.8            
##            Mean   :1717             Mean   : 2636.6            
##            3rd Qu.:2259             3rd Qu.: 3354.5            
##            Max.   :8645             Max.   :12822.5            
##                                                                
##      LOCGVT          F_LOCGVT      STGVT           F_STGVT  
##  Min.   :0.000e+00   IF17:23   Min.   :        0   IF17:22  
##  1st Qu.:3.673e+07   R_17:31   1st Qu.:   600288   R_17:32  
##  Median :1.228e+08             Median :  3987370            
##  Mean   :2.109e+08             Mean   : 16369668            
##  3rd Qu.:2.488e+08             3rd Qu.: 13527890            
##  Max.   :1.466e+09             Max.   :378391351            
##                                                             
##      FEDGVT        F_FEDGVT     OTHINCM          F_OTHINC 
##  Min.   :      0   IF17:24   Min.   :        0   IF17:24  
##  1st Qu.: 100846   R_17:30   1st Qu.:  3245873   R_17:30  
##  Median : 488692             Median :  7804848            
##  Mean   : 874573             Mean   : 17358687            
##  3rd Qu.:1187377             3rd Qu.: 21076727            
##  Max.   :7245771             Max.   :185825422            
##                                                           
##     TOTINCM          F_TOTINC     SALARIES          F_SALX  
##  Min.   :4.080e+05   IF17:24   Min.   :   190298   IF17:25  
##  1st Qu.:5.078e+07   R_17:30   1st Qu.: 25405734   R_17:29  
##  Median :1.358e+08             Median : 64473930            
##  Mean   :2.455e+08             Mean   :111821686            
##  3rd Qu.:2.973e+08             3rd Qu.:149094417            
##  Max.   :1.557e+09             Max.   :655508574            
##                                                             
##     BENEFIT           F_BENX      STAFFEXP         F_TOSTFX 
##  Min.   :        0   IF17:26   Min.   :   209072   IF17:26  
##  1st Qu.:  8235946   R_17:28   1st Qu.: 33077968   R_17:28  
##  Median : 21602092             Median : 86076022            
##  Mean   : 41358633             Mean   :153180320            
##  3rd Qu.: 47853656             3rd Qu.:198609555            
##  Max.   :314906940             Max.   :945490515            
##                                                             
##     PRMATEXP        F_PRMATX     ELMATEXP        F_ELMATX 
##  Min.   :     783   IF17:25   Min.   :       0   IF17:29  
##  1st Qu.: 2955908   R_17:29   1st Qu.:  995829   R_17:25  
##  Median : 9404856             Median : 3987152            
##  Mean   :13957847             Mean   : 6926839            
##  3rd Qu.:18360410             3rd Qu.: 8764068            
##  Max.   :73298018             Max.   :40301843            
##                                                           
##     OTHMATEX        F_OTMATX     TOTEXPCO         F_TOCOLX 
##  Min.   :       0   IF17:27   Min.   :      783   IF17:26  
##  1st Qu.:  609498   R_17:27   1st Qu.:  4314047   R_17:28  
##  Median : 2933638             Median : 16328964            
##  Mean   : 4593255             Mean   : 25477941            
##  3rd Qu.: 6338296             3rd Qu.: 35361758            
##  Max.   :29715225             Max.   :134252461            
##                                                            
##     OTHOPEXP         F_OTHOPX     TOTOPEXP         F_TOTOPX 
##  Min.   :     4980   IF17:25   Min.   :3.540e+05   IF17:26  
##  1st Qu.: 10214168   R_17:29   1st Qu.:4.873e+07   R_17:28  
##  Median : 26288912             Median :1.300e+08            
##  Mean   : 50929753             Mean   :2.296e+08            
##  3rd Qu.: 58241499             3rd Qu.:2.894e+08            
##  Max.   :408888807             Max.   :1.489e+09            
##                                                             
##     LCAP_REV        F_LCAPRV     SCAP_REV        F_SCAPRV 
##  Min.   :       0   IF17:25   Min.   :       0   IF17:26  
##  1st Qu.:  627904   R_17:29   1st Qu.:    1696   R_17:28  
##  Median : 5150793             Median :   86662            
##  Mean   :13771209             Mean   : 1371072            
##  3rd Qu.:20070256             3rd Qu.:  850390            
##  Max.   :78141200             Max.   :11137312            
##                                                           
##     FCAP_REV       F_FCAPRV     OCAP_REV        F_OCAPRV 
##  Min.   :      0   IF17:26   Min.   :       0   IF17:27  
##  1st Qu.:      0   R_17:28   1st Qu.:  214365   R_17:27  
##  Median :   5748             Median : 1111219            
##  Mean   : 230634             Mean   : 5545490            
##  3rd Qu.:  99918             3rd Qu.: 3350573            
##  Max.   :4615950             Max.   :86414059            
##                                                          
##     CAP_REV          F_TCAPRV     CAPITAL          F_TCAPX  
##  Min.   :        0   IF17:27   Min.   :        0   IF17:26  
##  1st Qu.:  3324510   R_17:27   1st Qu.:  4548049   R_17:28  
##  Median : 11062230             Median : 13116358            
##  Mean   : 20918405             Mean   : 23090344            
##  3rd Qu.: 26687619             3rd Qu.: 34128145            
##  Max.   :115049111             Max.   :128773264            
##                                                             
##      BKVOL              EBOOK          F_EBOOK      AUDIO_PH      
##  Min.   :   43208   Min.   :     200   IA17: 1   Min.   :    125  
##  1st Qu.: 3971801   1st Qu.:  964119   IF17:21   1st Qu.: 198587  
##  Median : 8822316   Median : 3627748   R_17:32   Median : 509842  
##  Mean   :13264512   Mean   : 8612249             Mean   : 808678  
##  3rd Qu.:16090848   3rd Qu.: 8578765             3rd Qu.: 947486  
##  Max.   :68128577   Max.   :81436463             Max.   :3533454  
##                                                                   
##  F_AUD_PH     AUDIO_DL        F_AUD_DL     VIDEO_PH       F_VID_PH 
##  IF17:28   Min.   :       0   IA17: 1   Min.   :      6   IF17:27  
##  R_17:26   1st Qu.:  794436   IF17:26   1st Qu.: 348534   R_17:27  
##            Median : 1509235   R_17:27   Median : 806348            
##            Mean   : 4804103             Mean   :1224140            
##            3rd Qu.: 5108128             3rd Qu.:1381205            
##            Max.   :46995879             Max.   :5487048            
##                                                                    
##     VIDEO_DL       F_VID_DL     EC_LO_OT       F_EC_L_O      EC_ST      
##  Min.   :      0   IA17: 1   Min.   :    0.0   IF17:22   Min.   :    0  
##  1st Qu.:  19338   IF17:34   1st Qu.:  298.2   R_17:32   1st Qu.: 1115  
##  Median :  92686   R_17:19   Median :  613.5             Median : 3741  
##  Mean   : 299419             Mean   : 1933.6             Mean   : 5935  
##  3rd Qu.: 271718             3rd Qu.: 1659.0             3rd Qu.: 7302  
##  Max.   :3882019             Max.   :15816.0             Max.   :33993  
##                                                                         
##  F_EC_ST      ELECCOLL     F_ELECOL     SUBSCRIP      F_PRSUB  
##  IF17: 8   Min.   :    0   IF17:21   Min.   :     0   IF17:27  
##  R_17:46   1st Qu.: 2225   R_17:33   1st Qu.:  5069   R_17:27  
##            Median : 4434             Median : 14371            
##            Mean   : 7868             Mean   : 29230            
##            3rd Qu.:10054             3rd Qu.: 34884            
##            Max.   :36594             Max.   :314871            
##                                                                
##     HRS_OPEN       F_HRS_OP      VISITS          F_VISITS 
##  Min.   :   1883   IF17:21   Min.   :    67848   IF17:28  
##  1st Qu.: 270776   R_17:33   1st Qu.:  5952298   R_17:26  
##  Median : 501157             Median : 16269185            
##  Mean   : 691125             Mean   : 24440815            
##  3rd Qu.: 928899             3rd Qu.: 31944010            
##  Max.   :2482431             Max.   :151058515            
##                                                           
##     REFERENC        F_REFER       REGBOR         F_REGBOR 
##  Min.   :     418   IA17: 1   Min.   :   14929   IF17:24  
##  1st Qu.:  755533   IF17:36   1st Qu.:  815876   R_17:30  
##  Median : 2609746   R_17:17   Median : 2114880            
##  Mean   : 4449827             Mean   : 3197103            
##  3rd Qu.: 5086460             3rd Qu.: 3802074            
##  Max.   :26644064             Max.   :22813619            
##                                                           
##      TOTCIR             KIDCIRCL        F_KIDCIR     ELMATCIR       
##  Min.   :    18155   Min.   :   11481   IF17:29   Min.   :       0  
##  1st Qu.:  6679753   1st Qu.: 2374436   R_17:25   1st Qu.:  754897  
##  Median : 24675570   Median : 9168036             Median : 2967962  
##  Mean   : 40072886   Mean   :14211448             Mean   : 4721746  
##  3rd Qu.: 55453038   3rd Qu.:21096745             3rd Qu.: 5966432  
##  Max.   :208171464   Max.   :85181814             Max.   :21842312  
##                                                                     
##     PHYSCIR              ELINFO              ELCONT         
##  Min.   :       -1   Min.   :       -1   Min.   :        0  
##  1st Qu.:  6008730   1st Qu.:  1487888   1st Qu.:  2746017  
##  Median : 22552907   Median :  4716450   Median :  7874830  
##  Mean   : 35357630   Mean   : 14886733   Mean   : 19479759  
##  3rd Qu.: 50820808   3rd Qu.: 10657050   3rd Qu.: 18551435  
##  Max.   :186369124   Max.   :326882313   Max.   :327614571  
##                                                             
##     TOTCOLL              LOANTO         F_LOANTO      LOANFM        
##  Min.   :        0   Min.   :       0   IF17:30   Min.   :       0  
##  1st Qu.:  9686763   1st Qu.:   41038   R_17:24   1st Qu.:   44351  
##  Median : 28705466   Median :  253296             Median :  264258  
##  Mean   : 53438261   Mean   : 1192748             Mean   : 1187276  
##  3rd Qu.: 65707292   3rd Qu.:  785919             3rd Qu.:  772888  
##  Max.   :332931529   Max.   :10494552             Max.   :10311967  
##                                                                     
##  F_LOANFM      TOTPRO       F_TOTPRO      KIDPRO       F_KIDPRO 
##  IF17:29   Min.   :    30   IF17:25   Min.   :    20   IF17:24  
##  R_17:25   1st Qu.: 28246   R_17:29   1st Qu.: 14876   R_17:30  
##            Median : 67359             Median : 38897            
##            Mean   :100432             Mean   : 55088            
##            3rd Qu.:119211             3rd Qu.: 69221            
##            Max.   :640908             Max.   :270054            
##                                                                 
##      YAPRO       F_YAPRO      TOTATTEN        F_TOTATT     KIDATTEN      
##  Min.   :    0   IF17:25   Min.   :    5562   IF17:23   Min.   :   4695  
##  1st Qu.: 2954   R_17:29   1st Qu.:  497408   R_17:31   1st Qu.: 349582  
##  Median : 7306             Median : 1621562             Median :1047597  
##  Mean   : 9832             Mean   : 2196522             Mean   :1467218  
##  3rd Qu.:11670             3rd Qu.: 2856180             3rd Qu.:1944277  
##  Max.   :67178             Max.   :11121961             Max.   :7687968  
##                                                                          
##  F_KIDATT     YAATTEN       F_YAATT      GPTERMS      F_GPTERM 
##  IF17:24   Min.   :     0   IF17:25   Min.   :   20   IF17:23  
##  R_17:30   1st Qu.: 35486   R_17:29   1st Qu.: 1407   R_17:31  
##            Median :112205             Median : 4367            
##            Mean   :151418             Mean   : 5576            
##            3rd Qu.:195825             3rd Qu.: 7272            
##            Max.   :901005             Max.   :23489            
##                                                                
##      PITUSR         F_PITUSR     WIFISESS              STARTDAT 
##  Min.   :    6889   IF17:34   Min.   :    1432   01/01/2016: 4  
##  1st Qu.:  982062   R_17:20   1st Qu.:  636738   01/01/2017:13  
##  Median : 3448202             Median : 3132385   02/01/2016: 1  
##  Mean   : 4783887             Mean   : 6219529   04/01/2016: 2  
##  3rd Qu.: 6010960             3rd Qu.: 8042690   07/01/2016:24  
##  Max.   :27219883             Max.   :42721358   10/01/2016: 8  
##                                                  12/01/2015: 2  
##        ENDDATE      INCITSST         YR_SUB         OBEREG     
##  06/30/2017:24   Min.   : 1.00   Min.   :2017   Min.   :1.000  
##  09/30/2017: 9   1st Qu.:17.25   1st Qu.:2017   1st Qu.:3.000  
##  10/31/2017: 1   Median :30.50   Median :2017   Median :5.000  
##  12/31/2017:20   Mean   :30.96   Mean   :2017   Mean   :4.722  
##                  3rd Qu.:44.75   3rd Qu.:2017   3rd Qu.:6.750  
##                  Max.   :69.00   Max.   :2017   Max.   :9.000  
## 

4. Look at the first few records of the dataframe called publiclibrary.

head(publiclibrary)
##   STABR POPU_LSA F_POPLSA POPU_UND  POPU_ST CENTLIB F_CENLIB BRANLIB
## 1    AK   649636     R_17   649636   737080      79     R_17      16
## 2    AL  4858979     R_17  4858979  4858979     222     R_17      73
## 3    AR  2925255     R_17  2643928  2915918      54     R_17     181
## 4    AS    60300     R_17    60300    60300       1     R_17       1
## 5    AZ 10986792     R_17  6965897  6965897      85     R_17     136
## 6    CA 39507301     R_17 39507301 39523613     165     R_17     959
##   F_BRLIB BKMOB F_BKMOB  MASTER F_MASTER LIBRARIA F_LIBRAR OTHPAID
## 1    R_17     1    R_17   56.90     R_17   105.31     R_17  195.27
## 2    R_17    15    R_17  334.38     R_17   761.26     R_17  948.56
## 3    R_17     3    R_17  154.02     IF17   327.70     IF17  771.01
## 4    R_17     0    R_17    1.00     R_17     1.00     R_17    9.00
## 5    R_17    11    R_17  463.20     IF17   572.64     IF17 1570.02
## 6    R_17    50    R_17 3178.02     IF17  3405.90     IF17 8645.04
##   F_OTHSTF TOTSTAFF F_TOTSTF     LOCGVT F_LOCGVT    STGVT F_STGVT  FEDGVT
## 1     R_17   300.58     R_17   35535907     R_17   967811    R_17  760297
## 2     R_17  1709.82     R_17   95284671     R_17  4289664    R_17  884759
## 3     IF17  1098.71     IF17   71811926     IF17  5093835    IF17    3198
## 4     R_17    10.00     R_17          0     R_17   280252    R_17  116264
## 5     IF17  2142.66     IF17  181981565     IF17  1359870    IF17 1885210
## 6     IF17 12050.94     IF17 1466205141     IF17 16015943    IF17 3450577
##   F_FEDGVT  OTHINCM F_OTHINC    TOTINCM F_TOTINC  SALARIES F_SALX
## 1     R_17  1086505     R_17   38350520     R_17  14825147   IF17
## 2     R_17  9561929     R_17  110021023     R_17  56852173   R_17
## 3     IF17  7001346     IF17   83910305     IF17  36971331   IF17
## 4     R_17    11470     R_17     407986     R_17    190298   R_17
## 5     IF17  6529775     IF17  191756420     IF17  79923881   IF17
## 6     IF17 71094198     IF17 1556765859     IF17 630583575   IF17
##     BENEFIT F_BENX  STAFFEXP F_TOSTFX PRMATEXP F_PRMATX ELMATEXP F_ELMATX
## 1   8817590   IF17  23642737     IF17  1766174     R_17   590787     IF17
## 2  15207301   R_17  72059474     R_17  6236182     R_17  2078897     R_17
## 3  11420197   IF17  48391528     IF17  5218162     IF17  2662117     IF17
## 4     18774   R_17    209072     R_17     2772     R_17        0     R_17
## 5  30807861   IF17 110731742     IF17 15148150     IF17  8132366     IF17
## 6 314906940   IF17 945490515     IF17 73298018     IF17 40301843     IF17
##   OTHMATEX F_OTMATX  TOTEXPCO F_TOCOLX  OTHOPEXP F_OTHOPX   TOTOPEXP
## 1    83053     IF17   2440014     R_17  10232067     R_17   36314818
## 2  2156596     R_17  10471675     R_17  20920399     R_17  103451548
## 3  1831950     IF17   9712229     IF17  19591119     IF17   77694876
## 4    11614     R_17     14386     R_17    130493     R_17     353951
## 5  5337908     IF17  28618424     IF17  41277932     IF17  180628098
## 6 20652600     IF17 134252461     IF17 408888807     IF17 1488631783
##   F_TOTOPX LCAP_REV F_LCAPRV SCAP_REV F_SCAPRV FCAP_REV F_FCAPRV OCAP_REV
## 1     R_17    88549     R_17   882017     R_17        0     R_17   543731
## 2     R_17  5584461     R_17        0     R_17    92425     R_17   282280
## 3     IF17 25424513     IF17   177084     IF17        0     IF17  2423208
## 4     R_17        0     R_17        0     R_17        0     R_17        0
## 5     IF17  7496725     IF17    40896     IF17        0     IF17   225762
## 6     IF17 68435601     IF17    22454     IF17  1067098     IF17 11962515
##   F_OCAPRV  CAP_REV F_TCAPRV  CAPITAL F_TCAPX    BKVOL    EBOOK F_EBOOK
## 1     R_17  1514297     R_17 12964531    R_17  2241681  1190695    R_17
## 2     R_17  5959166     R_17  7740222    R_17  8835816  4551754    R_17
## 3     IF17 28024805     IF17 13268186    IF17  6078553  1426355    IF17
## 4     R_17        0     R_17        0    R_17    43208     1265    R_17
## 5     IF17  7763383     IF17  6039281    IF17  7585916  4377983    R_17
## 6     IF17 81487668     IF17 81808077    IF17 62110372 13594513    IF17
##   AUDIO_PH F_AUD_PH AUDIO_DL F_AUD_DL VIDEO_PH F_VID_PH VIDEO_DL F_VID_DL
## 1   117588     IF17   899185     R_17   301449     R_17    15039     R_17
## 2   435091     R_17  1610947     IF17   674779     IF17   114098     IF17
## 3   256759     IF17  1488248     IF17   526151     IF17    75191     IF17
## 4      895     R_17        0     R_17      337     R_17        0     R_17
## 5   592002     IF17  4534761     IF17  1049065     IF17   285018     IF17
## 6  3328908     IF17 17262208     IF17  5317349     IF17   723077     IF17
##   EC_LO_OT F_EC_L_O EC_ST F_EC_ST ELECCOLL F_ELECOL SUBSCRIP F_PRSUB
## 1      356     R_17  3794    R_17     4150     R_17     4397    R_17
## 2     1077     R_17 13603    R_17    14680     R_17     8543    R_17
## 3      369     IF17  3120    R_17     3489     IF17     9287    IF17
## 4        0     R_17    41    R_17       41     R_17       63    R_17
## 5      646     IF17  2970    R_17     3616     IF17    17410    IF17
## 6     4143     IF17     6    IF17     4149     IF17    86709    IF17
##   HRS_OPEN F_HRS_OP    VISITS F_VISITS REFERENC F_REFER   REGBOR F_REGBOR
## 1   143746     R_17   3403598     IF17   356776    R_17   368512     R_17
## 2   628602     R_17  14583055     R_17  4114709    IF17  2620475     R_17
## 3   322503     IF17  10358181     IF17  2664467    IF17  1665795     IF17
## 4     2129     R_17     67848     R_17      418    R_17    14929     R_17
## 5   486826     IF17  24588717     IF17  5177089    IF17  3238665     IF17
## 6  2386702     IF17 151058515     IF17 18953587    IF17 22813619     IF17
##      TOTCIR KIDCIRCL F_KIDCIR ELMATCIR   PHYSCIR   ELINFO   ELCONT
## 1   4725553  1499911     IF17   608351   4117202   213926   822274
## 2  19974984  6768365     R_17  2654129  17320855  8385998 11040127
## 3  13900209  3918856     IF17  1748493  12151716  4420947  6169440
## 4     18155    11481     R_17       58     18097    40459    40517
## 5  45351653 12427033     IF17  7664214  37682262 14881869 22547840
## 6 208171464 85181814     IF17 21842312 186369124 68631591 89364987
##     TOTCOLL  LOANTO F_LOANTO  LOANFM F_LOANFM TOTPRO F_TOTPRO KIDPRO
## 1   4939476    9403     IF17   10008     IF17  13771     R_17   8873
## 2  28360982  386422     R_17  418634     R_17  51354     R_17  25096
## 3  18321156   27872     IF17   26998     IF17  46138     IF17  26202
## 4     58614       0     R_17       0     R_17    282     R_17    282
## 5  59922415  507669     IF17  485720     IF17  85156     IF17  44026
## 6 254659592 3463927     IF17 3068809     IF17 404614     IF17 242357
##   F_KIDPRO YAPRO F_YAPRO TOTATTEN F_TOTATT KIDATTEN F_KIDATT YAATTEN
## 1     R_17  1520    R_17   266184     R_17   193013     R_17   20407
## 2     R_17  6647    IF17  1273334     R_17   807789     R_17  116403
## 3     IF17  7295    IF17  1136567     IF17   671586     IF17  112913
## 4     R_17     0    R_17     5562     R_17     5562     R_17       0
## 5     IF17  9453    IF17  1708390     IF17  1142827     IF17  139245
## 6     IF17 40791    IF17 10249635     IF17  7687968     IF17  608684
##   F_YAATT GPTERMS F_GPTERM   PITUSR F_PITUSR WIFISESS   STARTDAT
## 1    R_17     983     R_17   617059     IF17   589331 01/01/2016
## 2    IF17    5056     R_17  3729784     IF17  4087113 10/01/2016
## 3    IF17    2811     IF17  1526940     IF17  1623761 01/01/2017
## 4    R_17      20     R_17     6889     R_17     1432 10/01/2016
## 5    IF17    7364     IF17  5577313     IF17 10813345 07/01/2016
## 6    IF17   23489     IF17 27219883     IF17 22950041 07/01/2016
##      ENDDATE INCITSST YR_SUB OBEREG
## 1 06/30/2017        2   2017      8
## 2 09/30/2017        1   2017      5
## 3 12/31/2017        5   2017      5
## 4 09/30/2017       60   2017      9
## 5 06/30/2017        4   2017      6
## 6 06/30/2017        6   2017      8

5. Pull in the tidyverse package to start managing the variables.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages ----------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1     v purrr   0.3.2
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'tidyr' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.3
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'stringr' was built under R version 3.5.3
## Warning: package 'forcats' was built under R version 3.5.3
## -- Conflicts -------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

6. Choose variables to explore in a smaller sub-dataframe.

Rename the STABR variable to be STATE.
publiclibrarysub <- publiclibrary %>%
  select(STABR, POPU_LSA, CENTLIB, BRANLIB, BKMOB, TOTSTAFF, STAFFEXP, BKVOL, EBOOK, AUDIO_PH, AUDIO_DL, HRS_OPEN, VISITS, REFERENC, REGBOR, TOTCIR, TOTCOLL, GPTERMS, PITUSR, WIFISESS, STARTDAT, ENDDATE, YR_SUB, OBEREG) 

publiclibrarysub <- rename(publiclibrarysub, STATE = STABR) 

head(publiclibrarysub)
##   STATE POPU_LSA CENTLIB BRANLIB BKMOB TOTSTAFF  STAFFEXP    BKVOL
## 1    AK   649636      79      16     1   300.58  23642737  2241681
## 2    AL  4858979     222      73    15  1709.82  72059474  8835816
## 3    AR  2925255      54     181     3  1098.71  48391528  6078553
## 4    AS    60300       1       1     0    10.00    209072    43208
## 5    AZ 10986792      85     136    11  2142.66 110731742  7585916
## 6    CA 39507301     165     959    50 12050.94 945490515 62110372
##      EBOOK AUDIO_PH AUDIO_DL HRS_OPEN    VISITS REFERENC   REGBOR
## 1  1190695   117588   899185   143746   3403598   356776   368512
## 2  4551754   435091  1610947   628602  14583055  4114709  2620475
## 3  1426355   256759  1488248   322503  10358181  2664467  1665795
## 4     1265      895        0     2129     67848      418    14929
## 5  4377983   592002  4534761   486826  24588717  5177089  3238665
## 6 13594513  3328908 17262208  2386702 151058515 18953587 22813619
##      TOTCIR   TOTCOLL GPTERMS   PITUSR WIFISESS   STARTDAT    ENDDATE
## 1   4725553   4939476     983   617059   589331 01/01/2016 06/30/2017
## 2  19974984  28360982    5056  3729784  4087113 10/01/2016 09/30/2017
## 3  13900209  18321156    2811  1526940  1623761 01/01/2017 12/31/2017
## 4     18155     58614      20     6889     1432 10/01/2016 09/30/2017
## 5  45351653  59922415    7364  5577313 10813345 07/01/2016 06/30/2017
## 6 208171464 254659592   23489 27219883 22950041 07/01/2016 06/30/2017
##   YR_SUB OBEREG
## 1   2017      8
## 2   2017      5
## 3   2017      5
## 4   2017      9
## 5   2017      6
## 6   2017      8
That is still a lot of variables but it is getting better!
The variables in this library dataset include mostly continuous variables with 1 categorical variable (OBEREG). The OBEREG variable represents the Bureau of Economic Analysis (BEA) region code to which that State belongs. The US is divided into 8 regions: 01 New England, 02 Mid East, 03 Great Lakes, 04 Plains, 05 South East, 06 South West, 07 Rocky Mountains, and 08 Far West.
Initially, I wanted to create another categorical variable by grouping the number of staff members (full-time equivalent) in ranges of 1,000. I started doing this on step 9 below, grouping them into “few” (less than 1,000 FTEs), “moderate” (1,000-1,999 FTEs), “high” (2,000-2,999 FTEs), and “veryhigh” (3,000+). I used the mutate command to create new columns indicating the staffing level for each State. I did not end up using this information in my final visualization (final viz is step 15).
As for cleaning up the dataset, the variables were all named correctly (no spaces or symbols) and were mostly integers so I did not change them but only renamed the STABR variable to be STATE as that made more sense to me (in step 6 above). There are dates within the dataset but these simply represent the range of 12-months of data which that State reported. The PLS uses 12-months of data but this does not equate to a particular calendar year so these date variables indicate the 12-month range used by that State.
I used select and filter commands to group the records within sub-dataframes. I used the arrange command to sort the records descending.

7. View the dimensions of the sub-dataframe called publiclibrarysub.

dim(publiclibrarysub)
## [1] 54 24

8. Filter out the territories and retain the 51 States in a dataframe called state.

state <- filter(publiclibrarysub, STATE != "AS", STATE != "GU", STATE != "MP")

head(state)
##   STATE POPU_LSA CENTLIB BRANLIB BKMOB TOTSTAFF  STAFFEXP    BKVOL
## 1    AK   649636      79      16     1   300.58  23642737  2241681
## 2    AL  4858979     222      73    15  1709.82  72059474  8835816
## 3    AR  2925255      54     181     3  1098.71  48391528  6078553
## 4    AZ 10986792      85     136    11  2142.66 110731742  7585916
## 5    CA 39507301     165     959    50 12050.94 945490515 62110372
## 6    CO  5505446      96     165    13  3252.12 191787325  9517379
##      EBOOK AUDIO_PH AUDIO_DL HRS_OPEN    VISITS REFERENC   REGBOR
## 1  1190695   117588   899185   143746   3403598   356776   368512
## 2  4551754   435091  1610947   628602  14583055  4114709  2620475
## 3  1426355   256759  1488248   322503  10358181  2664467  1665795
## 4  4377983   592002  4534761   486826  24588717  5177089  3238665
## 5 13594513  3328908 17262208  2386702 151058515 18953587 22813619
## 6  2815454   810860  2277173   637132  31975615  3679129  3624818
##      TOTCIR   TOTCOLL GPTERMS   PITUSR WIFISESS   STARTDAT    ENDDATE
## 1   4725553   4939476     983   617059   589331 01/01/2016 06/30/2017
## 2  19974984  28360982    5056  3729784  4087113 10/01/2016 09/30/2017
## 3  13900209  18321156    2811  1526940  1623761 01/01/2017 12/31/2017
## 4  45351653  59922415    7364  5577313 10813345 07/01/2016 06/30/2017
## 5 208171464 254659592   23489 27219883 22950041 07/01/2016 06/30/2017
## 6  62438667  73670109    6413  6490654  4004762 01/01/2017 12/31/2017
##   YR_SUB OBEREG
## 1   2017      8
## 2   2017      5
## 3   2017      5
## 4   2017      6
## 5   2017      8
## 6   2017      7

8. Plot the population against the total collection circulation for each state.

The labels were showing exponents so I adjusted the scale of the labels to make this more legible.
ggplot(data = state) +
  geom_point(mapping = aes(x = POPU_LSA, y = TOTCOLL)) + 
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(labels = scales::comma)

It looks like there is a correlation between population and the total collection circulation, which makes sense, although the outlier is weird. I could explore that more fully but will leave it for now.

9. Plot TOTSTAFF to identify how to break that variable into categorical variables. The TOTSTAFF variable represents the number of full-time equivalent (FTE) staff that are employed by that State’s libraries.

ggplot(data = state) +
  geom_point(mapping = aes(y = TOTSTAFF, x = STATE))

While the States are not legible, it looks like I can break the TOTSTAFF numerical variable into 4 categories in increments of 1,000: few, moderate, high, veryhigh.

10. Segment the TOTSTAFF variable. Use the arrange command to sort descending by TOTSTAFF (FTEs).

staffinglevels <- state %>% 
  mutate(few = (TOTSTAFF >0 & TOTSTAFF <1000), moderate = (TOTSTAFF >=1000 & TOTSTAFF <2000), high = (TOTSTAFF >=3000 & TOTSTAFF <4000), veryhigh = (TOTSTAFF >=4000)) %>%
 arrange(desc(TOTSTAFF))

#This adds columns at the end of my data to show which states are "true" for few staff, or "true" for moderate staff, etc.
#The arrange command sorts the TOTSTAFF column descending.

topstaffing <- head(staffinglevels, n = 10)

#I decided to focus on the highest 10 staffing States instead, so I added n=10 to limit the dataframe for topstaffing.
head(topstaffing)
##   STATE POPU_LSA CENTLIB BRANLIB BKMOB TOTSTAFF  STAFFEXP    BKVOL
## 1    NY 19542753     755     312    11 12822.53 943530658 68128577
## 2    CA 39507301     165     959    50 12050.94 945490515 62110372
## 3    OH 11510467     239     477    56  9110.92 475357681 39581237
## 4    IL 11714654     622     162    17  8758.59 496343452 39946369
## 5    TX 25699219     556     326    11  6916.71 373517642 38410864
## 6    FL 20790357      59     474    21  6340.03 334620281 28147847
##      EBOOK AUDIO_PH AUDIO_DL HRS_OPEN    VISITS REFERENC   REGBOR
## 1 28445930  3533454  6074003  2482431 100012193 26644064 10397405
## 2 13594513  3328908 17262208  2386702 151058515 18953587 22813619
## 3 81436463  3367638 33758449  1927239  71895854 17735110  8440127
## 4 21830848  2835940  7855094  1921739  66174632 10261005  4995089
## 5 10331800  1954831  5663867  1935320  70521177 13528644 14320123
## 6  5201706  1880107 18368080  1275478  66753349 21779507 11640505
##      TOTCIR   TOTCOLL GPTERMS   PITUSR WIFISESS   STARTDAT    ENDDATE
## 1 127222803 140044443   20973 17700652 15488574 04/01/2016 12/31/2017
## 2 208171464 254659592   23489 27219883 22950041 07/01/2016 06/30/2017
## 3 170442595 191426990   14305 15131182 19547124 01/01/2017 12/31/2017
## 4 107710331  91514945   15834 11617964 20971054 12/01/2015 06/30/2017
## 5 117304751 220885344   21824 13823532 42721358 02/01/2016 12/31/2017
## 6  99660103 110259833   16813 12529944 10257484 10/01/2016 09/30/2017
##   YR_SUB OBEREG   few moderate  high veryhigh
## 1   2017      2 FALSE    FALSE FALSE     TRUE
## 2   2017      8 FALSE    FALSE FALSE     TRUE
## 3   2017      3 FALSE    FALSE FALSE     TRUE
## 4   2017      3 FALSE    FALSE FALSE     TRUE
## 5   2017      6 FALSE    FALSE FALSE     TRUE
## 6   2017      5 FALSE    FALSE FALSE     TRUE
The States with the highest staffing levels are NY, CA, OH, IL, TX, and Fl.
While the step above makes the numerical staffing total into a new categorical variable, I will hold on further staffing analysis of that for the moment.

11. Let’s looks at State populations. I’ll do this by using the arrange command to sort the dataframe, descending, by the population variable and assign the highest population States to a toppopulation dataframe.

populationlevels <- state %>%
  arrange(desc(POPU_LSA))
toppopulation <- head(populationlevels, n = 10)
#I decided to focus on the highest 10 population States so I added n=10 to limit the dataframe for toppopulation.
head(toppopulation)
##   STATE POPU_LSA CENTLIB BRANLIB BKMOB TOTSTAFF  STAFFEXP    BKVOL
## 1    CA 39507301     165     959    50 12050.94 945490515 62110372
## 2    TX 25699219     556     326    11  6916.71 373517642 38410864
## 3    FL 20790357      59     474    21  6340.03 334620281 28147847
## 4    NY 19542753     755     312    11 12822.53 943530658 68128577
## 5    PA 12427301     449     172    23  4669.93 245061274 24237863
## 6    IL 11714654     622     162    17  8758.59 496343452 39946369
##      EBOOK AUDIO_PH AUDIO_DL HRS_OPEN    VISITS REFERENC   REGBOR
## 1 13594513  3328908 17262208  2386702 151058515 18953587 22813619
## 2 10331800  1954831  5663867  1935320  70521177 13528644 14320123
## 3  5201706  1880107 18368080  1275478  66753349 21779507 11640505
## 4 28445930  3533454  6074003  2482431 100012193 26644064 10397405
## 5 26462269  2115303 46995879  1484972  42251845  7275836  5328788
## 6 21830848  2835940  7855094  1921739  66174632 10261005  4995089
##      TOTCIR   TOTCOLL GPTERMS   PITUSR WIFISESS   STARTDAT    ENDDATE
## 1 208171464 254659592   23489 27219883 22950041 07/01/2016 06/30/2017
## 2 117304751 220885344   21824 13823532 42721358 02/01/2016 12/31/2017
## 3  99660103 110259833   16813 12529944 10257484 10/01/2016 09/30/2017
## 4 127222803 140044443   20973 17700652 15488574 04/01/2016 12/31/2017
## 5  61829907  77918664    8127  6072550  8110110 01/01/2017 12/31/2017
## 6 107710331  91514945   15834 11617964 20971054 12/01/2015 06/30/2017
##   YR_SUB OBEREG
## 1   2017      8
## 2   2017      6
## 3   2017      5
## 4   2017      2
## 5   2017      2
## 6   2017      3
The States with the highest population levels are CA, TX, FL, NY, PA, and IL.

12. Let’s look at total collection circulation of materials. Similar to toppopulation, I will use the arrange command to sort the dataframe, descending, by the total collection circulation variable and assign the highest collection circulation States to a topcollcirculation dataframe.

circulationlevels <- state %>%
  arrange(desc(TOTCOLL))
topcollcirculation <- head(circulationlevels, n = 10)
#I decided to focus on the highest 10 collection circulation States so I added n=10 to limit the dataframe for topcollcirculation.
head(topcollcirculation)
##   STATE POPU_LSA CENTLIB BRANLIB BKMOB TOTSTAFF  STAFFEXP    BKVOL
## 1    RI  1424662      47      24     2   645.18  37440495  3461135
## 2    CA 39507301     165     959    50 12050.94 945490515 62110372
## 3    TX 25699219     556     326    11  6916.71 373517642 38410864
## 4    OH 11510467     239     477    56  9110.92 475357681 39581237
## 5    NY 19542753     755     312    11 12822.53 943530658 68128577
## 6    FL 20790357      59     474    21  6340.03 334620281 28147847
##      EBOOK AUDIO_PH AUDIO_DL HRS_OPEN    VISITS REFERENC   REGBOR
## 1  2593546   152231   877905   159805   5669309   589418   443396
## 2 13594513  3328908 17262208  2386702 151058515 18953587 22813619
## 3 10331800  1954831  5663867  1935320  70521177 13528644 14320123
## 4 81436463  3367638 33758449  1927239  71895854 17735110  8440127
## 5 28445930  3533454  6074003  2482431 100012193 26644064 10397405
## 6  5201706  1880107 18368080  1275478  66753349 21779507 11640505
##      TOTCIR   TOTCOLL GPTERMS   PITUSR WIFISESS   STARTDAT    ENDDATE
## 1   6049216 332931529    1480   972285   426930 07/01/2016 06/30/2017
## 2 208171464 254659592   23489 27219883 22950041 07/01/2016 06/30/2017
## 3 117304751 220885344   21824 13823532 42721358 02/01/2016 12/31/2017
## 4 170442595 191426990   14305 15131182 19547124 01/01/2017 12/31/2017
## 5 127222803 140044443   20973 17700652 15488574 04/01/2016 12/31/2017
## 6  99660103 110259833   16813 12529944 10257484 10/01/2016 09/30/2017
##   YR_SUB OBEREG
## 1   2017      1
## 2   2017      8
## 3   2017      6
## 4   2017      3
## 5   2017      2
## 6   2017      5
The States with the highest collection circulation levels are RI, CA, TX, OH, NY, and FL.

13. Let’s plot the total collection circulation against total population for the top 10 populated States.

popcirplot <- ggplot(toppopulation, aes(x = POPU_LSA,y = TOTCOLL, color = STATE)) +
  geom_point()
popcirplot + ggtitle("State Population and Library Total Collection Circulation") + xlab("Population") + ylab("Collection Circulation") + labs(fill = "State") +
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(labels = scales::comma)

#Again, I added the scaling to make the axis tick labels display legibly.
This point plot looks ok but let’s keep going.

14. Let’s look at the total registered users for the States. I’ll use the arrange command and sort descending to collect the States with top registered users into a dataframe call topreguser.

registereduserlevels <- state %>%
  arrange(desc(REGBOR))
topreguser <- head(registereduserlevels, n = 10)
#I decided to focus on the highest 10 registered user States so I added n=10 to limit the dataframe for topreguser.
head(topreguser)
##   STATE POPU_LSA CENTLIB BRANLIB BKMOB TOTSTAFF  STAFFEXP    BKVOL
## 1    CA 39507301     165     959    50 12050.94 945490515 62110372
## 2    TX 25699219     556     326    11  6916.71 373517642 38410864
## 3    FL 20790357      59     474    21  6340.03 334620281 28147847
## 4    NY 19542753     755     312    11 12822.53 943530658 68128577
## 5    OH 11510467     239     477    56  9110.92 475357681 39581237
## 6    NC 10155942      70     319    20  3059.03 157518816 15526913
##      EBOOK AUDIO_PH AUDIO_DL HRS_OPEN    VISITS REFERENC   REGBOR
## 1 13594513  3328908 17262208  2386702 151058515 18953587 22813619
## 2 10331800  1954831  5663867  1935320  70521177 13528644 14320123
## 3  5201706  1880107 18368080  1275478  66753349 21779507 11640505
## 4 28445930  3533454  6074003  2482431 100012193 26644064 10397405
## 5 81436463  3367638 33758449  1927239  71895854 17735110  8440127
## 6  6418337   666565  1002503   944993  32621293  6979972  5647452
##      TOTCIR   TOTCOLL GPTERMS   PITUSR WIFISESS   STARTDAT    ENDDATE
## 1 208171464 254659592   23489 27219883 22950041 07/01/2016 06/30/2017
## 2 117304751 220885344   21824 13823532 42721358 02/01/2016 12/31/2017
## 3  99660103 110259833   16813 12529944 10257484 10/01/2016 09/30/2017
## 4 127222803 140044443   20973 17700652 15488574 04/01/2016 12/31/2017
## 5 170442595 191426990   14305 15131182 19547124 01/01/2017 12/31/2017
## 6  49231011  57826436    7611  5938721  4234491 07/01/2016 06/30/2017
##   YR_SUB OBEREG
## 1   2017      8
## 2   2017      6
## 3   2017      5
## 4   2017      2
## 5   2017      3
## 6   2017      5
The States with the highest number of registered users are CA, TX, FL, NY, OH, and NC.

15. I want to compare the number of registered users to the State population for the top registered user States. I will do this using geom_point and will add titles, labels, and color.

popregplot <- ggplot(topreguser, aes(x = POPU_LSA, y = REGBOR, color = STATE)) +
  geom_point(aes(size = 3)) 

popregplot + ggtitle("State Population and Number of Registered Users") + xlab("Population") + ylab("Registered Users") +
  scale_y_continuous(labels = scales::comma) + 
  scale_x_continuous(labels = scales::comma)

#Again, I added the scaling to make the axis ticks display legibly.
#I made the bubble size 3 so the points were more visible.
It is not a surprise that as population increases, the number of registered users increases.

This visualization helps us see several aspects of the data. It is clear that there is a correlation between the population of a State and the number of registered public library users, with California having the highest population and highest number of registered users. This visualization shows us that California is far ahead of the 2nd most populated and registered-user State, Texas.

The visualization also shows a grouping of similarly populated States with similar numbers of registered users. Ohio appears to be an anomaly so that is one State that would be great to investigate further. Additional items I would like to explore are the relationship of population to number of libraries in the State and the number of visits compared to the number of computer terminals and wifi sessions by State. As a bonus, I would like to take the PLS data from other years and try combining the datasets to look at change over time.