1. Choose and load any R dataset (except for diamonds!) that has at least two numeric variables and at least two categorical variables. Identify which variables your data set are numeric, and which are categorical (factors).

Dataset Description:

The Student Weight Status Category Reporting System (SWSCR) collects weight status category data (underweight, healthy weight, overweight or obese, based on BMI-for-age percentile). The dataset includes separate estimates of the percent of students overweight, obese and overweight or obese for all reportable grades within the county and/or region and by grade groups (elementary and middle/high). The rates of overweight and obesity reported are percentages based on counts of students in selected grades (Pre-K, K, 2, 4, 7, 10) reported to the NYSDOH. Because these rates reflect a broad range of factors that vary by school district, to make comparisons about observed differences in the rates of obesity and overweight between school districts requires the use of multivariate statistics.

require("knitr")
## Loading required package: knitr
opts_chunk$set(cache=FALSE)
knitr::opts_chunk$set(cache=FALSE)
require(ggplot2)
## Loading required package: ggplot2
students_wt <- read.csv(file="C:/CUNY/R/Student_Weight_2010.csv",head=TRUE)

#head(students_wt)
str(students_wt)
## 'data.frame':    3270 obs. of  18 variables:
##  $ LOCATION.CODE          : int  10402 10402 10402 10500 10500 10500 10601 10601 10601 10623 ...
##  $ COUNTY                 : Factor w/ 59 levels "ALBANY","ALLEGANY",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ AREA.NAME              : Factor w/ 745 levels "ADDISON CENTRAL SCHOOL",..: 540 540 540 132 132 132 617 617 617 453 ...
##  $ REGION                 : Factor w/ 8 levels "CENTRAL NEW YORK",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ SCHOOL.YEARS           : Factor w/ 2 levels "2010-2012","2012-2013": 2 2 2 2 2 2 2 2 2 2 ...
##  $ NO..OVERWEIGHT         : int  124 74 50 84 58 26 231 145 86 237 ...
##  $ PCT.OVERWEIGHT         : Factor w/ 244 levels "","1.60%","10%",..: 89 96 80 76 65 106 63 78 43 67 ...
##  $ NO..OBESE              : int  139 72 67 124 89 35 262 169 95 225 ...
##  $ PCT.OBESE              : Factor w/ 323 levels "","1.30%","1.60%",..: 113 92 141 159 152 176 86 108 58 60 ...
##  $ NO..OVERWEIGHT.OR.OBESE: int  263 146 117 208 147 61 493 314 181 462 ...
##  $ PCT.OVERWEIGHT.OR.OBESE: Factor w/ 416 levels "","10.40%","10.70%",..: 254 240 274 287 270 334 201 239 152 179 ...
##  $ GRADE.LEVEL            : Factor w/ 3 levels "DISTRICT TOTAL",..: 1 2 3 1 2 3 1 2 3 1 ...
##  $ AREA.TYPE              : Factor w/ 4 levels "COUNTY","REGION",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ STREET.ADDRESS         : Factor w/ 694 levels "","1 ACADEMY ST",..: 162 162 162 598 598 598 46 46 46 664 ...
##  $ CITY                   : Factor w/ 637 levels "","ACCORD","ADAMS CENTER",..: 474 474 474 123 123 123 7 7 7 311 ...
##  $ STATE                  : Factor w/ 2 levels "","NY": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ZIP.CODE               : int  12143 12143 12143 12047 12047 12047 12205 12205 12205 12110 ...
##  $ Location.1             : Factor w/ 1083 levels "","(40.588, -73.6284)",..: 814 814 814 1039 1039 1039 759 759 759 1069 ...
#summary(students_wt)
  1. Generate summary level descriptive statistics: Show the mean, median, 25th and 75th quartiles, min, and max for each of the applicable variables in your data set.
summary(students_wt)
##  LOCATION.CODE            COUNTY    
##  Min.   :     0   SUFFOLK    : 315  
##  1st Qu.:141501   NASSAU     : 258  
##  Median :350000   WESTCHESTER: 183  
##  Mean   :345143   ERIE       : 129  
##  3rd Qu.:572702   CHAUTAUQUA :  84  
##  Max.   :700000   ONONDAGA   :  84  
##                   (Other)    :2217  
##                                  AREA.NAME                      REGION   
##  AKRON CENTRAL SCHOOL                 :   6   NORTHEASTERN NEW YORK:711  
##  ALBION CENTRAL SCHOOL                :   6   NASSAU-SUFFOLK       :576  
##  ALDEN CENTRAL SCHOOL                 :   6   CENTRAL NEW YORK     :549  
##  ALEXANDRIA CENTRAL ELEM. & HIGH SCHO :   6   HUDSON VALLEY        :486  
##  ALFRED ALMOND CENTRAL SCHOOL DISTRICT:   6   WESTERN NEW YORK     :474  
##  ALLEGANY-LIMESTONE CSD               :   6   FINGER LAKES         :342  
##  (Other)                              :3234   (Other)              :132  
##     SCHOOL.YEARS  NO..OVERWEIGHT    PCT.OVERWEIGHT   NO..OBESE      
##  2010-2012:2235   Min.   :    5.0          : 192   Min.   :    5.0  
##  2012-2013:1035   1st Qu.:   27.0   16.20% :  58   1st Qu.:   29.0  
##                   Median :   55.0   16.70% :  55   Median :   57.0  
##                   Mean   :  228.8   15.60% :  51   Mean   :  247.7  
##                   3rd Qu.:  114.8   15.50% :  49   3rd Qu.:  118.0  
##                   Max.   :77813.0   15.90% :  47   Max.   :84578.0  
##                   NA's   :192       (Other):2818   NA's   :198      
##    PCT.OBESE    NO..OVERWEIGHT.OR.OBESE PCT.OVERWEIGHT.OR.OBESE
##         : 198   Min.   :     5.00              : 146           
##  19.20% :  30   1st Qu.:    55.75       33.30% :  34           
##  17.30% :  28   Median :   111.00       34.80% :  27           
##  21.30% :  28   Mean   :   469.00       32%    :  24           
##  13.80% :  27   3rd Qu.:   230.00       32.20% :  24           
##  22.20% :  27   Max.   :162391.00       35.10% :  24           
##  (Other):2932   NA's   :146             (Other):2991           
##          GRADE.LEVEL                       AREA.TYPE   
##  DISTRICT TOTAL:1090   COUNTY                   : 171  
##  ELEMENTARY    :1090   REGION                   :  21  
##  MIDDLE/HIGH   :1090   SCHOOL DISTRICT          :3075  
##                        STATEWIDE (EXCLUDING NYC):   3  
##                                                        
##                                                        
##                                                        
##            STREET.ADDRESS            CITY      STATE        ZIP.CODE    
##                   : 195                : 195     : 195   Min.   : 6390  
##  1 ACADEMY ST     :  12   ROCHESTER    :  30   NY:3075   1st Qu.:11937  
##  26 INSTITUTE ST  :   9   BINGHAMTON   :  18             Median :12953  
##  1 C B GARIEPY AVE:   6   SCHENECTADY  :  18             Mean   :12898  
##  1 CATHERINE ST   :   6   VALLEY STREAM:  18             3rd Qu.:14001  
##  1 DICKINSON ST   :   6   TROY         :  15             Max.   :14905  
##  (Other)          :3036   (Other)      :2976             NA's   :195    
##                Location.1  
##                     :  24  
##  (40.588, -73.6284) :   3  
##  (40.6077, -73.6471):   3  
##  (40.6109, -73.7353):   3  
##  (40.6332, -73.7061):   3  
##  (40.6361, -73.6054):   3  
##  (Other)            :3231

3.Determine the frequency for each of one of the categorical variables

table(students_wt$LOCATION.CODE)
## 
##      0      1      2      3      4      5      6      7      8      9 
##      3      3      3      3      3      3      3      3      3      3 
##     10     11     12     13     14     15     16     17     18     19 
##      3      3      3      3      3      3      3      3      3      3 
##     20     21     22     23     24     25     26     27     28     40 
##      3      3      3      3      3      3      3      3      3      3 
##     41     42     43     44     45     46     47     48     49     50 
##      3      3      3      3      3      3      3      3      3      3 
##     51     52     53     54     55     56     57     58     59     60 
##      3      3      3      3      3      3      3      3      3      3 
##     61     62     63     64     65     66     67     68  10100  10201 
##      3      3      3      3      3      3      3      3      3      3 
##  10306  10402  10500  10601  10615  10623  10701  10802  11003  11200 
##      3      6      6      6      3      6      6      6      3      3 
##  20101  20601  20702  20801  21102  21601  22001  22101  22302  22401 
##      6      3      3      3      6      6      3      6      3      3 
##  22601  22902  30101  30200  30501  30601  30701  31101  31301  31401 
##      6      6      6      6      6      6      6      3      3      3 
##  31501  31502  31601  31701  40204  40302  40901  41101  41401  42302 
##      3      3      3      6      3      6      6      3      6      3 
##  42400  42801  42901  43001  43200  43501  50100  50301  50401  50701 
##      6      3      3      6      6      3      3      6      6      3 
##  51101  51301  51901  60201  60301  60401  60503  60601  60701  60800 
##      3      6      3      3      6      6      3      3      6      6 
##  61001  61101  61501  61503  61601  61700  62201  62301  62401  62601 
##      3      3      6      3      6      3      3      6      3      6 
##  62901  70600  70901  70902  80101  80201  80601  81003  81200  81401 
##      6      6      3      3      3      3      6      6      6      6 
##  81501  82001  90201  90301  90501  90601  90901  91101  91200  91402 
##      3      3      6      6      3      3      6      6      3      3 
## 100000 100501 100902 101001 101300 101401 101601 110101 110200 110304 
##      3      6      3      3      3      3      6      6      3      3 
## 110701 110901 120102 120301 120401 120501 120701 120906 121401 121502 
##      6      6      3      6      3      3      3      6      3      6 
## 121601 121701 121702 121901 130200 130502 130801 131101 131201 131301 
##      3      6      6      6      3      3      3      6      6      3 
## 131500 131601 131602 131701 131801 132101 132201 140101 140201 140203 
##      3      6      6      6      6      3      6      6      3      6 
## 140207 140301 140600 140701 140702 140703 140707 140709 140801 141101 
##      6      3      3      6      3      3      3      3      3      6 
## 141201 141301 141401 141501 141601 141604 141701 141800 141901 142101 
##      6      3      3      6      6      6      3      6      6      6 
## 142201 142301 142500 142601 142801 150203 150301 150601 150801 150901 
##      3      6      3      6      3      3      6      6      3      3 
## 151001 151102 151401 151501 151601 151701 160101 160801 161201 161401 
##      6      3      6      3      3      6      6      3      6      3 
## 161501 161601 161801 170301 170500 170600 170801 170901 171001 171102 
##      3      6      3      3      3      6      3      6      3      6 
## 180202 180300 180701 180901 181001 181101 181201 181302 190301 190401 
##      3      6      3      6      3      6      6      3      6      3 
## 190501 190701 190901 191401 200000 200101 200401 200501 200601 200701 
##      6      3      3      6      3      3      6      3      6      3 
## 200901 210302 210402 210501 210502 210601 210800 211003 211103 211701 
##      3      6      3      6      3      3      6      6      6      3 
## 211901 212001 220101 220202 220301 220401 220701 220909 221001 221301 
##      3      3      6      6      6      3      6      3      3      3 
## 221401 222000 222201 230201 230301 230901 231101 231301 240101 240201 
##      3      3      6      3      6      3      6      6      3      3 
## 240401 240801 240901 241001 241101 241701 250109 250201 250301 250401 
##      6      3      6      6      6      3      6      3      6      6 
## 250701 250901 251101 251400 251501 251601 260101 260401 260501 260801 
##      3      6      3      3      3      6      6      3      3      6 
## 260803 260901 261001 261101 261201 261301 261313 261401 261501 261600 
##      6      3      3      6      3      3      6      3      3      3 
## 261701 261801 261901 262001 270100 270301 270601 270701 271102 280100 
##      6      6      6      3      6      3      3      6      6      6 
## 280201 280202 280203 280204 280205 280206 280207 280208 280209 280210 
##      3      6      6      3      6      6      3      6      6      3 
## 280211 280212 280213 280214 280215 280216 280217 280218 280219 280220 
##      3      3      6      3      6      3      6      3      3      3 
## 280221 280222 280223 280224 280225 280226 280227 280229 280230 280231 
##      6      6      3      3      6      3      6      6      6      3 
## 280251 280252 280253 280300 280401 280402 280403 280404 280405 280406 
##      3      3      3      6      3      3      6      3      6      3 
## 280407 280409 280410 280411 280501 280502 280503 280504 280506 280515 
##      6      6      3      6      6      3      6      6      6      6 
## 280517 280518 280521 280522 280523 300000 400000 400301 400400 400601 
##      6      3      3      3      6      3      3      6      6      3 
## 400701 400800 400900 401001 401201 401301 401501 410401 410601 411101 
##      3      3      3      6      3      6      6      3      3      3 
## 411501 411504 411603 411701 411800 411902 412000 412201 412300 412801 
##      3      6      6      6      6      3      3      6      3      6 
## 412901 412902 420101 420303 420401 420411 420501 420601 420701 420702 
##      3      6      3      3      6      6      3      6      3      6 
## 420807 420901 421001 421101 421201 421501 421504 421601 421800 421902 
##      3      6      6      3      6      3      3      3      6      6 
## 430300 430501 430700 430901 431101 431201 431301 431401 431701 440102 
##      3      3      6      6      6      6      3      3      6      6 
## 440201 440301 440401 440601 440901 441000 441101 441201 441301 441600 
##      6      6      3      3      6      3      6      6      3      6 
## 441800 441903 442101 442111 442115 450101 450607 450704 450801 451001 
##      3      3      6      3      6      6      6      6      3      6 
## 460102 460500 460701 460801 460901 461300 461801 461901 462001 470202 
##      6      6      6      3      3      3      3      6      6      3 
## 470501 470801 470901 471101 471201 471400 471601 471701 472001 472202 
##      6      6      3      3      3      6      3      6      6      3 
## 472506 480101 480102 480401 480404 480503 480601 490101 490202 490301 
##      6      3      6      6      3      3      6      3      6      6 
## 490501 490601 490801 490804 491200 491302 491401 491501 491700 500000 
##      3      3      3      3      3      6      6      6      6      3 
## 500101 500108 500201 500301 500304 500308 500401 500402 510101 510201 
##      3      6      6      6      6      6      3      3      3      3 
## 510401 510501 511101 511201 511301 511602 511901 512001 512101 512201 
##      6      6      3      6      6      3      3      3      6      6 
## 512300 512404 512501 512902 513102 520101 520302 520401 520601 520701 
##      6      6      6      3      3      6      3      6      6      6 
## 521200 521301 521401 521701 521800 522001 522101 530101 530202 530301 
##      3      3      6      3      6      3      3      6      3      6 
## 530501 530515 530600 540801 540901 541001 541102 541201 541401 550101 
##      3      6      3      6      3      6      3      3      6      6 
## 550301 560501 560603 560701 561006 570101 570201 570302 570401 570603 
##      3      3      6      6      3      3      3      6      3      3 
## 571000 571502 571800 571901 572301 572702 572901 573002 580101 580102 
##      3      6      6      6      6      6      6      3      6      3 
## 580103 580104 580105 580106 580107 580109 580201 580203 580205 580206 
##      6      3      3      6      6      3      3      6      6      6 
## 580207 580208 580209 580211 580212 580224 580232 580233 580234 580235 
##      6      3      6      6      3      6      6      6      3      6 
## 580301 580302 580303 580304 580305 580306 580401 580402 580403 580404 
##      3      3      3      3      3      6      6      3      3      6 
## 580405 580406 580410 580413 580501 580502 580503 580504 580505 580506 
##      6      6      3      3      3      6      6      3      3      6 
## 580507 580509 580512 580513 580514 580601 580602 580701 580801 580805 
##      3      6      3      3      6      6      6      6      6      6 
## 580901 580902 580903 580905 580906 580909 580910 580912 580913 580917 
##      3      3      3      3      6      3      3      3      6      6 
## 581002 581004 581005 581010 581012 581015 590501 590801 590901 591201 
##      6      3      3      6      6      6      6      6      3      3 
## 591301 591302 591401 591502 600000 600101 600301 600402 600601 600801 
##      6      6      3      6      3      6      6      3      3      6 
## 600903 610301 610501 610600 610801 610901 611001 620600 620803 620901 
##      3      3      6      6      6      3      3      6      6      3 
## 621001 621101 621201 621601 621801 622002 630101 630202 630300 630601 
##      6      6      3      3      6      3      6      3      6      6 
## 630701 630801 630902 630918 631201 640101 640502 640601 640701 640801 
##      3      3      6      6      3      3      6      6      6      6 
## 641001 641301 641401 641501 641610 641701 650101 650301 650501 650701 
##      3      6      3      3      3      3      3      3      6      6 
## 650801 650901 650902 651201 651402 651501 651503 660101 660102 660202 
##      3      6      3      3      3      6      6      3      6      6 
## 660203 660301 660302 660303 660401 660402 660403 660404 660405 660406 
##      6      3      3      6      3      6      6      6      6      3 
## 660407 660409 660501 660701 660801 660802 660805 660809 660900 661004 
##      6      3      6      3      6      3      6      6      6      3 
## 661100 661201 661301 661401 661402 661500 661601 661800 661901 661904 
##      6      3      6      3      3      3      6      3      6      3 
## 661905 662001 662101 662200 662300 662401 662402 670201 670401 671002 
##      3      6      6      3      3      3      3      3      6      6 
## 671201 671501 680601 680801 700000 
##      3      6      3      6      3
#table(students_wt$COUNTY)
#table(students_wt$AREA.NAME)
#table(students_wt$REGION)
#table(students_wt$SCHOOL)
#table(students_wt$YEARS)
#table(students_wt$NO..OVERWEIGHT)
#table(students_wt$PCT.OVERWEIGHT)
#table(students_wt$NO..OBESE)
#table(students_wt$PCT.OBESE)
#table(students_wt$NO..OVERWEIGHT.OR.OBESE)
#table(students_wt$PCT.OVERWEIGHT.OR.OBESE)  
#table(students_wt$GRADE.LEVEL)
#table(students_wt$AREA.TYPE)
#table(students_wt$STREET.ADDRESS)
#table(students_wt$CITY)
#table(students_wt$STATE)  
#table(students_wt$ZIP.CODE)    
#table(students_wt$Location.1)
  1. Determine the frequency for each of the one of the categorical variables, by a different categorical variable.
#table(students_wt$COUNTY, students_wt$AREA.NAME)
#table(students_wt$COUNTY, students_wt$REGION)
#table(students_wt$COUNTY, students_wt$SCHOOL)
table(students_wt$COUNTY, students_wt$GRADE.LEVEL)
##               
##                DISTRICT TOTAL ELEMENTARY MIDDLE/HIGH
##   ALBANY                   19         19          19
##   ALLEGANY                 19         19          19
##   BROOME                   19         19          19
##   CATTARAUGUS              19         19          19
##   CAYUGA                   11         11          11
##   CHAUTAUQUA               28         28          28
##   CHEMUNG                   5          5           5
##   CHENANGO                 13         13          13
##   CLINTON                  13         13          13
##   COLUMBIA                  9          9           9
##   CORTLAND                  9          9           9
##   DELAWARE                 19         19          19
##   DUTCHESS                 21         21          21
##   ERIE                     43         43          43
##   ESSEX                    17         17          17
##   FRANKLIN                 11         11          11
##   FULTON                   11         11          11
##   GENESEE                  13         13          13
##   GREENE                   10         10          10
##   HAMILTON                  9          9           9
##   HERKIMER                 17         17          17
##   JEFFERSON                17         17          17
##   LEWIS                     9          9           9
##   LIVINGSTON               13         13          13
##   M/A                       0          0           1
##   MADISON                  16         16          16
##   MONROE                   27         27          27
##   MONTGOMERY                9          9           9
##   N/A                       8          8           7
##   NASSAU                   86         86          86
##   NIAGARA                  16         16          16
##   ONEIDA                   23         23          23
##   ONONDAGA                 28         28          28
##   ONTARIO                  15         15          15
##   ORANGE                   26         26          26
##   ORLEANS                  10         10          10
##   OSWEGO                   15         15          15
##   OTSEGO                   19         19          19
##   PUTNAM                   10         10          10
##   RENSSELAER               19         19          19
##   ROCKLAND                 14         14          14
##   SARATOGA                 19         19          19
##   SCHENECTADY              10         10          10
##   SCHOHARIE                10         10          10
##   SCHUYLER                  4          4           4
##   SENECA                    7          7           7
##   ST. LAWRENCE             27         27          27
##   STEUBEN                  21         21          21
##   SUFFOLK                 105        105         105
##   SULLIVAN                 14         14          14
##   TIOGA                    10         10          10
##   TOMPKINS                 10         10          10
##   ULSTER                   15         15          15
##   WARREN                   15         15          15
##   WASHINGTON               17         17          17
##   WAYNE                    17         17          17
##   WESTCHESTER              61         61          61
##   WYOMING                   9          9           9
##   YATES                     4          4           4
#table(students_wt$COUNTY, students_wt$CITY)
  1. Create a graph for a single numeric variable
boxplot(students_wt$ZIP.CODE)
#hist(students_wt$ZIP.CODE)
#boxplot(students_wt$NO..OVERWEIGHT)
#hist(students_wt$NO..OVERWEIGHT)
#boxplot(students_wt$NO..OVERWEIGHT.OR.OBESE)
#hist(students_wt$NO..OVERWEIGHT.OR.OBESE)

#http://www.ceb-institute.org/bbs/wp-content/uploads/2011/09/handout_ggplot2.pdf
# ggplot2:
#qplot(ZIP.CODE, data=students_wt, binwidth= 20)
# belwo is an example of non numeric factor variable as well, region. 
qplot(ZIP.CODE, data=students_wt, binwidth= 20, fill=REGION,  geom="bar")
  1. Create a scatterplot of two numeric variables.
#plot(students_wt$ZIP.CODE ~ students_wt$NO..OVERWEIGHT)
plot(students_wt$ZIP.CODE ~ students_wt$NO..OVERWEIGHT.OR.OBESE)
#qplot(ZIP.CODE, NO..OVERWEIGHT, fill=REGION, data= students_wt)
#qplot(ZIP.CODE, NO..OVERWEIGHT, data= students_wt)
#qplot(ZIP.CODE, NO..OVERWEIGHT.OR.OBESE, data= students_wt)