Ayala Cohen’s Week 2 Assignment

For this week’s assignment, I used a CSV file from https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/sandwich/PublicSchools.csv

The file was read into R with the following code:

theURL <- "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/sandwich/PublicSchools.csv"
read.table(file=theURL, header=TRUE, sep=",")
##                 X Expenditure Income
## 1         Alabama         275   6247
## 2          Alaska         821  10851
## 3         Arizona         339   7374
## 4        Arkansas         275   6183
## 5      California         387   8850
## 6        Colorado         452   8001
## 7     Connecticut         531   8914
## 8        Delaware         424   8604
## 9         Florida         316   7505
## 10        Georgia         265   6700
## 11         Hawaii         403   8380
## 12          Idaho         304   6813
## 13       Illinois         437   8745
## 14        Indiana         345   7696
## 15           Iowa         431   7873
## 16         Kansas         355   8001
## 17       Kentucky         260   6615
## 18      Louisiana         316   6640
## 19          Maine         327   6333
## 20       Maryland         427   8306
## 21  Massachusetts         427   8063
## 22       Michigan         466   8442
## 23      Minnesota         477   7847
## 24    Mississippi         259   5736
## 25       Missouri         274   7342
## 26        Montana         433   7051
## 27       Nebraska         294   7391
## 28         Nevada         359   9032
## 29  New Hampshire         279   7277
## 30     New Jersey         423   8818
## 31     New Mexico         388   6505
## 32       New York         447   8267
## 33 North Carolina         335   6607
## 34   North Dakota         311   7478
## 35           Ohio         322   7812
## 36       Oklahoma         320   6951
## 37         Oregon         397   7839
## 38   Pennsylvania         412   7733
## 39   Rhode Island         342   7526
## 40 South Carolina         315   6242
## 41   South Dakota         321   6841
## 42      Tennessee         268   6489
## 43          Texas         315   7697
## 44           Utah         417   6622
## 45        Vermont         353   6541
## 46       Virginia         356   7624
## 47     Washington         415   8450
## 48  Washington DC         428  10022
## 49  West Virginia         320   6456
## 50      Wisconsin          NA   7597
## 51        Wyoming         500   9096
ps <- read.table(file=theURL, header=TRUE, sep=",")
ps
##                 X Expenditure Income
## 1         Alabama         275   6247
## 2          Alaska         821  10851
## 3         Arizona         339   7374
## 4        Arkansas         275   6183
## 5      California         387   8850
## 6        Colorado         452   8001
## 7     Connecticut         531   8914
## 8        Delaware         424   8604
## 9         Florida         316   7505
## 10        Georgia         265   6700
## 11         Hawaii         403   8380
## 12          Idaho         304   6813
## 13       Illinois         437   8745
## 14        Indiana         345   7696
## 15           Iowa         431   7873
## 16         Kansas         355   8001
## 17       Kentucky         260   6615
## 18      Louisiana         316   6640
## 19          Maine         327   6333
## 20       Maryland         427   8306
## 21  Massachusetts         427   8063
## 22       Michigan         466   8442
## 23      Minnesota         477   7847
## 24    Mississippi         259   5736
## 25       Missouri         274   7342
## 26        Montana         433   7051
## 27       Nebraska         294   7391
## 28         Nevada         359   9032
## 29  New Hampshire         279   7277
## 30     New Jersey         423   8818
## 31     New Mexico         388   6505
## 32       New York         447   8267
## 33 North Carolina         335   6607
## 34   North Dakota         311   7478
## 35           Ohio         322   7812
## 36       Oklahoma         320   6951
## 37         Oregon         397   7839
## 38   Pennsylvania         412   7733
## 39   Rhode Island         342   7526
## 40 South Carolina         315   6242
## 41   South Dakota         321   6841
## 42      Tennessee         268   6489
## 43          Texas         315   7697
## 44           Utah         417   6622
## 45        Vermont         353   6541
## 46       Virginia         356   7624
## 47     Washington         415   8450
## 48  Washington DC         428  10022
## 49  West Virginia         320   6456
## 50      Wisconsin          NA   7597
## 51        Wyoming         500   9096

Tasks in the assignment:

Task 1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.

summary(ps)
##           X       Expenditure        Income     
##  Alabama   : 1   Min.   :259.0   Min.   : 5736  
##  Alaska    : 1   1st Qu.:315.2   1st Qu.: 6670  
##  Arizona   : 1   Median :354.0   Median : 7597  
##  Arkansas  : 1   Mean   :373.3   Mean   : 7608  
##  California: 1   3rd Qu.:426.2   3rd Qu.: 8286  
##  Colorado  : 1   Max.   :821.0   Max.   :10851  
##  (Other)   :45   NA's   :1

Task 2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.

psdf <- data.frame(ps)
psdf
##                 X Expenditure Income
## 1         Alabama         275   6247
## 2          Alaska         821  10851
## 3         Arizona         339   7374
## 4        Arkansas         275   6183
## 5      California         387   8850
## 6        Colorado         452   8001
## 7     Connecticut         531   8914
## 8        Delaware         424   8604
## 9         Florida         316   7505
## 10        Georgia         265   6700
## 11         Hawaii         403   8380
## 12          Idaho         304   6813
## 13       Illinois         437   8745
## 14        Indiana         345   7696
## 15           Iowa         431   7873
## 16         Kansas         355   8001
## 17       Kentucky         260   6615
## 18      Louisiana         316   6640
## 19          Maine         327   6333
## 20       Maryland         427   8306
## 21  Massachusetts         427   8063
## 22       Michigan         466   8442
## 23      Minnesota         477   7847
## 24    Mississippi         259   5736
## 25       Missouri         274   7342
## 26        Montana         433   7051
## 27       Nebraska         294   7391
## 28         Nevada         359   9032
## 29  New Hampshire         279   7277
## 30     New Jersey         423   8818
## 31     New Mexico         388   6505
## 32       New York         447   8267
## 33 North Carolina         335   6607
## 34   North Dakota         311   7478
## 35           Ohio         322   7812
## 36       Oklahoma         320   6951
## 37         Oregon         397   7839
## 38   Pennsylvania         412   7733
## 39   Rhode Island         342   7526
## 40 South Carolina         315   6242
## 41   South Dakota         321   6841
## 42      Tennessee         268   6489
## 43          Texas         315   7697
## 44           Utah         417   6622
## 45        Vermont         353   6541
## 46       Virginia         356   7624
## 47     Washington         415   8450
## 48  Washington DC         428  10022
## 49  West Virginia         320   6456
## 50      Wisconsin          NA   7597
## 51        Wyoming         500   9096

Task 3. Create new column names for the new data frame.

names(psdf) <- c("State", "State_Expenditures", "State_Income")
psdf
##             State State_Expenditures State_Income
## 1         Alabama                275         6247
## 2          Alaska                821        10851
## 3         Arizona                339         7374
## 4        Arkansas                275         6183
## 5      California                387         8850
## 6        Colorado                452         8001
## 7     Connecticut                531         8914
## 8        Delaware                424         8604
## 9         Florida                316         7505
## 10        Georgia                265         6700
## 11         Hawaii                403         8380
## 12          Idaho                304         6813
## 13       Illinois                437         8745
## 14        Indiana                345         7696
## 15           Iowa                431         7873
## 16         Kansas                355         8001
## 17       Kentucky                260         6615
## 18      Louisiana                316         6640
## 19          Maine                327         6333
## 20       Maryland                427         8306
## 21  Massachusetts                427         8063
## 22       Michigan                466         8442
## 23      Minnesota                477         7847
## 24    Mississippi                259         5736
## 25       Missouri                274         7342
## 26        Montana                433         7051
## 27       Nebraska                294         7391
## 28         Nevada                359         9032
## 29  New Hampshire                279         7277
## 30     New Jersey                423         8818
## 31     New Mexico                388         6505
## 32       New York                447         8267
## 33 North Carolina                335         6607
## 34   North Dakota                311         7478
## 35           Ohio                322         7812
## 36       Oklahoma                320         6951
## 37         Oregon                397         7839
## 38   Pennsylvania                412         7733
## 39   Rhode Island                342         7526
## 40 South Carolina                315         6242
## 41   South Dakota                321         6841
## 42      Tennessee                268         6489
## 43          Texas                315         7697
## 44           Utah                417         6622
## 45        Vermont                353         6541
## 46       Virginia                356         7624
## 47     Washington                415         8450
## 48  Washington DC                428        10022
## 49  West Virginia                320         6456
## 50      Wisconsin                 NA         7597
## 51        Wyoming                500         9096

Task 4. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

psdf$State <- as.character(psdf$State)
psdf$State_Expenditures <- as.numeric(psdf$State_Expenditures)
psdf$State_Income <- as.numeric(psdf$State_Income)
summary(psdf)
##     State           State_Expenditures  State_Income  
##  Length:51          Min.   :259.0      Min.   : 5736  
##  Class :character   1st Qu.:315.2      1st Qu.: 6670  
##  Mode  :character   Median :354.0      Median : 7597  
##                     Mean   :373.3      Mean   : 7608  
##                     3rd Qu.:426.2      3rd Qu.: 8286  
##                     Max.   :821.0      Max.   :10851  
##                     NA's   :1

When I compare the summary of the data frame to the summary of the data set, the main difference was that the summary for the data frame recognized the column that contains characters and listed the class as character. Also, the summary for the data frame did NOT list the count of NA’s as was listed in the summary of the data set.

Task 5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

I renamed 1 character value in the “State” column of my data frame.

psdf$State[psdf$State == "Washington DC"] <- "Washington, District of Columbia"
psdf[48, ]
##                               State State_Expenditures State_Income
## 48 Washington, District of Columbia                428        10022

Then, I replaced the NA that was listed for Wisconsin in the “State_Expenditures” column

psdf[50,2]
## [1] NA
psdf[50,2] = 425
psdf[50,2]
## [1] 425

In this particular data set, it was hard to find examples where I could rename 3 values in the column, since there were no examples of repeated values in the columns.