For this homework, I will use the California Test Score Data. The dataset contains:
data <- read.csv('/Users/sue/Downloads/CASchools.csv')
The dataset has 420 observations and 15 variables. I used R code to compute this and display it inline.
dim(data)
## [1] 420 15
str(data)
## 'data.frame': 420 obs. of 15 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ district : int 75119 61499 61549 61457 61523 62042 68536 63834 62331 67306 ...
## $ school : chr "Sunol Glen Unified" "Manzanita Elementary" "Thermalito Union Elementary" "Golden Feather Union Elementary" ...
## $ county : chr "Alameda" "Butte" "Butte" "Butte" ...
## $ grades : chr "KK-08" "KK-08" "KK-08" "KK-08" ...
## $ students : int 195 240 1550 243 1335 137 195 888 379 2247 ...
## $ teachers : num 10.9 11.1 82.9 14 71.5 ...
## $ calworks : num 0.51 15.42 55.03 36.48 33.11 ...
## $ lunch : num 2.04 47.92 76.32 77.05 78.43 ...
## $ computer : int 67 101 169 85 171 25 28 66 35 0 ...
## $ expenditure: num 6385 5099 5502 7102 5236 ...
## $ income : num 22.69 9.82 8.98 8.98 9.08 ...
## $ english : num 0 4.58 30 0 13.86 ...
## $ read : num 692 660 636 652 642 ...
## $ math : num 690 662 651 644 640 ...
###summary of the data
head(data)
## X district school county grades students teachers
## 1 1 75119 Sunol Glen Unified Alameda KK-08 195 10.90
## 2 2 61499 Manzanita Elementary Butte KK-08 240 11.15
## 3 3 61549 Thermalito Union Elementary Butte KK-08 1550 82.90
## 4 4 61457 Golden Feather Union Elementary Butte KK-08 243 14.00
## 5 5 61523 Palermo Union Elementary Butte KK-08 1335 71.50
## 6 6 62042 Burrel Union Elementary Fresno KK-08 137 6.40
## calworks lunch computer expenditure income english read math
## 1 0.5102 2.0408 67 6384.911 22.690001 0.000000 691.6 690.0
## 2 15.4167 47.9167 101 5099.381 9.824000 4.583333 660.5 661.9
## 3 55.0323 76.3226 169 5501.955 8.978000 30.000002 636.3 650.9
## 4 36.4754 77.0492 85 7101.831 8.978000 0.000000 651.9 643.5
## 5 33.1086 78.4270 171 5235.988 9.080333 13.857677 641.8 639.9
## 6 12.3188 86.9565 25 5580.147 10.415000 12.408759 605.7 605.4
summary(data)
## X district school county
## Min. : 1.0 Min. :61382 Length:420 Length:420
## 1st Qu.:105.8 1st Qu.:64308 Class :character Class :character
## Median :210.5 Median :67760 Mode :character Mode :character
## Mean :210.5 Mean :67473
## 3rd Qu.:315.2 3rd Qu.:70419
## Max. :420.0 Max. :75440
## grades students teachers calworks
## Length:420 Min. : 81.0 Min. : 4.85 Min. : 0.000
## Class :character 1st Qu.: 379.0 1st Qu.: 19.66 1st Qu.: 4.395
## Mode :character Median : 950.5 Median : 48.56 Median :10.520
## Mean : 2628.8 Mean : 129.07 Mean :13.246
## 3rd Qu.: 3008.0 3rd Qu.: 146.35 3rd Qu.:18.981
## Max. :27176.0 Max. :1429.00 Max. :78.994
## lunch computer expenditure income
## Min. : 0.00 Min. : 0.0 Min. :3926 Min. : 5.335
## 1st Qu.: 23.28 1st Qu.: 46.0 1st Qu.:4906 1st Qu.:10.639
## Median : 41.75 Median : 117.5 Median :5215 Median :13.728
## Mean : 44.71 Mean : 303.4 Mean :5312 Mean :15.317
## 3rd Qu.: 66.86 3rd Qu.: 375.2 3rd Qu.:5601 3rd Qu.:17.629
## Max. :100.00 Max. :3324.0 Max. :7712 Max. :55.328
## english read math
## Min. : 0.000 Min. :604.5 Min. :605.4
## 1st Qu.: 1.941 1st Qu.:640.4 1st Qu.:639.4
## Median : 8.778 Median :655.8 Median :652.5
## Mean :15.768 Mean :655.0 Mean :653.3
## 3rd Qu.:22.970 3rd Qu.:668.7 3rd Qu.:665.9
## Max. :85.540 Max. :704.0 Max. :709.5
Then we try to calculate the average number of students is 2628.7928571. I used R code to compute this and display it inline. The equation is \(\frac{\sum_{i=1}^n x}{n}\)
##total read score
mean(data$students)
## [1] 2628.793
From the plots, we can see that Percent qualifying for CalWorks is negatively related to math and read scores.