library(curl)
## Using libcurl 7.64.1 with Schannel
teaching_ratings_url <- "https://raw.githubusercontent.com/moiyajosephs/R-Homework-2/main/TeachingRatings.csv"
teaching_ratings <- read.csv(curl(teaching_ratings_url))
summary(teaching_ratings)
## X minority age gender
## Min. : 1.0 Length:463 Min. :29.00 Length:463
## 1st Qu.:116.5 Class :character 1st Qu.:42.00 Class :character
## Median :232.0 Mode :character Median :48.00 Mode :character
## Mean :232.0 Mean :48.37
## 3rd Qu.:347.5 3rd Qu.:57.00
## Max. :463.0 Max. :73.00
## credits beauty eval division
## Length:463 Min. :-1.4504940 Min. :2.100 Length:463
## Class :character 1st Qu.:-0.6562689 1st Qu.:3.600 Class :character
## Mode :character Median :-0.0680143 Median :4.000 Mode :character
## Mean : 0.0000001 Mean :3.998
## 3rd Qu.: 0.5456024 3rd Qu.:4.400
## Max. : 1.9700230 Max. :5.000
## native tenure students allstudents
## Length:463 Length:463 Min. : 5.00 Min. : 8.00
## Class :character Class :character 1st Qu.: 15.00 1st Qu.: 19.00
## Mode :character Mode :character Median : 23.00 Median : 29.00
## Mean : 36.62 Mean : 55.18
## 3rd Qu.: 40.00 3rd Qu.: 60.00
## Max. :380.00 Max. :581.00
## prof
## Min. : 1.00
## 1st Qu.:20.00
## Median :44.00
## Mean :45.43
## 3rd Qu.:70.50
## Max. :94.00
mean(teaching_ratings$age)
## [1] 48.36501
median(teaching_ratings$age)
## [1] 48
The mean age of the teachers of the data set is 48.36501. The median age of the teachers of the data set is 48.
mean(teaching_ratings$eval)
## [1] 3.998272
median(teaching_ratings$eval)
## [1] 4
The evaluation averaged out to 3.998272 and the median is 4.
teacher_stats <- teaching_ratings[c(1:20),c(1:4,7)]
head(teacher_stats)
## X minority age gender eval
## 1 1 yes 36 female 4.3
## 2 2 no 59 male 4.5
## 3 3 no 51 male 3.7
## 4 4 no 40 female 4.3
## 5 5 no 31 female 4.4
## 6 6 no 62 male 4.2
colnames(teacher_stats) <- c("ID", "teacher_minority", "teacher_age","sex","scores")
head(teacher_stats)
## ID teacher_minority teacher_age sex scores
## 1 1 yes 36 female 4.3
## 2 2 no 59 male 4.5
## 3 3 no 51 male 3.7
## 4 4 no 40 female 4.3
## 5 5 no 31 female 4.4
## 6 6 no 62 male 4.2
summary(teacher_stats)
## ID teacher_minority teacher_age sex
## Min. : 1.00 Length:20 Min. :31.00 Length:20
## 1st Qu.: 5.75 Class :character 1st Qu.:36.75 Class :character
## Median :10.50 Mode :character Median :45.50 Mode :character
## Mean :10.50 Mean :44.75
## 3rd Qu.:15.25 3rd Qu.:51.00
## Max. :20.00 Max. :62.00
## scores
## Min. :2.900
## 1st Qu.:3.625
## Median :4.000
## Mean :3.920
## 3rd Qu.:4.300
## Max. :4.500
By taking a subset of the rows and the columns and using the summary function it is clear to see that the statistics of the data set will change. The original data frame had 463 values while I took 20 values. When decreasing the amount of data points in data analysis the statistics you get can vary and not show the true representation of the complete data set. This can be shown when comparing the mean and median of the same attributes I found in step 1 of the original data set and seeing that the values change when the data is decreased.
mean(teacher_stats$teacher_age)
## [1] 44.75
median(teacher_stats$teacher_age)
## [1] 45.5
Further when taking the mean and median of the age attributes the mean changed from 48.36501 to 44.75; The median changed from 48 to 45.5.
mean(teacher_stats$scores)
## [1] 3.92
median(teacher_stats$scores)
## [1] 4
When looking at the evaluation scores which I renamed to scores, the mean changed from 3.998272 to 3.92. However the median value of the evaluations remained the same at a value of 4.
For the teacher evaluations they were rated on a range of 1-5. I gave the ratings within a certain range a text value. (1=poor, 2=fair, 3=good, 4=very good, 5=excellent)
teacher_stats["scores"][teacher_stats["scores"] >= 1 & teacher_stats["scores"] < 2] <- "poor"
teacher_stats["scores"][teacher_stats["scores"] >= 2 & teacher_stats["scores"] < 3] <- "fair"
teacher_stats["scores"][teacher_stats["scores"] >= 3 & teacher_stats["scores"] < 4] <- "good"
teacher_stats["scores"][teacher_stats["scores"] >= 4 & teacher_stats["scores"] < 5] <- "very good"
teacher_stats["scores"][teacher_stats["scores"] == 5] <- "excellent"
#teacher_stats1 = replace(teacher_stats$scores, teacher_stats$scores >= 3 & teacher_stats$scores < 4, "very good")
head(teacher_stats, n=20)
## ID teacher_minority teacher_age sex scores
## 1 1 yes 36 female very good
## 2 2 no 59 male very good
## 3 3 no 51 male good
## 4 4 no 40 female very good
## 5 5 no 31 female very good
## 6 6 no 62 male very good
## 7 7 no 33 female very good
## 8 8 no 51 female good
## 9 9 no 33 female very good
## 10 10 no 47 male good
## 11 11 yes 35 male good
## 12 12 no 37 male very good
## 13 13 no 42 male good
## 14 14 no 49 male good
## 15 15 no 37 female fair
## 16 16 no 45 male very good
## 17 17 no 56 female very good
## 18 18 no 48 male good
## 19 19 no 46 female very good
## 20 20 no 57 female good