Homework 2

library(curl)

## Using libcurl 7.64.1 with Schannel

teaching_ratings_url <- "https://raw.githubusercontent.com/moiyajosephs/R-Homework-2/main/TeachingRatings.csv"
teaching_ratings <- read.csv(curl(teaching_ratings_url))

1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes of your data.

Display a summary of the teaching ratings

summary(teaching_ratings)

##        X           minority              age           gender         
##  Min.   :  1.0   Length:463         Min.   :29.00   Length:463        
##  1st Qu.:116.5   Class :character   1st Qu.:42.00   Class :character  
##  Median :232.0   Mode  :character   Median :48.00   Mode  :character  
##  Mean   :232.0                      Mean   :48.37                     
##  3rd Qu.:347.5                      3rd Qu.:57.00                     
##  Max.   :463.0                      Max.   :73.00                     
##    credits              beauty                eval         division        
##  Length:463         Min.   :-1.4504940   Min.   :2.100   Length:463        
##  Class :character   1st Qu.:-0.6562689   1st Qu.:3.600   Class :character  
##  Mode  :character   Median :-0.0680143   Median :4.000   Mode  :character  
##                     Mean   : 0.0000001   Mean   :3.998                     
##                     3rd Qu.: 0.5456024   3rd Qu.:4.400                     
##                     Max.   : 1.9700230   Max.   :5.000                     
##     native             tenure             students       allstudents    
##  Length:463         Length:463         Min.   :  5.00   Min.   :  8.00  
##  Class :character   Class :character   1st Qu.: 15.00   1st Qu.: 19.00  
##  Mode  :character   Mode  :character   Median : 23.00   Median : 29.00  
##                                        Mean   : 36.62   Mean   : 55.18  
##                                        3rd Qu.: 40.00   3rd Qu.: 60.00  
##                                        Max.   :380.00   Max.   :581.00  
##       prof      
##  Min.   : 1.00  
##  1st Qu.:20.00  
##  Median :44.00  
##  Mean   :45.43  
##  3rd Qu.:70.50  
##  Max.   :94.00

The Mean and Median of the age attribute

mean(teaching_ratings$age)

## [1] 48.36501

median(teaching_ratings$age)

## [1] 48

The mean age of the teachers of the data set is 48.36501. The median age of the teachers of the data set is 48.

The mean and the median of the evaluation attribute

mean(teaching_ratings$eval)

## [1] 3.998272

median(teaching_ratings$eval)

## [1] 4

The evaluation averaged out to 3.998272 and the median is 4.

2. Create a new data frame with a subset of the columns AND rows.

teacher_stats <- teaching_ratings[c(1:20),c(1:4,7)]
head(teacher_stats)

##   X minority age gender eval
## 1 1      yes  36 female  4.3
## 2 2       no  59   male  4.5
## 3 3       no  51   male  3.7
## 4 4       no  40 female  4.3
## 5 5       no  31 female  4.4
## 6 6       no  62   male  4.2

3. Create new column names for each column in the new data frame created in step 2.

colnames(teacher_stats) <- c("ID", "teacher_minority", "teacher_age","sex","scores")
head(teacher_stats)

##   ID teacher_minority teacher_age    sex scores
## 1  1              yes          36 female    4.3
## 2  2               no          59   male    4.5
## 3  3               no          51   male    3.7
## 4  4               no          40 female    4.3
## 5  5               no          31 female    4.4
## 6  6               no          62   male    4.2

4.Use the summary function to create an overview of your new data frame created in step 2. Then print the mean and median for the same two attributes. Please compare (i.e. tell me how the values changed and why).

Summary

summary(teacher_stats)

##        ID        teacher_minority    teacher_age        sex           
##  Min.   : 1.00   Length:20          Min.   :31.00   Length:20         
##  1st Qu.: 5.75   Class :character   1st Qu.:36.75   Class :character  
##  Median :10.50   Mode  :character   Median :45.50   Mode  :character  
##  Mean   :10.50                      Mean   :44.75                     
##  3rd Qu.:15.25                      3rd Qu.:51.00                     
##  Max.   :20.00                      Max.   :62.00                     
##      scores     
##  Min.   :2.900  
##  1st Qu.:3.625  
##  Median :4.000  
##  Mean   :3.920  
##  3rd Qu.:4.300  
##  Max.   :4.500

By taking a subset of the rows and the columns and using the summary function it is clear to see that the statistics of the data set will change. The original data frame had 463 values while I took 20 values. When decreasing the amount of data points in data analysis the statistics you get can vary and not show the true representation of the complete data set. This can be shown when comparing the mean and median of the same attributes I found in step 1 of the original data set and seeing that the values change when the data is decreased.

The mean and median of the subsetted datasets age attribute

mean(teacher_stats$teacher_age)

## [1] 44.75

median(teacher_stats$teacher_age)

## [1] 45.5

Further when taking the mean and median of the age attributes the mean changed from 48.36501 to 44.75; The median changed from 48 to 45.5.

The mean and median of the subsetted datasets scores/eval attribute

mean(teacher_stats$scores)

## [1] 3.92

median(teacher_stats$scores)

## [1] 4

When looking at the evaluation scores which I renamed to scores, the mean changed from 3.998272 to 3.92. However the median value of the evaluations remained the same at a value of 4.

5. Rename 3 distinct values in a column.

For the teacher evaluations they were rated on a range of 1-5. I gave the ratings within a certain range a text value. (1=poor, 2=fair, 3=good, 4=very good, 5=excellent)

teacher_stats["scores"][teacher_stats["scores"] >= 1 & teacher_stats["scores"] < 2] <- "poor"
teacher_stats["scores"][teacher_stats["scores"] >= 2 & teacher_stats["scores"] < 3] <- "fair"
teacher_stats["scores"][teacher_stats["scores"] >= 3 & teacher_stats["scores"] < 4] <- "good"
teacher_stats["scores"][teacher_stats["scores"] >= 4 & teacher_stats["scores"] < 5] <- "very good"
teacher_stats["scores"][teacher_stats["scores"] == 5] <- "excellent"



#teacher_stats1 = replace(teacher_stats$scores, teacher_stats$scores >= 3 & teacher_stats$scores < 4, "very good")

6. Display enough rows to see examples of all of steps 1-5 above. This means use a function to show me enough row values that I can see the changes.

head(teacher_stats, n=20)

##    ID teacher_minority teacher_age    sex    scores
## 1   1              yes          36 female very good
## 2   2               no          59   male very good
## 3   3               no          51   male      good
## 4   4               no          40 female very good
## 5   5               no          31 female very good
## 6   6               no          62   male very good
## 7   7               no          33 female very good
## 8   8               no          51 female      good
## 9   9               no          33 female very good
## 10 10               no          47   male      good
## 11 11              yes          35   male      good
## 12 12               no          37   male very good
## 13 13               no          42   male      good
## 14 14               no          49   male      good
## 15 15               no          37 female      fair
## 16 16               no          45   male very good
## 17 17               no          56 female very good
## 18 18               no          48   male      good
## 19 19               no          46 female very good
## 20 20               no          57 female      good