Introduction

This analysis explores student exam performance data using R.
We will load the data, summarize it, visualize distributions, and run a regression.

Load Data

data <- read_csv("C:\\Users\\kalyani kumar\\OneDrive\\Desktop\\StudentsPerformance.csv")
## Rows: 1000 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): gender, race/ethnicity, parental level of education, lunch, test pr...
## dbl (3): math score, reading score, writing score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 8
##   gender `race/ethnicity` parental level of educa…¹ lunch test preparation cou…²
##   <chr>  <chr>            <chr>                     <chr> <chr>                 
## 1 female group B          bachelor's degree         stan… none                  
## 2 female group C          some college              stan… completed             
## 3 female group B          master's degree           stan… none                  
## 4 male   group A          associate's degree        free… none                  
## 5 male   group C          some college              stan… none                  
## 6 female group B          associate's degree        stan… none                  
## # ℹ abbreviated names: ¹​`parental level of education`,
## #   ²​`test preparation course`
## # ℹ 3 more variables: `math score` <dbl>, `reading score` <dbl>,
## #   `writing score` <dbl>

Data Summary

summary(data)
##     gender          race/ethnicity     parental level of education
##  Length:1000        Length:1000        Length:1000                
##  Class :character   Class :character   Class :character           
##  Mode  :character   Mode  :character   Mode  :character           
##                                                                   
##                                                                   
##                                                                   
##     lunch           test preparation course   math score     reading score   
##  Length:1000        Length:1000             Min.   :  0.00   Min.   : 17.00  
##  Class :character   Class :character        1st Qu.: 57.00   1st Qu.: 59.00  
##  Mode  :character   Mode  :character        Median : 66.00   Median : 70.00  
##                                             Mean   : 66.09   Mean   : 69.17  
##                                             3rd Qu.: 77.00   3rd Qu.: 79.00  
##                                             Max.   :100.00   Max.   :100.00  
##  writing score   
##  Min.   : 10.00  
##  1st Qu.: 57.75  
##  Median : 69.00  
##  Mean   : 68.05  
##  3rd Qu.: 79.00  
##  Max.   :100.00
str(data)
## spc_tbl_ [1,000 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ gender                     : chr [1:1000] "female" "female" "female" "male" ...
##  $ race/ethnicity             : chr [1:1000] "group B" "group C" "group B" "group A" ...
##  $ parental level of education: chr [1:1000] "bachelor's degree" "some college" "master's degree" "associate's degree" ...
##  $ lunch                      : chr [1:1000] "standard" "standard" "standard" "free/reduced" ...
##  $ test preparation course    : chr [1:1000] "none" "completed" "none" "none" ...
##  $ math score                 : num [1:1000] 72 69 90 47 76 71 88 40 64 38 ...
##  $ reading score              : num [1:1000] 72 90 95 57 78 83 95 43 64 60 ...
##  $ writing score              : num [1:1000] 74 88 93 44 75 78 92 39 67 50 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   gender = col_character(),
##   ..   `race/ethnicity` = col_character(),
##   ..   `parental level of education` = col_character(),
##   ..   lunch = col_character(),
##   ..   `test preparation course` = col_character(),
##   ..   `math score` = col_double(),
##   ..   `reading score` = col_double(),
##   ..   `writing score` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Reading Score Histogram

hist(data$`reading score`, col = "skyblue", main = "Reading Score Distribution")

Boxplot of Math Score by Gender

boxplot(`math score` ~ gender, data = data, col = "lightblue", main = "Math Score by Gender")

Regression Example

model <- lm(`math score` ~ `reading score` + `writing score`, data = data)
summary(model)
## 
## Call:
## lm(formula = `math score` ~ `reading score` + `writing score`, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -23.8779  -6.1750   0.2693   6.0184  24.8727 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      7.52409    1.32823   5.665 1.93e-08 ***
## `reading score`  0.60129    0.06304   9.538  < 2e-16 ***
## `writing score`  0.24942    0.06057   4.118 4.14e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.667 on 997 degrees of freedom
## Multiple R-squared:  0.674,  Adjusted R-squared:  0.6733 
## F-statistic:  1031 on 2 and 997 DF,  p-value: < 2.2e-16