Practical Data Operations in R: Sorting, Grouping, and Assignment

Summary of Task

Write an R script to perform a practical student project distribution task.

Objectives

  1. Read Student Data from Excel
    • The dataset includes:
      • Student names
      • Exam roll numbers
      • CGPA
  2. Sort Students by CGPA (Descending Order)
    • Students are ranked from highest to lowest academic performance.
  3. Assign Students to Sequential Groups
    • Students are divided into groups where each group size equals the number of teachers (e.g., 23).
    • The final group may contain fewer students if the total number is not divisible by the number of teachers.
  4. Randomly Assign Teachers to Each Student
    • For full-size groups:
      • Each student is randomly assigned a unique teacher.
    • For the last (smaller) group:
      • Only the first m teachers (m < total number of teachers) are considered.
      • Teacher assignments are randomized within this subset.
  5. Create an Output-Ready Dataset
    • The final data frame includes:
      • Student information
      • Merit position (implicitly defined by CGPA sorting)
      • Group number
      • Assigned teacher
  6. Write Output to Excel
    • Export the final project distribution dataset to an Excel file.

Student–Teacher Assignment Flow

             ┌───────────────────────────────┐
             │   12 Students                 │
             │   Sorted by CGPA (High → Low) │
             └───────────────────────────────┘
                               │
                               ▼
             ┌───────────────────────────────┐
             │   Divide into 4 Groups        │
             │   (Each group size = 3)       │
             └───────────────────────────────┘
                               │
                               ▼
    ┌──────────────────┬──────────────────┬──────────────────┬───────────────────┐
    │     Group 1      │     Group 2      │     Group 3      │     Group 4       │
    │   Top 3 Students │  Next 3 Students │  Next 3 Students │ Bottom 3 Students │
    └──────────────────┴──────────────────┴──────────────────┴───────────────────┘
                               │
                               ▼
             ┌───────────────────────────────┐
             │     FOR EACH GROUP            │
             │                               │
             │  • Take 3 students            │
             │  • Shuffle teachers A, B, C   │
             │  • Assign sequentially:       │
             │      Student 1 → Teacher 1    │
             │      Student 2 → Teacher 2    │
             │      Student 3 → Teacher 3    │
             └───────────────────────────────┘
                               │
                               ▼
             ┌───────────────────────────────┐
             │        FINAL RESULT           │
             └───────────────────────────────┘
                               │
                               ▼
    ┌──────────────────┬──────────────────┬──────────────────┐
    │    Teacher A     │    Teacher B     │    Teacher C     │
    │ 1 Top + 1 Good   │ 1 Top + 1 Good   │ 1 Top + 1 Good   │
    │ +1 Avg +1 Weak   │ +1 Avg +1 Weak   │ +1 Avg +1 Weak   │
    └──────────────────┴──────────────────┴──────────────────┘
require(readxl)
library(writexl)
project.df<-read_excel("C:\\Users\\Dell\\OneDrive\\Desktop\\STATISTICS\\2nd Year\\R programming\\varsity\\Project_Distribution_Data.xlsx")

# View data structure
str(project.df)
head(project.df)

Step 1: Sort Data in descending order of CGPA

sorted_df <- project.df[order(project.df$CGPA, decreasing = TRUE), ]
sorted_df

Step 2: Vector of teacher initials (suppose we have currently 23 techers in active service)

teachers <- LETTERS[1:23]
teachers

Step 3: Assign students to groups sequentially, starting from 1, 2, 3… and repeating, based on the number of teachers

n_stu <- nrow(sorted_df)
n_tea <- length(teachers)
n_groups <- ceiling(n_stu / n_tea)
sorted_df$Group <- rep(1:n_groups, each = n_tea, length.out = n_stu)
sorted_df

Step 4: block-wise random assignment of teachers!

set.seed(209)
sorted_df$AssignedT <- NA
groups <- unique(sorted_df$Group)    # Get all unique group numbers


for (g in groups) {
  # Find indices of students in this group
  idx <- which(Group == g)
  n_stu_group <- length(idx)
  
  # Randomly assign teachers to these students
  # Since teachers repeat across groups, we can sample all teachers
  sorted_df$AssignedT[idx] <- sample(teachers)[1:n_stu_group]
}
  

Step 5: Check Results

head(sorted_df, 20)

# Check distribution
table(sorted_df$AssignedT)

# Verify no teacher gets too many students
teacher_counts <- table(sorted_df$AssignedT)
summary(as.numeric(teacher_counts))

Step 6: Save/ Write df to an Excel file

write_xlsx(sorted_df, "Assigned_Projects.xlsx")

R Data Analysis Project: Student Performance Dataset

Project Overview

You are given a dataset containing student information including Exam Roll numbers, Sex, CGPA, Group assignment, and Assigned Teacher. Your task is to prepare this data for analysis and perform a comprehensive statistical investigation using R.


Part 1: Data Preparation & Variable Type Conversion

Instructions

Before conducting any analysis, you must properly convert variable types in R. Explain your reasoning for each conversion in your code comments.

Variables to Convert

  1. ExamRoll
    • Currently: numeric
    • Convert to: character (or factor)
    • Reason: Exam roll numbers are identifiers, not numeric quantities for mathematical operations. Keeping them numeric could cause unintended statistical operations.
  2. Sex
    • Currently: character
    • Convert to: factor
    • Reason: This is a categorical variable with fixed levels (e.g., “M”, “F”).
  3. CGPA
    • Currently: numeric
    • Keep as: numeric
    • Reason: Continuous quantitative variable suitable for mathematical operations.
  4. Group
    • Currently: numeric
    • Convert to: factor
    • Reason: Represents class/section categories (1, 2, 3…), not measurable quantities.
  5. AssignedTeacher
    • Currently: character
    • Convert to: factor
    • Reason: Teacher names or initials represent categorical groups.

Part 2: Statistical Analysis

A. Descriptive Statistics

Perform the following analyses and report your findings:

  1. Univariate Statistics for CGPA
    • Mean
    • Median
    • Standard deviation
  2. Frequency Distributions
    • Sex
    • Group
    • AssignedTeacher
  3. Group Comparisons
    • CGPA by Sex (mean, SD, count)
    • CGPA by Group (mean, SD, count)

B. Data Visualization

Create the following plots with appropriate titles and labels:

  1. Distribution Plots
    • Histogram of CGPA
    • Density plot of CGPA
  2. Categorical Data Plots
    • Bar plot of Sex
    • Bar plot of Group
    • Bar plot of AssignedTeacher
  3. Comparative Plots
    • Boxplot of CGPA by Sex
    • Boxplot of CGPA by Group

C. Inferential Statistics

Perform the following hypothesis tests and interpret the results:

  1. Mean Comparisons
    • Independent samples t-test: CGPA by Sex
    • One-way ANOVA: CGPA by Group
    • One-way ANOVA: CGPA by AssignedTeacher
  2. Association Tests
    • Chi-square test: Sex vs Group
    • Chi-square test: Sex vs AssignedTeacher

D. Regression Analysis

  1. Simple Linear Regression
    • Model 1: CGPA ~ Sex
    • Model 2: CGPA ~ Group
  2. Multiple Regression
    • Full model: CGPA ~ Sex + Group + AssignedTeacher

Part 3: Diagnostic Checks & Assumptions

A. Outlier Detection

  • Detect outliers in CGPA using the IQR rule
  • Visualize outliers using boxplots

B. Regression Diagnostics

For the multiple regression model (CGPA ~ Sex + Group + AssignedTeacher), check:

  1. Linearity
    • Residuals vs Fitted plot
  2. Normality
    • Q–Q plot of residuals
    • Shapiro–Wilk test on residuals
  3. Homoscedasticity
    • Scale–Location plot
  4. Independence
    • Durbin–Watson test (if appropriate)
  5. Multicollinearity
    • Variance Inflation Factor (VIF)

Part 4: Submission Requirements

What to Submit

  1. R Script
    • Contains all code with clear and meaningful comments
  2. Report (R Markdown / Word / PDF)
    • Introduction and objectives
    • Data preparation summary
    • Key findings from each analysis
    • Interpretation of statistical results
    • Visualizations with captions
    • Conclusions and limitations

Code Organization

Your R script should follow this structure:

Code Organization Structure

Part 1: Data Preparation

  • Variable type conversion
  • Data cleaning (if required)

Part 2A: Descriptive Statistics

  • Summary statistics for CGPA
  • Frequency tables for categorical variables
  • Group-wise summaries

Part 2B: Data Visualization

  • Distribution plots (histogram, density plot)
  • Bar plots for categorical variables
  • Boxplots for group comparisons

Remaining Sections

  • Inferential statistics
  • Regression analysis
  • Diagnostic checks and assumption testing

Note: Each section in the R script should be clearly separated using comments and written in a logical, sequential manner for readability and grading.

Student Performance Analysis – Workflow Flowchart

             ┌───────────────────────────────────────────┐
             │            START PROJECT                  │
             │   Student Performance Dataset in R        │
             └───────────────────────────────────────────┘
                               │
                               ▼
             ┌───────────────────────────────────────────┐
             │            LOAD PACKAGES                  │
             │     readxl, dplyr                         │
             └───────────────────────────────────────────┘
                               │
                               ▼
             ┌───────────────────────────────────────────┐
             │              READ DATA                    │
             │  Import Excel Student Dataset             │
             └───────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │              PART 1: DATA PREPARATION                  │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Convert Variable Types                                 │
    │                                                        │
    │ ExamRoll         → character (identifier)              │
    │ Sex              → factor (categorical)                │
    │ CGPA             → numeric (continuous)                │
    │ Group            → factor (category)                   │
    │ AssignedTeacher  → factor (category)                   │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
             ┌───────────────────────────────────────────┐
             │          VERIFY STRUCTURE                 │
             │               str(data)                   │
             └───────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │        PART 2A: DESCRIPTIVE STATISTICS                 │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Univariate Statistics (CGPA)                           │
    │ Mean • Median • Standard Deviation                     │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Frequency Distributions                                │
    │ Sex • Group • AssignedTeacher                          │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Group-wise CGPA Comparison                             │
    │ CGPA by Sex  (Mean, SD, Count)                         │
    │ CGPA by Group (Mean, SD, Count)                        │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │        PART 2B: DATA VISUALIZATION                     │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Distribution Plots                                     │
    │ Histogram of CGPA                                      │
    │ Density Plot of CGPA                                   │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Categorical Bar Plots                                  │
    │ Sex • Group • AssignedTeacher                          │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Comparative Boxplots                                   │
    │ CGPA by Sex                                            │
    │ CGPA by Group                                          │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │        PART 2C: INFERENTIAL STATISTICS                 │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Mean Comparison Tests                                  │
    │ t-test      : CGPA ~ Sex                               │
    │ ANOVA       : CGPA ~ Group                             │
    │ ANOVA       : CGPA ~ AssignedTeacher                   │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Association Tests                                      │
    │ Chi-square: Sex vs Group                               │
    │ Chi-square: Sex vs AssignedTeacher                     │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │        PART 2D: REGRESSION ANALYSIS                    │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Simple Regression Models                               │
    │ CGPA ~ Sex                                             │
    │ CGPA ~ Group                                           │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Multiple Regression Model                              │
    │ CGPA ~ Sex + Group + AssignedTeacher                   │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │        PART 3A: OUTLIER DETECTION                      │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ IQR Rule for CGPA                                      │
    │ Identify and Count Outliers                            │
    │ Boxplot Visualization                                  │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │        PART 3B: REGRESSION DIAGNOSTICS                 │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
    ┌────────────────────────────────────────────────────────┐
    │ Diagnostic Checks                                      │
    │ Residual Plots                                         │
    │ Normality (Q-Q Plot, Shapiro-Wilk Test)                │
    └────────────────────────────────────────────────────────┘
                               │
                               ▼
             ┌───────────────────────────────────────────┐
             │            END OF ANALYSIS                │
             │      Results Ready for Report             │
             └───────────────────────────────────────────┘
# ===================================================================
# Student Performance Analysis - R Script
# ===================================================================

# Load required packages
library(readxl)
library(dplyr)

# Read data
data <- read_excel("C:/Users/Dell/OneDrive/Desktop/STATISTICS/2nd Year/R programming/varsity/Project_Distribution.xlsx")

# Part 1: Data Preparation
# =========================

# Convert variable types
# ExamRoll: Convert to character (identifiers, not for calculations)
data$ExamRoll <- as.character(data$ExamRoll)

# Sex: Convert to factor (categorical variable with fixed levels)
data$Sex <- as.factor(data$Sex)

# CGPA: Keep as numeric (continuous quantitative variable)
# No conversion needed

# Group: Convert to factor (class categories, not measurable quantities)
data$Group <- as.factor(data$Group)

# AssignedTeacher: Convert to factor (teacher names as categorical groups)
data$AssignedTeacher <- as.factor(data$AssignedTeacher)

# Check conversions
str(data)
tibble [80 × 5] (S3: tbl_df/tbl/data.frame)
 $ ExamRoll       : chr [1:80] "2316669" "2316632" "2316615" "2316643" ...
 $ Sex            : Factor w/ 2 levels "F","M": 1 2 2 2 1 1 2 2 2 1 ...
 $ CGPA           : num [1:80] 3.99 3.95 3.94 3.94 3.91 ...
 $ Group          : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
 $ AssignedTeacher: Factor w/ 23 levels "A","B","C","D",..: 2 15 3 6 13 20 11 9 5 7 ...
# Part 2A: Descriptive Statistics
# ===============================

# 1. Univariate Statistics for CGPA
cat("CGPA Statistics:\n")
CGPA Statistics:
cat("Mean:", mean(data$CGPA), "\n")
Mean: 3.415225 
cat("Median:", median(data$CGPA), "\n")
Median: 3.484 
cat("Standard Deviation:", sd(data$CGPA), "\n\n")
Standard Deviation: 0.4411517 
# 2. Frequency Distributions
cat("Frequency Tables:\n")
Frequency Tables:
cat("Sex:\n")
Sex:
print(table(data$Sex))

 F  M 
30 50 
cat("\nGroup:\n")

Group:
print(table(data$Group))

 1  2  3  4 
23 23 23 11 
cat("\nAssignedTeacher:\n")

AssignedTeacher:
print(table(data$AssignedTeacher))

 A  B  C  D  E FT  G  H  I  J  K  L MT  N  O  P  Q  R  S  T  U  V  W 
 4  4  4  4  4  4  4  4  4  4  4  3  3  3  3  3  3  3  3  3  3  3  3 
cat("\n")
# 3. Group Comparisons
cat("CGPA by Sex:\n")
CGPA by Sex:
print(data %>% group_by(Sex) %>% 
      summarise(Mean = mean(CGPA), SD = sd(CGPA), Count = n()))

cat("\nCGPA by Group:\n")

CGPA by Group:
print(data %>% group_by(Group) %>% 
      summarise(Mean = mean(CGPA), SD = sd(CGPA), Count = n()))

# Part 2B: Data Visualization
# ============================

# 1. Distribution Plots
# Histogram of CGPA
hist(data$CGPA, main = "Histogram of CGPA", xlab = "CGPA", ylab = "Frequency")


# Density plot of CGPA
plot(density(data$CGPA), main = "Density Plot of CGPA", xlab = "CGPA", ylab = "Density")


# 2. Categorical Data Plots
# Bar plot of Sex
barplot(table(data$Sex), main = "Count by Sex", xlab = "Sex", ylab = "Count")


# Bar plot of Group
barplot(table(data$Group), main = "Count by Group", xlab = "Group", ylab = "Count")


# Bar plot of AssignedTeacher
barplot(table(data$AssignedTeacher), main = "Count by Assigned Teacher", 
        xlab = "Teacher", ylab = "Count", las = 2)


# 3. Comparative Plots
# Boxplot of CGPA by Sex
boxplot(CGPA ~ Sex, data = data, main = "CGPA by Sex", xlab = "Sex", ylab = "CGPA")


# Boxplot of CGPA by Group
boxplot(CGPA ~ Group, data = data, main = "CGPA by Group", xlab = "Group", ylab = "CGPA")


# Part 2C: Inferential Statistics
# ===============================

# 1. Mean Comparisons
cat("\nT-test: CGPA by Sex\n")

T-test: CGPA by Sex
print(t.test(CGPA ~ Sex, data = data))

    Welch Two Sample t-test

data:  CGPA by Sex
t = 1.7854, df = 70.874, p-value = 0.07848
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
 -0.01995516  0.36152849
sample estimates:
mean in group F mean in group M 
       3.521967        3.351180 
cat("\nANOVA: CGPA by Group\n")

ANOVA: CGPA by Group
print(summary(aov(CGPA ~ Group, data = data)))
            Df Sum Sq Mean Sq F value Pr(>F)    
Group        3 13.670   4.557   203.1 <2e-16 ***
Residuals   76  1.705   0.022                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
cat("\nANOVA: CGPA by AssignedTeacher\n")

ANOVA: CGPA by AssignedTeacher
print(summary(aov(CGPA ~ AssignedTeacher, data = data)))
                Df Sum Sq Mean Sq F value Pr(>F)
AssignedTeacher 22   1.78 0.08091   0.339  0.997
Residuals       57  13.60 0.23850               
# 2. Association Tests
cat("\nChi-square: Sex vs Group\n")

Chi-square: Sex vs Group
print(chisq.test(table(data$Sex, data$Group)))

    Pearson's Chi-squared test

data:  table(data$Sex, data$Group)
X-squared = 4.0095, df = 3, p-value = 0.2604
cat("\nChi-square: Sex vs AssignedTeacher\n")

Chi-square: Sex vs AssignedTeacher
print(chisq.test(table(data$Sex, data$AssignedTeacher)))

    Pearson's Chi-squared test

data:  table(data$Sex, data$AssignedTeacher)
X-squared = 25.244, df = 22, p-value = 0.2855
# Part 2D: Regression Analysis
# ============================

# Simple Linear Regression
cat("\nRegression: CGPA ~ Sex\n")

Regression: CGPA ~ Sex
print(summary(lm(CGPA ~ Sex, data = data)))

Call:
lm(formula = CGPA ~ Sex, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.29818 -0.22513  0.09743  0.28478  0.60182 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.5220     0.0796  44.244   <2e-16 ***
SexM         -0.1708     0.1007  -1.696   0.0938 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.436 on 78 degrees of freedom
Multiple R-squared:  0.03557,   Adjusted R-squared:  0.02321 
F-statistic: 2.877 on 1 and 78 DF,  p-value: 0.09384
cat("\nRegression: CGPA ~ Group\n")

Regression: CGPA ~ Group
print(summary(lm(CGPA ~ Group, data = data)))

Call:
lm(formula = CGPA ~ Group, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.48527 -0.07246  0.00333  0.10026  0.44173 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.83774    0.03123 122.887  < 2e-16 ***
Group2      -0.26913    0.04417  -6.094 4.23e-08 ***
Group3      -0.57900    0.04417 -13.110  < 2e-16 ***
Group4      -1.29947    0.05491 -23.668  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1498 on 76 degrees of freedom
Multiple R-squared:  0.8891,    Adjusted R-squared:  0.8847 
F-statistic: 203.1 on 3 and 76 DF,  p-value: < 2.2e-16
# Multiple Regression
cat("\nMultiple Regression: CGPA ~ Sex + Group + AssignedTeacher\n")

Multiple Regression: CGPA ~ Sex + Group + AssignedTeacher
full_model <- lm(CGPA ~ Sex + Group + AssignedTeacher, data = data)
print(summary(full_model))

Call:
lm(formula = CGPA ~ Sex + Group + AssignedTeacher, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.37629 -0.07189  0.01049  0.06907  0.29031 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        3.697292   0.079024  46.787  < 2e-16 ***
SexM               0.053618   0.042178   1.271   0.2092    
Group2            -0.278455   0.043500  -6.401 4.14e-08 ***
Group3            -0.588325   0.043500 -13.525  < 2e-16 ***
Group4            -1.321622   0.058344 -22.652  < 2e-16 ***
AssignedTeacherB   0.225250   0.102817   2.191   0.0329 *  
AssignedTeacherC   0.183846   0.103356   1.779   0.0810 .  
AssignedTeacherD   0.066154   0.103356   0.640   0.5249    
AssignedTeacherE   0.093191   0.104957   0.888   0.3786    
AssignedTeacherFT  0.033096   0.103356   0.320   0.7501    
AssignedTeacherG   0.041346   0.103356   0.400   0.6907    
AssignedTeacherH   0.130846   0.103356   1.266   0.2111    
AssignedTeacherI   0.007691   0.104957   0.073   0.9419    
AssignedTeacherJ   0.260404   0.103356   2.519   0.0148 *  
AssignedTeacherK   0.264250   0.102817   2.570   0.0130 *  
AssignedTeacherL   0.005017   0.114247   0.044   0.9651    
AssignedTeacherMT  0.202096   0.111863   1.807   0.0765 .  
AssignedTeacherN  -0.007777   0.112184  -0.069   0.9450    
AssignedTeacherO   0.175557   0.112184   1.565   0.1236    
AssignedTeacherP   0.143302   0.113299   1.265   0.2115    
AssignedTeacherQ   0.021557   0.112184   0.192   0.8484    
AssignedTeacherR   0.193096   0.111863   1.726   0.0901 .  
AssignedTeacherS   0.087351   0.114247   0.765   0.4479    
AssignedTeacherT   0.113557   0.112184   1.012   0.3160    
AssignedTeacherU   0.023890   0.112184   0.213   0.8322    
AssignedTeacherV   0.165351   0.114247   1.447   0.1537    
AssignedTeacherW   0.211429   0.111863   1.890   0.0642 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1454 on 53 degrees of freedom
Multiple R-squared:  0.9271,    Adjusted R-squared:  0.8914 
F-statistic: 25.93 on 26 and 53 DF,  p-value: < 2.2e-16
# Part 3A: Outlier Detection
# ==========================

# Detect outliers using IQR rule
Q1 <- quantile(data$CGPA, 0.25)
Q3 <- quantile(data$CGPA, 0.75)
IQR_val <- Q3 - Q1
lower <- Q1 - 1.5 * IQR_val
upper <- Q3 + 1.5 * IQR_val

outliers <- data[data$CGPA < lower | data$CGPA > upper, ]
cat("\nOutliers detected:", nrow(outliers), "\n")

Outliers detected: 4 
# Visualize outliers
boxplot(data$CGPA, main = "Boxplot of CGPA", ylab = "CGPA")

# Part 3B: Regression Diagnostics
# ===============================

# Diagnostic plots for full model
par(mfrow = c(2, 2))

plot(full_model)
par(mfrow = c(1, 1))


# Shapiro-Wilk test
cat("\nShapiro-Wilk Test on Residuals:\n")

Shapiro-Wilk Test on Residuals:
print(shapiro.test(residuals(full_model)))

    Shapiro-Wilk normality test

data:  residuals(full_model)
W = 0.97797, p-value = 0.1824
---
title: "R Notebook"
output: html_notebook
---
# Practical Data Operations in R: Sorting, Grouping, and Assignment

## Summary of Task
Write an R script to perform a practical student project distribution task.

## Objectives

1. **Read Student Data from Excel**  
   - The dataset includes:
     - Student names  
     - Exam roll numbers  
     - CGPA  

2. **Sort Students by CGPA (Descending Order)**  
   - Students are ranked from highest to lowest academic performance.

3. **Assign Students to Sequential Groups**  
   - Students are divided into groups where each group size equals the number of teachers (e.g., 23).
   - The final group may contain fewer students if the total number is not divisible by the number of teachers.

4. **Randomly Assign Teachers to Each Student**
   - **For full-size groups**:  
     - Each student is randomly assigned a **unique teacher**.
   - **For the last (smaller) group**:  
     - Only the first *m* teachers (*m* < total number of teachers) are considered.
     - Teacher assignments are randomized within this subset.

5. **Create an Output-Ready Dataset**
   - The final data frame includes:
     - Student information  
     - Merit position (implicitly defined by CGPA sorting)  
     - Group number  
     - Assigned teacher  

6. **Write Output to Excel**
   - Export the final project distribution dataset to an Excel file.



## Student–Teacher Assignment Flow

                 ┌───────────────────────────────┐
                 │   12 Students                 │
                 │   Sorted by CGPA (High → Low) │
                 └───────────────────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────┐
                 │   Divide into 4 Groups        │
                 │   (Each group size = 3)       │
                 └───────────────────────────────┘
                                   │
                                   ▼
        ┌──────────────────┬──────────────────┬──────────────────┬───────────────────┐
        │     Group 1      │     Group 2      │     Group 3      │     Group 4       │
        │   Top 3 Students │  Next 3 Students │  Next 3 Students │ Bottom 3 Students │
        └──────────────────┴──────────────────┴──────────────────┴───────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────┐
                 │     FOR EACH GROUP            │
                 │                               │
                 │  • Take 3 students            │
                 │  • Shuffle teachers A, B, C   │
                 │  • Assign sequentially:       │
                 │      Student 1 → Teacher 1    │
                 │      Student 2 → Teacher 2    │
                 │      Student 3 → Teacher 3    │
                 └───────────────────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────┐
                 │        FINAL RESULT           │
                 └───────────────────────────────┘
                                   │
                                   ▼
        ┌──────────────────┬──────────────────┬──────────────────┐
        │    Teacher A     │    Teacher B     │    Teacher C     │
        │ 1 Top + 1 Good   │ 1 Top + 1 Good   │ 1 Top + 1 Good   │
        │ +1 Avg +1 Weak   │ +1 Avg +1 Weak   │ +1 Avg +1 Weak   │
        └──────────────────┴──────────────────┴──────────────────┘

```{r}
require(readxl)
library(writexl)
project.df<-read_excel("C:\\Users\\Dell\\OneDrive\\Desktop\\STATISTICS\\2nd Year\\R programming\\varsity\\Project_Distribution_Data.xlsx")

# View data structure
str(project.df)
head(project.df)
```
Step 1: Sort Data in descending order of CGPA
```{r}
sorted_df <- project.df[order(project.df$CGPA, decreasing = TRUE), ]
sorted_df
```
Step 2: Vector of teacher initials (suppose we have currently 23 techers in active service)
```{r}
teachers <- LETTERS[1:23]
teachers
```
Step 3: Assign students to groups sequentially, starting from 1, 2, 3… and repeating, based on the number of teachers
```{r}
n_stu <- nrow(sorted_df)
n_tea <- length(teachers)
n_groups <- ceiling(n_stu / n_tea)
sorted_df$Group <- rep(1:n_groups, each = n_tea, length.out = n_stu)
sorted_df
```
Step 4: block-wise random assignment of teachers!
```{r}
set.seed(209)
sorted_df$AssignedT <- NA
groups <- unique(sorted_df$Group)    # Get all unique group numbers


for (g in groups) {
  # Find indices of students in this group
  idx <- which(Group == g)
  n_stu_group <- length(idx)
  
  # Randomly assign teachers to these students
  # Since teachers repeat across groups, we can sample all teachers
  sorted_df$AssignedT[idx] <- sample(teachers)[1:n_stu_group]
}
  

```
Step 5: Check Results
```{r}
head(sorted_df, 20)

# Check distribution
table(sorted_df$AssignedT)

# Verify no teacher gets too many students
teacher_counts <- table(sorted_df$AssignedT)
summary(as.numeric(teacher_counts))
```
Step 6: Save/ Write df to an Excel file
```{r}
write_xlsx(sorted_df, "Assigned_Projects.xlsx")
```

# R Data Analysis Project: Student Performance Dataset

## Project Overview
You are given a dataset containing student information including Exam Roll numbers, Sex, CGPA, Group assignment, and Assigned Teacher. Your task is to prepare this data for analysis and perform a comprehensive statistical investigation using R.

---

## Part 1: Data Preparation & Variable Type Conversion

### Instructions
Before conducting any analysis, you must properly convert variable types in R. **Explain your reasoning** for each conversion in your code comments.

### Variables to Convert

1. **ExamRoll**
   - Currently: `numeric`
   - Convert to: `character` (or `factor`)
   - **Reason**: Exam roll numbers are identifiers, not numeric quantities for mathematical operations. Keeping them numeric could cause unintended statistical operations.

2. **Sex**
   - Currently: `character`
   - Convert to: `factor`
   - **Reason**: This is a categorical variable with fixed levels (e.g., "M", "F").

3. **CGPA**
   - Currently: `numeric`
   - Keep as: `numeric`
   - **Reason**: Continuous quantitative variable suitable for mathematical operations.

4. **Group**
   - Currently: `numeric`
   - Convert to: `factor`
   - **Reason**: Represents class/section categories (1, 2, 3…), not measurable quantities.

5. **AssignedTeacher**
   - Currently: `character`
   - Convert to: `factor`
   - **Reason**: Teacher names or initials represent categorical groups.

---

## Part 2: Statistical Analysis

### A. Descriptive Statistics
Perform the following analyses and report your findings:

1. **Univariate Statistics for CGPA**
   - Mean
   - Median
   - Standard deviation

2. **Frequency Distributions**
   - Sex
   - Group
   - AssignedTeacher

3. **Group Comparisons**
   - CGPA by Sex (mean, SD, count)
   - CGPA by Group (mean, SD, count)

---

### B. Data Visualization
Create the following plots with appropriate titles and labels:

1. **Distribution Plots**
   - Histogram of CGPA
   - Density plot of CGPA

2. **Categorical Data Plots**
   - Bar plot of Sex
   - Bar plot of Group
   - Bar plot of AssignedTeacher

3. **Comparative Plots**
   - Boxplot of CGPA by Sex
   - Boxplot of CGPA by Group

---

### C. Inferential Statistics
Perform the following hypothesis tests and interpret the results:

1. **Mean Comparisons**
   - Independent samples t-test: CGPA by Sex
   - One-way ANOVA: CGPA by Group
   - One-way ANOVA: CGPA by AssignedTeacher

2. **Association Tests**
   - Chi-square test: Sex vs Group
   - Chi-square test: Sex vs AssignedTeacher

---

### D. Regression Analysis

1. **Simple Linear Regression**
   - Model 1: CGPA ~ Sex
   - Model 2: CGPA ~ Group

2. **Multiple Regression**
   - Full model: CGPA ~ Sex + Group + AssignedTeacher

---

## Part 3: Diagnostic Checks & Assumptions

### A. Outlier Detection
- Detect outliers in CGPA using the IQR rule
- Visualize outliers using boxplots

### B. Regression Diagnostics
For the multiple regression model (`CGPA ~ Sex + Group + AssignedTeacher`), check:

1. **Linearity**
   - Residuals vs Fitted plot

2. **Normality**
   - Q–Q plot of residuals
   - Shapiro–Wilk test on residuals

3. **Homoscedasticity**
   - Scale–Location plot

4. **Independence**
   - Durbin–Watson test (if appropriate)

5. **Multicollinearity**
   - Variance Inflation Factor (VIF)

---

## Part 4: Submission Requirements

### What to Submit

1. **R Script**
   - Contains all code with clear and meaningful comments

2. **Report** (R Markdown / Word / PDF)
   - Introduction and objectives
   - Data preparation summary
   - Key findings from each analysis
   - Interpretation of statistical results
   - Visualizations with captions
   - Conclusions and limitations

---

### Code Organization
Your R script should follow this structure:


## Code Organization Structure

### Part 1: Data Preparation
- Variable type conversion  
- Data cleaning (if required)

---

### Part 2A: Descriptive Statistics
- Summary statistics for CGPA  
- Frequency tables for categorical variables  
- Group-wise summaries

---

### Part 2B: Data Visualization
- Distribution plots (histogram, density plot)  
- Bar plots for categorical variables  
- Boxplots for group comparisons

---

### Remaining Sections
- Inferential statistics  
- Regression analysis  
- Diagnostic checks and assumption testing  

> **Note:** Each section in the R script should be clearly separated using comments and written in a logical, sequential manner for readability and grading.




## Student Performance Analysis – Workflow Flowchart

                 ┌───────────────────────────────────────────┐
                 │            START PROJECT                  │
                 │   Student Performance Dataset in R        │
                 └───────────────────────────────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────────────────┐
                 │            LOAD PACKAGES                  │
                 │     readxl, dplyr                         │
                 └───────────────────────────────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────────────────┐
                 │              READ DATA                    │
                 │  Import Excel Student Dataset             │
                 └───────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │              PART 1: DATA PREPARATION                  │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Convert Variable Types                                 │
        │                                                        │
        │ ExamRoll         → character (identifier)              │
        │ Sex              → factor (categorical)                │
        │ CGPA             → numeric (continuous)                │
        │ Group            → factor (category)                   │
        │ AssignedTeacher  → factor (category)                   │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────────────────┐
                 │          VERIFY STRUCTURE                 │
                 │               str(data)                   │
                 └───────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │        PART 2A: DESCRIPTIVE STATISTICS                 │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Univariate Statistics (CGPA)                           │
        │ Mean • Median • Standard Deviation                     │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Frequency Distributions                                │
        │ Sex • Group • AssignedTeacher                          │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Group-wise CGPA Comparison                             │
        │ CGPA by Sex  (Mean, SD, Count)                         │
        │ CGPA by Group (Mean, SD, Count)                        │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │        PART 2B: DATA VISUALIZATION                     │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Distribution Plots                                     │
        │ Histogram of CGPA                                      │
        │ Density Plot of CGPA                                   │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Categorical Bar Plots                                  │
        │ Sex • Group • AssignedTeacher                          │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Comparative Boxplots                                   │
        │ CGPA by Sex                                            │
        │ CGPA by Group                                          │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │        PART 2C: INFERENTIAL STATISTICS                 │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Mean Comparison Tests                                  │
        │ t-test      : CGPA ~ Sex                               │
        │ ANOVA       : CGPA ~ Group                             │
        │ ANOVA       : CGPA ~ AssignedTeacher                   │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Association Tests                                      │
        │ Chi-square: Sex vs Group                               │
        │ Chi-square: Sex vs AssignedTeacher                     │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │        PART 2D: REGRESSION ANALYSIS                    │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Simple Regression Models                               │
        │ CGPA ~ Sex                                             │
        │ CGPA ~ Group                                           │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Multiple Regression Model                              │
        │ CGPA ~ Sex + Group + AssignedTeacher                   │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │        PART 3A: OUTLIER DETECTION                      │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ IQR Rule for CGPA                                      │
        │ Identify and Count Outliers                            │
        │ Boxplot Visualization                                  │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │        PART 3B: REGRESSION DIAGNOSTICS                 │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
        ┌────────────────────────────────────────────────────────┐
        │ Diagnostic Checks                                      │
        │ Residual Plots                                         │
        │ Normality (Q-Q Plot, Shapiro-Wilk Test)                │
        └────────────────────────────────────────────────────────┘
                                   │
                                   ▼
                 ┌───────────────────────────────────────────┐
                 │            END OF ANALYSIS                │
                 │      Results Ready for Report             │
                 └───────────────────────────────────────────┘




```{r}
# ===================================================================
# Student Performance Analysis - R Script
# ===================================================================

# Load required packages
library(readxl)
library(dplyr)

# Read data
data <- read_excel("C:/Users/Dell/OneDrive/Desktop/STATISTICS/2nd Year/R programming/varsity/Project_Distribution.xlsx")

# Part 1: Data Preparation
# =========================

# Convert variable types
# ExamRoll: Convert to character (identifiers, not for calculations)
data$ExamRoll <- as.character(data$ExamRoll)

# Sex: Convert to factor (categorical variable with fixed levels)
data$Sex <- as.factor(data$Sex)

# CGPA: Keep as numeric (continuous quantitative variable)
# No conversion needed

# Group: Convert to factor (class categories, not measurable quantities)
data$Group <- as.factor(data$Group)

# AssignedTeacher: Convert to factor (teacher names as categorical groups)
data$AssignedTeacher <- as.factor(data$AssignedTeacher)

# Check conversions
str(data)

# Part 2A: Descriptive Statistics
# ===============================

# 1. Univariate Statistics for CGPA
cat("CGPA Statistics:\n")
cat("Mean:", mean(data$CGPA), "\n")
cat("Median:", median(data$CGPA), "\n")
cat("Standard Deviation:", sd(data$CGPA), "\n\n")

# 2. Frequency Distributions
cat("Frequency Tables:\n")
cat("Sex:\n")
print(table(data$Sex))
cat("\nGroup:\n")
print(table(data$Group))
cat("\nAssignedTeacher:\n")
print(table(data$AssignedTeacher))
cat("\n")

# 3. Group Comparisons
cat("CGPA by Sex:\n")
print(data %>% group_by(Sex) %>% 
      summarise(Mean = mean(CGPA), SD = sd(CGPA), Count = n()))

cat("\nCGPA by Group:\n")
print(data %>% group_by(Group) %>% 
      summarise(Mean = mean(CGPA), SD = sd(CGPA), Count = n()))

# Part 2B: Data Visualization
# ============================

# 1. Distribution Plots
# Histogram of CGPA
hist(data$CGPA, main = "Histogram of CGPA", xlab = "CGPA", ylab = "Frequency")

# Density plot of CGPA
plot(density(data$CGPA), main = "Density Plot of CGPA", xlab = "CGPA", ylab = "Density")

# 2. Categorical Data Plots
# Bar plot of Sex
barplot(table(data$Sex), main = "Count by Sex", xlab = "Sex", ylab = "Count")

# Bar plot of Group
barplot(table(data$Group), main = "Count by Group", xlab = "Group", ylab = "Count")

# Bar plot of AssignedTeacher
barplot(table(data$AssignedTeacher), main = "Count by Assigned Teacher", 
        xlab = "Teacher", ylab = "Count", las = 2)

# 3. Comparative Plots
# Boxplot of CGPA by Sex
boxplot(CGPA ~ Sex, data = data, main = "CGPA by Sex", xlab = "Sex", ylab = "CGPA")

# Boxplot of CGPA by Group
boxplot(CGPA ~ Group, data = data, main = "CGPA by Group", xlab = "Group", ylab = "CGPA")

# Part 2C: Inferential Statistics
# ===============================

# 1. Mean Comparisons
cat("\nT-test: CGPA by Sex\n")
print(t.test(CGPA ~ Sex, data = data))

cat("\nANOVA: CGPA by Group\n")
print(summary(aov(CGPA ~ Group, data = data)))

cat("\nANOVA: CGPA by AssignedTeacher\n")
print(summary(aov(CGPA ~ AssignedTeacher, data = data)))

# 2. Association Tests
cat("\nChi-square: Sex vs Group\n")
print(chisq.test(table(data$Sex, data$Group)))

cat("\nChi-square: Sex vs AssignedTeacher\n")
print(chisq.test(table(data$Sex, data$AssignedTeacher)))

# Part 2D: Regression Analysis
# ============================

# Simple Linear Regression
cat("\nRegression: CGPA ~ Sex\n")
print(summary(lm(CGPA ~ Sex, data = data)))

cat("\nRegression: CGPA ~ Group\n")
print(summary(lm(CGPA ~ Group, data = data)))

# Multiple Regression
cat("\nMultiple Regression: CGPA ~ Sex + Group + AssignedTeacher\n")
full_model <- lm(CGPA ~ Sex + Group + AssignedTeacher, data = data)
print(summary(full_model))

# Part 3A: Outlier Detection
# ==========================

# Detect outliers using IQR rule
Q1 <- quantile(data$CGPA, 0.25)
Q3 <- quantile(data$CGPA, 0.75)
IQR_val <- Q3 - Q1
lower <- Q1 - 1.5 * IQR_val
upper <- Q3 + 1.5 * IQR_val

outliers <- data[data$CGPA < lower | data$CGPA > upper, ]
cat("\nOutliers detected:", nrow(outliers), "\n")

# Visualize outliers
boxplot(data$CGPA, main = "Boxplot of CGPA", ylab = "CGPA")

# Part 3B: Regression Diagnostics
# ===============================

# Diagnostic plots for full model
par(mfrow = c(2, 2))
plot(full_model)
par(mfrow = c(1, 1))

# Shapiro-Wilk test
cat("\nShapiro-Wilk Test on Residuals:\n")
print(shapiro.test(residuals(full_model)))
```

