Social Media Impact on Student

use of library

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.3

Introduction:- In this Data set we are Analysis the students stress level , daily social media use ,sleep hours, screen time before sleep ,depression , anxiety, addiction and also physical activity

Teen_health<-read.csv("C:/R studio/Project/ca2/Teen_Mental_Health_Dataset.csv")
View(Teen_health)

———————————————–———————————————–———————————————–———————————————–———————————————– Level 1: Understanding the Data (Basic Exploration) ————————————————————————————————————————————————————————————————————————–——————— Question 1.1: What is the structure of the dataset (number of rows, columns, and data types)? column name,head and tail top6,bottom 6 fetch data

#1structure of dataset
str(Teen_health)
## 'data.frame':    1200 obs. of  13 variables:
##  $ age                     : int  14 19 17 15 15 19 18 16 19 15 ...
##  $ gender                  : chr  "male" "female" "female" "male" ...
##  $ daily_social_media_hours: num  7.9 1.9 1.3 7.4 4.7 7.4 2.5 4 3.3 1.9 ...
##  $ platform_usage          : chr  "Instagram" "TikTok" "Instagram" "TikTok" ...
##  $ sleep_hours             : num  7.4 8 7.6 6.9 4.9 4.4 6.4 4.2 5 4.9 ...
##  $ screen_time_before_sleep: num  2.9 2.9 0.5 1.6 3 2.4 2.4 0.5 2.1 1.5 ...
##  $ academic_performance    : num  3.01 3.22 3.92 3.48 2.37 2.63 2.63 2.4 2.04 3.77 ...
##  $ physical_activity       : num  1.5 0.8 0 0.8 1.4 0.6 0.7 1.3 0.9 1.1 ...
##  $ social_interaction_level: chr  "low" "high" "high" "medium" ...
##  $ stress_level            : int  2 8 2 1 3 3 2 6 1 1 ...
##  $ anxiety_level           : int  2 1 4 7 5 5 2 10 10 1 ...
##  $ addiction_level         : int  1 10 2 9 2 7 5 5 9 4 ...
##  $ depression_label        : int  0 0 0 0 0 0 0 0 0 0 ...
#column names
names(Teen_health)
##  [1] "age"                      "gender"                  
##  [3] "daily_social_media_hours" "platform_usage"          
##  [5] "sleep_hours"              "screen_time_before_sleep"
##  [7] "academic_performance"     "physical_activity"       
##  [9] "social_interaction_level" "stress_level"            
## [11] "anxiety_level"            "addiction_level"         
## [13] "depression_label"
#first& lastrows
head(Teen_health)
##   age gender daily_social_media_hours platform_usage sleep_hours
## 1  14   male                      7.9      Instagram         7.4
## 2  19 female                      1.9         TikTok         8.0
## 3  17 female                      1.3      Instagram         7.6
## 4  15   male                      7.4         TikTok         6.9
## 5  15 female                      4.7           Both         4.9
## 6  19 female                      7.4           Both         4.4
##   screen_time_before_sleep academic_performance physical_activity
## 1                      2.9                 3.01               1.5
## 2                      2.9                 3.22               0.8
## 3                      0.5                 3.92               0.0
## 4                      1.6                 3.48               0.8
## 5                      3.0                 2.37               1.4
## 6                      2.4                 2.63               0.6
##   social_interaction_level stress_level anxiety_level addiction_level
## 1                      low            2             2               1
## 2                     high            8             1              10
## 3                     high            2             4               2
## 4                   medium            1             7               9
## 5                   medium            3             5               2
## 6                     high            3             5               7
##   depression_label
## 1                0
## 2                0
## 3                0
## 4                0
## 5                0
## 6                0
tail(Teen_health)
##      age gender daily_social_media_hours platform_usage sleep_hours
## 1195  17   male                      2.0           Both         4.5
## 1196  18 female                      6.8      Instagram         6.6
## 1197  16   male                      2.3           Both         8.0
## 1198  14 female                      1.7           Both         8.7
## 1199  15   male                      3.9           Both         8.5
## 1200  16 female                      4.7         TikTok         6.5
##      screen_time_before_sleep academic_performance physical_activity
## 1195                      1.7                 2.65               0.0
## 1196                      2.0                 2.76               1.0
## 1197                      1.9                 2.12               0.4
## 1198                      0.7                 3.98               0.8
## 1199                      2.1                 3.19               0.6
## 1200                      1.0                 2.91               0.9
##      social_interaction_level stress_level anxiety_level addiction_level
## 1195                   medium            9             4               2
## 1196                      low            3             4               4
## 1197                     high            7             4               4
## 1198                     high            1             1               1
## 1199                     high            7             9               9
## 1200                   medium            5             7               3
##      depression_label
## 1195                0
## 1196                0
## 1197                0
## 1198                0
## 1199                0
## 1200                0
#2Missing values
colSums(is.na(Teen_health))
##                      age                   gender daily_social_media_hours 
##                        0                        0                        0 
##           platform_usage              sleep_hours screen_time_before_sleep 
##                        0                        0                        0 
##     academic_performance        physical_activity social_interaction_level 
##                        0                        0                        0 
##             stress_level            anxiety_level          addiction_level 
##                        0                        0                        0 
##         depression_label 
##                        0

Interpretation: In this i have learn that 1200 records and 13 columns innthis show how many male and female because of use of phone and less sleep stress, anxiety,depression increasing day by day and also know name of colums The dataset has no missing values, ensuring complete data for analysis. This eliminates the need for imputation and allows seamless filtering, grouping, and feature engineering.

———————————————–———————————————–———————————————–———————————————–———————————————– Level 2:Understanding the Data and converting into category ———————————————–———————————————–———————————————–———————————————–———————————————– Question:-A data analyst wants to summarize the Teen_health dataset, check its size, and ensure that categorical variables like gender are properly formatted for analysis.

#Summary
summary(Teen_health)
##       age           gender          daily_social_media_hours platform_usage    
##  Min.   :13.00   Length:1200        Min.   :1.000            Length:1200       
##  1st Qu.:14.00   Class :character   1st Qu.:2.800            Class :character  
##  Median :16.00   Mode  :character   Median :4.500            Mode  :character  
##  Mean   :15.93                      Mean   :4.537                              
##  3rd Qu.:18.00                      3rd Qu.:6.300                              
##  Max.   :19.00                      Max.   :8.000                              
##   sleep_hours    screen_time_before_sleep academic_performance
##  Min.   :4.000   Min.   :0.50             Min.   :2.00        
##  1st Qu.:5.200   1st Qu.:1.10             1st Qu.:2.50        
##  Median :6.500   Median :1.80             Median :2.99        
##  Mean   :6.449   Mean   :1.74             Mean   :2.99        
##  3rd Qu.:7.600   3rd Qu.:2.40             3rd Qu.:3.48        
##  Max.   :9.000   Max.   :3.00             Max.   :4.00        
##  physical_activity social_interaction_level  stress_level    anxiety_level   
##  Min.   :0.000     Length:1200              Min.   : 1.000   Min.   : 1.000  
##  1st Qu.:0.500     Class :character         1st Qu.: 3.000   1st Qu.: 3.000  
##  Median :1.000     Mode  :character         Median : 5.000   Median : 6.000  
##  Mean   :1.014                              Mean   : 5.446   Mean   : 5.637  
##  3rd Qu.:1.500                              3rd Qu.: 8.000   3rd Qu.: 8.000  
##  Max.   :2.000                              Max.   :10.000   Max.   :10.000  
##  addiction_level  depression_label 
##  Min.   : 1.000   Min.   :0.00000  
##  1st Qu.: 3.000   1st Qu.:0.00000  
##  Median : 6.000   Median :0.00000  
##  Mean   : 5.565   Mean   :0.02583  
##  3rd Qu.: 8.000   3rd Qu.:0.00000  
##  Max.   :10.000   Max.   :1.00000
#dimensions
dim(Teen_health)
## [1] 1200   13
#Convert gender to factor
Teen_health$gender <- as.factor(Teen_health$gender)

Interpretation: I can see statistical details like minimum, maximum, mean, and also frequency for categorical data. find the number of rows and columns in the dataset. convert the gender column into categorical data so that it can be used properly in analysis and graphs. Converting to factor is important because it helps in grouping, comparison, and visualization (like bar charts).

———————————————–———————————————–———————————————–———————————————–———————————————– Level 3:filtering and small analysis ————————————————————————————————————————————————————————————————————————–———————

Question:-A student wants to identify with high stress levels and extract their basic details like age, gender, and stress level for further analysis.

#3High stress students
high_stress <- Teen_health %>%
  filter(stress_level > 7) %>%
  select(age, gender, stress_level)
head(high_stress)
##   age gender stress_level
## 1  19 female            8
## 2  16   male           10
## 3  16 female           10
## 4  14 female            8
## 5  18   male            8
## 6  18 female            9

Interpretation:to find students with stress level greater than 7.use select only to columnlike age, gender, and stress level. ————————————————————————————————————————————————————————————————————————–———————

Question3.1:-identify the top 10 students with the highest stress levels and extract their basic details like age, gender, and stress level for deeper analysis.

#4top 10 highest stress students
top_stress <- Teen_health %>%
  arrange(desc(stress_level)) %>%
  select(age, gender, stress_level) %>%
  head(10)
top_stress
##    age gender stress_level
## 1   16   male           10
## 2   16 female           10
## 3   13 female           10
## 4   14   male           10
## 5   19   male           10
## 6   15   male           10
## 7   14   male           10
## 8   17 female           10
## 9   17 female           10
## 10  13   male           10

Interpretation:-identify students with the highest stress levels by sorting the dataset in descending order.only relevant columns such as age, gender, and stress level using select().retrieve the top 10 students with the highest stress ————————————————————————————————————————————————————————————————————————–——————— Question3.2 want to fetech data that stress levels to identify which students are most and least stressed.

#5ranking students by stress
rank_stress <- Teen_health %>%
  arrange(desc(stress_level)) %>%
  mutate(rank = row_number())
head(rank_stress)
##   age gender daily_social_media_hours platform_usage sleep_hours
## 1  16   male                      3.1           Both         6.1
## 2  16 female                      6.7           Both         6.8
## 3  13 female                      6.6         TikTok         7.3
## 4  14   male                      6.4      Instagram         5.7
## 5  19   male                      1.6      Instagram         8.6
## 6  15   male                      4.0           Both         8.8
##   screen_time_before_sleep academic_performance physical_activity
## 1                      0.8                 2.11               1.9
## 2                      1.9                 3.08               1.4
## 3                      1.6                 3.27               0.1
## 4                      0.9                 2.06               1.8
## 5                      2.4                 3.81               0.4
## 6                      1.3                 3.59               1.9
##   social_interaction_level stress_level anxiety_level addiction_level
## 1                     high           10             7              10
## 2                     high           10             9               1
## 3                   medium           10             2               1
## 4                     high           10             4               2
## 5                      low           10             4               1
## 6                   medium           10             6              10
##   depression_label rank
## 1                0    1
## 2                0    2
## 3                0    3
## 4                0    4
## 5                0    5
## 6                0    6

Interpretation:-create new column use arrange descending use mutaate and fetch data of top most who have most stress

————————————————————————————————————————————————————————————————————————–——————— Level 4: Data Visualization and Trend Analysis ————————————————————————————————————————————————————————————————————————–——————— Question:- wants to analyze the distribution of stress levels among students to understand how stress is spread across the dataset.

#7distribution of stress levels
ggplot(Teen_health, aes(x = stress_level)) +
  geom_histogram(binwidth = 2, fill = "red", color = "black") +
  labs(title = "Distribution of Stress Levels")

Interpretation:-By use of ggplot and histogram learning how trends of most students have low, medium, or high stress levels. Visualization makes easy to understand patterns and trends in the data. ————————————————————————————————————————————————————————————————————————–——————— Level 5: Feature Engineering and creating new column ————————————————————————————————————————————————————————————————————————–——————— Question:-analyst wants to classify students into stress categories (High, Medium, Low) based on their stress level no of student in each category

#8stress categories
Teen_health$stress_group <- ifelse(Teen_health$stress_level >= 7, "High",
                                   ifelse(Teen_health$stress_level >= 4, 
                                          "Medium", "Low"))
ggplot(Teen_health, aes(x = stress_group, fill = stress_group)) +
  geom_bar() +
  labs(title = "Stress Categories")

Interpretation:- learnt how to make new column and convert into categories for analysis use of barchar to comparise between data use if ifelse i easy to covert into (high, medium and low )category

Question:-checking relationship between daily social media usage and stress levels usage leads to higher stress.

#9social media vs stress
ggplot(Teen_health, aes(x = daily_social_media_hours, y = stress_level)) +
  geom_point(color = "blue") +
  labs(title = "Social Media vs Stress")

Interpretation:-The scatter plot shows no clear relationship between daily social media hours and stress level. The data points are randomly scattered, and there is no visible slope—neither positive, negative, nor constant.

Question:- wants to compare the average stress levels between different genders to understand if stress varies by gender.

#10average stress by gender
gender_avg <- Teen_health %>%
  group_by(gender) %>%
  summarise(avg_stress = mean(stress_level))
ggplot(gender_avg, aes(x = gender, y = avg_stress, fill = gender)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Stress by Gender")

Interpretation:- in this i have learnt how to group columns make bar char to compare between gender and find avg and anlysis dataset summaerise it ————————————————————————————————————————————————————————————————————————–——————— Data Visualization and Trend Analysis ————————————————————————————————————————————————————————————————————————–——————— Question:- wants to analyze the distribution of sleep hours among students to understand their sleeping pattern

#11sleep distribution
ggplot(Teen_health, aes(x = sleep_hours)) +
  geom_histogram(binwidth = 1, fill = "green", color = "black") +
  labs(title = "Sleep Hours Distribution")

Interpretation:- in this with the use of hisogram i learnt how sleep hours are spread across different ranges.it help whether most students get adequate sleep or not. help to understand beahviour

Question:-i wants to check the relationship between sleep hours and stress levels among students to determine whether less sleep leads to higher stress.

#12sleep vs stress
ggplot(Teen_health, aes(x = sleep_hours, y = stress_level)) +
  geom_jitter(color = "darkblue", width = 0.2, height = 0.2) +
  geom_smooth(method = "lm", color = "red", se = FALSE)
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:-the scatter plot with a regression line shows a slight negative relationship between sleep hours and stress level. As sleep hours increase, stress level tends to decrease slightly, but the trend is very weak.

Question:-analyze how screen time before sleep affects students’ total sleep hours to understand its impact on sleep patterns.

#13screen time before sleep vs after sleep
ggplot(Teen_health, aes(x = screen_time_before_sleep, y = sleep_hours)) +
  geom_point(color = "purple") +
  geom_smooth(method = "lm", color = "blue", se = FALSE)+
  labs(title = "Screen Time Before Sleep vs Sleep",
       x = "Screen Time Before Sleep",
       y = "Sleep Hours")
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:-The scatter plot with regression line shows a slight positive relationship between screen time before sleep and sleep hours. As screen time increases, sleep hours also increase slightly, but the relationship is very weak

Question:-checking relationship between physical activity and stress levels to check whether increased physical activity reduces stress among students.

#14physical activity vs stress
ggplot(Teen_health, aes(x = physical_activity, y = stress_level)) +
  geom_point(color = "orange") +
  geom_smooth(method = "lm", color = "blue") +
  labs(title = "Physical Activity vs Stress")
## `geom_smooth()` using formula = 'y ~ x'

Interpretation:-The scatter plot with regression line shows a very slight positive relationship between physical activity and stress level. However, the slope is almost flat, indicating that the relationship is extremely weak and practically negligible

Question:-wants to analyze which social media platforms are most used by students and compare the number of users for each platform.

#15graph for use of socal media
ggplot(Teen_health, aes(x = platform_usage, fill = platform_usage)) +
  geom_bar() +
  labs(title = "Number of Students Using Each Social Media App",
       x = "Social Media Platform",
       y = "Number of Students") 

Interpretation:-visualize categorical data using a bar chart to compare the number of students using different social media platforms.I was able to see which platform is most popular among students

CA 3 Question start Question :- find Correlation in male how much % male use wihch platfrom

male_data <- Teen_health[Teen_health$gender == "male", ]

values <- table(male_data$platform_usage)
labels <- paste(names(values),
                round(values/sum(values)*100), "%")

pie(values, labels = labels,
    col = rainbow(length(values)),
    main = "Male Platform Usage (%)")

Interpretation:-I used a bar chart to compare how many male students use different social media platforms. From this chart i see which platform is the most popular among the male students.

Question:- How do sleep hours affect the stress level of students?

model <- lm(stress_level ~ sleep_hours, data = Teen_health)
summary(model)
## 
## Call:
## lm(formula = stress_level ~ sleep_hours, data = Teen_health)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5000 -2.4469 -0.4094  2.5487  4.6105 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.58833    0.38422   14.54   <2e-16 ***
## sleep_hours -0.02209    0.05814   -0.38    0.704    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.904 on 1198 degrees of freedom
## Multiple R-squared:  0.0001205,  Adjusted R-squared:  -0.0007141 
## F-statistic: 0.1444 on 1 and 1198 DF,  p-value: 0.704
predict(model, newdata = data.frame(sleep_hours = 6))
##        1 
## 5.455763

Interpretation:-Students who sleep more tend to have weak relation stress levels. This means proper sleep helps reduce stress and if less sleep high stress level for that you have to sleep well p < 0.05 → significant other Only 0.012%

Question:-What relationship exists between daily social media hours and screen time before sleep?

pairs(Teen_health[, c("daily_social_media_hours", "screen_time_before_sleep")], main = "Pair Plot: Social Media vs Screen Time", pch = 19, col = "blue")

Interpretation:- The pair plot shows the relationship between daily social media usage and screen time before sleep. The data points are scattered randomly, indicating no clear or strong relationship between the two variables

Question:-What is the cumulative distribution of students’ physical activity levels?

x <- sort(Teen_health$physical_activity)
y <- cumsum(rep(1, length(x)))
plot(x, y,
     type = "l",
     col = "blue",
     main = "CFD of Physical Activity",
     xlab = "Physical Activity",
     ylab = "Cumulative Frequency")

interpretation:-The cumulative frequency graph shows how the number of students increases as physical activity levels increase.I line graph It helps in understanding how many students fall below or within a certain level of physical activity.”

Question:-How is stress level distributed among students and are there any outliers?

boxplot(Teen_health$stress_level,
        col = "lightblue",
        main = "Box Plot of Stress Level",
        ylab = "Stress Level")

Interpretation:-The box plot shows the distribution of stress levels among students. It displays the median, spread of data, and possible Outliers. It helps in understanding how stress levels vary among students.

Question:-What is the distribution of social media addiction levels among students?

# Create categories (Low, Medium, High addiction)
Teen_health$addiction_level <- case_when(
  Teen_health$daily_social_media_hours <= 2 ~ "Low",
  Teen_health$daily_social_media_hours <= 5 ~ "Medium",
  Teen_health$daily_social_media_hours <= 10 ~ "High"
)
values <- table(Teen_health$addiction_level)

labels <- paste(names(values),
                round(values/sum(values)*100), "%")

pie(values,
    labels = labels,
    col = c("green","yellow","red"),
    main = "Social Media Addiction Level (%)")

Interpretation:-The pie chart shows the percentage distribution of students based on their level of social media usage. It categorizes students into low, medium, and high addiction levels table count student

Question:-Is there a relationship between sleep hours and stress level among students?

cor(Teen_health$sleep_hours,
    Teen_health$stress_level)
## [1] -0.01097922

interpretation:-The correlation value is very close to 0, which indicates that there is no strong relationship between sleep hours ind and stress level dep .

Question:-Is there a relationship between social media usage and stress level among students?

cor(Teen_health[, c("sleep_hours",
                   "daily_social_media_hours",
                   "stress_level")])
##                           sleep_hours daily_social_media_hours stress_level
## sleep_hours               1.000000000             -0.009472174  -0.01097922
## daily_social_media_hours -0.009472174              1.000000000   0.03069774
## stress_level             -0.010979224              0.030697742   1.00000000

Interpretation:-The correlation value is close to 0, which indicates that there is no strong relationship between social media usage and stress level sleep vs social,sleep vs stress, social vs stress

Question:-How do sleep hours affect stress level among students?”

model1 <- lm(stress_level ~ sleep_hours, data = Teen_health)
summary(model1)
## 
## Call:
## lm(formula = stress_level ~ sleep_hours, data = Teen_health)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5000 -2.4469 -0.4094  2.5487  4.6105 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.58833    0.38422   14.54   <2e-16 ***
## sleep_hours -0.02209    0.05814   -0.38    0.704    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.904 on 1198 degrees of freedom
## Multiple R-squared:  0.0001205,  Adjusted R-squared:  -0.0007141 
## F-statistic: 0.1444 on 1 and 1198 DF,  p-value: 0.704

Interpretation:-The regression analysis shows that sleep hours do not have a significant effect on stress level dep . Although there is a slight negative relationship, it is very weak and not meaningful.

Question:-Does social media usage affect stress level among students?

model2 <- lm(stress_level ~ daily_social_media_hours, data = Teen_health)
summary(model2)
## 
## Call:
## lm(formula = stress_level ~ daily_social_media_hours, data = Teen_health)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5979 -2.4541 -0.3476  2.5426  4.7051 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               5.24662    0.20529  25.557   <2e-16 ***
## daily_social_media_hours  0.04391    0.04131   1.063    0.288    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.903 on 1198 degrees of freedom
## Multiple R-squared:  0.0009424,  Adjusted R-squared:  0.0001084 
## F-statistic:  1.13 on 1 and 1198 DF,  p-value: 0.288

Interpretation:-The regression analysis shows that social media usage does not have a significant effect on stress level. The relationship between the two variables is very weak.

Question:-How do sleep hours, social media usage, and physical activity together affect stress level among students?

model_multi <- lm(stress_level ~ sleep_hours +daily_social_media_hours + physical_activity,data = Teen_health)
summary(model_multi)
## 
## Call:
## lm(formula = stress_level ~ sleep_hours + daily_social_media_hours + 
##     physical_activity, data = Teen_health)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.6588 -2.4574 -0.3404  2.5474  4.7961 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               5.33157    0.45043  11.837   <2e-16 ***
## sleep_hours              -0.02181    0.05816  -0.375    0.708    
## daily_social_media_hours  0.04334    0.04135   1.048    0.295    
## physical_activity         0.05746    0.14417   0.399    0.690    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.905 on 1196 degrees of freedom
## Multiple R-squared:  0.001189,   Adjusted R-squared:  -0.001316 
## F-statistic: 0.4747 on 3 and 1196 DF,  p-value: 0.7

Interpretation:-The multiple regression analysis shows that none of the variables significantly affect stress level. The relationships are weak, and the model does not explain much variation in stress.

Conclusion:-