By:

  • Ammar Ateeq
  • Muhammad Hashim Naveed
  • Sidra Aziz

2 Motivation & Overview

Objective 1: The study’s primary objective is to find out a relation between the COVID-19 Lockdown effect on the mental health of students who have been taking online semesters, mainly in the Summer and Winter semesters of 2020/2021. When we talk about correlation, we mean to find out whether there was a pattern for students to struggle with online study schedules. This correlation is the potential behind what students have gone through during the year of confinement to dorms or shared home spaces, and whether good or bad, the effect could be found on their mental health as a result.

The aspects that should help us find this correlation should be the factors such as semester of the student, demographics such as whether they were stranded in their home country or staying in the dorm room, the communication process of steps to attending online classes by the institute, their internet connection as well their grade score quality. Furthermore, we would like to accommodate a subset of the GHQ-12 questionnaire where we ask the students questions about their concentration span, the capability of making decisions, feeling of unhappiness or depression, sleep deprivation, constant feeling of stress, and the financial situation of the students during this period, etc.

Objective 2: The second objective of the study is to perform and visualize a sentiment analysis using text data gathered using the survey and help find out students’ sentiments throughout their online study period that they express. For example, sentiment polarity scores from negative, neutral to positive, happiness, fear or trust, etc.

Objective 3: Our third objective aims to predict the grades of the students who are confined in similar demographic situations and share somewhat identical sentiments. This study will not indicate any student’s depression but only trying to understand whether there are adverse effects on students who have gone ahead for a year with little to no social interaction. At the same time had to go through a strenuous schedule of studies and/or part-time jobs.

Questionnaire: A Short Survey About Online Studies And Its Effects

3 Data

3.1 Data Cleaning & Manipulation

Data cleaning wasn’t as big an issue for such a dataset. Given the fact that the answers were fixed options and required, there were no NAs. Except in the questions which required reasoning. It made no sense to remove those rows as we would be losing out on a lot of information if we pulled all NAs for those types of questions.

Furthermore, all the categorical answers were converted to factor variable with consistent levels (No-Maybe-Yes).

3.2 Overview

This section will give a breif overview of the demographics of the students who filled out the questionnaire. A total of 58 students filled out the form. The graphs will illustrate what nationalities the students are of and the age range. The focus will shift towards the explanatory analysis of the data.

A total of 29 questions were asked in the questionnaire. The format of the options for the questions were ordinal or categorical.

The pie chart below shows that 45 out of the 58 students are studying in Germany. Exactly half of the students are in the age range of 26 to 30. Almost 40% are 21 to 25 years of age. The rest are either below 20 or above 30.

3.3 World Map of Students Nationalities

Most of the students are from the subcontinent region mainly Pakistan & India.

4 Exploratory Analysis

This section will look into the basic distribution of data amongst the different variables. This will show a glimpse of the answers of the students in terms of the selected variables.

4.1 Correlations Between Variables

This section will correlate two variables. This will give a better insight how some factors have an effect on the other. Analysis show that when the concentration and the motivation of the student have been bad, the grades of the student have been bad as well. Which is an obvious statement, but it is still important to know this fact as it also gives an indication of the validity of the answers of the students.

Analysis show that students who are living in shared apartments have been the most unhappy during online studies. We initially predicted the single apartment students to be unhappy because of lonliness. This does not mean it is soley because of this reason, but it is a major factor.

Losing job also leads to unhappiness which can be seen in figure 15.

4.2 Germany Specific

It seems rather logical to look a little deeper into just the students studying in Germany. Upon analysis, it was found that none of the students studying in Germany are living with their families.

Out of 45 students who study in Germany, 22 have said their results have gotten worse and 13 say ‘maybe’. That is very high percentage of people compared to the people not living in Germany.

None of the students outside Germany lost their jobs during the pandemic, whereas out of 45 people, 14 people lost their jobs.

71.11% of students studying in Germany show high levels of unhappiness or anxiety. Compared to 38.46% of students studying outside.

Almost 68.89% of students studying in Germany faced a delay in their study schedule compared to only 23.08% for students studying outside.

The students outside of Germany faced internet problems much more than the students in Germany. The reason for it can be down to the fact that most of the students are from the subcontinent region where it is not so uncommon to have electricity or internet issues.

Students in Germany faced financial problems much more than the students outside Germany. It would be interesting to see for future analysis the semester number of the students. New international students must submit a minimum amount in their banks, which is sufficient for one year. Therefore, it is just a prediction that most of these students facing financial problems are semester two onwards or have been in Germany for more than a year.

Concentration levels have also been harmful during the online studies for German students.

While this analysis does not seal that students in Germany have faced many anxiety-stricken online studies, it indeed points in that direction. It would be more conclusive with an equal number of students from within and outside Germany. Only then can it be said with much more conviction.

5 Data Modeling

5.1 Random Forrest

The function train() from Caret Package used for model buidling. Several functions from other R Packages are also used pre-processing, data wrangling and visualziations. It includes Tunning algorithm which helps us get better results and give is the freedom to tune more parameters in a better fashion. We tried to balance two main paramters i.e. mtry and ntree. mtry: Number of variable is randomly collected to be sampled at each split time and ntree: Number of branches will grow after each time split.

We trained 3 different model in get the best information out of the data. Following are the data modeling formulae:

Model 1: Was the student able to concerntrate on studies (concentration) as the dependent variable. Independent variables are: How much strain does a student have (education_strain), which country is the student studying in (country_study), is the student getting enough sleep (sleep), the delay of the students graduation (graduation_effect), internet problems (internet), did the student lost job due to covid-19 (job_lost), was the student stranded outside germany (stranded_outside) and were there any financial problems (financial_problem).

              train(concentration ~ education_strain + country_study + sleep + graduation_effect +
              internet + job_lost + stranded_outside + financial_problem
            , data=so_training,  method= rf)

Model 2: Did the result get better (results_better) as the dependent variable. Independent variables are: How much strain does a student have (education_strain), is the student getting enough sleep (sleep), he delay of the students graduation (graduation_effect), did the student lost job due to covid-19 (job_lost), was the student stranded outside germany (stranded_outside), internet problems (internet) and were there any financial problems (financial_problem).

              train( results_better ~ 
                education_strain  + sleep + graduation_effect +
                internet + job_lost + stranded_outside + financial_problem
              , data=so_training, method=rf)
              
              

Model 3: How motivated does the student feel (motivation) as the dependent variable. Independent variables are: How unhappy does the student feel (feeling_unhappiniess), is the student getting enough sleep (sleep), he delay of the students graduation (graduation_effect), was the student stranded outside germany (stranded_outside), internet problems (internet) and did the student lost job due to covid-19 (job_lost).

              train(motivation ~ feeling_unhappiniess + sleep + graduation_effect + stranded_outside + internet + job_lost
             , data=so_training, method= rf)
             

We compare two major aspects; 1. Training and Testing accuracies for each model. 2. Training accuracies with Kappa values for each model.

THE GRAPHS:

The first scatter plot shows the camparison between the three models. Making Model 1 (Concerntration) to be the best modelled Independent variable, having explanation of just over 70% training accuracy and just under 45% of testing accuracy. The remaining two model are having different interpretation as the first one.

The second scatter plot agaib shows the camparison between the three models, but this time it is Training Accuracies with Kappa Values. Model 1 (Concerntration) having describing more relevancy with Kappa value greater than 36%.

5.2 Multinomial Logistic Regression

Since the target variable was categorical with 3 levels (‘Yes’ ‘No’ ‘Maybe’). Even though the choice for Logistic regression model was obvious, it was tricky to do it for 3 levels. This is where multinomial logistic regression comes into play. It considers more than 2 levels for the outcome variable as opposed to the standard glm funcion which is used for binomial variable type.

We looked into the method of encoding our variable into ‘0’ ‘1’ ‘2’ , but given the fact that 2 of the variables specifically ‘education_strain’ & ‘feeling_unhappiness’ had 5 levels. The encoding didn’t make sense with unequal levels. So we proceeded with the factor variables.

The multinom function from the nnet package will be used to estimate a multinomial logistic regression model. Functions from other R packages can also be used to perform multinomial regression. We chose the multinom function since it does not require data reshaping (as the mlogit package does) and it closely mirrors Hilbe’s Logistic Regression Models sample code. Multinomial logistic regression is a straightforward extension of binary logistic regression that allows for the inclusion of more than two categories of the dependent or outcome variable. Our model is run using Multinom.

model <- multinom( results_better ~ study_in_germany + graduation_effect + job_lost , data = dfModelTest)

The predictors for the model will be students studying in and out of Germany (study_in_germany), the delay of the students graduation (graduation_effect) & if the students lost their jobs or not (job_lost). The response variable will be (results_better) which signifies if the students results got better or worse during online studies.

To create a training and testing data set, first shuffle the data and divide it into two equal data frames. We use caret’s varImp to check the most influential variables once the model has converged. Then, on the testing data set, we use the predict function to predict results. There are two methods for calculating predictions: ‘class’ and ‘probs.’

We run the summary of the model and look at it’s interpration:

Keeping All other variables constant, the increase in one unit, meaning from Maybe to Yes will have a decrease of 13.02 units relative to result being bad. Similarily keeping all other variables constant, with relative to result being bad. The increase in one unit of Yes has a decrease in 17.9 units.

Keeping all variables in the model constant, relative to grade being bad. A unit increase from Maybe to Yes has a decrease in 34.1 units. Wherase one unit increase from Yes to No has an increase in 30 units.

Perhaps the most significant variable is study in germany factor. If we keep all the variables constant in the model, relative to result being bad, one unit from Yes to No has a decrease of 17.23 units. Whereas an increase in one unit from Maybe to Yes has an increase of 7.5 units.

Accuracy, Kappa & Missclassification

We use the postResample function from caret to assess the model’s accuracy It employs the mean squared error and R-squared for numeric vectors and the overall agreement rate and Kappa for factors. round(x, digits = 0)

Accuracy & Kappa: 0.87, 0.72

The misclassification rate is: 0.13

Confusion Matrix for Predicted & Actual Classification
Actual
No Maybe Yes
No 9 0 1
Maybe 0 1 0
Yes 1 0 3
Confusion Matrix for Predicted & Actual Classification (In percentage)
No Maybe Yes
No 90 0 10
Maybe 0 100 0
Yes 25 0 75

Multinomial Logistic Model Plot

The graph shows the prediction probabily that the person would get bad result with the actual classification specified by the 3 different colors. It shows that the probability gets better when there are actual bad results, the model predicts those instances with a much higher accuracy. The biggest constraint is the lack of data, with higher sample size, the prediction will be a lot better. With higher sample size, cross validation can be used which would make the model more accurate.

6 Sentiment Analysis

We have aimed at using text mining techniques using libraries and packages offered by R. From tm to tidyverse, tidytext, glue, wordcloud, dplyr we exploit all of this to understand sentiments behind textual data. Furthermore, study also aims to perform and visualise a sentiment analysis using text data that is gathered using the survey and help find out sentiments of students throughout their online study period that they express. For example sentiment polarity scores from negative, neutral to positive and feeling of unhappiness, or fear or trust etc.

The sentiment analysis show that most of the words used throughout the questions had negative inclination. Only two columns (‘motivation_reason’, ‘addional_notes’) had an overall positive sentiment.

7 Final Analysis

What did you learn about the data?

We curated the survey questions based on our own experiences of being a student that has seen a shift of academic and professional ecosystem in the last 1.5 years. This inspired the study, and therefore we wanted to understand what the students have been feeling during this time, the significant concerns they have, what affected their study? Did they get better results? etc. The data was not only open-text questions but also likert scale method of scoring and categorical data etc. We understood the demographics of survey respondents, their age, location, etc.

For future studies, we can incorporate further questions such as analysing in terms of their semesters, whether they or their loved ones were diagnosed with the Covid-19 or not.

How did you answer the questions?

We started the basic pre-processing steps of cleaning the data, removing NAs, converting text data to lower case, and also stopword removal. We found answers to our research questions by performing a sentiment analysis on text data. We also tried to find a correlation between different variables. These were integral to find out whether each variable contributes to the other or not. We also ran ML models such as Logistic Regression to predict grades of Masters students. The most essential struggle for us was not having enough data. We highly emphasize future work where we can have a higher number of survey respondents - ideally more than 100, so we could have better training and test data split and offer more promising results of the study.

8 Screencast & Website

Project Screencast

Project Website

9 Questionnaire

The following are the list of questions used in our survery:

x
Timestamp
What is your age?
Which country are you from?
Do you study in Germany?
Which country were you in while taking online classes?
What is your living setup?
Were you motivated to take online classes as compared to regular classes?
Please give reason for your previous question’s answer below:
How has been your concentration span during online studies vs. in-person classes/lectures?
Please share some thoughts or explanation on the concentration span in online studies.
Has your sleep been affected during online classes?
Please share some thoughts on how has your sleep been affected during online classes if you marked previous question as ‘yes’:
From 1 being lowest and 5 the highest, how much are you in constant strain around the time of assignment submissions, projects, exams or classes in online setting?
Did the online class setting effect your graduation timeline or plans?
If your answer is ‘yes’ to previous question, would you like to take a few seconds and describe to us how has it impacted your graduation timeline or future plans?
From scale of 1 to 5 where 1 being the lowest and 5 being the highest, how much has the feeling of unhappiness or anxiety been felt by you during your online classes in COVID-19 situation?
Would you like to share your thoughts on what has been the cause of anxiety or worry during this time for you?
How was the assistance from your Institute during COVID-19 when it comes to getting comfortable in online-class decorum? 1 being ‘least assistive’ to 5 being ‘most assistive’.
Has the internet connection affected you during online classes or online exams?
If you answered ‘yes’ to previous question, can you please share thoughts on what kind of effects have you experienced due to poor internet connection?
Did you lose your job due to Covid-19 pendemic?
Were you able to take all the courses that you wanted?
If your answer is ‘no’ to previous question, what were your immediate thoughts when you were unable to take all the courses in due time?
Were you stranded outside your country of education?
If your answer is ‘yes’ to previous question, how did you cope with online studies at that time?
Did you have financial problem due to COVID-19?
If your answer is ‘yes’ to the question above, how do you think it affected your studies?
Did you get better results in online-exam setting where you could take the exam in premise of your own home?
Please share any thoughts in addition that you feel were not addressed in above questions.

References

[1] L. R. A. S. Browning Matthew H. E. M. AND Larson, “Psychological impacts from covid-19 among university students: Risk factors across seven states in the united states,” PLOS ONE, vol. 16, no. 1, pp. 1–27, Jan. 2021, doi: 10.1371/journal.pone.0245327.

[2] Y. Zhai and X. Du, “Mental health care for international chinese students affected by the covid-19 outbreak,” The Lancet Psychiatry, vol. 7, p. e22, Apr. 2020, doi: 10.1016/S2215-0366(20)30089-4.

[3] K. A. S. Elmer Timon AND Mepham, “Students under lockdown: Comparisons of students’ social networks and mental health before and during the covid-19 crisis in switzerland,” PLOS ONE, vol. 15, no. 7, pp. 1–22, Jul. 2020, doi: 10.1371/journal.pone.0236337.