Introduction

During the pandemic, the world shut down and all of us stayed at home… Right?

Think about it for a second.

Alt Text

One would think that ALL people, regardless of wealth and social class, were staying put at home. But is that truly the case?

Did people who were considered low class stay at home less often than their middle and high class counterparts?

I mean…when you have all the money in the world, plenty of space for activities, and have the ability to buy all of your needs in bulk, it makes sense why the high and middle class might be staying at home more than those in the low class.

The type of people who can afford the lifestyle to live in the following three homes are vastly different.

Alt Text

Alt Text

Alt Text

So let’s take a closer look!! The below is raw data (don’t tell me how they calculated this) of smartphone users’ time spent at home in Jakarta, Indonesia during February to November 2020. The data includes wealth class, mean, and standard error of mean.

In this analysis, high class represents people in the top 80%-100%, middle class represents those in the 40%-80% range, and low class represents those in the bottom 0%-40% of the population.

Prep Work

# Install and load relevant libraries
library(knitr)
library(dplyr)
library(tibble)
library(ggplot2)
library(tidyverse)
# Setting working directory
setwd("/Users/victorzheng/Documents/NYU/R")

# Importing the data
data <- read.csv("/Users/victorzheng/Documents/NYU/R/smartphone_jakarta.csv")

# View first few rows of data, but not all of it so it doesn't overwhelm the page
head(data, n=5)
##         date wealth_label_home      mean         sem
## 1 2020-02-01   High (80%-100%) 0.8256003 0.001123439
## 2 2020-02-01      Low (0%-40%) 0.8225373 0.001815352
## 3 2020-02-01  Medium (40%-80%) 0.8295173 0.001003396
## 4 2020-02-02   High (80%-100%) 0.7195918 0.001147772
## 5 2020-02-02      Low (0%-40%) 0.7408457 0.001732143

Data Analysis

In this section, I will be using dplyr’s summarize, count, and group_by functions to produce a report that describes two key insights that answer my question. Then, use ggplot2’s functions to visualize one of your insights.

# Using the group_by function, I am able to group my data by their column "wealth_label_home" which allows me to group according to high, middle, and low class data.

grouped_data <- data %>% group_by(wealth_label_home)
# Using the summarize function, I am able to find a summary of results for each group that I just created

summarized_data <- grouped_data %>% summarize(
  Mean_Mean = mean(mean),
  Max_Mean = max(mean),
  Mean_SEM = mean(sem)
)

# I will also use the summarize function to showcase the 25th and 75th percentiles for the respective groups
percentiles <- grouped_data %>%
  summarize(
    Q1_percentile = quantile(mean, 0.25),
    Q2_percentile = quantile(mean, 0.5),
    Q3_percentile = quantile(mean, 0.75)
  )
# The count function is not necessary for this specific data analysis, but I will use it to count how many days worth of data we are observing
num_days <- data %>%
  count(date)

print(paste0("We are observing ", nrow(num_days), " days worth of data"))
## [1] "We are observing 289 days worth of data"

Including Plots

# Use ggplot to plot the mean values
ggplot(data = grouped_data, aes(x = wealth_label_home, y = mean)) +
  geom_point() 

# Use ggplot to plot the SEM
ggplot(data = grouped_data, aes(x = wealth_label_home, y = sem)) +
  geom_point() 

Take a look at the plots above and the data below. What do you notice?

kable(summarized_data)
wealth_label_home Mean_Mean Max_Mean Mean_SEM
High (80%-100%) 0.8470972 0.9464578 0.0008750
Low (0%-40%) 0.8388483 0.9274056 0.0012922
Medium (40%-80%) 0.8499366 0.9424363 0.0007587
kable(percentiles)
wealth_label_home Q1_percentile Q2_percentile Q3_percentile
High (80%-100%) 0.8287607 0.8514946 0.8868856
Low (0%-40%) 0.8232536 0.8385944 0.8701810
Medium (40%-80%) 0.8343730 0.8530853 0.8865011

Takeaway #1: The most amount of time those in the low class spent at home was around 92.7% of the time. In contrast, the amount of time those in middle and high class spent at home peaked at 94.2% and 94.6% respectively. This most likely can be attributed to those in lower class living hand to mouth and living in overcrowded areas where staying at home is difficult. The low class of Jakarta were the most vulnerable when it came to lockdown measures. Pay special attention to the average SEM or Standard Error of Mean. The higher SEM for the low class suggests that the sample mean is likely further away from the true population mean, which means there is higher variability in this sample mean estimate. Given all that we know about how the struggles of those living in low class during a pandemic, it is completely possible that they spent far less time at home than the data suggests.

Takeaway #2: The differences between the 25th and 50th percentile for those living in low class sees marginal increases compared to that of those living in middle and high class. Although one could argue that the differences are not drastic, the small margins add up over time and lead to a higher likelihood that one is exposed to the virus.