Investigation of Screen Time Across Different User Groups

Steve Eisner

import kagglehub
import pandas as pd
import glob
import matplotlib.pyplot as plt
import seaborn as sns
import squarify

# Download latest version
path = kagglehub.dataset_download("valakhorasani/mobile-device-usage-and-user-behavior-dataset")
path += '/*.csv'
print("Path to dataset files:", path)

## Path to dataset files: /Users/steven/.cache/kagglehub/datasets/valakhorasani/mobile-device-usage-and-user-behavior-dataset/versions/1/*.csv

files_to_read = glob.glob(path)

df = pd.DataFrame()
for f in files_to_read:
    df = pd.concat([df,pd.read_csv(f)])

df = df.reset_index(drop=True)
print(len(df.index))

## 700

The Data

Below, we will see a small snapshot of the data at hand. The data here pertains to different users’ phone habits as well as their choices in mobile device and a few identifying attributes such as age and gender. The data can be found at https://www.kaggle.com/datasets/valakhorasani/mobile-device-usage-and-user-behavior-dataset

df.head()

##    User ID    Device Model Operating System  ...  Age  Gender  User Behavior Class
## 0        1  Google Pixel 5          Android  ...   40    Male                    4
## 1        2       OnePlus 9          Android  ...   47  Female                    3
## 2        3    Xiaomi Mi 11          Android  ...   42    Male                    2
## 3        4  Google Pixel 5          Android  ...   20    Male                    3
## 4        5       iPhone 12              iOS  ...   31  Female                    3
## 
## [5 rows x 11 columns]

Question: How do Age and Gender Correlate to Screentime?

Below we will see two charts, the first of which asks if the screentime for Men differs from that of Women? To do this, we will use two histograms to compare how many people fit into each bucket of screentime.

male_users = df[df['Gender'] == 'Male']
female_users = df[df['Gender'] == 'Female']


fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12,6))

ax1.hist(male_users['Screen On Time (hours/day)'], bins=10, alpha=0.5, label="Male Users", color="blue")
ax2.hist(female_users['Screen On Time (hours/day)'], bins=10, alpha=0.5, label="Female Users", color="pink")
ax1.set_xlabel('Average Hours of Screentime Per Day (Male Users)')
ax1.set_ylabel('Frequency')
ax2.set_xlabel('Average Hours of Screentime Per Day (Female Users)')
ax2.set_ylabel('Frequency')
#plt.legend(loc='upper right')

plt.show()

We see here that, largely, men and women don’t have any meaningful differences in their screentime usage. Both genders have most of their users falling into the “low” group of approximately two hours of screentime / day and a general decline.

So, if gender represents no meaningful difference, what about age? To answer this question, let’s use a scatter plot to represent all of our users.

plt.figure(figsize=(10, 6))
plt.scatter(male_users['Age'], male_users['Screen On Time (hours/day)'], label='Male Users', color='blue', alpha=0.5)
plt.scatter(female_users['Age'], female_users['Screen On Time (hours/day)'], label='Female Users', color='pink', alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Screen On Time (hours/day)')
plt.title('Scatter Plot of Screen On Time (hours/day) against Age for Male and Female Users')
plt.legend(loc='upper right')
plt.show()

Intuitively, we might assume that we would see a general negative trajectory with younger users using their screens more and older people being more screen/mobile phone adverse. However, what we see here is, largely, an even distribution of screentime users across age. Like before, there is no way to glean any information here based purely on demographics. So, let’s try to learn some more about the different groups of users we have.

Identifying Super Users and their Choices

Here, we’ll identify “Super Users” as the users that are falling into that top screentime bracket we observed in our histograms.

Before we do, we’ll want to get a general idea for what devices all users prefer. This will be relevant later.

device_counts = df['Device Model'].value_counts()
plt.figsize=(14,12)
device_counts.plot.pie(labels=[f'{label} ({count})' for label, count in zip(device_counts.index, device_counts)])
plt.title('Number of users by device Model')
plt.ylabel('')
plt.show()

This pie chart shows that, generally speaking, we have an evenly distributed pool of devices with preferences slightly skewing toward the Xiaomi Mi 11. I believe that this trend is more indicative of the data scientists who gathered this data hoping for an even spread than it is indicative of anything meaningful. Still, it will be useful to keep in mind for our further evaluations.

As mentioned before, let’s gather some analytics on the various groups using a tree map, keeping a more careful eye on the top usage group.

def assign_usage_group(screen_on_time, max_time):
    increment = 2.5
    group_number = (screen_on_time // increment) + 1
    return str(group_number)

max_screen_on_time = df['Screen On Time (hours/day)'].max()
max_app_time = df['App Usage Time (min/day)'].max()
df['Screentime Usage Group'] = df['Screen On Time (hours/day)'].apply(assign_usage_group, args=(max_screen_on_time,))

unique_groups = sorted(df['Screentime Usage Group'].unique(), reverse=True)
usage_labels = ['High Usage', "Medium High Usage", "Medium Usage", "Medium Usage", "Low Usage"]

mapping_dict = {str(group): usage_labels[i] for i, group in enumerate(unique_groups)}
df['Usage Group'] = df['Screentime Usage Group'].map(mapping_dict)

def most_popular_device(devices):
    return devices.value_counts().idxmax()
grouped_df = df.groupby('Usage Group').agg({
    'Screen On Time (hours/day)': 'mean',
    'Device Model': most_popular_device,
    'Age': 'mean'
}).reset_index()

labels = [f'{row['Usage Group']}\nScreen Time: {row['Screen On Time (hours/day)']} \nDevices: {row['Device Model']}\nAvg Age: {row['Age']:.1f}'
          for _, row in grouped_df.iterrows()]


plt.figure(figsize=(12,8))
colors=plt.cm.tab20.colors
squarify.plot(sizes=grouped_df['Screen On Time (hours/day)'], color=colors[:len(grouped_df)],label=labels, alpha=0.8)
plt.axis('off')

## (np.float64(0.0), np.float64(100.0), np.float64(0.0), np.float64(100.0))

plt.title('Tree Map of Screen Time Usage Groups')
plt.show()

We’ll note here that despite the distribution of users-to-devices seemed rather evenly split, the high usage users seem to prefer the Xiaomi Mi 11 phone. So now we can ask one key question to gain insight. Why?

To answer this question, let’s consider one factor that all mobile phone users keep in the backs of their minds: battery life. More specifically, we can form the hypothesis that the super users will gravitate toward the phones that have better overall battery life. So, if we map out battery life against screen time, we should see the Xiaomi Mi 11 perform the best.

We will use a trellis of line charts to map out the performance of the devices, using colors to highlight the performance. Green will be best, followed by blue then yellow then orange then finally red for the worst performing battery.

max_battery_life = df.groupby('Device Model')['Battery Drain (mAh/day)'].max().sort_values()

color_mapping = {
    max_battery_life.index[0]: 'red',
    max_battery_life.index[1]: 'orange',
    max_battery_life.index[2]: 'yellow',
    max_battery_life.index[3]: 'blue',
    max_battery_life.index[4]: 'green'
}
g = sns.FacetGrid(df, col="Device Model", col_wrap=2, hue='Device Model',palette=color_mapping)
g.map(sns.lineplot, "Screen On Time (hours/day)", "Battery Drain (mAh/day)", errorbar=None)

# Set the axis labels and title
g.set_axis_labels("Screen On Time (hours/day)", "Battery Drain (mAh/day)")

g.figure.suptitle("Battery Consumption vs Screentime for Different Device Types", y=1.05)

# Show the plot
plt.show()

We successfully confirm our hypothesis that the Xiaomi Mi 11 has the best overall battery. Interestingly, none of the devices have a perfectly linear of battery drain to screen time with the iPhone 12 having the most apparent peaks and valleys in its graph.

Conclusion