How RAM might impact battery life

Author

Ayomide Joe-Adigwe

Battery life comparison

Battery life comparison

I first discovered this dataset on Kaggle and later found the original source at https://www.gsmarena.com/res.php3?sSearch=phones+2024 from which the information was scraped. I chose this dataset because of my passion for gadgets, particularly smartphones. I’m always drawn to exploring the technical specifications of phones, such as their RAM, battery capacity, and performance features, which play a significant role in determining user experience. This dataset provides a rich variety of information to work with, including both hardware details and categorical variables like brand and operating system. Its breadth and diversity offer endless possibilities for analysis and visualization, enabling me to uncover trends, patterns, and insights about the smartphone industry.

By working with this dataset, I get to combine my love for technology with data analysis, making the project both meaningful and enjoyable. It also allows me to deepen my understanding of how different technical specifications relate to each other, such as how battery life might be influenced by RAM or how price varies across brands. This hands-on exploration feels especially rewarding because I’m not only familiar with the variables but also genuinely curious about the insights they can reveal.

Moreover, as a tech enthusiast, I’m excited by the potential to present this information in an engaging way, using visualizations and statistical techniques to make sense of the data. This project is an opportunity to translate my personal interest into actionable insights while honing my data analysis and storytelling skills.

library(tidyverse) # for data manipulation, exploration, and visualization
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor) # helpful for cleaning and preparing data

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(highcharter) # For interactivity
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
# set working directory 
setwd("/Users/ayomidealagbada/AYOMIDE'S DATAVISUALITIOM")
# Load the dataset
cities500 <- read_csv("processed_data2.csv")
Rows: 1708 Columns: 38
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (23): phone_brand, phone_model, store, dimensions, display_type, displa...
dbl  (13): price_usd, storage, ram, weight, display_size, nfc, battery, fold...
date  (2): launch_date, year

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Subset of dataset overview
head(cities500)
# A tibble: 6 × 38
  phone_brand phone_model   store price_usd storage   ram launch_date dimensions
  <chr>       <chr>         <chr>     <dbl>   <dbl> <dbl> <date>      <chr>     
1 apple       Apple iPhone… Amaz…     1358.     256     8 2024-09-20  149.6 x 7…
2 apple       Apple iPhone… Amaz…     1493.     512     8 2024-09-20  149.6 x 7…
3 apple       Apple iPhone… Amaz…     1705.    1000     8 2024-09-20  149.6 x 7…
4 apple       Apple iPhone… Amaz…     1565.     512     8 2024-09-20  163 x 77.…
5 apple       Apple iPhone… Amaz…      247.     128     4 2020-11-13  131.5 x 6…
6 apple       Apple iPhone… Amaz…      320.     256     4 2020-11-13  131.5 x 6…
# ℹ 30 more variables: weight <dbl>, display_type <chr>, display_size <dbl>,
#   display_resolution <chr>, os <chr>, nfc <dbl>, usb <chr>, battery <dbl>,
#   features_sensors <chr>, colors <chr>, video <chr>, chipset <chr>,
#   cpu <chr>, gpu <chr>, year <date>, foldable <dbl>, ppi_density <dbl>,
#   quantile_10 <dbl>, quantile_50 <dbl>, quantile_90 <dbl>, price_range <chr>,
#   os_type <chr>, os_version <chr>, battery_size <chr>,
#   colors_available <dbl>, chip_company <chr>, cpu_core <chr>, …

Define the variables and initial exploration

Categorical: Brand, Operating System

Quantitative: Battery Capacity, RAM

Exploration Questions:

  1. Which brand offers the highest average RAM?

  2. What is the relationship between battery capacity and RAM?

  3. Distribution of phone prices among different brands.

Filter and clean dataset

# Assume missing data exists; use dplyr methods for handling it instead of na.omit
cities500 <- cities500 %>% 
  filter(!is.na(ram) & !is.na(battery))
# Mutate to add new columns if needed
cities500 <- cities500 %>% 
  mutate(price_category = ifelse(price_usd > 500, "High", "Low"))
# dplyr Commands:
# 1. Grouping and summarizing average RAM by brand
brand_ram_summary <- cities500 %>% 
  group_by(phone_brand) %>% 
  summarize(avg_ram = mean(ram, na.rm = TRUE)) %>% 
  arrange(desc(avg_ram))
# 2. Filtering top 5 brands by RAM
top_brands <- brand_ram_summary %>% 
  slice_max(order_by = avg_ram, n = 5)
# 3. Creating a summarized dataset for visualization
battery_ram_summary <- cities500 %>% 
  group_by(phone_brand) %>% 
  summarize(avg_battery = mean(battery,na.rm = TRUE), 
            avg_ram = mean(ram, na.rm = TRUE))
# Scatter plot showing the relationship between RAM (GB) and Battery Capacity (mAh) for different phone brands.


plot1 <- ggplot(cities500, aes(x = ram, y = battery, color = phone_brand)) +
  geom_point(alpha = 0.7, size = 3) +  # Add transparency
  # I added a linear trend line to visualize the overall trend. The plot uses color to differentiate phone brands.
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed") +  # Add trend line
  labs(
    title = "RAM vs Battery Capacity Across Phone Brands",
    subtitle = "Relationship between device memory and battery performance",
    x = "RAM (GB)",
    y = "Battery Capacity (mAh)",
    color = "Phone Brand"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right",
    axis.title = element_text(face = "italic")
  ) +
  scale_color_brewer(palette = "Paired")  # Or another palette that supports more colors
print(plot1)
`geom_smooth()` using formula = 'y ~ x'
Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Paired is 12
Returning the palette you asked for with that many colors
Warning: Removed 1114 rows containing missing values or values outside the scale range
(`geom_point()`).

This code creates a scatter plot that visually explores the relationship between RAM and battery capacity across different phone brands. By using transparent points colored by phone brand, the visualization allows for a nuanced view of how device memory correlates with battery performance. The addition of a black dashed linear trend line helps viewers quickly grasp the overall relationship between RAM and battery capacity. The theme_minimal provides a clean, uncluttered background, while custom theme settings enhance readability by bolding the title, italicizing axis labels, and positioning the legend strategically. The “Paired” color palette ensures distinct, visually appealing colors for differentiating phone brands, making the complex data more accessible and engaging for the viewer.

library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
plot2 <- ggplot(top_brands, aes(x = reorder(phone_brand, avg_ram), y = avg_ram, fill = phone_brand)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  coord_flip() +
  labs(
    title = "Top 5 Brands with Highest Average RAM",
    x = "Brand",
    y = "Average RAM (GB)"
  ) +
  theme_classic() +
  scale_fill_manual(values = c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e"))

print(plot2)

This code creates a horizontal bar plot using ggplot2 to visualize the top 5 smartphone brands with the highest average RAM. The bars represent the average RAM (in GB) for each brand, with customized colors for each brand. The plot is flipped for better readability, and a classic theme is applied.

# Interactivity Example (Highcharter)
hchart(cities500, "scatter", hcaes(x = ram, y = battery, group = phone_brand)) %>%
  hc_title(text = "Interactive RAM vs Battery Capacity") %>%
  hc_xAxis(title = list(text = "RAM (GB)")) %>%
  hc_yAxis(title = list(text = "Battery Capacity (mAh)"))

This code creates an interactive scatter plot using the highcharter library to visualize the relationship between RAM (in GB) and battery capacity (in mAh) for different smartphone brands. Each point on the plot represents a phone, with its position determined by its RAM and battery capacity. The plot is interactive, allowing users to explore the data, and includes custom axis titles for RAM and battery capacity. The data is grouped by phone brand, making it easy to compare different brands in terms of these two specifications.

library(highcharter)


hchart(top_brands, "bar", hcaes(x = reorder(phone_brand, avg_ram), y = avg_ram, color = phone_brand)) %>%
  hc_title(text = "Top 5 Brands with Highest Average RAM") %>%
  hc_xAxis(title = list(text = "Brand")) %>%
  hc_yAxis(title = list(text = "Average RAM (GB)")) %>%
  hc_plotOptions(
    bar = list(
      stacking = "normal"
    )
  ) %>%
  hc_colors(c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e")) %>%
  hc_tooltip(
    pointFormat = "{point.y} GB"
  ) %>%
  hc_chart(type = "bar") %>%
  hc_legend(enabled = FALSE) # Hides legend as per original code

Creates an interactive bar chart with highcharter, showing the top 5 smartphone brands’ average RAM, with custom colors, axis labels, tooltips, and disabled legen

# Statistical Component: Linear Regression
lm_model <- lm(battery ~ ram, data = cities500)
summary(lm_model)

Call:
lm(formula = battery ~ ram, data = cities500)

Residuals:
    Min      1Q  Median      3Q     Max 
-2632.0  -286.2   177.8   423.9  5267.8 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4329.965     41.760  103.69   <2e-16 ***
ram           41.016      4.808    8.53   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 680.5 on 1706 degrees of freedom
Multiple R-squared:  0.04091,   Adjusted R-squared:  0.04035 
F-statistic: 72.77 on 1 and 1706 DF,  p-value: < 2.2e-16
# Diagnostic plots for the regression model
par(mfrow = c(2, 2))
plot(lm_model)

#  This function is used to save a plot to a file and adds captions. 
ggsave("plot1_battery_vs_ram.png", plot = plot1)
Saving 7 x 5 in image
`geom_smooth()` using formula = 'y ~ x'
Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Paired is 12
Returning the palette you asked for with that many colors
Warning: Removed 1114 rows containing missing values or values outside the scale range
(`geom_point()`).
ggsave("plot2_top5_brands.png", plot = plot2)
Saving 7 x 5 in image

Caption

# used to display caption
cat("Screenshot of GSMArena’s 2024 phone listings page. For more details, visit: https://www.gsmarena.com/res.php3?sSearch=phones+2024\n")
Screenshot of GSMArena’s 2024 phone listings page. For more details, visit: https://www.gsmarena.com/res.php3?sSearch=phones+2024