Executive summary

Our main research question is “What are the demographic factors that determine e-book usage, and what types of books do e-book users enjoy?” Since the boom of ‘text hip’ and increase of e-book users, we wanted to focus on the ways to revitalize the e-book market. To do this we needed to designate a specific target, so we started with 3 hypotheses talking about the relationships on specific data we thought that would be significant for making our target. By analyzing selected categories - genre, monthly average capital Income of households, age, annual reading volume of e-books etc. - from the National Adult Reading Survey, we arranged 3 strategies on improving the accessibility of e-books. We did consider the people who were already into e-books and thought up promotions focused on genres that existing e-books users would be interested in. However, we also focused on those who were not familiar to e-books to increase the accessibility of e-books. Each strategy is applicable in public companies, public institutions and local communities.

Research Question

“What are the demographic factors that determine e-book usage, and what types of books do e-book users enjoy?”

With the advent of the digital age, reading culture is rapidly evolving, moving away from traditional forms. In particular, the spread of the “text-hip” culture has redefined reading, transforming it from a mere means of acquiring information into a stylish and sophisticated cultural activity. Text-hip culture revolves around the perception that reading books is cool and trendy, with younger generations using social media to share their reading experiences as a way to express their individuality and intellectual tastes. This phenomenon has elevated the act of reading itself into a social icon, creating new interest and perceptions about reading through digital platforms.

This growing interest in reading presents a significant opportunity for the activation of the e-book industry. E-books fit the way the younger generation enjoys reading trendyly in a digital environment, and the growth of the e-book market will play an important role not only in revitalizing the reading culture, but also in improving access to information and building a sustainable reading environment.

With the rapid advancement of mobile devices and information communication technology, alongside the deepening of the digital age, the e-book market has experienced significant growth. E-books have evolved far beyond being a mere reproduction of print books; they have developed into a unique content format, incorporating various new attributes and functionalities that differentiate them from traditional print media. This transition has allowed e-books to carve out their own space in the larger content ecosystem and establish a growing, dynamic market.

One of the key drivers behind the expansion of digital content services, including e-books, is the increasing emphasis on personalization and recommendations. The major advantage of modern content services is the ability to tailor messages and content to individual consumers, allowing for automatic delivery of personalized recommendations based on their preferences and behaviors. This has become a fundamental aspect of the digital content landscape, not just for e-books but for a wide range of online services.

For the e-book market to continue thriving, it is essential that industry stakeholders adapt their operations to this trend by embracing personalization and leveraging data-driven insights. Understanding consumer characteristics, such as reading preferences, habits, and demographics, is crucial for developing targeted marketing strategies and enhancing the overall user experience. The ability to provide tailored content that resonates with readers will be key to building a loyal customer base and fostering long-term engagement with e-books.

In this report, we aim to explore strategies for developing and activating the e-book market, focusing on consumer understanding and the role of personalization in shaping future industry trends. By analyzing preferences and behaviors of e-book users, we seek to offer insights that will not only help businesses better engage with their audience but also provide a fresh perspective on marketing strategies that can drive further growth in the digital reading sector.

By analyzing the characteristics and behaviors of e-book users, we aim to provide insights into new marketing strategies that will help companies communicate more effectively with their customers and drive greater growth in the digital reading market. In addition, by proposing implementation strategies for various stakeholders to increase access to e-books, we aim to provide insights that will ultimately enhance the accessibility of e-books.

First, we will support the importance of the topic by showing a graph of the rising e-book consumption trends, followed by the presentation of three hypotheses.

Hypothesis 1) People who read more e-books annually will consume more in the self-development genre.

The main advantages of e-books are portability and convenience. People who read e-books frequently are likely to be using their time efficiently, and individuals with this type of character might have a strong desire for self-improvement, which is why they tend to read self-development books more.

Hypothesis 2) The lower the average household income class, the higher the average reading volume of e-book.

If the low-cost aspect of e-books is considered a major advantage compared to print books, it can be expected that economically disadvantaged groups, who need to be more mindful of costs, would consume more e-books than the wealthier classes.

Hypothesis 3) The younger the age group, the more e-books they will consume.

The younger generation is likely to find using electronic media more natural and easier, while older generations, who have been reading print books for a long time, are expected to have resistance to e-books.

Data background, description

adread_23 <- read.csv("2023_ad.csv", fileEncoding = "euc-kr")
adread_23 %>%
    select("X.독서.생활.지난.1년간.독서량_전자책", "X.독서.행태..가장.많이.읽는.독서.분야..2..전자책_1순위", "X.응답자.특성.월.평균.가구.소득", "연령") %>% 
    rename("prefered_genre" = "X.독서.행태..가장.많이.읽는.독서.분야..2..전자책_1순위", "income" = "X.응답자.특성.월.평균.가구.소득", "ebook_read" = "X.독서.생활.지난.1년간.독서량_전자책", "age" = "연령") %>% 
    head(10)

##    ebook_read prefered_genre income age
## 1          NA             NA      7  19
## 2          NA             NA      6  22
## 3          NA             NA      4  22
## 4           6              2      6  23
## 5           5              7      6  23
## 6           4              7      7  24
## 7          20              2      7  24
## 8           2              2      6  24
## 9          NA             NA      7  24
## 10         10              2      7  25

The data set we are using is the ‘2023 National Reading Survey’ released by the Ministry of Culture, Sports, and Tourism. Some of the data from surveys of other years may also be used. This survey was conducted with 5,000 adults aged 19 and older across the country and examined the national reading rate, reading habits, and attitudes toward reading, etc. We got the data from Micro data Integrated Service(MDIS) of KOSTAT.

This survey is conducted by a national institution, so it is expected to be highly reliable and to reflect a relatively up-to-date depiction of society. Also, we thought this would be an effective source for identifying the factors needed for the e-book industries to seize opportunities and grow further, given the current global trend of the “text hip” culture.

This data has been processed to be suitable for analysis by correcting errors from the original data and removing personal information. The households within the sample survey units were visited, and face-to-face surveys were conducted with one eligible household member aged 19 or older.

There were 4 main variables that we were particularly interested in – ‘Annual Reading Volume’, ‘Genre’, ‘Monthly Average Capital Income of Households’, and ‘Age’. With these 4 variables we were interested in the relationships between ‘Annual Reading Volume’ and ‘Genre’, ‘Monthly Average Capital Income of Households’ and ‘Annual Reading Volume’, ‘Age’ and ‘Annual Reading Volume’.

First, the variable ‘Annual Reading Volume’ is consisted of 5 categories – 1~5 books, 6~10 books, 11~15 books, 16~20 books, and over 21 books. Second, the variable ‘Genre’ is consisted of 9 items – novels, hobby/entertainment/travel/health, economy/business management, self-help books, philosophy/ideology/religion, history/geography, politics/society/current events, science/technology/computer, etc. Third, the variable ‘Monthly Average Capital Income of Households’ consists of 8 sections – under 1 million won, 1~2 million won, 2~3 million won, 3~4 million won, 4~5 million won, 5~6 million won, 6~7 million won, and over 7 million won. The last variable ‘Age’ is classified into 5 groups – under twenties, thirties, forties, fifties, and over sixties

The first column extracted from our statistics indicates the trend of ‘Annual Reading Volume’ of electronic books throughout the years – from 2017~2023 (biennial).

The second column shows the relationship between ‘Annual Reading Volume’ and ‘Genre’. We will specifically focus on the ‘self-help books’ from the ‘Genre’ variable to find out the meaningful connection between the consumption of this genre with the number of books read in a year.

The third column is intended to show the relationship between ‘Monthly Average Capital Income of Households’ and ‘Annual Reading Volume’. By dividing the household income response results into three categories: low, middle, and high income, we will compare the trends of each group.

The last column will show the connection between ‘Age’ and ‘Annual Reading Volume’.

Individual figures

Figure 1

The annual reading volume (e-books) 2017-2023

To make first graph, we loaded each data file and mutate the year variable for comparison. Then selected the annual e-book reading amount and year variables and unify their names. And counted missing values as 0. Combined each DataFrame into one and group by the 'year' variable to calculate the average reading amount. Then created a scatter plot with the x-axis as 'year' and the y-axis as the average reading amount, connecting the points with a line to show the trend. Last, we adjust the y-axis range for better readability and add a title and axis labels."

adread_23 <- read.csv("2023_ad.csv", fileEncoding = "euc-kr") %>%
    mutate(year = "2023") 
adread_21 <- read.csv("2021_ad.csv", fileEncoding = "euc-kr") %>% 
    mutate(year = "2021")
adread_19 <- read.csv("2019_ad.csv", fileEncoding = "euc-kr") %>% 
    mutate(year = "2019")
adread_17 <- read.csv("2017_ad.csv", fileEncoding = "euc-kr") %>% 
    mutate(year = "2017")

adreadeb_17 <- adread_17 %>%
  select("X2.전자책연간독서량", "year") %>% 
    rename("ebook_read" = "X2.전자책연간독서량")

adreadeb_19 <- adread_19 %>%
  select("X2..전자책.독서량", "year") %>% 
    rename("ebook_read" = "X2..전자책.독서량")

adreadeb_21 <- adread_21 %>%
  select("문6_2..전자책.독서량", "year") %>% 
    rename("ebook_read" = "문6_2..전자책.독서량")

adreadeb_23 <- adread_23 %>% 
    select("X.독서.생활.지난.1년간.독서량_전자책", "year") %>% 
    rename("ebook_read" = "X.독서.생활.지난.1년간.독서량_전자책") %>% 
    mutate(ebook_read = ebook_read <- ifelse(is.na(ebook_read), 0, ebook_read))

ebookread <- bind_rows(adreadeb_17, adreadeb_19, adreadeb_21, adreadeb_23)

ebookread_year <- ebookread %>% 
  group_by(year) %>% 
  summarise(ebook_read = mean(ebook_read))

p <- ggplot(data = ebookread_year,
            mapping = aes(x = year,
                          y = ebook_read))

p + geom_point(size=2) + 
  geom_line(group = 1, linewidth  = 0.5) +
  scale_y_continuous(limits = c(0, 2.5)) + 
  labs(title = "The annual reading volume of e-books", x = "Year", y = "Average ebook read per year")

Source : Ministry of Culture, Sports and Tourism.

Most existing research has focused on analyzing the usage rate of e-books. In addition to that, we have analyzed the e-book usage volume. When comparing the two, we found that the e-book usage volume has increased at a much faster pace than the usage rate during the same period. Looking at this graph depicting the trend of e-book usage volume, we can see a remarkable growth of about two times from 2019 to 2023. This indicates that it is not only the number of people trying e-books that is increasing, but also that consumers are becoming more accustomed to e-books, gradually taking a larger share of reading time from print books. This suggests that while it’s important for the e-book industry to attract new users through services, it is also a strategy to consider ways to make people more familiar with e-books, encouraging them to read e-books over print books.

Figure 2

Hypothesis 1 : Genre - People, who read more e-books annually, will consume more in the genre of “self-help books”.

First, we simplified and selected the names of the target variables -annual e-book reading amount and genre- and removed missing values. Then converted genre variable from numeric variable to character type and renamed each value by genre name. We created a box plot with the x-axis as the genre variable (ordered by the number of respondents) and the y-axis as the reading amount and used different colors for each genre. For better look, we transformed the wide range of the y-axis to a log scale, fliped the x and y axes, and removed the legend. Lastly, added a title and axis label.

adread_gen <- adread_23 %>%
  rename("ebook_read" = "X.독서.생활.지난.1년간.독서량_전자책", "genre" = "X.독서.행태..가장.많이.읽는.독서.분야..2..전자책_1순위") %>% 
  select(ebook_read, genre) %>% 
  na.omit() 

adread_gen$genre <- as.character(adread_gen$genre)

adread_gen <- adread_gen %>% 
  mutate(genre = recode(genre, 
                       "1" = "Poetry", 
                       "2" = "Novel", 
                       "3" = "Essay",
                       "4" = "Picture Book",
                       "5" = "Philosophy",
                       "6" = "Politics",
                       "7" = "Economics",
                       "8" = "History",
                       "9" = "Art",
                       "10" = "Science",
                       "11" = "Lifestyle",
                       "12" = "Language",
                       "13" = "Hobbies",
                       "14" = "Finance",
                       "15" = "Self-Help")) %>% 
    filter(ebook_read != 0)

p <- ggplot(data = adread_gen,
            mapping = aes(x = reorder(genre, ebook_read),
                          y = ebook_read))

p + geom_boxplot(aes(fill = genre)) +
  scale_y_log10() +
  coord_flip() + 
  theme(legend.position = "none") +
  labs(title = "The Relationship Between E-book Consumption and Consumed Genres", x = NULL, y = "E-book Consumption")

In this regard, we analyzed the results of a 2023 survey to identify which genres are preferred by people who read a lot of e-books. Based on the responses to which genre the participants read the most and their e-book usage volume, we created a boxplot and listed the genres in order of popularity for clarity. The analysis revealed that while the most popular genre was novel, the genre with the highest e-book consumption among its preferred readers was self-help. Self-help books had the highest median and top 25% e-book consumption across all genres. This indicates that the group that prefers self-help books generally has a higher overall e-book consumption compared to those who prefer other genres. This suggests that the portability and convenience, often cited as advantages of e-books, appeal to those who value efficient use of time, and these individuals may feel a greater need for self-development.

Figure 3

Hypothesis 2 : Capital income of households - The lower the average household income class, the higher the average reading volume of e-book.

adread_23 <- read.csv("2023_ad.csv", fileEncoding = "euc-kr")
adread_21 <- read.csv("2021_ad.csv", fileEncoding = "euc-kr")
adread_19 <- read.csv("2019_ad.csv", fileEncoding = "euc-kr")
adread_17 <- read.csv("2017_ad.csv", fileEncoding = "euc-kr")

adread_inc23 <- adread_23 %>%
  rename("capital" = "X.응답자.특성.월.평균.가구.소득", "ebook_read" = "X.독서.생활.지난.1년간.독서량_전자책") %>%
  select(capital, ebook_read) %>%
  mutate(year = "2023") %>%
  na.omit()

adread_inc21 <- adread_21 %>%
  rename("capital" = "D3_가구소득", "ebook_read" = "문6_2..전자책.독서량") %>%
  select(capital, ebook_read) %>%
  mutate(year = "2021") %>%
  na.omit()

adread_inc19 <- adread_19 %>%
  rename("capital" = "가구.소득", "ebook_read" = "X2..전자책.독서량") %>%
  select(capital, ebook_read) %>%
  mutate(year = "2019") %>%
  na.omit()

adread_inc17 <- adread_17 %>%
  rename("capital" = "가구월평균소득", "ebook_read" = "X2.전자책연간독서량") %>%
  select(capital, ebook_read) %>%
  mutate(year = "2017") %>%
  na.omit()

adread_inc <- bind_rows(adread_inc17, adread_inc19, adread_inc21)
adread_inc %>% 
    head(10)

##    capital ebook_read year
## 1        6          0 2017
## 2        5          0 2017
## 3        3          0 2017
## 4        5          0 2017
## 5        3          0 2017
## 6        4          0 2017
## 7        4          0 2017
## 8        6          3 2017
## 9        4          0 2017
## 10       4          0 2017

Installing “dplyr” packages. Loading csv files (Adult reading survey data of 2017, 2019, 2021, 2023). Renaming 2 columns＂월가구소득"  and ＂전자책 소비량" to the same name, “capital” and “ebook_read”, among the columns in the annual data frame. Combining two columns renamed and a newly generated column called ”year" to create a new data frame for each year. Using na.omit function to remove the missing value. Checking if the amount of data among 4 year (2017, 2019, 2021. 2023) are same. Due to the amount of data are same in 2017, 2019 and 2021, so we combined all data of selected rows (“capital”, “ebook_read”, “year”) which named “adread_inc”.

adread_incc <- adread_inc %>%
  mutate(class = case_when(
    capital %in% c(1, 2, 3) ~ "Low",
    capital %in% c(4, 5) ~ "Medium",
    capital %in% c(6, 7, 8) ~ "High"
  ))

adread_avginc <- adread_incc %>%
  group_by(year, class) %>%
  summarise(avg_incc = mean(ebook_read, na.rm = TRUE), .groups = "drop")
adread_avginc

## # A tibble: 9 × 3
##   year  class  avg_incc
##   <chr> <chr>     <dbl>
## 1 2017  High      1.67 
## 2 2017  Low       0.729
## 3 2017  Medium    1.11 
## 4 2019  High      1.53 
## 5 2019  Low       0.523
## 6 2019  Medium    1.04 
## 7 2021  High      2.16 
## 8 2021  Low       0.750
## 9 2021  Medium    1.54

adread_inc23c <- adread_inc23 %>%
  mutate(class = case_when(
    capital %in% c(1, 2, 3) ~ "Low",
    capital %in% c(4, 5) ~ "Medium",
    capital %in% c(6, 7, 8) ~ "High"
  ))

adread_avg23inc <- adread_inc23c %>%
  group_by(year, class) %>%
  summarise(avg_inc23c = mean(ebook_read, na.rm = TRUE), .groups = "drop")
adread_avg23inc

## # A tibble: 3 × 3
##   year  class  avg_inc23c
##   <chr> <chr>       <dbl>
## 1 2023  High        10.9 
## 2 2023  Low          9.05
## 3 2023  Medium       6.96

Using mutate function to regroup data of “capital” into 3 classes, “Low”, ”Medium”, and “High” and create a new column named “class” (same process in combined data frame of 2017, 2019, 2021 and data frame 2023). Using select and summarise function to calculate the average e-book consumption of adults by class, “high-income”, “middle-income”, “low-income”, and year (Doing the same work for 2023 data frame, “adread_inc23c”).

p <- ggplot(data = adread_avginc,
            mapping = aes(x = factor(class, levels = c("Low", "Medium", "High")), y = avg_incc)) +
  geom_bar(aes(fill = year), stat = "identity") +   
  facet_wrap(~ year, nrow = 1) +                                
  labs(
    title = "Ebook reading volume by Capital income of households (2017, 2019, 2021)",
    x = "Class of capital income of households",
    y = "Average of Annual reading volume (ebook)"
  ) +
  theme_minimal()

p

e <- ggplot(data = adread_avg23inc, aes(x = factor(class, levels = c("Low", "Medium", "High")), y = avg_inc23c)) +
  geom_bar(fill = "orange", stat = "identity") + 
  labs(
    title = "Ebook reading volume by Capital income of households (2023)",
    x = "Class of capital income of households",
    y = "Average of Annual reading volume (ebook)"
  ) +
  theme_minimal()

e

Using “ggplot” function to draw bar graphs of the average e-book consumption of adults by income class and year in all 4 years. Using ”facet_wrap” function to combine three graphs of 2017, 2019, 2021 into one. Using “levels” function to show the income class in the order of "low-income, middle-income, and high-income” in the graph. Using ”labs” function to label title and x, y value. Using “ggsave” function to save the graph as .png file.

As a result of processing, visualizing, and analyzing the data, surprisingly, it was found that the high-speed acquisition consumes more e-books in all four years. The result is completely contrary to our hypothesis. According to a paper of e-book use, e-book consumers are the most sensitive to e-book prices. It was found that the higher the e-book price, the more negative effect to the satisfaction with e-book use. The appropriate price for e-books the consumers think is 30 percent of the cost of print books, but most e-books currently cost about 70 percent. Moreover, according to a survey by the Korea Consumer Agency, current e-book consumers have the lowest satisfaction with price. We have the goal of increasing the consumption of e-books by low- and middle-income customers, so we think we can achieve that goal by solving the problem of the current price. The solution we came up with is to provide them with a lower price e-book service or to provide a space where they can use e-books for free.

Figure 4

Hypothesis 3 : Age - The younger the age group, the more e-books they will read.

adread_23 %>%
  select("연령", "X.독서.생활.지난.1년간.독서량_전자책") %>% 
  rename("age"="연령","ebook_read"="X.독서.생활.지난.1년간.독서량_전자책") %>% 
  mutate(ebook_read = ifelse(is.na(ebook_read),0,ebook_read)) -> adread_23

adread_21 %>%
  select("SQ1_연령", "문6_2..전자책.독서량") %>% 
  rename("age"="SQ1_연령","ebook_read"="문6_2..전자책.독서량") %>% 
  mutate(ebook_read = ifelse(is.na(ebook_read),0,ebook_read)) -> adread_21

adread_19 %>%
  select("연령", "X2..전자책.독서량") %>% 
  rename("age"="연령","ebook_read"="X2..전자책.독서량") %>% 
  mutate(ebook_read = ifelse(is.na(ebook_read),0,ebook_read)) -> adread_19

adread_17 %>%
  select("연령", "X2.전자책연간독서량") %>% 
  rename("age"="연령","ebook_read"="X2.전자책연간독서량") %>% 
  mutate(ebook_read = ifelse(is.na(ebook_read),0,ebook_read)) -> adread_17

Select 2 columns that refer to ‘age’ and ‘the volume of e-books read in a year’. Rename each column as ‘age’ and ‘ebook_read’. If the value is missing - ‘NA’ values - then read it as ‘0’, if not maintain the value. Save this modified data frame in the variable ‘ebook_(year)’ again. Do this to all the years provided – 2023, 2021, 2019, 2017.

p23 <- ggplot(data=adread_23,mapping=aes(x=age,y=ebook_read))
p23 + geom_point(color="black",alpha=0.3) +
  labs(x="age",y="23ebook_read") +
  theme_minimal()

p21 <- ggplot(data=adread_21,mapping=aes(x=age,y=ebook_read))
p21 + geom_point(color="black",alpha=0.3) +
  labs(x="age",y="21ebook_read") +
  theme_minimal()

p19 <- ggplot(data=adread_19,mapping=aes(x=age,y=ebook_read))
p19 + geom_point(color="black",alpha=0.3) +
  labs(x="age",y="19ebook_read") +
  scale_x_continuous(breaks = c(1, 2, 3, 4, 5), 
                     labels = c("19-29", "30-39", "40-49", "50-59", "over60")) +
  theme_minimal()

p17 <- ggplot(data=adread_17,mapping=aes(x=age,y=ebook_read))
p17 + geom_point(color="black",alpha=0.3) +
  labs(x="age",y="17ebook_read") +
  theme_minimal()

Use the ggplot() function to visualize the data of each year – the data is the data frame made earlier and the x-axis is ‘age’ while the y-axis is ‘ebook_read’. Draw a scatter plot with the function geom_point – the color of the dot is ‘black’ and the transparency is 0.3. The title/label of the x-axis is ‘age’ and the y-axis is ‘(year)ebook_read’. Save the graphs in a png file. Do this to all the years provided – 2023, 2021, 2019, 2017.

The third hypothesis of our research question is that “the younger the age group, the more e-books they will consume”. We made 4 graphs of each year to see if this will turn out to be true, so now let’s see the results. This is a scatter plot showing the volume of e-books read in the year 2017 by age. The dots here mean the number of e-books read by a specific person of age. So, the darker the graph is, the more concentrated the people in a particular age are which means that more people read the same number of books in the same age. What we need to look at is the density of the dots made in each age and compare this with the graphs. Here are the graphs of 2017, 2019, 2021, 2023 in order. The age data was exceptionally categorized in 2019, so the labels are different in 2019. In all the graphs, the older the age, the lower the density which means the number of e-books are approaching ‘0’. And the younger the age, the higher the density is, which means younger people read more e-books. Looking through all of the graphs in order, we can see that the gap between young people and the elderly widens as the years go by. The insight we can get from this analysis is that the younger the age group, the more e-books they consume.

Results

We drew 3 insights from each of our hypotheses and from this we thought genre focused promotions are needed along with increasing the accessibility of low-capital households and the elderly on e-books. From this we drew 3 strategies for our research question. The first strategy we thought up was for the publishing companies to make booths on ‘self-help’ genre e-books – which were the most read genre from our analysis – in expositions. Next is for the public institutions to provide places in public libraries to read e-books which will help increase the accessibility of e-books. Last is for the local communities to provide media literacy programs on e-book instructions for the elderly. Back to our research question – “What are the demographic factors that determine e-book usage, and what types of books do e-book users enjoy?” – this is our conclusion and strategies of ways to vitalize e-book usage.

Proposals to revitalize E-books

임도원, 임준용, 전영민

2024.12.11