In this project, we explore the correlation between citations and subscribers for economic journals, addressing the following question: Does the number of citations received by journals in different fields have any correlation with the number of subscribers they attract? Through our data analysis and visualization, we aim to uncover insights into the impact and readership of scholarly journals in the economic domain.

#Load packages
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
#Load data from Github
JournalsDataset <- read_csv("https://raw.githubusercontent.com/Ismoo225/RBRIDGE/main/Journals.csv")
## New names:
## • `` -> `...1`
## Rows: 180 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): ...1, title, publisher, society, field
## dbl (6): price, pages, charpp, citations, foundingyear, subs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(JournalsDataset, 10)

Let’s start by exploring the economic journal dataset to understand its structure and gain some insights with statistics

summary(JournalsDataset)
##      ...1              title            publisher           society         
##  Length:180         Length:180         Length:180         Length:180        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      price            pages            charpp       citations      
##  Min.   :  20.0   Min.   : 167.0   Min.   :1782   Min.   :  21.00  
##  1st Qu.: 134.5   1st Qu.: 548.8   1st Qu.:2715   1st Qu.:  97.75  
##  Median : 282.0   Median : 693.0   Median :3010   Median : 262.50  
##  Mean   : 417.7   Mean   : 827.7   Mean   :3233   Mean   : 647.06  
##  3rd Qu.: 540.8   3rd Qu.: 974.2   3rd Qu.:3477   3rd Qu.: 656.00  
##  Max.   :2120.0   Max.   :2632.0   Max.   :6859   Max.   :8999.00  
##   foundingyear       subs           field          
##  Min.   :1844   Min.   :   2.0   Length:180        
##  1st Qu.:1963   1st Qu.:  52.0   Class :character  
##  Median :1973   Median : 122.5   Mode  :character  
##  Mean   :1967   Mean   : 196.9                     
##  3rd Qu.:1982   3rd Qu.: 268.2                     
##  Max.   :1996   Max.   :1098.0
#Displaying mean and median of both Citations and subscribers
mean(JournalsDataset$citations)
## [1] 647.0556
median(JournalsDataset$citations)
## [1] 262.5
mean(JournalsDataset$subs)
## [1] 196.8667
median(JournalsDataset$subs)
## [1] 122.5

These findings suggest that there is a considerable variation in the number of citations and subscribers among the journals. While some journals have a substantial impact and readership, others may have relatively lower influence and readership.

Now we need to create a subset of the data that includes the relevant columns. Additionally, we will create a new column to calculate the ratio of citations to subscribers for each journal.

Subset_journals <- JournalsDataset %>%
  select(title, citations, subs,field)
print(Subset_journals)
## # A tibble: 180 × 4
##    title                                               citations  subs field    
##    <chr>                                                   <dbl> <dbl> <chr>    
##  1 Asian-Pacific Economic Literature                          21    14 General  
##  2 South African Journal of Economic History                  22    59 Economic…
##  3 Computational Economics                                    22    17 Speciali…
##  4 MOCT-MOST Economic Policy in Transitional Economics        22     2 Area Stu…
##  5 Journal of Socio-Economics                                 24    96 Interdis…
##  6 Labour Economics                                           24    15 Labor    
##  7 Environment and Development Economics                      24    14 Developm…
##  8 Review of Radical Political Economics                      27   202 Speciali…
##  9 Economics of Planning                                      28    46 Area Stu…
## 10 Metroeconomica                                             30    46 General  
## # ℹ 170 more rows
#Creating new column citations_per_subscribers
Subset_journals <- Subset_journals %>%
  mutate(citations_per_subscribers = citations / subs)

print(Subset_journals)
## # A tibble: 180 × 5
##    title                            citations  subs field citations_per_subscr…¹
##    <chr>                                <dbl> <dbl> <chr>                  <dbl>
##  1 Asian-Pacific Economic Literatu…        21    14 Gene…                  1.5  
##  2 South African Journal of Econom…        22    59 Econ…                  0.373
##  3 Computational Economics                 22    17 Spec…                  1.29 
##  4 MOCT-MOST Economic Policy in Tr…        22     2 Area…                 11    
##  5 Journal of Socio-Economics              24    96 Inte…                  0.25 
##  6 Labour Economics                        24    15 Labor                  1.6  
##  7 Environment and Development Eco…        24    14 Deve…                  1.71 
##  8 Review of Radical Political Eco…        27   202 Spec…                  0.134
##  9 Economics of Planning                   28    46 Area…                  0.609
## 10 Metroeconomica                          30    46 Gene…                  0.652
## # ℹ 170 more rows
## # ℹ abbreviated name: ¹​citations_per_subscribers
#Let's list from highest to lowest ratio
Subset_journals <- Subset_journals %>%
  arrange(desc(citations_per_subscribers))

print(Subset_journals)
## # A tibble: 180 × 5
##    title                            citations  subs field citations_per_subscr…¹
##    <chr>                                <dbl> <dbl> <chr>                  <dbl>
##  1 Research Policy                        922    34 Busi…                   27.1
##  2 Econometrica                          7943   346 Gene…                   23.0
##  3 Journal of Econometrics               2479   129 Econ…                   19.2
##  4 Health Economics                       544    29 Heal…                   18.8
##  5 Journal of Economic Theory            2514   165 Theo…                   15.2
##  6 Ecological Economics                   499    40 Natu…                   12.5
##  7 Journal of Financial Economics        2676   231 Fina…                   11.6
##  8 Economics Letters                      930    81 Gene…                   11.5
##  9 MOCT-MOST Economic Policy in Tr…        22     2 Area…                   11  
## 10 Journal of Economic Dynamics & …       636    58 Theo…                   11.0
## # ℹ 170 more rows
## # ℹ abbreviated name: ¹​citations_per_subscribers

We know that “Research policy” has the highest citations per subscribers ratio. It can be interpret that on average, each subscriber is citing “Research policy” frequently suggesting that the journal’s content is relevant, influential and valuable to the academic community it serves. It is also interesting to find out that this specific journal operates in the business field.

Let’s visualize the relationship between citations and subscribers using a scatter plot. We will also create box plots to compare the distribution of citations and subscribers, along with a histogram to visualize the distribution of the “citations_per_subscriber” ratio.

#Scatter plot: Citation vs subscribers
scatter_plot<- ggplot(Subset_journals, aes(x=citations, y= subs)) +
  geom_point() +
  labs(title = "Citations vs Subscribers",
       x= "Number of Citations",
       y="Number of Subscribers")
print(scatter_plot)

#Histogram: Citations per Subscribers
Histogram_plot <- ggplot(Subset_journals,aes(x=citations_per_subscribers)) +
  geom_histogram(binwidth = 1) +
  labs(title = "Distribution of Citations per Subscribers",
       x= "Citation per Subscriber",
       y= "Frequency")
print(Histogram_plot)

We notice a trend going upward from left to right in the scatter plot meaning there is a positive correlation between the two attributes.

Let’s also see what can a boxplot of this dataset can tell us

box_plot <- ggplot(Subset_journals, aes(x = 1, y = subs)) +
  geom_boxplot() +
  labs(title = "Box Plot of Subscribers",
       x = "",
       y = "Number of Subscribers")
print(box_plot)

After exploring the dataset and performing data wrangling, we analyzed the relationship between the number of citations and subscribers for economic journals. The scatter plot helped us visualize the distribution of citations and subscribers, and the box plots provided insights into the spread and central tendencies of both variables. Additionally, the histogram depicted the distribution of the “citations_per_subscriber” ratio.

Through this analysis, we can observe a positive correlation patterns between the two variables and draw meaningful conclusions about the popularity and impact of economic journals in different fields. The findings from this analysis can be valuable for publishers, authors, and researchers in understanding the dynamics of journal readership and scholarly impact.