In this project, we explore the correlation between citations and subscribers for economic journals, addressing the following question: Does the number of citations received by journals in different fields have any correlation with the number of subscribers they attract? Through our data analysis and visualization, we aim to uncover insights into the impact and readership of scholarly journals in the economic domain.
#Load packages
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
#Load data from Github
JournalsDataset <- read_csv("https://raw.githubusercontent.com/Ismoo225/RBRIDGE/main/Journals.csv")
## New names:
## • `` -> `...1`
## Rows: 180 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): ...1, title, publisher, society, field
## dbl (6): price, pages, charpp, citations, foundingyear, subs
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(JournalsDataset, 10)
Let’s start by exploring the economic journal dataset to understand its structure and gain some insights with statistics
summary(JournalsDataset)
## ...1 title publisher society
## Length:180 Length:180 Length:180 Length:180
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## price pages charpp citations
## Min. : 20.0 Min. : 167.0 Min. :1782 Min. : 21.00
## 1st Qu.: 134.5 1st Qu.: 548.8 1st Qu.:2715 1st Qu.: 97.75
## Median : 282.0 Median : 693.0 Median :3010 Median : 262.50
## Mean : 417.7 Mean : 827.7 Mean :3233 Mean : 647.06
## 3rd Qu.: 540.8 3rd Qu.: 974.2 3rd Qu.:3477 3rd Qu.: 656.00
## Max. :2120.0 Max. :2632.0 Max. :6859 Max. :8999.00
## foundingyear subs field
## Min. :1844 Min. : 2.0 Length:180
## 1st Qu.:1963 1st Qu.: 52.0 Class :character
## Median :1973 Median : 122.5 Mode :character
## Mean :1967 Mean : 196.9
## 3rd Qu.:1982 3rd Qu.: 268.2
## Max. :1996 Max. :1098.0
#Displaying mean and median of both Citations and subscribers
mean(JournalsDataset$citations)
## [1] 647.0556
median(JournalsDataset$citations)
## [1] 262.5
mean(JournalsDataset$subs)
## [1] 196.8667
median(JournalsDataset$subs)
## [1] 122.5
These findings suggest that there is a considerable variation in the number of citations and subscribers among the journals. While some journals have a substantial impact and readership, others may have relatively lower influence and readership.
Now we need to create a subset of the data that includes the relevant columns. Additionally, we will create a new column to calculate the ratio of citations to subscribers for each journal.
Subset_journals <- JournalsDataset %>%
select(title, citations, subs,field)
print(Subset_journals)
## # A tibble: 180 × 4
## title citations subs field
## <chr> <dbl> <dbl> <chr>
## 1 Asian-Pacific Economic Literature 21 14 General
## 2 South African Journal of Economic History 22 59 Economic…
## 3 Computational Economics 22 17 Speciali…
## 4 MOCT-MOST Economic Policy in Transitional Economics 22 2 Area Stu…
## 5 Journal of Socio-Economics 24 96 Interdis…
## 6 Labour Economics 24 15 Labor
## 7 Environment and Development Economics 24 14 Developm…
## 8 Review of Radical Political Economics 27 202 Speciali…
## 9 Economics of Planning 28 46 Area Stu…
## 10 Metroeconomica 30 46 General
## # ℹ 170 more rows
#Creating new column citations_per_subscribers
Subset_journals <- Subset_journals %>%
mutate(citations_per_subscribers = citations / subs)
print(Subset_journals)
## # A tibble: 180 × 5
## title citations subs field citations_per_subscr…¹
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Asian-Pacific Economic Literatu… 21 14 Gene… 1.5
## 2 South African Journal of Econom… 22 59 Econ… 0.373
## 3 Computational Economics 22 17 Spec… 1.29
## 4 MOCT-MOST Economic Policy in Tr… 22 2 Area… 11
## 5 Journal of Socio-Economics 24 96 Inte… 0.25
## 6 Labour Economics 24 15 Labor 1.6
## 7 Environment and Development Eco… 24 14 Deve… 1.71
## 8 Review of Radical Political Eco… 27 202 Spec… 0.134
## 9 Economics of Planning 28 46 Area… 0.609
## 10 Metroeconomica 30 46 Gene… 0.652
## # ℹ 170 more rows
## # ℹ abbreviated name: ¹citations_per_subscribers
#Let's list from highest to lowest ratio
Subset_journals <- Subset_journals %>%
arrange(desc(citations_per_subscribers))
print(Subset_journals)
## # A tibble: 180 × 5
## title citations subs field citations_per_subscr…¹
## <chr> <dbl> <dbl> <chr> <dbl>
## 1 Research Policy 922 34 Busi… 27.1
## 2 Econometrica 7943 346 Gene… 23.0
## 3 Journal of Econometrics 2479 129 Econ… 19.2
## 4 Health Economics 544 29 Heal… 18.8
## 5 Journal of Economic Theory 2514 165 Theo… 15.2
## 6 Ecological Economics 499 40 Natu… 12.5
## 7 Journal of Financial Economics 2676 231 Fina… 11.6
## 8 Economics Letters 930 81 Gene… 11.5
## 9 MOCT-MOST Economic Policy in Tr… 22 2 Area… 11
## 10 Journal of Economic Dynamics & … 636 58 Theo… 11.0
## # ℹ 170 more rows
## # ℹ abbreviated name: ¹citations_per_subscribers
We know that “Research policy” has the highest citations per subscribers ratio. It can be interpret that on average, each subscriber is citing “Research policy” frequently suggesting that the journal’s content is relevant, influential and valuable to the academic community it serves. It is also interesting to find out that this specific journal operates in the business field.
Let’s visualize the relationship between citations and subscribers using a scatter plot. We will also create box plots to compare the distribution of citations and subscribers, along with a histogram to visualize the distribution of the “citations_per_subscriber” ratio.
#Scatter plot: Citation vs subscribers
scatter_plot<- ggplot(Subset_journals, aes(x=citations, y= subs)) +
geom_point() +
labs(title = "Citations vs Subscribers",
x= "Number of Citations",
y="Number of Subscribers")
print(scatter_plot)
#Histogram: Citations per Subscribers
Histogram_plot <- ggplot(Subset_journals,aes(x=citations_per_subscribers)) +
geom_histogram(binwidth = 1) +
labs(title = "Distribution of Citations per Subscribers",
x= "Citation per Subscriber",
y= "Frequency")
print(Histogram_plot)
We notice a trend going upward from left to right in the scatter plot
meaning there is a positive correlation between the two attributes.
Let’s also see what can a boxplot of this dataset can tell us
box_plot <- ggplot(Subset_journals, aes(x = 1, y = subs)) +
geom_boxplot() +
labs(title = "Box Plot of Subscribers",
x = "",
y = "Number of Subscribers")
print(box_plot)
After exploring the dataset and performing data wrangling, we analyzed the relationship between the number of citations and subscribers for economic journals. The scatter plot helped us visualize the distribution of citations and subscribers, and the box plots provided insights into the spread and central tendencies of both variables. Additionally, the histogram depicted the distribution of the “citations_per_subscriber” ratio.
Through this analysis, we can observe a positive correlation patterns between the two variables and draw meaningful conclusions about the popularity and impact of economic journals in different fields. The findings from this analysis can be valuable for publishers, authors, and researchers in understanding the dynamics of journal readership and scholarly impact.