library(ggplot2)
library(tidyr)
# Mock search results:
supernovas = c(3,7,10,30,2,1,1,25)
blackholes = c(110,55,55,15,3,10,12,1)
exoplanet = c(5,12,8,100,60,50,12,2)
# From the ADS, these will not be returned as simple vectors, rather an object made up of nested lists that will need to be indexed properly. In Python, once the authors attribute is identified I will need to loop through the list of lists and count the number of elements in the authors list to get the resulting number of authers per paper per topic. Data types will need to be taken into account for iteration.
# Here I create a dataframe of the results. This may not be necessary in Python if it can be done cleanly, but I will do it with the module Pandas if it seems adventageous
space_df <- data.frame(supernovas, blackholes,exoplanet)
space_df
## supernovas blackholes exoplanet
## 1 3 110 5
## 2 7 55 12
## 3 10 55 8
## 4 30 15 100
## 5 2 3 60
## 6 1 10 50
## 7 1 12 12
## 8 25 1 2
# I cleaned up the dataframe to make it easier to visualize. Here I used tidyr.
tidy_space <- space_df %>% gather(search, auth_count)
tidy_space
## search auth_count
## 1 supernovas 3
## 2 supernovas 7
## 3 supernovas 10
## 4 supernovas 30
## 5 supernovas 2
## 6 supernovas 1
## 7 supernovas 1
## 8 supernovas 25
## 9 blackholes 110
## 10 blackholes 55
## 11 blackholes 55
## 12 blackholes 15
## 13 blackholes 3
## 14 blackholes 10
## 15 blackholes 12
## 16 blackholes 1
## 17 exoplanet 5
## 18 exoplanet 12
## 19 exoplanet 8
## 20 exoplanet 100
## 21 exoplanet 60
## 22 exoplanet 50
## 23 exoplanet 12
## 24 exoplanet 2
# In Python I may use Scipy to get the mean of the results for number of authors for each search. Here I'm deonstrating that using tapply in R
tapply(tidy_space$auth_count, tidy_space$search, FUN=mean)
## supernovas blackholes exoplanet
## 9.875 32.625 31.125
# I would then like to show a boxplot of the results to allow for visual inspection of the results. I have also created a violin plot here using ggplot2 and will try to do the same in Python.
space_box <- ggplot(tidy_space, aes(y= auth_count, x= search, fill = search)) + geom_boxplot()
space_box

space_v <- ggplot(tidy_space, aes(y= auth_count, x= search, fill = search)) + geom_violin() + geom_point()
space_v

An example of some of the Python code I have been experimenting with to achieve the above described analysis can be seen here