At some point during a PhD, many PhD students go through an exam called the comprehensive exam (comps for short) to demonstrate their knowledge of their chosen field. At the Organizational Behavior group at McGill University where I am currently doing my PhD, our comprehensive exam is tailored to our own research interests. As such, given that my research draws from two streams of research: novelty reception and social networks and gender, my reading list for the exam includes 100+ research papers that examine one of two questions:

When studying for the exam, I applied Natural Language Processing (NLP) techniques to the paper abstracts to find the overarching themes across the papers. These analyses, in conjunction with reading the papers, help me develop a better understanding of what scientists know so far about novelty reception and about social networks and gender.

In this post, I walked through the first step to this process - extracting information about the research papers in my reading list using their citation using R packages tidytext and stringr. This information include the year the paper was published, the authors of the paper, the title of the paper, and the journal it was published in. Please note that the codes I wrote are to extract paper information from APA citation, and they might need to be adjusted for other types of citation.

First, let’s read the google sheet I created that contains my reading list and pick 5 random rows to see what the data looks like.

# read google docs file
gsheet::gsheet2tbl('https://docs.google.com/spreadsheets/d/1ic1Zc3CpXZyiYfD5whrHLizMN83WXyxjipN5moUtVG8/edit#gid=0') -> reading

# read the functions I wrote to create table with scroll box
source("/Users/mac/Library/CloudStorage/OneDrive-McGillUniversity/Work/Projects/Social cap and gender/Film-maker-network/functions.R")

set.seed(09011996)
reading %>% 
  sample_n(5) %>% 
  kbl_2()
Citation Abstract Topic
Sgourev, S. V. (2013). How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation. Organization Science, 24(6), 1601-1617. In structural analyses of innovation, one substantive question looms large: What makes radical innovation possible if peripheral actors are more likely to originate radical ideas but are poorly positioned to promote them? An inductive study of the rise of Cubism, a revolutionary paradigm that overthrew classic principles of representation in art, results in a model where not only the periphery moves toward the core through collective action, as typically asserted, but the core also moves toward the periphery, becoming more receptive to radical ideas. The fragmentation of the art market in early 20th-century Paris served as the trigger. The proliferation of market niches and growing ambiguity over evaluation standards dramatically reduced the costs of experimentation in the periphery and the ability of the core to suppress radical ideas. A multilevel analysis linking individual creativity, peer networks, and the art field reveals how market developments fostered Spanish Cubist Pablo Picasso’s experiments and facilitated their diffusion in the absence of public support, a coherent movement, and even his active involvement. If past research attests to the importance of framing innovations and mobilizing resources in their support, this study brings attention to shifts in the structure of opportunities to do so. Novelty reception
Mannucci, P. V., & Perry-Smith, J. E. (2021). “Who are you going to call?” Network activation in creative idea generation and elaboration. Academy of Management Journal, (ja). Considering creativity as a journey beyond idea generation, scholars have theorized that different ties are beneficial in different phases. As individuals usually possess different types of ties, selecting the optimal ties in each phase and changing ties as needed are central activities for creative success. We identify the types of ties (weak or strong) that are helpful in idea generation and idea elaboration, and given this understanding, whether individuals activate ties in each phase accordingly. In an experimental study of individuals conversing with their ties, we provide evidence of the causal effects of weak and strong ties on idea generation and idea elaboration. We also find that individuals do not always activate ties optimally and identify network size and risk as barriers. Our results in a series of studies reveal that individuals with large networks, despite providing more opportunity to activate both strong and weak ties, activate fewer weak ties and are less likely to switch ties across phases than individuals with smaller networks, particularly when creativity is perceived as a high-risk endeavor. Finally, we find that activating the wrong ties leads to either dropping creative ideas or pursuing uncreative ones. Novelty reception
DiMaggio, P., & Garip, F. (2012). Network effects and social inequality. Annual review of sociology, 38, 93-118. Students of social inequality have noted the presence of mechanisms militating toward cumulative advantage and increasing inequality. Social scientists have established that individuals’ choices are influenced by those of their network peers in many social domains. We suggest that the ubiquity of network effects and tendencies toward cumulative advantage are related. Inequality is exacerbated when effects of individual differences are multiplied by social networks: when persons must decide whether to adopt beneficial practices; when network externalities, social learning, or normative pressures influence adoption decisions; and when networks are homophilous with respect to individual characteristics that predict such decisions. We review evidence from literatures on network effects on technology, labor markets, education, demography, and health; identify several mechanisms through which networks may generate higher levels of inequality than one would expect based on differences in initial endowments alone; consider cases in which network effects may ameliorate inequality; and describe research priorities. Network and gender
McDonald, S., & Elder Jr, G. H. (2006). When does social capital matter? Non-searching for jobs across the life course. Social Forces, 85(1), 521-549. Non-searchers - people who get their jobs without engaging in a job search - are often excluded from investigations of the role of personal relationships in job finding processes. This practice fails to capture the scope of informal job matching activity and underestimates the effectiveness of social capital. Moreover, studies typically obtain average estimates of social capital effectiveness across broad age ranges, obscuring variation across the life course. Analysis of early career and mid-career job matching shows that non-searching is associated with significant advantages over formal job searching. However, these benefits accrue only during mid-career and primarily among highly experienced male non-searchers. The results highlight the need to examine life course variations in social capital effectiveness and the role of non-searching as an important informal mechanism in the maintenance of gender inequality. Network and gender
Berg, J. M. (2016). Balancing on the creative highwire: Forecasting the success of novel ideas in organizations. Administrative Science Quarterly, 61(3), 433-468. Betting on the most promising new ideas is key to creativity and innovation in organizations, but predicting the success of novel ideas can be difficult. To select the best ideas, creators and managers must excel at creative forecasting, the skill of predicting the outcomes of new ideas. Using both a field study of 339 professionals in the circus arts industry and a lab experiment, I examine the conditions for accurate creative forecasting, focusing on the effect of creators’ and managers’ roles. In the field study, creators and managers forecasted the success of new circus acts with audiences, and the accuracy of these forecasts was assessed using data from 13,248 audience members. Results suggest that creators were more accurate than managers when forecasting about others’ novel ideas, but not their own. This advantage over managers was undermined when creators previously had poor ideas that were successful in the marketplace anyway. Results from the lab experiment show that creators’ advantage over managers in predicting success may be tied to the emphasis on both divergent thinking (idea generation) and convergent thinking (idea evaluation) in the creator role, while the manager role emphasizes only convergent thinking. These studies highlight that creative forecasting is a critical bridge linking creativity and innovation, shed light on the importance of roles in creative forecasting, and advance theory on why creative success is difficult to sustain over time. Novelty reception

From the table above, we can see that for each paper, we have its citation, its abstract, and whether it is about novelty reception or network and gender (i.e., the two research streams I focus on for my exam). From the paper citation, we can create additional variables about the paper. Let’s start with the year the paper was published!

Publication year

We can create a variable for publication year by extracting 4-letter words from citation that starts with 2 then 0 (i.e., articles published in the 21st century) or with 1 then 9 (i.e., articles published in the 20th century).

Once we get that done, we can use publication year to create a variable for the decade the paper was published in. To this end, we divide the year by 10 and return the integer part of the results (i.e., integer division).

reading %>% 
  # convert column name to lower case 
  janitor::clean_names() %>% 
  mutate(
    # extract publication year from citation
    year = str_extract_all(citation, "[2][0][0-9]{2}|[1][9][0-9]{2}", simplify = T)[,1] %>% as.numeric(),
    # create publication decade from publication year
    decade = (year %/% 10) * 10, 
    decade = paste0(decade, "s")) -> reading

Let’s see what the data looks like now.

set.seed(09011996)
reading %>% 
  sample_n(5) %>% 
  kbl_2()
citation abstract topic year decade
Sgourev, S. V. (2013). How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation. Organization Science, 24(6), 1601-1617. In structural analyses of innovation, one substantive question looms large: What makes radical innovation possible if peripheral actors are more likely to originate radical ideas but are poorly positioned to promote them? An inductive study of the rise of Cubism, a revolutionary paradigm that overthrew classic principles of representation in art, results in a model where not only the periphery moves toward the core through collective action, as typically asserted, but the core also moves toward the periphery, becoming more receptive to radical ideas. The fragmentation of the art market in early 20th-century Paris served as the trigger. The proliferation of market niches and growing ambiguity over evaluation standards dramatically reduced the costs of experimentation in the periphery and the ability of the core to suppress radical ideas. A multilevel analysis linking individual creativity, peer networks, and the art field reveals how market developments fostered Spanish Cubist Pablo Picasso’s experiments and facilitated their diffusion in the absence of public support, a coherent movement, and even his active involvement. If past research attests to the importance of framing innovations and mobilizing resources in their support, this study brings attention to shifts in the structure of opportunities to do so. Novelty reception 2013 2010s
Mannucci, P. V., & Perry-Smith, J. E. (2021). “Who are you going to call?” Network activation in creative idea generation and elaboration. Academy of Management Journal, (ja). Considering creativity as a journey beyond idea generation, scholars have theorized that different ties are beneficial in different phases. As individuals usually possess different types of ties, selecting the optimal ties in each phase and changing ties as needed are central activities for creative success. We identify the types of ties (weak or strong) that are helpful in idea generation and idea elaboration, and given this understanding, whether individuals activate ties in each phase accordingly. In an experimental study of individuals conversing with their ties, we provide evidence of the causal effects of weak and strong ties on idea generation and idea elaboration. We also find that individuals do not always activate ties optimally and identify network size and risk as barriers. Our results in a series of studies reveal that individuals with large networks, despite providing more opportunity to activate both strong and weak ties, activate fewer weak ties and are less likely to switch ties across phases than individuals with smaller networks, particularly when creativity is perceived as a high-risk endeavor. Finally, we find that activating the wrong ties leads to either dropping creative ideas or pursuing uncreative ones. Novelty reception 2021 2020s
DiMaggio, P., & Garip, F. (2012). Network effects and social inequality. Annual review of sociology, 38, 93-118. Students of social inequality have noted the presence of mechanisms militating toward cumulative advantage and increasing inequality. Social scientists have established that individuals’ choices are influenced by those of their network peers in many social domains. We suggest that the ubiquity of network effects and tendencies toward cumulative advantage are related. Inequality is exacerbated when effects of individual differences are multiplied by social networks: when persons must decide whether to adopt beneficial practices; when network externalities, social learning, or normative pressures influence adoption decisions; and when networks are homophilous with respect to individual characteristics that predict such decisions. We review evidence from literatures on network effects on technology, labor markets, education, demography, and health; identify several mechanisms through which networks may generate higher levels of inequality than one would expect based on differences in initial endowments alone; consider cases in which network effects may ameliorate inequality; and describe research priorities. Network and gender 2012 2010s
McDonald, S., & Elder Jr, G. H. (2006). When does social capital matter? Non-searching for jobs across the life course. Social Forces, 85(1), 521-549. Non-searchers - people who get their jobs without engaging in a job search - are often excluded from investigations of the role of personal relationships in job finding processes. This practice fails to capture the scope of informal job matching activity and underestimates the effectiveness of social capital. Moreover, studies typically obtain average estimates of social capital effectiveness across broad age ranges, obscuring variation across the life course. Analysis of early career and mid-career job matching shows that non-searching is associated with significant advantages over formal job searching. However, these benefits accrue only during mid-career and primarily among highly experienced male non-searchers. The results highlight the need to examine life course variations in social capital effectiveness and the role of non-searching as an important informal mechanism in the maintenance of gender inequality. Network and gender 2006 2000s
Berg, J. M. (2016). Balancing on the creative highwire: Forecasting the success of novel ideas in organizations. Administrative Science Quarterly, 61(3), 433-468. Betting on the most promising new ideas is key to creativity and innovation in organizations, but predicting the success of novel ideas can be difficult. To select the best ideas, creators and managers must excel at creative forecasting, the skill of predicting the outcomes of new ideas. Using both a field study of 339 professionals in the circus arts industry and a lab experiment, I examine the conditions for accurate creative forecasting, focusing on the effect of creators’ and managers’ roles. In the field study, creators and managers forecasted the success of new circus acts with audiences, and the accuracy of these forecasts was assessed using data from 13,248 audience members. Results suggest that creators were more accurate than managers when forecasting about others’ novel ideas, but not their own. This advantage over managers was undermined when creators previously had poor ideas that were successful in the marketplace anyway. Results from the lab experiment show that creators’ advantage over managers in predicting success may be tied to the emphasis on both divergent thinking (idea generation) and convergent thinking (idea evaluation) in the creator role, while the manager role emphasizes only convergent thinking. These studies highlight that creative forecasting is a critical bridge linking creativity and innovation, shed light on the importance of roles in creative forecasting, and advance theory on why creative success is difficult to sustain over time. Novelty reception 2016 2010s

Looks like everything is as it should be. Let’s see how many papers in my list are published across years.

reading %>% 
  count(year) %>% 
  ggplot(aes(year, n)) +
  geom_col(show.legend = FALSE, fill = "#ca225e") +
  labs(y = "Number of papers", x = NULL)

It looks like most of the papers on my list were published within the last 10 years. The few papers published before 2000 must be the seminal papers on the topics I study.

Let’s see if the number of papers by year differs across my two comps themes: novelty reception and network and gender.

reading %>% 
  count(year, topic, sort= T) %>% 
  ggplot(aes(year, n, fill = topic)) + 
  geom_col(show.legend = FALSE) +
  facet_grid(rows = vars(topic)) +
  scale_fill_manual(values = c("#be1558", "#fbcbc9")) +
  scale_y_continuous(breaks = scales::pretty_breaks()) +
  labs(y = "Number of papers", x = NULL)

It seems that the majority of the papers in my list on both novelty reception and network and gender are recent papers published within the last 10 years.

Authors

Next, we can create a variable for author names along with the publication year (e.g., Guilbeault, D., & Centola, D. 2021) by extracting all characters from citation that appear before the closing bracket ).

We can also get the name of first author along with the publication year (e.g., Guilbeault 2021) by extract the first word in the variable we created for authors and merge it with the publication year.

reading %>% 
  # get author information
  mutate(authors = str_replace(citation, "\\).*$", ""), 
         authors = str_replace(authors, "\\(", ""),
         # get first author information
         first_author = word(authors, 1),
         first_author = str_remove_all(first_author, ",")) %>% 
  # merge first author name with publication year
  unite(first_author, c("first_author", "year"), sep = " ", remove = F) -> reading

Once we do this, let’s see what the data looks like now.

set.seed(09011996)
reading %>% 
  sample_n(5) %>% 
  kbl_2()
citation abstract topic first_author year decade authors
Sgourev, S. V. (2013). How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation. Organization Science, 24(6), 1601-1617. In structural analyses of innovation, one substantive question looms large: What makes radical innovation possible if peripheral actors are more likely to originate radical ideas but are poorly positioned to promote them? An inductive study of the rise of Cubism, a revolutionary paradigm that overthrew classic principles of representation in art, results in a model where not only the periphery moves toward the core through collective action, as typically asserted, but the core also moves toward the periphery, becoming more receptive to radical ideas. The fragmentation of the art market in early 20th-century Paris served as the trigger. The proliferation of market niches and growing ambiguity over evaluation standards dramatically reduced the costs of experimentation in the periphery and the ability of the core to suppress radical ideas. A multilevel analysis linking individual creativity, peer networks, and the art field reveals how market developments fostered Spanish Cubist Pablo Picasso’s experiments and facilitated their diffusion in the absence of public support, a coherent movement, and even his active involvement. If past research attests to the importance of framing innovations and mobilizing resources in their support, this study brings attention to shifts in the structure of opportunities to do so. Novelty reception Sgourev 2013 2013 2010s Sgourev, S. V. 2013
Mannucci, P. V., & Perry-Smith, J. E. (2021). “Who are you going to call?” Network activation in creative idea generation and elaboration. Academy of Management Journal, (ja). Considering creativity as a journey beyond idea generation, scholars have theorized that different ties are beneficial in different phases. As individuals usually possess different types of ties, selecting the optimal ties in each phase and changing ties as needed are central activities for creative success. We identify the types of ties (weak or strong) that are helpful in idea generation and idea elaboration, and given this understanding, whether individuals activate ties in each phase accordingly. In an experimental study of individuals conversing with their ties, we provide evidence of the causal effects of weak and strong ties on idea generation and idea elaboration. We also find that individuals do not always activate ties optimally and identify network size and risk as barriers. Our results in a series of studies reveal that individuals with large networks, despite providing more opportunity to activate both strong and weak ties, activate fewer weak ties and are less likely to switch ties across phases than individuals with smaller networks, particularly when creativity is perceived as a high-risk endeavor. Finally, we find that activating the wrong ties leads to either dropping creative ideas or pursuing uncreative ones. Novelty reception Mannucci 2021 2021 2020s Mannucci, P. V., & Perry-Smith, J. E. 2021
DiMaggio, P., & Garip, F. (2012). Network effects and social inequality. Annual review of sociology, 38, 93-118. Students of social inequality have noted the presence of mechanisms militating toward cumulative advantage and increasing inequality. Social scientists have established that individuals’ choices are influenced by those of their network peers in many social domains. We suggest that the ubiquity of network effects and tendencies toward cumulative advantage are related. Inequality is exacerbated when effects of individual differences are multiplied by social networks: when persons must decide whether to adopt beneficial practices; when network externalities, social learning, or normative pressures influence adoption decisions; and when networks are homophilous with respect to individual characteristics that predict such decisions. We review evidence from literatures on network effects on technology, labor markets, education, demography, and health; identify several mechanisms through which networks may generate higher levels of inequality than one would expect based on differences in initial endowments alone; consider cases in which network effects may ameliorate inequality; and describe research priorities. Network and gender DiMaggio 2012 2012 2010s DiMaggio, P., & Garip, F. 2012
McDonald, S., & Elder Jr, G. H. (2006). When does social capital matter? Non-searching for jobs across the life course. Social Forces, 85(1), 521-549. Non-searchers - people who get their jobs without engaging in a job search - are often excluded from investigations of the role of personal relationships in job finding processes. This practice fails to capture the scope of informal job matching activity and underestimates the effectiveness of social capital. Moreover, studies typically obtain average estimates of social capital effectiveness across broad age ranges, obscuring variation across the life course. Analysis of early career and mid-career job matching shows that non-searching is associated with significant advantages over formal job searching. However, these benefits accrue only during mid-career and primarily among highly experienced male non-searchers. The results highlight the need to examine life course variations in social capital effectiveness and the role of non-searching as an important informal mechanism in the maintenance of gender inequality. Network and gender McDonald 2006 2006 2000s McDonald, S., & Elder Jr, G. H. 2006
Berg, J. M. (2016). Balancing on the creative highwire: Forecasting the success of novel ideas in organizations. Administrative Science Quarterly, 61(3), 433-468. Betting on the most promising new ideas is key to creativity and innovation in organizations, but predicting the success of novel ideas can be difficult. To select the best ideas, creators and managers must excel at creative forecasting, the skill of predicting the outcomes of new ideas. Using both a field study of 339 professionals in the circus arts industry and a lab experiment, I examine the conditions for accurate creative forecasting, focusing on the effect of creators’ and managers’ roles. In the field study, creators and managers forecasted the success of new circus acts with audiences, and the accuracy of these forecasts was assessed using data from 13,248 audience members. Results suggest that creators were more accurate than managers when forecasting about others’ novel ideas, but not their own. This advantage over managers was undermined when creators previously had poor ideas that were successful in the marketplace anyway. Results from the lab experiment show that creators’ advantage over managers in predicting success may be tied to the emphasis on both divergent thinking (idea generation) and convergent thinking (idea evaluation) in the creator role, while the manager role emphasizes only convergent thinking. These studies highlight that creative forecasting is a critical bridge linking creativity and innovation, shed light on the importance of roles in creative forecasting, and advance theory on why creative success is difficult to sustain over time. Novelty reception Berg 2016 2016 2010s Berg, J. M. 2016

Title

Next, we can extract the title of the paper from its citation. To this end, we extract all strings in citation that appears after the closing bracket (which removes information on author and publication year) and before the period (which removes information on journal and page number).

reading %>% 
  # extract string after the closing bracket
  mutate(title = str_extract(citation, "\\).*$") %>% 
           # remove everything from the beginning until the first white space
           str_replace("^\\S* ", "") %>%
           # remove everything starting from the first period
           str_replace("\\..*$", "")) -> reading

Again, let’s see what the data looks like now.

set.seed(09011996)
reading %>% 
  sample_n(5) %>% 
  kbl_2()
citation abstract topic first_author year decade authors title
Sgourev, S. V. (2013). How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation. Organization Science, 24(6), 1601-1617. In structural analyses of innovation, one substantive question looms large: What makes radical innovation possible if peripheral actors are more likely to originate radical ideas but are poorly positioned to promote them? An inductive study of the rise of Cubism, a revolutionary paradigm that overthrew classic principles of representation in art, results in a model where not only the periphery moves toward the core through collective action, as typically asserted, but the core also moves toward the periphery, becoming more receptive to radical ideas. The fragmentation of the art market in early 20th-century Paris served as the trigger. The proliferation of market niches and growing ambiguity over evaluation standards dramatically reduced the costs of experimentation in the periphery and the ability of the core to suppress radical ideas. A multilevel analysis linking individual creativity, peer networks, and the art field reveals how market developments fostered Spanish Cubist Pablo Picasso’s experiments and facilitated their diffusion in the absence of public support, a coherent movement, and even his active involvement. If past research attests to the importance of framing innovations and mobilizing resources in their support, this study brings attention to shifts in the structure of opportunities to do so. Novelty reception Sgourev 2013 2013 2010s Sgourev, S. V. 2013 How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation
Mannucci, P. V., & Perry-Smith, J. E. (2021). “Who are you going to call?” Network activation in creative idea generation and elaboration. Academy of Management Journal, (ja). Considering creativity as a journey beyond idea generation, scholars have theorized that different ties are beneficial in different phases. As individuals usually possess different types of ties, selecting the optimal ties in each phase and changing ties as needed are central activities for creative success. We identify the types of ties (weak or strong) that are helpful in idea generation and idea elaboration, and given this understanding, whether individuals activate ties in each phase accordingly. In an experimental study of individuals conversing with their ties, we provide evidence of the causal effects of weak and strong ties on idea generation and idea elaboration. We also find that individuals do not always activate ties optimally and identify network size and risk as barriers. Our results in a series of studies reveal that individuals with large networks, despite providing more opportunity to activate both strong and weak ties, activate fewer weak ties and are less likely to switch ties across phases than individuals with smaller networks, particularly when creativity is perceived as a high-risk endeavor. Finally, we find that activating the wrong ties leads to either dropping creative ideas or pursuing uncreative ones. Novelty reception Mannucci 2021 2021 2020s Mannucci, P. V., & Perry-Smith, J. E. 2021 “Who are you going to call?” Network activation in creative idea generation and elaboration
DiMaggio, P., & Garip, F. (2012). Network effects and social inequality. Annual review of sociology, 38, 93-118. Students of social inequality have noted the presence of mechanisms militating toward cumulative advantage and increasing inequality. Social scientists have established that individuals’ choices are influenced by those of their network peers in many social domains. We suggest that the ubiquity of network effects and tendencies toward cumulative advantage are related. Inequality is exacerbated when effects of individual differences are multiplied by social networks: when persons must decide whether to adopt beneficial practices; when network externalities, social learning, or normative pressures influence adoption decisions; and when networks are homophilous with respect to individual characteristics that predict such decisions. We review evidence from literatures on network effects on technology, labor markets, education, demography, and health; identify several mechanisms through which networks may generate higher levels of inequality than one would expect based on differences in initial endowments alone; consider cases in which network effects may ameliorate inequality; and describe research priorities. Network and gender DiMaggio 2012 2012 2010s DiMaggio, P., & Garip, F. 2012 Network effects and social inequality
McDonald, S., & Elder Jr, G. H. (2006). When does social capital matter? Non-searching for jobs across the life course. Social Forces, 85(1), 521-549. Non-searchers - people who get their jobs without engaging in a job search - are often excluded from investigations of the role of personal relationships in job finding processes. This practice fails to capture the scope of informal job matching activity and underestimates the effectiveness of social capital. Moreover, studies typically obtain average estimates of social capital effectiveness across broad age ranges, obscuring variation across the life course. Analysis of early career and mid-career job matching shows that non-searching is associated with significant advantages over formal job searching. However, these benefits accrue only during mid-career and primarily among highly experienced male non-searchers. The results highlight the need to examine life course variations in social capital effectiveness and the role of non-searching as an important informal mechanism in the maintenance of gender inequality. Network and gender McDonald 2006 2006 2000s McDonald, S., & Elder Jr, G. H. 2006 When does social capital matter? Non-searching for jobs across the life course
Berg, J. M. (2016). Balancing on the creative highwire: Forecasting the success of novel ideas in organizations. Administrative Science Quarterly, 61(3), 433-468. Betting on the most promising new ideas is key to creativity and innovation in organizations, but predicting the success of novel ideas can be difficult. To select the best ideas, creators and managers must excel at creative forecasting, the skill of predicting the outcomes of new ideas. Using both a field study of 339 professionals in the circus arts industry and a lab experiment, I examine the conditions for accurate creative forecasting, focusing on the effect of creators’ and managers’ roles. In the field study, creators and managers forecasted the success of new circus acts with audiences, and the accuracy of these forecasts was assessed using data from 13,248 audience members. Results suggest that creators were more accurate than managers when forecasting about others’ novel ideas, but not their own. This advantage over managers was undermined when creators previously had poor ideas that were successful in the marketplace anyway. Results from the lab experiment show that creators’ advantage over managers in predicting success may be tied to the emphasis on both divergent thinking (idea generation) and convergent thinking (idea evaluation) in the creator role, while the manager role emphasizes only convergent thinking. These studies highlight that creative forecasting is a critical bridge linking creativity and innovation, shed light on the importance of roles in creative forecasting, and advance theory on why creative success is difficult to sustain over time. Novelty reception Berg 2016 2016 2010s Berg, J. M. 2016 Balancing on the creative highwire: Forecasting the success of novel ideas in organizations

Journal and field

Next, we will create a variable for a short-hand version of the journal names (e.g., AMJ for the Academy of Management journal). To this end, we will use regex to detect certain strings reflecting journal names in the citation (e.g., Academy of Management journal) and assign the paper with the short-hand journal name (e.g., AMJ) in the journal variable.

reading %>% 
  mutate(journal = case_when(  
      str_detect(citation, regex("American Sociological Review", ignore_case = T)) ~ "ASR",
      str_detect(citation, regex("Academy of Management journal", ignore_case = T)) ~ "AMJ",
      str_detect(citation, regex("Academy of Management review", ignore_case = T)) ~ "AMR",
      str_detect(citation, regex("Academy of Management discoveries", ignore_case = T)) ~ "Discoveries",
      str_detect(citation, regex("Academy of Management Annals", ignore_case = T)) ~ "Annals",
      str_detect(citation, regex("Academy of Management learning", ignore_case = T)) ~ "Learning",
      str_detect(citation, regex("Administrative Science Quarterly", ignore_case = T)) ~ "ASQ",
      str_detect(citation, "Management science") ~ "Management sci",
      str_detect(citation, regex("American journal of sociology", ignore_case = T)) ~ "AJS",
      str_detect(citation, "Scientific reports") ~ "Nature",
      str_detect(citation, regex("nature", ignore_case = T)) ~ "Nature",      
      str_detect(citation, regex("Social forces", ignore_case = T)) ~ "Soc forces",
      str_detect(citation, "Entrepreneurship Theory and Practice") ~ "ETP",
      
      # Soc net has to be after PSPB, annual review, jom, and Org Science because a few of papers in these journals have the phrase "social networks" in the title. Therefore, we have to categorize papers in PSPB, annual review, jom, and Org Science first, then assign the rest of the papers with the phrase "social networks" in citation to the journal Social Networks
      str_detect(citation, "Journal of Management") ~ "JoM",
      str_detect(citation, regex("Organization Science", ignore_case = T)) ~ "Org Sci",
      
      str_detect(citation, regex("annual", ignore_case = T)) ~ "Annual review",
      str_detect(citation, regex("Social Networks", ignore_case = T)) ~ "Soc Net", 
      str_detect(citation, regex("Social Science Research", ignore_case = T)) ~ "Social science research",
      
      str_detect(citation, regex("Journal of experimental social psychology", ignore_case = T)) ~ "JESP",
      str_detect(citation, "Proceedings of the National Academy of Sciences") ~ "PNAS",
      str_detect(citation, "Psychological review") ~ "Psyc review",
      str_detect(citation, "Psychological science") ~ "Psyc science",
      str_detect(citation, "Strategic Management Journal") ~ "SMJ",
      str_detect(citation, "Journal of personality and social psychology") ~ "JPSP",
      str_detect(citation, "Journal of Applied Psychology") ~ "JAP",
      str_detect(citation, regex("Personality and Social Psychology Bulletin", ignore_case = T)) ~ "PSPB",

      str_detect(citation, regex("Research Policy", ignore_case = T)) ~ "Research policy",
      str_detect(citation, regex("\\bScience\\b", ignore_case = T)) ~ "Science",
      str_detect(citation, regex("venturing", ignore_case = T)) ~ "JBV",
      str_detect(citation, regex("consumer", ignore_case = T)) ~ "Consumer research",
      str_detect(citation, regex("rationality", ignore_case = T)) ~ "Rationality",
      str_detect(citation, regex("Social psychology quarterly", ignore_case = T)) ~ "Social psyc quarterly",
      str_detect(citation, regex("Journal of Political Economy", ignore_case = T)) ~ "Journal of political economy",
      str_detect(citation, regex("Journal of Small Business Management", ignore_case = T)) ~ "Journal of small business management",
      TRUE ~ "others"),
      
      # manually recode those that are book chapters by assigning papers with the words ugly, speading, and routledge in citation as book chapter
  journal = case_when( 
    str_detect(citation, paste(c("ugly", "spreading", "Routledge" ), collapse = '|')) ~ "book chapter",
    
    TRUE ~ as.character(journal))) -> reading

Next, we can use the journal a paper is published in to create the variable for the field, based on our knowledge of whether a journal belongs to specific field such as psychology or contains research across different fields.

reading %>% 
  mutate(field = case_when(
      # management journals
      journal == "AMJ" | journal == "Discoveries" | journal == "Annals" | 
      journal == "AMR" | journal == "Learning" | journal == "ASQ" | journal == "Management sci" | 
      journal == "JoM" | journal == "Org Sci" | journal == "SMJ" | journal == "Research policy"  ~ "management",
      
      # sociology journals
      journal == "ASR" | journal == "AJS" | journal == "Soc forces" | journal == "Soc Net" | journal == "Rationality" ~ "sociology",
      
      # multidisciplinary science journals
      journal == "Nature" | journal == "PNAS" | journal == "Science" | journal == "Social science research" ~ "multidisciplinary science",
      
      # entrepreneurship journals
      journal == "ETP" | journal == "JBV" | journal == "Journal of small business management" ~ "entrepreneurship",
      
      # psychology journals
      journal == "PSPB" | journal == "JESP" | journal == "Psyc review" | journal == "Psyc science" |
      journal == "JPSP" | journal == "JAP" | journal == "Social psyc quarterly" | journal == "Consumer research" ~ "psychology",
      
      # reviews
      journal == "Annual review" | journal == "book chapter" ~ "reviews",
      
      # economics
      journal == "Journal of political economy" ~ "economics"
  )) -> reading

Let’s see what the data looks like now.

set.seed(09011996)
reading %>% 
  sample_n(5) %>% 
  kbl_2()
citation abstract topic first_author year decade authors title journal field
Sgourev, S. V. (2013). How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation. Organization Science, 24(6), 1601-1617. In structural analyses of innovation, one substantive question looms large: What makes radical innovation possible if peripheral actors are more likely to originate radical ideas but are poorly positioned to promote them? An inductive study of the rise of Cubism, a revolutionary paradigm that overthrew classic principles of representation in art, results in a model where not only the periphery moves toward the core through collective action, as typically asserted, but the core also moves toward the periphery, becoming more receptive to radical ideas. The fragmentation of the art market in early 20th-century Paris served as the trigger. The proliferation of market niches and growing ambiguity over evaluation standards dramatically reduced the costs of experimentation in the periphery and the ability of the core to suppress radical ideas. A multilevel analysis linking individual creativity, peer networks, and the art field reveals how market developments fostered Spanish Cubist Pablo Picasso’s experiments and facilitated their diffusion in the absence of public support, a coherent movement, and even his active involvement. If past research attests to the importance of framing innovations and mobilizing resources in their support, this study brings attention to shifts in the structure of opportunities to do so. Novelty reception Sgourev 2013 2013 2010s Sgourev, S. V. 2013 How Paris gave rise to Cubism (and Picasso): Ambiguity and fragmentation in radical innovation Org Sci management
Mannucci, P. V., & Perry-Smith, J. E. (2021). “Who are you going to call?” Network activation in creative idea generation and elaboration. Academy of Management Journal, (ja). Considering creativity as a journey beyond idea generation, scholars have theorized that different ties are beneficial in different phases. As individuals usually possess different types of ties, selecting the optimal ties in each phase and changing ties as needed are central activities for creative success. We identify the types of ties (weak or strong) that are helpful in idea generation and idea elaboration, and given this understanding, whether individuals activate ties in each phase accordingly. In an experimental study of individuals conversing with their ties, we provide evidence of the causal effects of weak and strong ties on idea generation and idea elaboration. We also find that individuals do not always activate ties optimally and identify network size and risk as barriers. Our results in a series of studies reveal that individuals with large networks, despite providing more opportunity to activate both strong and weak ties, activate fewer weak ties and are less likely to switch ties across phases than individuals with smaller networks, particularly when creativity is perceived as a high-risk endeavor. Finally, we find that activating the wrong ties leads to either dropping creative ideas or pursuing uncreative ones. Novelty reception Mannucci 2021 2021 2020s Mannucci, P. V., & Perry-Smith, J. E. 2021 “Who are you going to call?” Network activation in creative idea generation and elaboration AMJ management
DiMaggio, P., & Garip, F. (2012). Network effects and social inequality. Annual review of sociology, 38, 93-118. Students of social inequality have noted the presence of mechanisms militating toward cumulative advantage and increasing inequality. Social scientists have established that individuals’ choices are influenced by those of their network peers in many social domains. We suggest that the ubiquity of network effects and tendencies toward cumulative advantage are related. Inequality is exacerbated when effects of individual differences are multiplied by social networks: when persons must decide whether to adopt beneficial practices; when network externalities, social learning, or normative pressures influence adoption decisions; and when networks are homophilous with respect to individual characteristics that predict such decisions. We review evidence from literatures on network effects on technology, labor markets, education, demography, and health; identify several mechanisms through which networks may generate higher levels of inequality than one would expect based on differences in initial endowments alone; consider cases in which network effects may ameliorate inequality; and describe research priorities. Network and gender DiMaggio 2012 2012 2010s DiMaggio, P., & Garip, F. 2012 Network effects and social inequality Annual review reviews
McDonald, S., & Elder Jr, G. H. (2006). When does social capital matter? Non-searching for jobs across the life course. Social Forces, 85(1), 521-549. Non-searchers - people who get their jobs without engaging in a job search - are often excluded from investigations of the role of personal relationships in job finding processes. This practice fails to capture the scope of informal job matching activity and underestimates the effectiveness of social capital. Moreover, studies typically obtain average estimates of social capital effectiveness across broad age ranges, obscuring variation across the life course. Analysis of early career and mid-career job matching shows that non-searching is associated with significant advantages over formal job searching. However, these benefits accrue only during mid-career and primarily among highly experienced male non-searchers. The results highlight the need to examine life course variations in social capital effectiveness and the role of non-searching as an important informal mechanism in the maintenance of gender inequality. Network and gender McDonald 2006 2006 2000s McDonald, S., & Elder Jr, G. H. 2006 When does social capital matter? Non-searching for jobs across the life course Soc forces sociology
Berg, J. M. (2016). Balancing on the creative highwire: Forecasting the success of novel ideas in organizations. Administrative Science Quarterly, 61(3), 433-468. Betting on the most promising new ideas is key to creativity and innovation in organizations, but predicting the success of novel ideas can be difficult. To select the best ideas, creators and managers must excel at creative forecasting, the skill of predicting the outcomes of new ideas. Using both a field study of 339 professionals in the circus arts industry and a lab experiment, I examine the conditions for accurate creative forecasting, focusing on the effect of creators’ and managers’ roles. In the field study, creators and managers forecasted the success of new circus acts with audiences, and the accuracy of these forecasts was assessed using data from 13,248 audience members. Results suggest that creators were more accurate than managers when forecasting about others’ novel ideas, but not their own. This advantage over managers was undermined when creators previously had poor ideas that were successful in the marketplace anyway. Results from the lab experiment show that creators’ advantage over managers in predicting success may be tied to the emphasis on both divergent thinking (idea generation) and convergent thinking (idea evaluation) in the creator role, while the manager role emphasizes only convergent thinking. These studies highlight that creative forecasting is a critical bridge linking creativity and innovation, shed light on the importance of roles in creative forecasting, and advance theory on why creative success is difficult to sustain over time. Novelty reception Berg 2016 2016 2010s Berg, J. M. 2016 Balancing on the creative highwire: Forecasting the success of novel ideas in organizations ASQ management

We can graph the number of papers in my list that are published in journals across different fields.

reading %>% 
  count(field, sort = T) %>% 
  ggplot(aes(forcats::fct_reorder(field, n), n, fill = field)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Number of papers",
       x = NULL,
       title = "How many papers were published in field-specific journals?") +
  theme(plot.title = element_text(size = 13))

Let’s see if these numbers differ across two of my comps themes: novelty reception and network and gender.

reading %>% 
  group_by(topic) %>% 
  count(field, sort = T) %>% 
  ungroup()  %>% 
  mutate(field_1 = field,
         field = reorder_within(field, n, topic)
         ) %>% 
  ggplot(aes(field, n, fill = field_1)) +
  geom_col(show.legend = FALSE) +
  scale_x_reordered() +
  facet_wrap(~ topic, scales = "free_y") +
  coord_flip() +
  scale_y_continuous(expand = c(0,0)) +
  labs(y = "Number of papers",
       x = NULL,
       title = "How many papers were published in field-specific journals?") +
  theme(plot.title = element_text(size = 13))

It seems that I have a lot of papers published in management journals in my reading list, which makes sense since I study how people think and behave at work. Two other types of journals that I also read quite a lot is sociology and psychology journals, although sociology papers here are mostly about network and gender, whereas psychology papers are mostly about novelty reception.

Methodology

Finally, we can categorize a paper by its methodology, specially whether it is a theoretical, empirical, or review paper by the journal it is published in. For example, I know that the papers in my reading list that are published in certain journals such as Academy of Management Annals are review papers, papers published in Academy of Management Review are theory papers, etc. Based on our own knowledge of the journals in the list, we can create a variable for the methodology of each paper.

reading %>% 
  mutate(method = case_when(
    # review papers
    journal == "Annals" | journal == "book chapter" | journal == "Annual review" | journal == "JoM" |str_detect(title, 'distinctiveness') ~ "review",
    
    # theory papers             
    journal == "AMR" ~ "theory",
    
    # empirical paper            
    TRUE ~ "empirical")) -> reading

Once, this is done, let’s count how many papers in my reading list that are review, theory, and empirical papers.

reading %>% 
  count(method) %>% 
  ggplot(aes(forcats::fct_reorder(method, -n), n, fill = method)) +
  geom_col(show.legend = FALSE) +
  labs(y = "Number of papers",
       x = NULL,
       title = "How many papers are empirical, theory, and review papers?") +
  scale_fill_manual(values = c("#F9A12EFF", "#FC766AFF", "#9B4A97FF")) +
  theme(plot.title = element_text(size = 13))

Let’s see if this differs across my two comps themes: novelty reception and network and gender.

reading %>% 
  group_by(topic) %>% 
  count(method, sort = T) %>% 
  ungroup() %>% 
  mutate(method = reorder(method, -n)) %>%
  ggplot(aes(method, n, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic) +
  scale_fill_manual(values = c("#F9A12EFF", "#FC766AFF", "#9B4A97FF")) +
  labs(x = NULL, 
       y = "Number of papers",
       title = "How many papers are theoretical, empirical, and reviews?") 

The graphs show that most of my readings are empirical papers, but theoretical papers are quite rare. This makes sense since I am more interested in data and empirics than theory and this has apparently influenced which papers I chose to read for my research and comprehensive exam.

Now that we have created additional variables for each paper, we can move on to do interesting text analysis techniques. Before that, let’s create a unique ID for each paper and save the data for future analyses.

reading %>% mutate(
  # create id for each paper based on the row number
  id = row_number()) -> reading

saveRDS(reading, "reading.rds")

In future posts, I will demonstrate how we can find the most frequent words or phrases in the paper abstracts, as well as the words that are unique to some group of papers rather than others. In addition, I will show how we can use the unsupervised machine learning technique Topic Modeling to find the common themes across the papers based on the words the authors use in the abstracts.