Querying the New York Times Article Search API in R

Heather Geiger

Introduction - general idea of project

Our goal here is to query the New York Times article search API, and get a data frame with the information from the result.

First, we sign up for an API key here (http://developer.nytimes.com/signup), and save our key in a file.

Then, we can use this key to run our API requests right from R.

Finally, we use the jsonlite library plus the tidyverse to convert from JSON to data frame.

I’ll be using a modified version of a lot of the code here:

http://www.storybench.org/working-with-the-new-york-times-api-in-r/

Our specific question

In this analysis, we are going to look for New York Times articles about Facebook from around the time the Cambridge Analytica scandal broke.

We’ll also look for news about Facebook from around the time of another scandal of theirs, when news broke in 2014 that they had performed psychological experiments on users (https://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html).

Load libraries.

library(jsonlite)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(stringr)

Querying the API and making a data frame for each time range

In both cases, we’ll keep it simple with a basic search for “facebook”.

This article (http://www.straitstimes.com/world/united-states/timeline-of-facebook-cambridge-analytica-scandal) puts the start of the Cambridge Analytica scandal at March 17, 2018, so we’ll go from there to the date of this report (March 26, 2018).

The actual PNAS article about the Facebook experiment is dated June 17, 2014 (http://www.pnas.org/content/111/24/8788.full).

However, most articles I found when I Googled it were from later in June.

Let’s look from June 18 to July 6, 2014 and see what we get.

Run through all search results page by page (they come 10 at a time), and save each page’s results in a list object.

Save when complete, so if run this code again can use the saved data instead of querying the API again.

NYTIMES_KEY <- readLines("nytimes_api_key.txt")

term <- "facebook"
begin_date <- "20140618"
end_date <- "20140706"

baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,
                  "&begin_date=",begin_date,"&end_date=",end_date,
                  "&facet_filter=true&api-key=",NYTIMES_KEY, sep="")

initialQuery <- fromJSON(baseurl)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1) 

pages_2014 <- vector("list",length=maxPages)

for(i in 0:maxPages){
    nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame() 
    pages_2014[[i+1]] <- nytSearch 
    Sys.sleep(5) #I was getting errors more often when I waited only 1 second between calls. 5 seconds seems to work better.
}
facebook_2014_articles <- rbind_pages(pages_2014)

save(facebook_2014_articles,file="facebook_2014_articles.Rdata")
term <- "facebook"
begin_date <- "20180317"
end_date <- "20180326"

baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,
                  "&begin_date=",begin_date,"&end_date=",end_date,
                  "&facet_filter=true&api-key=",NYTIMES_KEY, sep="")

initialQuery <- fromJSON(baseurl)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1) 

pages_2018 <- vector("list",length=maxPages)

for(i in 0:maxPages){
    nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame()
    pages_2018[[i+1]] <- nytSearch
    Sys.sleep(5)
}
facebook_2018_articles <- rbind_pages(pages_2018)

save(facebook_2018_articles,file="facebook_2018_articles.Rdata")

Exploring and cleaning data frames

First, let’s replace the column names of both data frames with something a little shorter and more readable.

colnames(facebook_2014_articles) <- str_replace(colnames(facebook_2014_articles),
                pattern='response\\.',replace='')
colnames(facebook_2014_articles) <- str_replace(colnames(facebook_2014_articles),
                pattern='docs\\.',replace='')

colnames(facebook_2018_articles) <- str_replace(colnames(facebook_2018_articles),
                pattern='response\\.',replace='')
colnames(facebook_2018_articles) <- str_replace(colnames(facebook_2018_articles),
                pattern='docs\\.',replace='')

colnames(facebook_2014_articles)
##  [1] "status"                  "copyright"              
##  [3] "web_url"                 "snippet"                
##  [5] "print_page"              "source"                 
##  [7] "multimedia"              "keywords"               
##  [9] "pub_date"                "document_type"          
## [11] "new_desk"                "type_of_material"       
## [13] "_id"                     "word_count"             
## [15] "score"                   "uri"                    
## [17] "section_name"            "abstract"               
## [19] "headline.main"           "headline.kicker"        
## [21] "headline.content_kicker" "headline.print_headline"
## [23] "headline.name"           "headline.seo"           
## [25] "headline.sub"            "byline.original"        
## [27] "byline.person"           "byline.organization"    
## [29] "meta.hits"               "meta.offset"            
## [31] "meta.time"

Remove some columns that are always/almost always the same values, or that don’t provide much useful information.

Some of these may be specific to this data set. For example, “print_page” may be useful for a legacy data set, but for a recent data set I don’t think it will be very useful. Similar idea for “headline.print_headline”.

The “multimedia” column just provides information like the name of files for any associated pictures, plus size in pixels, etc.

While images or videos can be an important part of an article, we are not given enough information here to be able to get anything useful out of these.

Finally, in 2014 there was an “abstract” column, which is not there in 2018. This is probably mostly similar to the “snippet” column present in both data sets. Let’s set this aside just in case we want it later, but remove it from the main data frame for now.

columns_to_remove <- c("status","copyright","print_page","source","_id","uri","headline.print_headline","headline.name","headline.seo","headline.sub",grep('meta',colnames(facebook_2014_articles),value=TRUE),"abstract","multimedia")
columns_to_keep <- setdiff(colnames(facebook_2014_articles),columns_to_remove)

facebook_2014_abstracts <- as.vector(facebook_2014_articles$abstract)

facebook_2014_articles <- facebook_2014_articles %>% select(columns_to_keep)
facebook_2018_articles <- facebook_2018_articles %>% select(columns_to_keep)

Let’s look at the class of each variable. If any of the remaining columns are lists, we’ll see if we can convert to something simpler.

for(i in 1:ncol(facebook_2014_articles))
{
if(class(facebook_2014_articles[,i]) == "list"){
    print(colnames(facebook_2014_articles)[i])
    head(facebook_2014_articles[,i],n=3)
    }
}
## [1] "keywords"
## [1] "byline.person"

Looks like the “keywords” and “byline.person” columns have a data frame for each article.

Both of these provide information that could be useful depending on circumstances.

However, I don’t think we’ll be looking in detail at specific authors for this analysis, so let’s just remove byline.person.

Then, let’s work on collapsing keywords into one data frame for all articles.

facebook_2014_articles <- facebook_2014_articles %>% select(setdiff(colnames(facebook_2014_articles),"byline.person"))
facebook_2018_articles <- facebook_2018_articles %>% select(setdiff(colnames(facebook_2018_articles),"byline.person"))
num_rows_keywords <- unlist(lapply(facebook_2014_articles$keywords,function(x)nrow(x)))

for(article_num in which(num_rows_keywords > 0)){
this_article_keywords <- data.frame(Article.index = article_num,
        facebook_2014_articles$keywords[[article_num]],
        stringsAsFactors=FALSE)
if(exists("facebook_2014_keywords_collapsed") == FALSE){facebook_2014_keywords_collapsed <- this_article_keywords;next}
facebook_2014_keywords_collapsed <- rbind(facebook_2014_keywords_collapsed,this_article_keywords)
}

head(facebook_2014_keywords_collapsed);tail(facebook_2014_keywords_collapsed)
##   Article.index          name                                   value rank
## 1             1       subject                            Social Media    1
## 2             1       subject                                Research    2
## 3             1 organizations                            Facebook Inc    3
## 4             1       subject                                Emotions    4
## 5             1 organizations University of California, San Francisco    5
## 6             1 organizations                      Cornell University    6
##   major
## 1     N
## 2     N
## 3     N
## 4     N
## 5     N
## 6     N
##      Article.index    name                    value rank major
## 1368           290 subject Recession and Depression    2     N
## 1369           290 subject               Child Care    3     N
## 1370           290 persons         Taylor, Shanesha    4     N
## 1371           290 subject  Child Abuse and Neglect    5     N
## 1372           290 subject             Unemployment    6     N
## 1373           290 subject    United States Economy    7     N
class_keywords <- unlist(lapply(facebook_2018_articles$keywords,function(x)class(x)))
num_rows_keywords <- vector("numeric",length=nrow(facebook_2018_articles))
num_rows_keywords[which(class_keywords == "data.frame")] <- unlist(lapply(facebook_2018_articles$keywords,function(x)nrow(x)))

for(article_num in which(num_rows_keywords > 0)){
this_article_keywords <- data.frame(Article.index = article_num,
        facebook_2018_articles$keywords[[article_num]],
        stringsAsFactors=FALSE)
if(exists("facebook_2018_keywords_collapsed") == FALSE){facebook_2018_keywords_collapsed <- this_article_keywords;next}
facebook_2018_keywords_collapsed <- rbind(facebook_2018_keywords_collapsed,this_article_keywords)
}

head(facebook_2018_keywords_collapsed);tail(facebook_2018_keywords_collapsed)
##   Article.index          name                              value rank
## 1             1       subject         Computers and the Internet    1
## 2             1       subject                       Social Media    2
## 3             1       subject Data-Mining and Database Marketing    3
## 4             1 organizations                Cambridge Analytica    4
## 5             1 organizations                       Facebook Inc    5
## 6             1       persons                   Kogan, Aleksandr    6
##   major
## 1     N
## 2     N
## 3     N
## 4     N
## 5     N
## 6     N
##     Article.index          name                     value rank major
## 938           476       subject           Women and Girls    2     N
## 939           476       subject         Sexual Harassment    3     N
## 940           476       persons    MacKinnon, Catharine A    4     N
## 941           476       persons         Carlson, Gretchen    5     N
## 942           476 organizations Miss America Organization    6     N
## 943           476 organizations          Fox News Channel    7     N

Now, let’s remove keywords column from both data frames, before combining the two data frames.

Also combine the two “keywords_collapsed” data frames.

facebook_2014_articles <- facebook_2014_articles %>% select(setdiff(colnames(facebook_2014_articles),"keywords"))
facebook_2018_articles <- facebook_2018_articles %>% select(setdiff(colnames(facebook_2018_articles),"keywords"))

head(facebook_2014_articles);tail(facebook_2014_articles)
##                                                                                                             web_url
## 1            https://www.nytimes.com/2014/07/01/opinion/jaron-lanier-on-lack-of-transparency-in-facebook-study.html
## 2 https://www.nytimes.com/video/technology/personaltech/100000002975706/edit-your-facebook-and-twitter-history.html
## 3        https://www.nytimes.com/2014/06/26/technology/personaltech/finding-old-posts-on-the-facebook-timeline.html
## 4                         https://bits.blogs.nytimes.com/2014/06/30/facebook-says-its-sorry-weve-heard-that-before/
## 5                       https://bits.blogs.nytimes.com/2014/06/19/facebook-service-restored-after-worldwide-outage/
## 6                   https://boss.blogs.nytimes.com/2014/06/23/when-advertising-on-facebook-can-be-a-waste-of-money/
##                                                                                                                                                                       snippet
## 1                                                                                                              As guinea pigs, we deserve to know what researchers are doing.
## 2                                                                                            Molly Wood explains how to download and delete activity on Facebook and Twitter.
## 3                                                                                    Plus, how to paste text from a web page without also getting all the strange formatting.
## 4                            Facebook apologized for its study of how people’s emotions are affected by social media posts. And it’s not the first mea culpa for the company.
## 5 Users were unable to log into their accounts for more than a half-hour, though Facebook said later it had “resolved the issue quickly, and we are now back to 100 percent.”
## 6 If a business buys ads from Facebook and the ads generate likes from people or accounts who aren’t really interested in the business, advertising dollars have been wasted.
##                   pub_date document_type                   new_desk
## 1 2014-06-30T23:30:04+0000       article                       OpEd
## 2     2014-07-02T17:28:43Z    multimedia Technology / Personal Tech
## 3     2014-06-26T00:00:00Z       article                   Business
## 4     2014-06-30T16:06:45Z      blogpost                   Business
## 5     2014-06-19T05:16:30Z      blogpost                   Business
## 6     2014-06-23T14:00:31Z      blogpost                   Business
##   type_of_material word_count        score   section_name
## 1            Op-Ed        861 8.349110e-05           <NA>
## 2            Video         13 7.497751e-05  Personal Tech
## 3         Question        608 7.035457e-05  Personal Tech
## 4             Blog        670 6.946501e-05           <NA>
## 5             Blog        460 6.899050e-05           <NA>
## 6             Blog       1369 6.894341e-05 Small Business
##                                          headline.main
## 1                    Should Facebook Manipulate Users?
## 2               Edit Your Facebook and Twitter History
## 3           Finding Old Posts on the Facebook Timeline
## 4   Facebook Says It's Sorry. We've Heard That Before.
## 5     Facebook Service Restored After Worldwide Cutoff
## 6 When Advertising on Facebook Can Be a Waste of Money
##        headline.kicker headline.content_kicker
## 1    Op-Ed Contributor       Op-Ed Contributor
## 2                 <NA>                    <NA>
## 3                  Q&A                     Q&A
## 4                 Bits                    <NA>
## 5                 Bits                    <NA>
## 6 You&#039;re the Boss                    <NA>
##                                      byline.original byline.organization
## 1                                    By JARON LANIER                <NA>
## 2 Vanessa Perez, Rebekah Fergusson and Jason Blalock                <NA>
## 3                               By J. D. BIERSDORFER                <NA>
## 4                                      By MIKE ISAAC                <NA>
## 5                      By MARK SCOTT and DAVID JOLLY                <NA>
## 6                                By EILENE ZIMMERMAN                <NA>
##                                                                                                     web_url
## 285 https://www.nytimes.com/2014/07/06/us/foreign-couples-heading-to-america-for-surrogate-pregnancies.html
## 286                   https://www.nytimes.com/2014/07/02/us/many-sharp-turns-in-bergdahls-path-to-army.html
## 287                  https://sports.blogs.nytimes.com/2014/06/24/italy-vs-uruguay-world-cup-2014-live-blog/
## 288             https://www.nytimes.com/2014/06/22/travel/unplugging-in-the-unofficial-capital-of-yoga.html
## 289              https://thecaucus.blogs.nytimes.com/2014/06/24/updates-from-the-mississippi-senate-runoff/
## 290                         https://www.nytimes.com/2014/06/22/business/a-job-seekers-desperate-choice.html
##                                                                                                                                                                                                      snippet
## 285 Foreign citizens now make up most of the clients at large surrogacy agencies in the United States, highlighting a divide between the country and much of the world over fundamental questions on family.
## 286                                           People who knew Sgt. Bowe Bergdahl in Idaho paint a fairly consistent portrait: hard-working and socially awkward, full of restless energy and romantic plans.
## 287                             These teams might have expected to be fighting for first place in the group in this game. Instead they both lost to the astounding Costa Rica and are fighting for survival.
## 288                                                                                           With her cellphone turned off, the author journeys to Rishikesh, India, where some people think yoga was born.
## 289            Voters head to the polls Tuesday in a Republican Senate primary runoff between Senator Thad Cochran, a six-term incumbent, and State Senator Chris McDaniel, his Tea Party-backed challenger.
## 290                                                          The story of Shanesha Taylor, a mother who had a job interview but was unable to find child care, shows the harsh realities of today’s economy.
##                     pub_date document_type       new_desk type_of_material
## 285 2014-07-05T21:11:20+0000       article       National             News
## 286 2014-07-02T00:30:59+0000       article       National             News
## 287     2014-06-24T09:57:38Z      blogpost         Sports             Blog
## 288     2014-06-22T00:00:00Z       article         Travel             News
## 289     2014-06-24T09:50:43Z      blogpost       National             Blog
## 290 2014-06-21T15:34:43+0000       article SundayBusiness             News
##     word_count        score section_name
## 285       3375 6.217635e-08         <NA>
## 286       2715 6.101015e-08         <NA>
## 287       2937 5.718930e-08         <NA>
## 288       2728 5.600999e-08         <NA>
## 289       2806 5.497112e-08     Politics
## 290       3085 5.392906e-08         <NA>
##                                              headline.main
## 285          Coming to U.S. for Baby, and Womb to Carry It
## 286            Many Sharp Turns in Bergdahl’s Path to Army
## 287            Italy vs. Uruguay: World Cup 2014 Live Blog
## 288           Unplugging in the Unofficial Capital of Yoga
## 289 Highlights From the Mississippi Senate Primary Runoff 
## 290                        A Job Seeker’s Desperate Choice
##       headline.kicker headline.content_kicker
## 285 Pregnancy for Pay       Pregnancy for Pay
## 286                                          
## 287            Sports                    <NA>
## 288          Pursuits                Pursuits
## 289        The Caucus                    <NA>
## 290                                          
##                     byline.original byline.organization
## 285                  By TAMAR LEWIN                <NA>
## 286 By KIRK JOHNSON and MATT FURBER                <NA>
## 287               By JEFFREY MARCUS                <NA>
## 288                   By MARY PILON                <NA>
## 289           By THE NEW YORK TIMES  The New York Times
## 290                 By SHAILA DEWAN                <NA>
head(facebook_2018_articles);tail(facebook_2018_articles)
##                                                                                                                   web_url
## 1                                                https://www.nytimes.com/2018/03/19/technology/facebook-data-sharing.html
## 2                             https://www.nytimes.com/2018/03/24/opinion/sunday/delete-facebook-does-not-fix-problem.html
## 3                                                 https://www.nytimes.com/2018/03/21/opinion/facebook-trump-election.html
## 4                                                 https://www.nytimes.com/2018/03/19/opinion/facebook-privacy-breach.html
## 5                                    https://www.nytimes.com/2018/03/19/opinion/facebook-cambridge-analytica-privacy.html
## 6 https://www.nytimes.com/video/technology/100000005811544/why-leaving-facebook-doesnt-always-mean-quitting-facebook.html
##                                                                                                                                                                                                                                                      snippet
## 1                                                                                                              Sure, third-party Facebook apps collected data about users’ lives. But they seemed convenient and harmless, and, really, what could go wrong?
## 2                                                                                                                                                          Getting rid of your Facebook account will only offload the platform’s problems onto someone else.
## 3                                                                                                                                                                                                    Was it the reason Trump won? That’s the wrong question.
## 4                                                                                                                                                                                                    Readers suggest ways of preventing the next occurrence.
## 5                                                                                                         After learning how advisers to Donald Trump exploited the company’s vulnerabilities to get him elected, Congress needs to strengthen privacy laws.
## 6 In the wake of the Cambridge Analytica scandal, in which data from over 50 million Facebook profiles was secretly scraped and mined for voter insights, many Facebook users have decided to delete their accounts — but untangling yourself from a site...
##                   pub_date document_type  new_desk type_of_material
## 1 2018-03-19T23:46:17+0000       article  Business             News
## 2 2018-03-24T18:22:27+0000       article      OpEd            Op-Ed
## 3 2018-03-21T11:57:10+0000       article      OpEd            Op-Ed
## 4 2018-03-19T16:41:45+0000       article   Letters           Letter
## 5 2018-03-20T01:26:51+0000       article Editorial        Editorial
## 6 2018-03-22T02:11:15+0000    multimedia      <NA>            Video
##   word_count     score  section_name
## 1       1034 1.6520802          <NA>
## 2       1031 1.4591359 Sunday Review
## 3        522 1.1376616          <NA>
## 4        145 1.1008341          <NA>
## 5        681 0.9324561          <NA>
## 6        421 0.8981754          <NA>
##                                          headline.main headline.kicker
## 1 How Facebook’s Data Sharing Went From Feature to Bug       The Shift
## 2        Don’t Delete Facebook. Do Something About It.         Opinion
## 3                              Facebook Doesn’t Get It Op-Ed Columnist
## 4                                   Crisis at Facebook         Letters
## 5        Facebook Leaves Its Users’ Privacy Vulnerable       Editorial
## 6    Why Leaving Facebook Doesn’t Always Mean Quitting            <NA>
##   headline.content_kicker
## 1               The Shift
## 2                 Opinion
## 3         Op-Ed Columnist
## 4                 Letters
## 5               Editorial
## 6                    <NA>
##                                          byline.original
## 1                                         By KEVIN ROOSE
## 2                                  By SIVA VAIDHYANATHAN
## 3                                     By DAVID LEONHARDT
## 4                                                   <NA>
## 5                                 By THE EDITORIAL BOARD
## 6 By AINARA TIEFENTHÄLER, DEBORAH ACOSTA and ROBIN STEIN
##   byline.organization
## 1                <NA>
## 2                <NA>
## 3                <NA>
## 4                <NA>
## 5 THE EDITORIAL BOARD
## 6                <NA>
##                                                                                   web_url
## 471 https://www.nytimes.com/2018/03/24/world/europe/sweden-gender-neutral-preschools.html
## 472                   https://www.nytimes.com/2018/03/19/dining/maryland-stuffed-ham.html
## 473                      https://www.nytimes.com/2018/03/19/style/hashtag-open-house.html
## 474        https://www.nytimes.com/2018/03/24/world/asia/afghanistan-kabul-terrorism.html
## 475       https://www.nytimes.com/2018/03/23/us/sexual-harassment-workplace-response.html
## 476 https://www.nytimes.com/2018/03/17/business/catharine-mackinnon-gretchen-carlson.html
##                                                                                                                                                                               snippet
## 471               The state curriculum urges teachers to “counteract traditional gender roles,” and at one school, girls are encouraged to shout “No!” and boys run the play kitchen.
## 472                                                                   It’s a lot of work, but this traditional dish remains one of America’s most regional, and revered, specialties.
## 473                                        Influencers in Los Angeles and New York are saving us from dreary old open houses. Maybe you can get some cheap rent out of your Snapchat?
## 474             A Times reporter in Kabul has experienced eight suicide bombings since 2016, seven in the last year. Sick of it, she still goes to the scene. “If I don’t, who will?”
## 475                                       Corporations, entrepreneurs and lawmakers are stepping up efforts to prevent sexual harassment and expand worker protections. Is it enough?
## 476 The two discuss sexual harassment in the workplace, how to change corporate culture in meaningful and sustained ways, and whether Miss America can be relevant in the #MeToo era.
##                     pub_date document_type       new_desk type_of_material
## 471 2018-03-24T09:30:10+0000       article        Foreign             News
## 472 2018-03-19T15:56:05+0000       article         Dining             News
## 473 2018-03-19T09:00:01+0000       article         Styles             News
## 474 2018-03-24T07:00:05+0000       article        Insider             News
## 475 2018-03-23T09:00:07+0000       article  Investigative             News
## 476 2018-03-17T09:00:08+0000       article SundayBusiness             News
##     word_count        score section_name
## 471       1785 0.0010275886       Europe
## 472       1869 0.0010267762         <NA>
## 473       1796 0.0010259928         <NA>
## 474       1607 0.0010162268         <NA>
## 475       2070 0.0009004306         <NA>
## 476       3102 0.0007119151         <NA>
##                                                           headline.main
## 471 In Sweden’s Preschools, Boys Learn to Dance and Girls Learn to Yell
## 472             In This Corner of Maryland, Holidays Mean a Stuffed Ham
## 473                                                  Hashtag Open House
## 474                     This Is What I Do When I Hear the Bombs Explode
## 475      #MeToo Called for an Overhaul. Are Workplaces Really Changing?
## 476   Catharine MacKinnon and Gretchen Carlson Have a Few Things to Say
##     headline.kicker headline.content_kicker    byline.original
## 471            <NA>                    <NA>     By ELLEN BARRY
## 472            <NA>                    <NA>    By KIM SEVERSON
## 473            <NA>                    <NA> By CANDACE JACKSON
## 474            <NA>                    <NA>    By FATIMA FAIZI
## 475            <NA>                    <NA>     By JODI KANTOR
## 476 Table for Three         Table for Three  By PHILIP GALANES
##     byline.organization
## 471                <NA>
## 472                <NA>
## 473                <NA>
## 474                <NA>
## 475                <NA>
## 476                <NA>
facebook_2014_articles <- data.frame(Targeted.topic = "2014 Facebook psychological experiment scandal",
            Article.index = 1:nrow(facebook_2014_articles),
            facebook_2014_articles,
            stringsAsFactors=FALSE)

facebook_2018_articles <- data.frame(Targeted.topic = "2018 Facebook Cambridge Analytica scandal",
            Article.index = 1:nrow(facebook_2018_articles),
            facebook_2018_articles,
            stringsAsFactors=FALSE)

facebook_2014_vs_2018_scandal_articles <- rbind(facebook_2014_articles,facebook_2018_articles)

facebook_2014_keywords_collapsed <- data.frame(Targeted.topic = "2014 Facebook psychological experiment scandal",
            facebook_2014_keywords_collapsed,
            stringsAsFactors=FALSE)

facebook_2018_keywords_collapsed <- data.frame(Targeted.topic = "2018 Facebook Cambridge Analytica scandal",
            facebook_2018_keywords_collapsed,
            stringsAsFactors=FALSE)

facebook_2014_vs_2018_keywords <- rbind(facebook_2014_keywords_collapsed,facebook_2018_keywords_collapsed)

Based on the head and tail results, we see that some articles are probably not really what we were looking for.

Let’s filter for articles where value = “Facebook Inc” in the keywords table.

articles_matching_keywords_info <- facebook_2014_vs_2018_keywords %>% filter(value == "Facebook Inc") %>% select(c("Targeted.topic","Article.index"))
articles_matching_keywords_info <- articles_matching_keywords_info[!duplicated(articles_matching_keywords_info),]

facebook_2014_vs_2018_scandal_articles <- merge(facebook_2014_vs_2018_scandal_articles,articles_matching_keywords_info,by=c("Targeted.topic","Article.index"))

How many articles are we left with for each time period to be studied?

table(facebook_2014_vs_2018_scandal_articles$Targeted.topic)
## 
## 2014 Facebook psychological experiment scandal 
##                                             24 
##      2018 Facebook Cambridge Analytica scandal 
##                                             47

Not really enough to do as much detailed analysis as I thought we might be able to do.

Let’s try and convert the pub_date format to an actual date format.

facebook_2014_vs_2018_scandal_articles$pub_date <- as.Date(substr(facebook_2014_vs_2018_scandal_articles$pub_date,1,10))
facebook_2014_vs_2018_scandal_articles <- facebook_2014_vs_2018_scandal_articles %>% arrange(pub_date)

Let’s show some of the information we might be most interested in for each topic.

Then I think we are good to save for now, keeping the data ready to analyze another time.

facebook_2014_vs_2018_scandal_articles %>% 
filter(Targeted.topic == "2014 Facebook psychological experiment scandal") %>%
select(c("pub_date","snippet","headline.main"))
##      pub_date
## 1  2014-06-18
## 2  2014-06-19
## 3  2014-06-21
## 4  2014-06-23
## 5  2014-06-23
## 6  2014-06-25
## 7  2014-06-26
## 8  2014-06-26
## 9  2014-06-27
## 10 2014-06-27
## 11 2014-06-27
## 12 2014-06-28
## 13 2014-06-29
## 14 2014-06-30
## 15 2014-06-30
## 16 2014-06-30
## 17 2014-07-01
## 18 2014-07-02
## 19 2014-07-02
## 20 2014-07-03
## 21 2014-07-03
## 22 2014-07-03
## 23 2014-07-03
## 24 2014-07-06
##                                                                                                                                                                                                                                               snippet
## 1                                                                       The social media giant announced that it had created a new kind of computer networking switch, potentially capable of shifting data rapidly through the largest data centers.
## 2                                                                         Users were unable to log into their accounts for more than a half-hour, though Facebook said later it had “resolved the issue quickly, and we are now back to 100 percent.”
## 3  Social Sweepster, a new service, says it can scan photos for telltale signs of youthful indiscretions, like red party cups. It joins several services trying to help people erase evidence of behavior prospective employers may not find amusing.
## 4                                                                                    The Breakthrough Prize in Mathematics, financed by Yuri Milner, a Russian investor, and Mark Zuckerberg, the founder of Facebook, comes with a $3 million award.
## 5                                                                         If a business buys ads from Facebook and the ads generate likes from people or accounts who aren’t really interested in the business, advertising dollars have been wasted.
## 6                                                                            The social networking company disclosed that 31 percent of its workers globally are women. In the United States, Facebook’s management is overwhelmingly white and male.
## 7                   Facebook has argued that Manhattan prosecutors violated the constitutional rights of its users last year by demanding the nearly complete account data of 381 people, from pages they liked to their photos and private messages.
## 8                                                                                                                                                            Plus, how to paste text from a web page without also getting all the strange formatting.
## 9                                                                                   It is a rare product misstep for Facebook. Its Home software was supposed to turn an Android smartphone into a Facebook phone. But it never caught on with users.
## 10                                                                                      The New York district attorney’s office demanded account details of 381 people for an investigation that led to indictments on Social Security fraud charges.
## 11                                                                                      The New York district attorney’s office demanded account details of 381 people for an investigation that led to indictments on Social Security fraud charges.
## 12                                                                                                                         Instead of combing through profiles on dating sites, some people prefer making their connections on Facebook or Instagram.
## 13                                                               The Islamic State in Iraq and Syria has demonstrated modern sophistication in its adoption of social media, particularly Twitter, where its hashtags have gained jihadist followers.
## 14                                                                                                                                                                                     As guinea pigs, we deserve to know what researchers are doing.
## 15                                                                    Last week Facebook revealed that it had manipulated the news feeds of over half a million randomly selected users to change the number of positive and negative posts they saw.
## 16                                                                                                   Facebook apologized for its study of how people’s emotions are affected by social media posts. And it’s not the first mea culpa for the company.
## 17                                                                                 A man says the violent rap lyrics he posted on social media were art, but he was locked up for threatening his ex-wife. The Supreme Court will weigh in next term.
## 18                                                                The social network is facing potential investigations in Europe on whether it broke local privacy laws by manipulating the emotional content of users’ posts without their consent.
## 19                                                                                                                                                                   Molly Wood explains how to download and delete activity on Facebook and Twitter.
## 20                                                                                              Studying how social media sites are used can provide valuable insight into human behavior and may also help curb their power and potential for abuse.
## 21                                                                                                                                                             A reader responds to an Op-Ed article, &#8220;Should Facebook Manipulate Users?&#8221;
## 22                                                                                                  Those concerned about their online profiles have a range of options, from deactivating their accounts completely to limiting who sees past posts.
## 23                                                                    The Electronic Privacy Information Center said Facebook violated a consent decree with regulators by manipulating some of its users’ news feeds without their explicit consent.
## 24                                                                                                                                                               Notable quotes from business articles that appeared in The New York Times last week.
##                                                                        headline.main
## 1                                  Facebook Makes Its Own Computer Networking Switch
## 2                                   Facebook Service Restored After Worldwide Cutoff
## 3               New Offering for Job Seekers: Fewer Embarrassing Social Media Photos
## 4                            The Multimillion-Dollar Minds of 5 Mathematical Masters
## 5                               When Advertising on Facebook Can Be a Waste of Money
## 6                                 Facebook Mirrors Tech Industry's Lack of Diversity
## 7                            Facebook Case Over Search Warrants for User Information
## 8                                         Finding Old Posts on the Facebook Timeline
## 9                       What Happened to the Facebook Phone? Not Very Much, It Seems
## 10  Daily Report: Effort by Facebook to Safeguard Data From the Law Fails, For Now  
## 11                                   Forced to Hand Over Data, Facebook Files Appeal
## 12                                     Cupid&rsquo;s Arrows Fly on Social Media, Too
## 13 Iraq’s Sunni Militants Take to Social Media to Advance Their Cause and Intimidate
## 14                                                 Should Facebook Manipulate Users?
## 15    Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry
## 16                                Facebook Says It's Sorry. We've Heard That Before.
## 17               On the Next Docket: How the First Amendment Applies to Social Media
## 18      After Uproar, European Regulators Question Facebook on Psychological Testing
## 19                                            Edit Your Facebook and Twitter History
## 20                              A Bright Side to Facebook’s Experiments on Its Users
## 21                                                             Timeless Manipulators
## 22                                  Swear Off Social Media, for Good or Just for Now
## 23                    Privacy Group Complains to F.T.C. About Facebook Emotion Study
## 24                                                    The Chatter for Sunday, July 6
facebook_2014_vs_2018_scandal_articles %>% 
filter(Targeted.topic == "2018 Facebook Cambridge Analytica scandal") %>%
select(c("pub_date","snippet","headline.main"))
##      pub_date
## 1  2018-03-18
## 2  2018-03-18
## 3  2018-03-19
## 4  2018-03-19
## 5  2018-03-19
## 6  2018-03-19
## 7  2018-03-19
## 8  2018-03-19
## 9  2018-03-19
## 10 2018-03-19
## 11 2018-03-19
## 12 2018-03-20
## 13 2018-03-20
## 14 2018-03-20
## 15 2018-03-20
## 16 2018-03-20
## 17 2018-03-20
## 18 2018-03-20
## 19 2018-03-20
## 20 2018-03-20
## 21 2018-03-21
## 22 2018-03-21
## 23 2018-03-21
## 24 2018-03-21
## 25 2018-03-21
## 26 2018-03-21
## 27 2018-03-21
## 28 2018-03-21
## 29 2018-03-22
## 30 2018-03-22
## 31 2018-03-22
## 32 2018-03-22
## 33 2018-03-22
## 34 2018-03-22
## 35 2018-03-22
## 36 2018-03-22
## 37 2018-03-22
## 38 2018-03-22
## 39 2018-03-22
## 40 2018-03-22
## 41 2018-03-23
## 42 2018-03-23
## 43 2018-03-23
## 44 2018-03-23
## 45 2018-03-24
## 46 2018-03-24
## 47 2018-03-24
##                                                                                                                                                                                                                                                       snippet
## 1                                                                                   American and British lawmakers called on Facebook to explain how a political data firm tied to the Trump campaign harvested private data from more than 50 million users.
## 2                                                                                   American and British lawmakers called on Facebook to explain how a political data firm tied to the Trump campaign harvested private data from more than 50 million users.
## 3                                                                                                               Sure, third-party Facebook apps collected data about users’ lives. But they seemed convenient and harmless, and, really, what could go wrong?
## 4                                                                                                            As the social network’s list of woes grows, its 33-year-old founder, Mark Zuckerberg, will have to prove somehow he is not in way over his head.
## 5                                                                                                                                                       It’s true that the Cambridge Analytica incident wasn’t a security breach. It was something far worse.
## 6                                                                                          A political data firm tied to the Trump campaign gained access to information on 50 million Facebook users. Here is how it happened, and the uproar it has caused.
## 7                                                                                                                                 Larry Ellison is teaming up with Dr. David Agus to start a hydroponic farming firm focused on creating more healthful food.
## 8                                                                                                A European Union plan would hit Silicon Valley’s technology giants especially hard, further straining relations with the United States over taxes and trade.
## 9                                                                                                                             There are some practical solutions to safeguard some of your data, like installing software to block web tracking technologies.
## 10                                                                                                                                                                                                    Readers suggest ways of preventing the next occurrence.
## 11                                                                                                                             Shares of technology companies plunged as investors fretted that tougher government oversight could hurt the sector’s profits.
## 12                                                                                                                              How serious of an issue do you think this misuse of data is? Will it make you reconsider how you use social media in any way?
## 13                                                                                                    Alex Stamos, Facebook’s chief information security officer, who plans to leave the company by August, is known in Silicon Valley for his strong stands.
## 14                                                                                            Fed policymakers conclude their March rate-setting meeting on Wednesday. It is the first such session with Jerome H. Powell as the central bank’s new chairman.
## 15                                                                                                                                                                              Undercover video shows the president’s digital consultants acting like thugs.
## 16                                                                                          Your interest in Kim Kardashian West can tell researchers how extroverted (very), how conscientious (more than most) and how open-minded (only somewhat) you are.
## 17                                                                                                                                                                                                       We need to figure out how to avoid future tragedies.
## 18                                                                                                                The company’s board said it was suspending the chief executive, Alexander Nix, with immediate effect, pending an independent investigation.
## 19                                                                                                                                             The Federal Trade Commission is said to be examining whether the social media giant violated a 2011 agreement.
## 20                                                                                                         After learning how advisers to Donald Trump exploited the company’s vulnerabilities to get him elected, Congress needs to strengthen privacy laws.
## 21                                                                                                        Brian Acton, one of the creators of WhatsApp, sold his company to the internet giant for $19 billion. Now he’s telling people to “#deletefacebook.”
## 22                                                                                            In his first public statements concerning a scandal involving Cambridge Analytica, Mark Zuckerberg said “there’s more to do, and we need to step up and do it.”
## 23                                                                                          Amid a data scandal this week, Mr. Zuckerberg, Facebook’s chief executive, and Sheryl Sandberg, chief operating officer, have been nowhere to be found in public.
## 24                                                                                                                 How did the brains behind Cambridge Analytica, the political research firm that worked with the Trump campaign, become its whistle-blower?
## 25                                                                                                          The social network may be too large to truly quit. Our personal tech columnist answers questions from readers who are contemplating deactivation.
## 26                                                                                                                                                                                                  It was television, not Facebook, that made him president.
## 27                                                                                                                                                                                                    Was it the reason Trump won? That’s the wrong question.
## 28                                                                                           Patrons of the social network are deleting their profiles in protest over reports that the company allowed a political data firm to harvest private information.
## 29                                                                                              Five days after details about Cambridge Analytica’s data mining were made public, Mark Zuckerberg, Facebook’s chief executive, spoke with The New York Times.
## 30                                                                                              Five days after details about Cambridge Analytica’s data mining were made public, Mark Zuckerberg, Facebook’s chief executive, spoke with The New York Times.
## 31                                                                                                                                                                        Readers discuss how Cambridge Analytica provided information to the Trump campaign.
## 32                                                                                                      Mark Zuckerberg said Facebook’s reliance on advertising aligned with its mission to build a community. But what if Facebook cost $5 per month to use?
## 33                                                                                                                    Facebook’s chief executive spoke with The New York Times about data privacy of users, Cambridge Analytica and the company’s next steps.
## 34                                                                                                                                  The changes appear to address some common gripes, like how some posts can keep appearing on your feed seemingly for days.
## 35                                                                                                                                     We can blame Facebook and Cambridge Analytica for the damage they’ve done, but the responsibility lies with all of us.
## 36                                                                                                                                                                                                      Ross Trudeau takes social media to a whole new level.
## 37                                                                                                Till now investors might have thought they could take Mr. Trump’s trade pronouncements in stride. But after Thursday’s selloff, will their optimism return?
## 38                                                                                                                                                                                                      Think first before you retweet that bit of fake news.
## 39                                                                                                                                                                                                      Think first before you retweet that bit of fake news.
## 40 In the wake of the Cambridge Analytica scandal, in which data from over 50 million Facebook profiles was secretly scraped and mined for voter insights, many Facebook users have decided to delete their accounts — but untangling yourself from a site...
## 41                                                                   The political action committee founded by John Bolton hired Cambridge Analytica specifically to develop psychological profiles of voters — and it knew the firm was using Facebook data.
## 42                                                                                           The company harvested data from 50 million Facebook users to develop psychological profiles on behalf of political campaigns, including that of President Trump.
## 43                                                                                             Mr. Musk deleted the Facebook pages of two of his companies, SpaceX and Tesla. He and the Facebook C.E.O., Mark Zuckerberg, have, er, not always gotten along.
## 44                                                                                               Mark Zuckerberg held a Wednesday meeting with staff, followed by a regularly scheduled meeting on Friday, partly to discuss the Cambridge Analytica scandal.
## 45                                                                                        The suffering and spirit of San Juan, P.R. Kayaking across the Atlantic Ocean at 70 years old (for the third time). David Bowie as you’ve never seen him, and more.
## 46                                                                                                         In 2011, the F.T.C. first required a company to create a comprehensive data privacy program for consumers. Europe will soon take another big step.
## 47                                                                                                                       Internet companies were built on a model in which people gave up their information for free services. Now, that idea is under siege.
##                                                                                  headline.main
## 1                             Facebook’s Role in Data Misuse Sets Off Storms on Two Continents
## 2                             Facebook’s Role in Data Misuse Sets Off Storms on Two Continents
## 3                                         How Facebook’s Data Sharing Went From Feature to Bug
## 4                                           Is It Time for More Adult Supervision at Facebook?
## 5                                                              Facebook’s Surveillance Machine
## 6                    Facebook and Cambridge Analytica: What You Need to Know as Fallout Widens
## 7                      Oracle’s Ellison Unveils Hydroponic Farming Start-Up: DealBook Briefing
## 8                                    Europe’s Planned Digital Tax Heightens Tensions With U.S.
## 9                                       How to Protect Yourself (and Your Friends) on Facebook
## 10                                                                          Crisis at Facebook
## 11                                   Facebook and Other Tech Companies Drag Down Stock Markets
## 12 Teaching Activities for: ‘Facebook’s Role in Data Misuse Sets Off Storms on Two Continents’
## 13                                                  The End for Facebook’s Security Evangelist
## 14                           What to Expect From Powell’s First Fed Meeting: DealBook Briefing
## 15                                                          Trump’s High-Tech Dirty Tricksters
## 16                       How Researchers Learned to Use Facebook ‘Likes’ to Sway Your Thinking
## 17                                                                 Lessons From the Uber Crash
## 18                              Cambridge Analytica Suspends C.E.O. Amid Facebook Data Scandal
## 19                             Facebook Faces Growing Pressure Over Data and Privacy Inquiries
## 20                                               Facebook Leaves Its Users’ Privacy Vulnerable
## 21                                         Facebook Made Him a Billionaire. Now He’s a Critic.
## 22                      Zuckerberg, Facing Facebook’s Worst Crisis Yet, Pledges Better Privacy
## 23                                             Missing From Facebook’s Crisis: Mark Zuckerberg
## 24                                                  Listen to ‘The Daily’: The Data Harvesters
## 25                                                        Want to #DeleteFacebook? You Can Try
## 26                                                Trump Hacked the Media Right Before Our Eyes
## 27                                                                     Facebook Doesn’t Get It
## 28                               For Many Facebook Users, a ‘Last Straw’ That Led Them to Quit
## 29                                               Listen to ‘The Daily’: Can Facebook Be Fixed?
## 30                                               Listen to ‘The Daily’: Can Facebook Be Fixed?
## 31                                                          Facebook’s Apology, and Next Steps
## 32          Kevin’s Week in Tech: Zuckerberg’s Answers to Privacy Scandal Raise More Questions
## 33                                  Mark Zuckerberg’s Reckoning: ‘This Is a Major Trust Issue’
## 34                                            Instagram Is Changing Its Algorithm. Here’s How.
## 35                                                          How Democracy Can Survive Big Data
## 36                                                                    One With a Lot of Tweets
## 37                           What’s Next for Stocks After the China Tariffs: DealBook Briefing
## 38                                       How to Prevent Smart People From Spreading Dumb Ideas
## 39                                       How to Prevent Smart People From Spreading Dumb Ideas
## 40                                           Why Leaving Facebook Doesn’t Always Mean Quitting
## 41                         Bolton Was Early Beneficiary of Cambridge Analytica’s Facebook Data
## 42                                   British Authorities Search Offices of Cambridge Analytica
## 43                                    Elon Musk Joins #DeleteFacebook With a Barrage of Tweets
## 44                                           Zuckerberg Takes Steps to Calm Facebook Employees
## 45                                                                11 of Our Best Weekend Reads
## 46                                       Timeline: Facebook and Google Under Regulators’ Glare
## 47                            How Calls for Privacy May Upend Business for Facebook and Google