Our goal here is to query the New York Times article search API, and get a data frame with the information from the result.
First, we sign up for an API key here (http://developer.nytimes.com/signup), and save our key in a file.
Then, we can use this key to run our API requests right from R.
Finally, we use the jsonlite library plus the tidyverse to convert from JSON to data frame.
I’ll be using a modified version of a lot of the code here:
http://www.storybench.org/working-with-the-new-york-times-api-in-r/
In this analysis, we are going to look for New York Times articles about Facebook from around the time the Cambridge Analytica scandal broke.
We’ll also look for news about Facebook from around the time of another scandal of theirs, when news broke in 2014 that they had performed psychological experiments on users (https://www.nytimes.com/2014/06/30/technology/facebook-tinkers-with-users-emotions-in-news-feed-experiment-stirring-outcry.html).
library(jsonlite)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(stringr)
In both cases, we’ll keep it simple with a basic search for “facebook”.
This article (http://www.straitstimes.com/world/united-states/timeline-of-facebook-cambridge-analytica-scandal) puts the start of the Cambridge Analytica scandal at March 17, 2018, so we’ll go from there to the date of this report (March 26, 2018).
The actual PNAS article about the Facebook experiment is dated June 17, 2014 (http://www.pnas.org/content/111/24/8788.full).
However, most articles I found when I Googled it were from later in June.
Let’s look from June 18 to July 6, 2014 and see what we get.
Run through all search results page by page (they come 10 at a time), and save each page’s results in a list object.
Save when complete, so if run this code again can use the saved data instead of querying the API again.
NYTIMES_KEY <- readLines("nytimes_api_key.txt")
term <- "facebook"
begin_date <- "20140618"
end_date <- "20140706"
baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,
"&begin_date=",begin_date,"&end_date=",end_date,
"&facet_filter=true&api-key=",NYTIMES_KEY, sep="")
initialQuery <- fromJSON(baseurl)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1)
pages_2014 <- vector("list",length=maxPages)
for(i in 0:maxPages){
nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame()
pages_2014[[i+1]] <- nytSearch
Sys.sleep(5) #I was getting errors more often when I waited only 1 second between calls. 5 seconds seems to work better.
}
facebook_2014_articles <- rbind_pages(pages_2014)
save(facebook_2014_articles,file="facebook_2014_articles.Rdata")
term <- "facebook"
begin_date <- "20180317"
end_date <- "20180326"
baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,
"&begin_date=",begin_date,"&end_date=",end_date,
"&facet_filter=true&api-key=",NYTIMES_KEY, sep="")
initialQuery <- fromJSON(baseurl)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1)
pages_2018 <- vector("list",length=maxPages)
for(i in 0:maxPages){
nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame()
pages_2018[[i+1]] <- nytSearch
Sys.sleep(5)
}
facebook_2018_articles <- rbind_pages(pages_2018)
save(facebook_2018_articles,file="facebook_2018_articles.Rdata")
First, let’s replace the column names of both data frames with something a little shorter and more readable.
colnames(facebook_2014_articles) <- str_replace(colnames(facebook_2014_articles),
pattern='response\\.',replace='')
colnames(facebook_2014_articles) <- str_replace(colnames(facebook_2014_articles),
pattern='docs\\.',replace='')
colnames(facebook_2018_articles) <- str_replace(colnames(facebook_2018_articles),
pattern='response\\.',replace='')
colnames(facebook_2018_articles) <- str_replace(colnames(facebook_2018_articles),
pattern='docs\\.',replace='')
colnames(facebook_2014_articles)
## [1] "status" "copyright"
## [3] "web_url" "snippet"
## [5] "print_page" "source"
## [7] "multimedia" "keywords"
## [9] "pub_date" "document_type"
## [11] "new_desk" "type_of_material"
## [13] "_id" "word_count"
## [15] "score" "uri"
## [17] "section_name" "abstract"
## [19] "headline.main" "headline.kicker"
## [21] "headline.content_kicker" "headline.print_headline"
## [23] "headline.name" "headline.seo"
## [25] "headline.sub" "byline.original"
## [27] "byline.person" "byline.organization"
## [29] "meta.hits" "meta.offset"
## [31] "meta.time"
Remove some columns that are always/almost always the same values, or that don’t provide much useful information.
Some of these may be specific to this data set. For example, “print_page” may be useful for a legacy data set, but for a recent data set I don’t think it will be very useful. Similar idea for “headline.print_headline”.
The “multimedia” column just provides information like the name of files for any associated pictures, plus size in pixels, etc.
While images or videos can be an important part of an article, we are not given enough information here to be able to get anything useful out of these.
Finally, in 2014 there was an “abstract” column, which is not there in 2018. This is probably mostly similar to the “snippet” column present in both data sets. Let’s set this aside just in case we want it later, but remove it from the main data frame for now.
columns_to_remove <- c("status","copyright","print_page","source","_id","uri","headline.print_headline","headline.name","headline.seo","headline.sub",grep('meta',colnames(facebook_2014_articles),value=TRUE),"abstract","multimedia")
columns_to_keep <- setdiff(colnames(facebook_2014_articles),columns_to_remove)
facebook_2014_abstracts <- as.vector(facebook_2014_articles$abstract)
facebook_2014_articles <- facebook_2014_articles %>% select(columns_to_keep)
facebook_2018_articles <- facebook_2018_articles %>% select(columns_to_keep)
Let’s look at the class of each variable. If any of the remaining columns are lists, we’ll see if we can convert to something simpler.
for(i in 1:ncol(facebook_2014_articles))
{
if(class(facebook_2014_articles[,i]) == "list"){
print(colnames(facebook_2014_articles)[i])
head(facebook_2014_articles[,i],n=3)
}
}
## [1] "keywords"
## [1] "byline.person"
Looks like the “keywords” and “byline.person” columns have a data frame for each article.
Both of these provide information that could be useful depending on circumstances.
However, I don’t think we’ll be looking in detail at specific authors for this analysis, so let’s just remove byline.person.
Then, let’s work on collapsing keywords into one data frame for all articles.
facebook_2014_articles <- facebook_2014_articles %>% select(setdiff(colnames(facebook_2014_articles),"byline.person"))
facebook_2018_articles <- facebook_2018_articles %>% select(setdiff(colnames(facebook_2018_articles),"byline.person"))
num_rows_keywords <- unlist(lapply(facebook_2014_articles$keywords,function(x)nrow(x)))
for(article_num in which(num_rows_keywords > 0)){
this_article_keywords <- data.frame(Article.index = article_num,
facebook_2014_articles$keywords[[article_num]],
stringsAsFactors=FALSE)
if(exists("facebook_2014_keywords_collapsed") == FALSE){facebook_2014_keywords_collapsed <- this_article_keywords;next}
facebook_2014_keywords_collapsed <- rbind(facebook_2014_keywords_collapsed,this_article_keywords)
}
head(facebook_2014_keywords_collapsed);tail(facebook_2014_keywords_collapsed)
## Article.index name value rank
## 1 1 subject Social Media 1
## 2 1 subject Research 2
## 3 1 organizations Facebook Inc 3
## 4 1 subject Emotions 4
## 5 1 organizations University of California, San Francisco 5
## 6 1 organizations Cornell University 6
## major
## 1 N
## 2 N
## 3 N
## 4 N
## 5 N
## 6 N
## Article.index name value rank major
## 1368 290 subject Recession and Depression 2 N
## 1369 290 subject Child Care 3 N
## 1370 290 persons Taylor, Shanesha 4 N
## 1371 290 subject Child Abuse and Neglect 5 N
## 1372 290 subject Unemployment 6 N
## 1373 290 subject United States Economy 7 N
class_keywords <- unlist(lapply(facebook_2018_articles$keywords,function(x)class(x)))
num_rows_keywords <- vector("numeric",length=nrow(facebook_2018_articles))
num_rows_keywords[which(class_keywords == "data.frame")] <- unlist(lapply(facebook_2018_articles$keywords,function(x)nrow(x)))
for(article_num in which(num_rows_keywords > 0)){
this_article_keywords <- data.frame(Article.index = article_num,
facebook_2018_articles$keywords[[article_num]],
stringsAsFactors=FALSE)
if(exists("facebook_2018_keywords_collapsed") == FALSE){facebook_2018_keywords_collapsed <- this_article_keywords;next}
facebook_2018_keywords_collapsed <- rbind(facebook_2018_keywords_collapsed,this_article_keywords)
}
head(facebook_2018_keywords_collapsed);tail(facebook_2018_keywords_collapsed)
## Article.index name value rank
## 1 1 subject Computers and the Internet 1
## 2 1 subject Social Media 2
## 3 1 subject Data-Mining and Database Marketing 3
## 4 1 organizations Cambridge Analytica 4
## 5 1 organizations Facebook Inc 5
## 6 1 persons Kogan, Aleksandr 6
## major
## 1 N
## 2 N
## 3 N
## 4 N
## 5 N
## 6 N
## Article.index name value rank major
## 938 476 subject Women and Girls 2 N
## 939 476 subject Sexual Harassment 3 N
## 940 476 persons MacKinnon, Catharine A 4 N
## 941 476 persons Carlson, Gretchen 5 N
## 942 476 organizations Miss America Organization 6 N
## 943 476 organizations Fox News Channel 7 N
Now, let’s remove keywords column from both data frames, before combining the two data frames.
Also combine the two “keywords_collapsed” data frames.
facebook_2014_articles <- facebook_2014_articles %>% select(setdiff(colnames(facebook_2014_articles),"keywords"))
facebook_2018_articles <- facebook_2018_articles %>% select(setdiff(colnames(facebook_2018_articles),"keywords"))
head(facebook_2014_articles);tail(facebook_2014_articles)
## web_url
## 1 https://www.nytimes.com/2014/07/01/opinion/jaron-lanier-on-lack-of-transparency-in-facebook-study.html
## 2 https://www.nytimes.com/video/technology/personaltech/100000002975706/edit-your-facebook-and-twitter-history.html
## 3 https://www.nytimes.com/2014/06/26/technology/personaltech/finding-old-posts-on-the-facebook-timeline.html
## 4 https://bits.blogs.nytimes.com/2014/06/30/facebook-says-its-sorry-weve-heard-that-before/
## 5 https://bits.blogs.nytimes.com/2014/06/19/facebook-service-restored-after-worldwide-outage/
## 6 https://boss.blogs.nytimes.com/2014/06/23/when-advertising-on-facebook-can-be-a-waste-of-money/
## snippet
## 1 As guinea pigs, we deserve to know what researchers are doing.
## 2 Molly Wood explains how to download and delete activity on Facebook and Twitter.
## 3 Plus, how to paste text from a web page without also getting all the strange formatting.
## 4 Facebook apologized for its study of how people’s emotions are affected by social media posts. And it’s not the first mea culpa for the company.
## 5 Users were unable to log into their accounts for more than a half-hour, though Facebook said later it had “resolved the issue quickly, and we are now back to 100 percent.”
## 6 If a business buys ads from Facebook and the ads generate likes from people or accounts who aren’t really interested in the business, advertising dollars have been wasted.
## pub_date document_type new_desk
## 1 2014-06-30T23:30:04+0000 article OpEd
## 2 2014-07-02T17:28:43Z multimedia Technology / Personal Tech
## 3 2014-06-26T00:00:00Z article Business
## 4 2014-06-30T16:06:45Z blogpost Business
## 5 2014-06-19T05:16:30Z blogpost Business
## 6 2014-06-23T14:00:31Z blogpost Business
## type_of_material word_count score section_name
## 1 Op-Ed 861 8.349110e-05 <NA>
## 2 Video 13 7.497751e-05 Personal Tech
## 3 Question 608 7.035457e-05 Personal Tech
## 4 Blog 670 6.946501e-05 <NA>
## 5 Blog 460 6.899050e-05 <NA>
## 6 Blog 1369 6.894341e-05 Small Business
## headline.main
## 1 Should Facebook Manipulate Users?
## 2 Edit Your Facebook and Twitter History
## 3 Finding Old Posts on the Facebook Timeline
## 4 Facebook Says It's Sorry. We've Heard That Before.
## 5 Facebook Service Restored After Worldwide Cutoff
## 6 When Advertising on Facebook Can Be a Waste of Money
## headline.kicker headline.content_kicker
## 1 Op-Ed Contributor Op-Ed Contributor
## 2 <NA> <NA>
## 3 Q&A Q&A
## 4 Bits <NA>
## 5 Bits <NA>
## 6 You're the Boss <NA>
## byline.original byline.organization
## 1 By JARON LANIER <NA>
## 2 Vanessa Perez, Rebekah Fergusson and Jason Blalock <NA>
## 3 By J. D. BIERSDORFER <NA>
## 4 By MIKE ISAAC <NA>
## 5 By MARK SCOTT and DAVID JOLLY <NA>
## 6 By EILENE ZIMMERMAN <NA>
## web_url
## 285 https://www.nytimes.com/2014/07/06/us/foreign-couples-heading-to-america-for-surrogate-pregnancies.html
## 286 https://www.nytimes.com/2014/07/02/us/many-sharp-turns-in-bergdahls-path-to-army.html
## 287 https://sports.blogs.nytimes.com/2014/06/24/italy-vs-uruguay-world-cup-2014-live-blog/
## 288 https://www.nytimes.com/2014/06/22/travel/unplugging-in-the-unofficial-capital-of-yoga.html
## 289 https://thecaucus.blogs.nytimes.com/2014/06/24/updates-from-the-mississippi-senate-runoff/
## 290 https://www.nytimes.com/2014/06/22/business/a-job-seekers-desperate-choice.html
## snippet
## 285 Foreign citizens now make up most of the clients at large surrogacy agencies in the United States, highlighting a divide between the country and much of the world over fundamental questions on family.
## 286 People who knew Sgt. Bowe Bergdahl in Idaho paint a fairly consistent portrait: hard-working and socially awkward, full of restless energy and romantic plans.
## 287 These teams might have expected to be fighting for first place in the group in this game. Instead they both lost to the astounding Costa Rica and are fighting for survival.
## 288 With her cellphone turned off, the author journeys to Rishikesh, India, where some people think yoga was born.
## 289 Voters head to the polls Tuesday in a Republican Senate primary runoff between Senator Thad Cochran, a six-term incumbent, and State Senator Chris McDaniel, his Tea Party-backed challenger.
## 290 The story of Shanesha Taylor, a mother who had a job interview but was unable to find child care, shows the harsh realities of today’s economy.
## pub_date document_type new_desk type_of_material
## 285 2014-07-05T21:11:20+0000 article National News
## 286 2014-07-02T00:30:59+0000 article National News
## 287 2014-06-24T09:57:38Z blogpost Sports Blog
## 288 2014-06-22T00:00:00Z article Travel News
## 289 2014-06-24T09:50:43Z blogpost National Blog
## 290 2014-06-21T15:34:43+0000 article SundayBusiness News
## word_count score section_name
## 285 3375 6.217635e-08 <NA>
## 286 2715 6.101015e-08 <NA>
## 287 2937 5.718930e-08 <NA>
## 288 2728 5.600999e-08 <NA>
## 289 2806 5.497112e-08 Politics
## 290 3085 5.392906e-08 <NA>
## headline.main
## 285 Coming to U.S. for Baby, and Womb to Carry It
## 286 Many Sharp Turns in Bergdahl’s Path to Army
## 287 Italy vs. Uruguay: World Cup 2014 Live Blog
## 288 Unplugging in the Unofficial Capital of Yoga
## 289 Highlights From the Mississippi Senate Primary Runoff
## 290 A Job Seeker’s Desperate Choice
## headline.kicker headline.content_kicker
## 285 Pregnancy for Pay Pregnancy for Pay
## 286
## 287 Sports <NA>
## 288 Pursuits Pursuits
## 289 The Caucus <NA>
## 290
## byline.original byline.organization
## 285 By TAMAR LEWIN <NA>
## 286 By KIRK JOHNSON and MATT FURBER <NA>
## 287 By JEFFREY MARCUS <NA>
## 288 By MARY PILON <NA>
## 289 By THE NEW YORK TIMES The New York Times
## 290 By SHAILA DEWAN <NA>
head(facebook_2018_articles);tail(facebook_2018_articles)
## web_url
## 1 https://www.nytimes.com/2018/03/19/technology/facebook-data-sharing.html
## 2 https://www.nytimes.com/2018/03/24/opinion/sunday/delete-facebook-does-not-fix-problem.html
## 3 https://www.nytimes.com/2018/03/21/opinion/facebook-trump-election.html
## 4 https://www.nytimes.com/2018/03/19/opinion/facebook-privacy-breach.html
## 5 https://www.nytimes.com/2018/03/19/opinion/facebook-cambridge-analytica-privacy.html
## 6 https://www.nytimes.com/video/technology/100000005811544/why-leaving-facebook-doesnt-always-mean-quitting-facebook.html
## snippet
## 1 Sure, third-party Facebook apps collected data about users’ lives. But they seemed convenient and harmless, and, really, what could go wrong?
## 2 Getting rid of your Facebook account will only offload the platform’s problems onto someone else.
## 3 Was it the reason Trump won? That’s the wrong question.
## 4 Readers suggest ways of preventing the next occurrence.
## 5 After learning how advisers to Donald Trump exploited the company’s vulnerabilities to get him elected, Congress needs to strengthen privacy laws.
## 6 In the wake of the Cambridge Analytica scandal, in which data from over 50 million Facebook profiles was secretly scraped and mined for voter insights, many Facebook users have decided to delete their accounts — but untangling yourself from a site...
## pub_date document_type new_desk type_of_material
## 1 2018-03-19T23:46:17+0000 article Business News
## 2 2018-03-24T18:22:27+0000 article OpEd Op-Ed
## 3 2018-03-21T11:57:10+0000 article OpEd Op-Ed
## 4 2018-03-19T16:41:45+0000 article Letters Letter
## 5 2018-03-20T01:26:51+0000 article Editorial Editorial
## 6 2018-03-22T02:11:15+0000 multimedia <NA> Video
## word_count score section_name
## 1 1034 1.6520802 <NA>
## 2 1031 1.4591359 Sunday Review
## 3 522 1.1376616 <NA>
## 4 145 1.1008341 <NA>
## 5 681 0.9324561 <NA>
## 6 421 0.8981754 <NA>
## headline.main headline.kicker
## 1 How Facebook’s Data Sharing Went From Feature to Bug The Shift
## 2 Don’t Delete Facebook. Do Something About It. Opinion
## 3 Facebook Doesn’t Get It Op-Ed Columnist
## 4 Crisis at Facebook Letters
## 5 Facebook Leaves Its Users’ Privacy Vulnerable Editorial
## 6 Why Leaving Facebook Doesn’t Always Mean Quitting <NA>
## headline.content_kicker
## 1 The Shift
## 2 Opinion
## 3 Op-Ed Columnist
## 4 Letters
## 5 Editorial
## 6 <NA>
## byline.original
## 1 By KEVIN ROOSE
## 2 By SIVA VAIDHYANATHAN
## 3 By DAVID LEONHARDT
## 4 <NA>
## 5 By THE EDITORIAL BOARD
## 6 By AINARA TIEFENTHÄLER, DEBORAH ACOSTA and ROBIN STEIN
## byline.organization
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 THE EDITORIAL BOARD
## 6 <NA>
## web_url
## 471 https://www.nytimes.com/2018/03/24/world/europe/sweden-gender-neutral-preschools.html
## 472 https://www.nytimes.com/2018/03/19/dining/maryland-stuffed-ham.html
## 473 https://www.nytimes.com/2018/03/19/style/hashtag-open-house.html
## 474 https://www.nytimes.com/2018/03/24/world/asia/afghanistan-kabul-terrorism.html
## 475 https://www.nytimes.com/2018/03/23/us/sexual-harassment-workplace-response.html
## 476 https://www.nytimes.com/2018/03/17/business/catharine-mackinnon-gretchen-carlson.html
## snippet
## 471 The state curriculum urges teachers to “counteract traditional gender roles,” and at one school, girls are encouraged to shout “No!” and boys run the play kitchen.
## 472 It’s a lot of work, but this traditional dish remains one of America’s most regional, and revered, specialties.
## 473 Influencers in Los Angeles and New York are saving us from dreary old open houses. Maybe you can get some cheap rent out of your Snapchat?
## 474 A Times reporter in Kabul has experienced eight suicide bombings since 2016, seven in the last year. Sick of it, she still goes to the scene. “If I don’t, who will?”
## 475 Corporations, entrepreneurs and lawmakers are stepping up efforts to prevent sexual harassment and expand worker protections. Is it enough?
## 476 The two discuss sexual harassment in the workplace, how to change corporate culture in meaningful and sustained ways, and whether Miss America can be relevant in the #MeToo era.
## pub_date document_type new_desk type_of_material
## 471 2018-03-24T09:30:10+0000 article Foreign News
## 472 2018-03-19T15:56:05+0000 article Dining News
## 473 2018-03-19T09:00:01+0000 article Styles News
## 474 2018-03-24T07:00:05+0000 article Insider News
## 475 2018-03-23T09:00:07+0000 article Investigative News
## 476 2018-03-17T09:00:08+0000 article SundayBusiness News
## word_count score section_name
## 471 1785 0.0010275886 Europe
## 472 1869 0.0010267762 <NA>
## 473 1796 0.0010259928 <NA>
## 474 1607 0.0010162268 <NA>
## 475 2070 0.0009004306 <NA>
## 476 3102 0.0007119151 <NA>
## headline.main
## 471 In Sweden’s Preschools, Boys Learn to Dance and Girls Learn to Yell
## 472 In This Corner of Maryland, Holidays Mean a Stuffed Ham
## 473 Hashtag Open House
## 474 This Is What I Do When I Hear the Bombs Explode
## 475 #MeToo Called for an Overhaul. Are Workplaces Really Changing?
## 476 Catharine MacKinnon and Gretchen Carlson Have a Few Things to Say
## headline.kicker headline.content_kicker byline.original
## 471 <NA> <NA> By ELLEN BARRY
## 472 <NA> <NA> By KIM SEVERSON
## 473 <NA> <NA> By CANDACE JACKSON
## 474 <NA> <NA> By FATIMA FAIZI
## 475 <NA> <NA> By JODI KANTOR
## 476 Table for Three Table for Three By PHILIP GALANES
## byline.organization
## 471 <NA>
## 472 <NA>
## 473 <NA>
## 474 <NA>
## 475 <NA>
## 476 <NA>
facebook_2014_articles <- data.frame(Targeted.topic = "2014 Facebook psychological experiment scandal",
Article.index = 1:nrow(facebook_2014_articles),
facebook_2014_articles,
stringsAsFactors=FALSE)
facebook_2018_articles <- data.frame(Targeted.topic = "2018 Facebook Cambridge Analytica scandal",
Article.index = 1:nrow(facebook_2018_articles),
facebook_2018_articles,
stringsAsFactors=FALSE)
facebook_2014_vs_2018_scandal_articles <- rbind(facebook_2014_articles,facebook_2018_articles)
facebook_2014_keywords_collapsed <- data.frame(Targeted.topic = "2014 Facebook psychological experiment scandal",
facebook_2014_keywords_collapsed,
stringsAsFactors=FALSE)
facebook_2018_keywords_collapsed <- data.frame(Targeted.topic = "2018 Facebook Cambridge Analytica scandal",
facebook_2018_keywords_collapsed,
stringsAsFactors=FALSE)
facebook_2014_vs_2018_keywords <- rbind(facebook_2014_keywords_collapsed,facebook_2018_keywords_collapsed)
Based on the head and tail results, we see that some articles are probably not really what we were looking for.
Let’s filter for articles where value = “Facebook Inc” in the keywords table.
articles_matching_keywords_info <- facebook_2014_vs_2018_keywords %>% filter(value == "Facebook Inc") %>% select(c("Targeted.topic","Article.index"))
articles_matching_keywords_info <- articles_matching_keywords_info[!duplicated(articles_matching_keywords_info),]
facebook_2014_vs_2018_scandal_articles <- merge(facebook_2014_vs_2018_scandal_articles,articles_matching_keywords_info,by=c("Targeted.topic","Article.index"))
How many articles are we left with for each time period to be studied?
table(facebook_2014_vs_2018_scandal_articles$Targeted.topic)
##
## 2014 Facebook psychological experiment scandal
## 24
## 2018 Facebook Cambridge Analytica scandal
## 47
Not really enough to do as much detailed analysis as I thought we might be able to do.
Let’s try and convert the pub_date format to an actual date format.
facebook_2014_vs_2018_scandal_articles$pub_date <- as.Date(substr(facebook_2014_vs_2018_scandal_articles$pub_date,1,10))
facebook_2014_vs_2018_scandal_articles <- facebook_2014_vs_2018_scandal_articles %>% arrange(pub_date)
Let’s show some of the information we might be most interested in for each topic.
Then I think we are good to save for now, keeping the data ready to analyze another time.
facebook_2014_vs_2018_scandal_articles %>%
filter(Targeted.topic == "2014 Facebook psychological experiment scandal") %>%
select(c("pub_date","snippet","headline.main"))
## pub_date
## 1 2014-06-18
## 2 2014-06-19
## 3 2014-06-21
## 4 2014-06-23
## 5 2014-06-23
## 6 2014-06-25
## 7 2014-06-26
## 8 2014-06-26
## 9 2014-06-27
## 10 2014-06-27
## 11 2014-06-27
## 12 2014-06-28
## 13 2014-06-29
## 14 2014-06-30
## 15 2014-06-30
## 16 2014-06-30
## 17 2014-07-01
## 18 2014-07-02
## 19 2014-07-02
## 20 2014-07-03
## 21 2014-07-03
## 22 2014-07-03
## 23 2014-07-03
## 24 2014-07-06
## snippet
## 1 The social media giant announced that it had created a new kind of computer networking switch, potentially capable of shifting data rapidly through the largest data centers.
## 2 Users were unable to log into their accounts for more than a half-hour, though Facebook said later it had “resolved the issue quickly, and we are now back to 100 percent.”
## 3 Social Sweepster, a new service, says it can scan photos for telltale signs of youthful indiscretions, like red party cups. It joins several services trying to help people erase evidence of behavior prospective employers may not find amusing.
## 4 The Breakthrough Prize in Mathematics, financed by Yuri Milner, a Russian investor, and Mark Zuckerberg, the founder of Facebook, comes with a $3 million award.
## 5 If a business buys ads from Facebook and the ads generate likes from people or accounts who aren’t really interested in the business, advertising dollars have been wasted.
## 6 The social networking company disclosed that 31 percent of its workers globally are women. In the United States, Facebook’s management is overwhelmingly white and male.
## 7 Facebook has argued that Manhattan prosecutors violated the constitutional rights of its users last year by demanding the nearly complete account data of 381 people, from pages they liked to their photos and private messages.
## 8 Plus, how to paste text from a web page without also getting all the strange formatting.
## 9 It is a rare product misstep for Facebook. Its Home software was supposed to turn an Android smartphone into a Facebook phone. But it never caught on with users.
## 10 The New York district attorney’s office demanded account details of 381 people for an investigation that led to indictments on Social Security fraud charges.
## 11 The New York district attorney’s office demanded account details of 381 people for an investigation that led to indictments on Social Security fraud charges.
## 12 Instead of combing through profiles on dating sites, some people prefer making their connections on Facebook or Instagram.
## 13 The Islamic State in Iraq and Syria has demonstrated modern sophistication in its adoption of social media, particularly Twitter, where its hashtags have gained jihadist followers.
## 14 As guinea pigs, we deserve to know what researchers are doing.
## 15 Last week Facebook revealed that it had manipulated the news feeds of over half a million randomly selected users to change the number of positive and negative posts they saw.
## 16 Facebook apologized for its study of how people’s emotions are affected by social media posts. And it’s not the first mea culpa for the company.
## 17 A man says the violent rap lyrics he posted on social media were art, but he was locked up for threatening his ex-wife. The Supreme Court will weigh in next term.
## 18 The social network is facing potential investigations in Europe on whether it broke local privacy laws by manipulating the emotional content of users’ posts without their consent.
## 19 Molly Wood explains how to download and delete activity on Facebook and Twitter.
## 20 Studying how social media sites are used can provide valuable insight into human behavior and may also help curb their power and potential for abuse.
## 21 A reader responds to an Op-Ed article, “Should Facebook Manipulate Users?”
## 22 Those concerned about their online profiles have a range of options, from deactivating their accounts completely to limiting who sees past posts.
## 23 The Electronic Privacy Information Center said Facebook violated a consent decree with regulators by manipulating some of its users’ news feeds without their explicit consent.
## 24 Notable quotes from business articles that appeared in The New York Times last week.
## headline.main
## 1 Facebook Makes Its Own Computer Networking Switch
## 2 Facebook Service Restored After Worldwide Cutoff
## 3 New Offering for Job Seekers: Fewer Embarrassing Social Media Photos
## 4 The Multimillion-Dollar Minds of 5 Mathematical Masters
## 5 When Advertising on Facebook Can Be a Waste of Money
## 6 Facebook Mirrors Tech Industry's Lack of Diversity
## 7 Facebook Case Over Search Warrants for User Information
## 8 Finding Old Posts on the Facebook Timeline
## 9 What Happened to the Facebook Phone? Not Very Much, It Seems
## 10 Daily Report: Effort by Facebook to Safeguard Data From the Law Fails, For Now
## 11 Forced to Hand Over Data, Facebook Files Appeal
## 12 Cupid’s Arrows Fly on Social Media, Too
## 13 Iraq’s Sunni Militants Take to Social Media to Advance Their Cause and Intimidate
## 14 Should Facebook Manipulate Users?
## 15 Facebook Tinkers With Users’ Emotions in News Feed Experiment, Stirring Outcry
## 16 Facebook Says It's Sorry. We've Heard That Before.
## 17 On the Next Docket: How the First Amendment Applies to Social Media
## 18 After Uproar, European Regulators Question Facebook on Psychological Testing
## 19 Edit Your Facebook and Twitter History
## 20 A Bright Side to Facebook’s Experiments on Its Users
## 21 Timeless Manipulators
## 22 Swear Off Social Media, for Good or Just for Now
## 23 Privacy Group Complains to F.T.C. About Facebook Emotion Study
## 24 The Chatter for Sunday, July 6
facebook_2014_vs_2018_scandal_articles %>%
filter(Targeted.topic == "2018 Facebook Cambridge Analytica scandal") %>%
select(c("pub_date","snippet","headline.main"))
## pub_date
## 1 2018-03-18
## 2 2018-03-18
## 3 2018-03-19
## 4 2018-03-19
## 5 2018-03-19
## 6 2018-03-19
## 7 2018-03-19
## 8 2018-03-19
## 9 2018-03-19
## 10 2018-03-19
## 11 2018-03-19
## 12 2018-03-20
## 13 2018-03-20
## 14 2018-03-20
## 15 2018-03-20
## 16 2018-03-20
## 17 2018-03-20
## 18 2018-03-20
## 19 2018-03-20
## 20 2018-03-20
## 21 2018-03-21
## 22 2018-03-21
## 23 2018-03-21
## 24 2018-03-21
## 25 2018-03-21
## 26 2018-03-21
## 27 2018-03-21
## 28 2018-03-21
## 29 2018-03-22
## 30 2018-03-22
## 31 2018-03-22
## 32 2018-03-22
## 33 2018-03-22
## 34 2018-03-22
## 35 2018-03-22
## 36 2018-03-22
## 37 2018-03-22
## 38 2018-03-22
## 39 2018-03-22
## 40 2018-03-22
## 41 2018-03-23
## 42 2018-03-23
## 43 2018-03-23
## 44 2018-03-23
## 45 2018-03-24
## 46 2018-03-24
## 47 2018-03-24
## snippet
## 1 American and British lawmakers called on Facebook to explain how a political data firm tied to the Trump campaign harvested private data from more than 50 million users.
## 2 American and British lawmakers called on Facebook to explain how a political data firm tied to the Trump campaign harvested private data from more than 50 million users.
## 3 Sure, third-party Facebook apps collected data about users’ lives. But they seemed convenient and harmless, and, really, what could go wrong?
## 4 As the social network’s list of woes grows, its 33-year-old founder, Mark Zuckerberg, will have to prove somehow he is not in way over his head.
## 5 It’s true that the Cambridge Analytica incident wasn’t a security breach. It was something far worse.
## 6 A political data firm tied to the Trump campaign gained access to information on 50 million Facebook users. Here is how it happened, and the uproar it has caused.
## 7 Larry Ellison is teaming up with Dr. David Agus to start a hydroponic farming firm focused on creating more healthful food.
## 8 A European Union plan would hit Silicon Valley’s technology giants especially hard, further straining relations with the United States over taxes and trade.
## 9 There are some practical solutions to safeguard some of your data, like installing software to block web tracking technologies.
## 10 Readers suggest ways of preventing the next occurrence.
## 11 Shares of technology companies plunged as investors fretted that tougher government oversight could hurt the sector’s profits.
## 12 How serious of an issue do you think this misuse of data is? Will it make you reconsider how you use social media in any way?
## 13 Alex Stamos, Facebook’s chief information security officer, who plans to leave the company by August, is known in Silicon Valley for his strong stands.
## 14 Fed policymakers conclude their March rate-setting meeting on Wednesday. It is the first such session with Jerome H. Powell as the central bank’s new chairman.
## 15 Undercover video shows the president’s digital consultants acting like thugs.
## 16 Your interest in Kim Kardashian West can tell researchers how extroverted (very), how conscientious (more than most) and how open-minded (only somewhat) you are.
## 17 We need to figure out how to avoid future tragedies.
## 18 The company’s board said it was suspending the chief executive, Alexander Nix, with immediate effect, pending an independent investigation.
## 19 The Federal Trade Commission is said to be examining whether the social media giant violated a 2011 agreement.
## 20 After learning how advisers to Donald Trump exploited the company’s vulnerabilities to get him elected, Congress needs to strengthen privacy laws.
## 21 Brian Acton, one of the creators of WhatsApp, sold his company to the internet giant for $19 billion. Now he’s telling people to “#deletefacebook.”
## 22 In his first public statements concerning a scandal involving Cambridge Analytica, Mark Zuckerberg said “there’s more to do, and we need to step up and do it.”
## 23 Amid a data scandal this week, Mr. Zuckerberg, Facebook’s chief executive, and Sheryl Sandberg, chief operating officer, have been nowhere to be found in public.
## 24 How did the brains behind Cambridge Analytica, the political research firm that worked with the Trump campaign, become its whistle-blower?
## 25 The social network may be too large to truly quit. Our personal tech columnist answers questions from readers who are contemplating deactivation.
## 26 It was television, not Facebook, that made him president.
## 27 Was it the reason Trump won? That’s the wrong question.
## 28 Patrons of the social network are deleting their profiles in protest over reports that the company allowed a political data firm to harvest private information.
## 29 Five days after details about Cambridge Analytica’s data mining were made public, Mark Zuckerberg, Facebook’s chief executive, spoke with The New York Times.
## 30 Five days after details about Cambridge Analytica’s data mining were made public, Mark Zuckerberg, Facebook’s chief executive, spoke with The New York Times.
## 31 Readers discuss how Cambridge Analytica provided information to the Trump campaign.
## 32 Mark Zuckerberg said Facebook’s reliance on advertising aligned with its mission to build a community. But what if Facebook cost $5 per month to use?
## 33 Facebook’s chief executive spoke with The New York Times about data privacy of users, Cambridge Analytica and the company’s next steps.
## 34 The changes appear to address some common gripes, like how some posts can keep appearing on your feed seemingly for days.
## 35 We can blame Facebook and Cambridge Analytica for the damage they’ve done, but the responsibility lies with all of us.
## 36 Ross Trudeau takes social media to a whole new level.
## 37 Till now investors might have thought they could take Mr. Trump’s trade pronouncements in stride. But after Thursday’s selloff, will their optimism return?
## 38 Think first before you retweet that bit of fake news.
## 39 Think first before you retweet that bit of fake news.
## 40 In the wake of the Cambridge Analytica scandal, in which data from over 50 million Facebook profiles was secretly scraped and mined for voter insights, many Facebook users have decided to delete their accounts — but untangling yourself from a site...
## 41 The political action committee founded by John Bolton hired Cambridge Analytica specifically to develop psychological profiles of voters — and it knew the firm was using Facebook data.
## 42 The company harvested data from 50 million Facebook users to develop psychological profiles on behalf of political campaigns, including that of President Trump.
## 43 Mr. Musk deleted the Facebook pages of two of his companies, SpaceX and Tesla. He and the Facebook C.E.O., Mark Zuckerberg, have, er, not always gotten along.
## 44 Mark Zuckerberg held a Wednesday meeting with staff, followed by a regularly scheduled meeting on Friday, partly to discuss the Cambridge Analytica scandal.
## 45 The suffering and spirit of San Juan, P.R. Kayaking across the Atlantic Ocean at 70 years old (for the third time). David Bowie as you’ve never seen him, and more.
## 46 In 2011, the F.T.C. first required a company to create a comprehensive data privacy program for consumers. Europe will soon take another big step.
## 47 Internet companies were built on a model in which people gave up their information for free services. Now, that idea is under siege.
## headline.main
## 1 Facebook’s Role in Data Misuse Sets Off Storms on Two Continents
## 2 Facebook’s Role in Data Misuse Sets Off Storms on Two Continents
## 3 How Facebook’s Data Sharing Went From Feature to Bug
## 4 Is It Time for More Adult Supervision at Facebook?
## 5 Facebook’s Surveillance Machine
## 6 Facebook and Cambridge Analytica: What You Need to Know as Fallout Widens
## 7 Oracle’s Ellison Unveils Hydroponic Farming Start-Up: DealBook Briefing
## 8 Europe’s Planned Digital Tax Heightens Tensions With U.S.
## 9 How to Protect Yourself (and Your Friends) on Facebook
## 10 Crisis at Facebook
## 11 Facebook and Other Tech Companies Drag Down Stock Markets
## 12 Teaching Activities for: ‘Facebook’s Role in Data Misuse Sets Off Storms on Two Continents’
## 13 The End for Facebook’s Security Evangelist
## 14 What to Expect From Powell’s First Fed Meeting: DealBook Briefing
## 15 Trump’s High-Tech Dirty Tricksters
## 16 How Researchers Learned to Use Facebook ‘Likes’ to Sway Your Thinking
## 17 Lessons From the Uber Crash
## 18 Cambridge Analytica Suspends C.E.O. Amid Facebook Data Scandal
## 19 Facebook Faces Growing Pressure Over Data and Privacy Inquiries
## 20 Facebook Leaves Its Users’ Privacy Vulnerable
## 21 Facebook Made Him a Billionaire. Now He’s a Critic.
## 22 Zuckerberg, Facing Facebook’s Worst Crisis Yet, Pledges Better Privacy
## 23 Missing From Facebook’s Crisis: Mark Zuckerberg
## 24 Listen to ‘The Daily’: The Data Harvesters
## 25 Want to #DeleteFacebook? You Can Try
## 26 Trump Hacked the Media Right Before Our Eyes
## 27 Facebook Doesn’t Get It
## 28 For Many Facebook Users, a ‘Last Straw’ That Led Them to Quit
## 29 Listen to ‘The Daily’: Can Facebook Be Fixed?
## 30 Listen to ‘The Daily’: Can Facebook Be Fixed?
## 31 Facebook’s Apology, and Next Steps
## 32 Kevin’s Week in Tech: Zuckerberg’s Answers to Privacy Scandal Raise More Questions
## 33 Mark Zuckerberg’s Reckoning: ‘This Is a Major Trust Issue’
## 34 Instagram Is Changing Its Algorithm. Here’s How.
## 35 How Democracy Can Survive Big Data
## 36 One With a Lot of Tweets
## 37 What’s Next for Stocks After the China Tariffs: DealBook Briefing
## 38 How to Prevent Smart People From Spreading Dumb Ideas
## 39 How to Prevent Smart People From Spreading Dumb Ideas
## 40 Why Leaving Facebook Doesn’t Always Mean Quitting
## 41 Bolton Was Early Beneficiary of Cambridge Analytica’s Facebook Data
## 42 British Authorities Search Offices of Cambridge Analytica
## 43 Elon Musk Joins #DeleteFacebook With a Barrage of Tweets
## 44 Zuckerberg Takes Steps to Calm Facebook Employees
## 45 11 of Our Best Weekend Reads
## 46 Timeline: Facebook and Google Under Regulators’ Glare
## 47 How Calls for Privacy May Upend Business for Facebook and Google