PROJECT TEAM INFORMATION:

README : In this RMD, please see PART1 for the project. Another RMD will be submitted for PART2.

PART1 : Will contain ELA/MATH scores analysis and NYtimes API data analysis PART2 : Will contain Web scrape of Wikipedia page and neo4j data model and implementation.

Team Members: Banu Boopalan & Sadia Perveen. For our final project, we decided to work on combining our initial proposals into one RMD file. First proposal was with looking at the following dataset on assessment data on ELA and Math and combining them with FreeReducedPriceLunch.csv and performing analysis and visualization. The second proposal was to work with API data from NYtimes API and performing sentiment analysis, topic models using https://www.tidytextmining.com/. We have tried to write back data to neo4j and try to requery back into R to show querying on the text data stored from webscraping in R as nodes and relationships.

Our collaboration was over the phone and whatsapp and gotomeeting to show our ideas initially. We thought we could somehow look at Education related topics on the retrieved data from NYTIMES API on ELA and Math assessments for the NY schools but we were not able to figure this out.Possibly this may be considered as an enhancement to this code to connect the dataset analysis to Twitter data or other website source data to understand on sentiment related to ELA/MATH performance.

SADIA PERVEEN PROPOSAL: Project Proposal: Since I am currently working in the educational field I wanted to do a analysis on district SES status compared to their yearly test scores. My two data source would be from https://data.nysed.gov/downloads.php This database contains assessment data for grades 3-8 on ELA and Math at the state, county, district, and school level broken down by various subgroups.This is where I have data on test scores for ELA and MAT for NY schools and districts. The question I am looking to answer is if schools with higher SES status have higher ELA and MAT scores. First I would need to combine data from both of my sources. Second I would need to conduct a analysis and see if the difference is significant. Third I would need to display my results in a visualization form that is easy to understand.

BANU BOOPALAN PROPOSAL: https://rpubs.com/BanuB/551009.

ALL DATA SOURCES:

ANALYSIS 2:

Topic of interest #MeToo. Reviewed most popular shared NYtimes on Twitter and Facebook. Then looked at previous timeframe of 2018 and 2017 October and November month at the height of movement to see the topics in these datasets. Even though data was there, there wasn’t a significant number of articles in Archive during 2018 and 2017 month of October and November that were related to Metoo movement. Primary topics found were related to President Trump. The API was significant amount of data retrieved even with a page count of 5-8. A max of pages can be 100 when using JSON API call. So, the analysis was performed only on the NYTIMES Des_Facet description of articles summary text that was pulled back on these fields. We are not sure if there would have been a better way to scrape direct NYTIMES articles directly (as I heard that NYTIMES articles cannot be scraped directly or a comparison news agency API would be BBC and Al Jazeera API to pull articles back on this topic). Topic model used LDA to get topics. Tried to run a cluster dendogram but had issues with this.

Conclusions. Running the LDA model once took a long time so I had to quit the session, also my R session crashed multiple times when I tried to run the RMD file. Might need to understand the LDA model better to figure out how to run it. Running bigram and unigram token analysis to discover the vertices to report on igraph was great to visualize.

First API call and dataset - all articles shared in current month https://api.nytimes.com/svc/mostpopular/v2/shared/30/twitter.json?api-key=wl3OA7v4AV7cjxGya142nvRGGv46HdNG https://api.nytimes.com/svc/mostpopular/v2/shared/30/facebook.json?api-key=wl3OA7v4AV7cjxGya142nvRGGv46HdNG

Second API call and dataset - Here I tried to run for year 2019, 2018 and 2017 looking at 10, 11, 12 the month to see frequency of #metoo on des_Facet field of returned page. Renamed these 2 datasets and saved to Rdata so I can reload it at times to save from calling the API over and over. https://api.nytimes.com/svc/archive/v1/2018/11.json?api-key=wl3OA7v4AV7cjxGya142nvRGGv46HdNG https://api.nytimes.com/svc/archive/v1/2017/12.json?api-key=wl3OA7v4AV7cjxGya142nvRGGv46HdNG

Summarize and Print what was collected as shared on Twitter

#facebook
summary(myresponse4)
##             Length Class      Mode     
## status       1     -none-     character
## copyright    1     -none-     character
## num_results  1     -none-     numeric  
## results     24     data.frame list
summary(myresponse3)
##             Length Class      Mode     
## status       1     -none-     character
## copyright    1     -none-     character
## num_results  1     -none-     numeric  
## results     24     data.frame list
length(myresponse4$results)
## [1] 24
length(myresponse3$results)
## [1] 24
print(myresponse4$results$url[1])
## [1] "https://www.nytimes.com/2019/07/31/us/politics/trump-navy-seal-war-crimes.html"
print(myresponse4$results$abstract[1])
## [1] "President Trump lashed out at military lawyers who tried the case of Edward Gallagher, a Navy SEAL who was acquitted of killing a captured teenage Islamic State fighter."
print(myresponse4$results$adx_keywords[1])
## [1] "United States Defense and Military Forces;United States Politics and Government;Awards, Decorations and Honors;War Crimes, Genocide and Crimes Against Humanity;United States Navy;Gallagher, Edward (1979- );Trump, Donald J"
print(myresponse4$results$url[2])
## [1] "https://www.nytimes.com/2019/11/27/nyregion/hull-o-farm-catskills.html"
print(myresponse4$results$abstract[2])
## [1] "The aging owners of a Catskills farm say it “has to close so we can survive.”"
print(myresponse4$results$adx_keywords[2])
## [1] "Agriculture and Farming;Hull-O Farms (Durham, NY);Catskills (NYS Area);Family Business;Pensions and Retirement Plans"

Summarize and Print what was collected as shared on Twitter

#twitter
print(myresponse3$results$url[1])
## [1] "https://www.nytimes.com/2019/12/08/nyregion/auschwitz-love-story.html"
print(myresponse3$results$abstract[1])
## [1] "Was she the reason he was alive today?"
print(myresponse3$results$adx_keywords[1])
## [1] "Holocaust and the Nazi Era;Concentration Camps;Jews and Judaism;Auschwitz Concentration Camp;Europe;Manhattan (NYC);World War II (1939-45);War Crimes, Genocide and Crimes Against Humanity;Wisnia, David;Tichauer, Helen (1918-2018);Dating and Relationships"
print(myresponse3$results$url[2])
## [1] "https://www.nytimes.com/2019/12/07/opinion/sunday/katie-hill-resignation.html"
print(myresponse3$results$abstract[2])
## [1] "I overcame the desperation I felt after stepping down from Congress, and I’m still in the fight."
print(myresponse3$results$adx_keywords[2])
## [1] "United States Politics and Government;Elections, House of Representatives;Women and Girls;House of Representatives;Democratic Party;United States;California;Hill, Katie (1987- )"

Bind both datasets with the shared per Facebook and Twitter

#bind both results together to a new dataset for facebook/twitter shares
new.df <- rbind(df3, df4) 
head(new.df)
##                                                                                             url
## 1                         https://www.nytimes.com/2019/12/08/nyregion/auschwitz-love-story.html
## 2                 https://www.nytimes.com/2019/12/07/opinion/sunday/katie-hill-resignation.html
## 3                        https://www.nytimes.com/2019/11/13/movies/tom-hanks-mister-rogers.html
## 4 https://www.nytimes.com/2019/11/30/business/david-boies-pottinger-jeffrey-epstein-videos.html
## 5                        https://www.nytimes.com/2019/11/29/us/politics/kamala-harris-2020.html
## 6                          https://www.nytimes.com/2019/12/05/magazine/alex-jones-infowars.html
##                                                                                                                                                                                                                                                                                                             adx_keywords
## 1                                                         Holocaust and the Nazi Era;Concentration Camps;Jews and Judaism;Auschwitz Concentration Camp;Europe;Manhattan (NYC);World War II (1939-45);War Crimes, Genocide and Crimes Against Humanity;Wisnia, David;Tichauer, Helen (1918-2018);Dating and Relationships
## 2                                                                                                                                      United States Politics and Government;Elections, House of Representatives;Women and Girls;House of Representatives;Democratic Party;United States;California;Hill, Katie (1987- )
## 3                                                                                                                                                                                                                                     Hanks, Tom;Actors and Actresses;Movies;A Beautiful Day in the Neighborhood (Movie)
## 4 Epstein, Jeffrey E (1953- );Kessler, Patrick;Boies, David;Pottinger, John Stanley;Sex Crimes;Child Abuse and Neglect;Prostitution;Human Trafficking;Video Recordings, Downloads and Streaming;#MeToo Movement;Legal Profession;High Net Worth Individuals;Extortion and Blackmail;Whistle-Blowers;Frauds and Swindling
## 5                                                                                                                                                                                      Harris, Kamala D;Presidential Election of 2020;Democratic Party;Iowa;Primaries and Caucuses;United States Politics and Government
## 6                                                                                                                                                                                  Infowars;Jones, Alex (1974- );Fringe Groups and Movements;United States Politics and Government;Radio;Islamberg (NY);Muslim Americans
##      subsection share_count     count_type  column eta_id  section    id
## 1                         1 SHARED-TWITTER    <NA>      0 New York 1e+14
## 2 sunday review           2 SHARED-TWITTER    <NA>      0  Opinion 1e+14
## 3                         3 SHARED-TWITTER    <NA>      0   Movies 1e+14
## 4                         4 SHARED-TWITTER    <NA>      0 Business 1e+14
## 5      politics           5 SHARED-TWITTER    <NA>      0     U.S. 1e+14
## 6                         6 SHARED-TWITTER Feature      0 Magazine 1e+14
##   asset_id nytdsection
## 1    1e+14    new york
## 2    1e+14     opinion
## 3    1e+14      movies
## 4    1e+14    business
## 5    1e+14        u.s.
## 6    1e+14    magazine
##                                                                       byline
## 1                                                         By KEREN BLANKFELD
## 2                                                              By KATIE HILL
## 3                                                   By TAFFY BRODESSER-AKNER
## 4 By JESSICA SILVER-GREENBERG, EMILY STEEL, JACOB BERNSTEIN and DAVID ENRICH
## 5                  By JONATHAN MARTIN, ASTEAD W. HERNDON and ALEXANDER BURNS
## 6                                                              By JOSH OWENS
##      type
## 1 Article
## 2 Article
## 3 Article
## 4 Article
## 5 Article
## 6 Article
##                                                                title
## 1 Lovers in Auschwitz, Reunited 72 Years Later. He Had One Question.
## 2                                Katie Hill: It’s Not Over After All
## 3                   This Tom Hanks Story Will Help You Feel Less Bad
## 4              Jeffrey Epstein, Blackmail and a Lucrative ‘Hot List’
## 5                             How Kamala Harris’s Campaign Unraveled
## 6                              I Worked for Alex Jones. I Regret It.
##                                                                                                                                                                                                                         abstract
## 1                                                                                                                                                                                         Was she the reason he was alive today?
## 2                                                                                                                               I overcame the desperation I felt after stepping down from Congress, and I’m still in the fight.
## 3                                                                                                         Hanks is playing Mister Rogers in a new movie and is just as nice as you think he is. Please read this article anyway.
## 4                                                                              A shadowy hacker claimed to have the financier’s sex tapes. Two top lawyers wondered: What would the men in those videos pay to keep them secret?
## 5 Ms. Harris is the only 2020 Democrat who has fallen hard out of the top tier of candidates. She has proved to be an uneven campaigner who changes her message and tactics to little effect and has a staff torn into factions.
## 6                                                                                  I dropped out of film school to edit video for the conspiracy theorist because I believed in his worldview. Then I saw what it did to people.
##   published_date             source             updated
## 1     2019-12-08 The New York Times 2019-12-11 19:58:55
## 2     2019-12-07 The New York Times 2019-12-09 17:05:38
## 3     2019-11-13 The New York Times 2019-12-04 14:51:05
## 4     2019-11-30 The New York Times 2019-12-09 18:42:14
## 5     2019-11-29 The New York Times 2019-12-06 16:50:46
## 6     2019-12-05 The New York Times 2019-12-11 19:58:55
##                                                                                     des_facet
## 1                                        HOLOCAUST AND THE NAZI ERA, DATING AND RELATIONSHIPS
## 2 UNITED STATES POLITICS AND GOVERNMENT, ELECTIONS, HOUSE OF REPRESENTATIVES, WOMEN AND GIRLS
## 3                                                                                      MOVIES
## 4                                 VIDEO RECORDINGS, DOWNLOADS AND STREAMING, LEGAL PROFESSION
## 5                                                               PRESIDENTIAL ELECTION OF 2020
## 6                                     UNITED STATES POLITICS AND GOVERNMENT, MUSLIM AMERICANS
##                                                                                                                                                                           org_facet
## 1                                     CONCENTRATION CAMPS, JEWS AND JUDAISM, AUSCHWITZ CONCENTRATION CAMP, WORLD WAR II (1939-45), WAR CRIMES, GENOCIDE AND CRIMES AGAINST HUMANITY
## 2                                                                                                                                        HOUSE OF REPRESENTATIVES, DEMOCRATIC PARTY
## 3                                                                                                                                                              ACTORS AND ACTRESSES
## 4 SEX CRIMES, CHILD ABUSE AND NEGLECT, PROSTITUTION, HUMAN TRAFFICKING, #METOO MOVEMENT, HIGH NET WORTH INDIVIDUALS, EXTORTION AND BLACKMAIL, WHISTLE-BLOWERS, FRAUDS AND SWINDLING
## 5                                                                                                   DEMOCRATIC PARTY, PRIMARIES AND CAUCUSES, UNITED STATES POLITICS AND GOVERNMENT
## 6                                                                                                                                      INFOWARS, FRINGE GROUPS AND MOVEMENTS, RADIO
##                                                                              per_facet
## 1                                           WISNIA, DAVID, TICHAUER, HELEN (1918-2018)
## 2                                                                 HILL, KATIE (1987- )
## 3                                                                           HANKS, TOM
## 4 EPSTEIN, JEFFREY E (1953- ), KESSLER, PATRICK, BOIES, DAVID, POTTINGER, JOHN STANLEY
## 5                                                                     HARRIS, KAMALA D
## 6                                                                 JONES, ALEX (1974- )
##                   geo_facet
## 1   EUROPE, MANHATTAN (NYC)
## 2 UNITED STATES, CALIFORNIA
## 3                          
## 4                          
## 5                      IOWA
## 6            ISLAMBERG (NY)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        media
## 1                                                                                                                                             image, photo, David Wisnia at his home in Pennsylvania., Danna Singer for The New York Times, 1, https://static01.nyt.com/images/2019/12/08/nyregion/08aushchwitz-lovers1/08aushchwitz-lovers1-thumbStandard.jpg, https://static01.nyt.com/images/2019/12/08/nyregion/08aushchwitz-lovers1/08aushchwitz-lovers1-mediumThreeByTwo210-v3.jpg, https://static01.nyt.com/images/2019/12/08/nyregion/08aushchwitz-lovers1/08aushchwitz-lovers1-mediumThreeByTwo440-v3.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 2                                                                                                                                                                                         image, photo, , Damon Winter/The New York Times, 1, https://static01.nyt.com/images/2019/12/08/opinion/08Hill/08Hill-thumbStandard.jpg, https://static01.nyt.com/images/2019/12/08/opinion/08Hill/merlin_165012912_141192c4-7734-4f90-b842-017f5a9deab4-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/12/08/opinion/08Hill/merlin_165012912_141192c4-7734-4f90-b842-017f5a9deab4-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 3         image, photo, “I recognized in myself a long time ago that I don’t instill fear in anybody,” Hanks said. “Now, that’s different than being nice, you know? I think I have a cache of mystery. But it’s not one of malevolence.”, Daniel Dorsa for The New York Times, 1, https://static01.nyt.com/images/2019/11/17/arts/17tom-hanks2-promo/17tom-hanks2-thumbStandard.jpg, https://static01.nyt.com/images/2019/11/17/arts/17tom-hanks2-promo/17tom-hanks2-promo-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/11/17/arts/17tom-hanks2-promo/17tom-hanks2-promo-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 4                                                                                                                                                                                                                              image, photo, NA, Stephanie Diani for The New York Times, 1, https://static01.nyt.com/images/2019/12/01/autossell/30hotlist-promo/_Horizontal-thumbStandard.jpg, https://static01.nyt.com/images/2019/12/01/autossell/30hotlist-promo/_Horizontal-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/12/01/autossell/30hotlist-promo/_Horizontal-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 5 image, photo, Senator Kamala Harris’s abundant political skills convinced many Democrats that she had the potential to take on President Trump., Daniel Acker for The New York Times, 1, https://static01.nyt.com/images/2019/11/30/us/politics/30harris-jump1-alt/29harris1-thumbStandard.jpg, https://static01.nyt.com/images/2019/11/30/us/politics/30harris-jump1-alt/merlin_162352476_24d90952-c16b-4099-8860-d097259f3c74-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/11/30/us/politics/30harris-jump1-alt/merlin_162352476_24d90952-c16b-4099-8860-d097259f3c74-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 6                                                                                                                                                                                                                                             image, photo, , Illustration by Eric Yahnker, 1, https://static01.nyt.com/images/2019/12/08/magazine/08Mag-Jones-1/08Mag-Jones-1-thumbStandard.jpg, https://static01.nyt.com/images/2019/12/08/magazine/08Mag-Jones-1/08Mag-Jones-1-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/12/08/magazine/08Mag-Jones-1/08Mag-Jones-1-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
##                                                  uri
## 1 nyt://article/8edf8c46-8166-5577-992e-2f4cd9cd081c
## 2 nyt://article/1d5686e2-3ef5-537b-a505-8ad67b29a042
## 3 nyt://article/8837752f-6207-55e9-8cc3-b811091e3a86
## 4 nyt://article/488da5b2-6476-5809-9718-6691d6cc5b65
## 5 nyt://article/70bf6582-a21c-5d54-8207-6db42dccfb30
## 6 nyt://article/bd5f4671-3587-5025-96f8-802a2a4b4463
#str(new.df)
#summary(new.df)

NRC and Valence Sentiment Analysis

Get nrc sentiment and valence value for the articles shared on the abstract api field of the article

#get nrc sentiment for abstract of articles shared on facebook and twitter
nrc_data <- get_nrc_sentiment(new.df$abstract)
valence <- (nrc_data[, 9]*-1) + nrc_data[, 10]

#dataset with valence
final_data <- cbind(new.df$abstract, new.df$count_type,nrc_data[1:10],valence)%>% kable() %>% kable_styling() %>% scroll_box(width = "910px", height = "400px")
final_data
new.df\(abstract </th> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> new.df\)count_type anger anticipation disgust fear joy sadness surprise trust negative positive valence
Was she the reason he was alive today? SHARED-TWITTER 0 1 0 0 1 0 0 1 0 2 2
I overcame the desperation I felt after stepping down from Congress, and I’m still in the fight. SHARED-TWITTER 1 0 1 1 0 0 0 1 1 0 -1
Hanks is playing Mister Rogers in a new movie and is just as nice as you think he is. Please read this article anyway. SHARED-TWITTER 0 0 0 0 0 0 0 0 0 0 0
A shadowy hacker claimed to have the financier’s sex tapes. Two top lawyers wondered: What would the men in those videos pay to keep them secret? SHARED-TWITTER 0 3 0 0 2 0 0 4 0 3 3
Ms. Harris is the only 2020 Democrat who has fallen hard out of the top tier of candidates. She has proved to be an uneven campaigner who changes her message and tactics to little effect and has a staff torn into factions. SHARED-TWITTER 0 1 0 1 0 0 0 2 2 1 -1
I dropped out of film school to edit video for the conspiracy theorist because I believed in his worldview. Then I saw what it did to people. SHARED-TWITTER 0 0 0 1 0 0 0 2 0 0 0
Fred Rogers wasn’t just a brilliant educator and a profoundly moral person. He was an uncompromising artist. SHARED-TWITTER 1 1 0 0 1 0 0 2 0 2 2
For 40 years, journalists chronicled the eccentric royal family of Oudh, deposed aristocrats who lived in a ruined palace in the Indian capital. It was a tragic, astonishing story. But was it true? SHARED-TWITTER 1 0 1 1 0 1 0 0 2 0 -2
Top military officials threatened to resign or be fired if their plans to remove Chief Gallagher from the SEALs were halted by President Trump, administration officials said. SHARED-TWITTER 2 1 1 3 0 2 1 3 2 2 0
“Cancel culture” has always existed — for the powerful, at least. Now, social media has democratized it. SHARED-TWITTER 1 1 1 1 1 1 0 1 1 2 1
My brother dishes on Trump, impeachment and the 2020 field. SHARED-TWITTER 0 0 0 0 0 0 1 1 1 1 0
In 1964, with “Seven Up!” Michael Apted stumbled into making what has become the most profound documentary series in the history of cinema. Fifty-five years later, the project is reaching its conclusion. SHARED-TWITTER 0 0 0 0 0 0 0 1 0 0 0
Former President Barack Obama, in an address to liberal donors, warned candidates not to go too far left and sought to calm those who were concerned about the state of the Democratic primary. SHARED-TWITTER 0 1 0 2 0 1 1 1 1 4 3
President Trump’s former press secretary has returned to Arkansas a bona fide star eager to play a new role in a post-Trump Republican Party. SHARED-TWITTER 0 2 0 0 2 0 2 3 0 3 3
You might want to think twice before plugging in at an airport or on the train. SHARED-TWITTER 0 1 0 0 0 0 0 0 0 0 0
Chief Petty Officer Edward Gallagher is expected to be formally notified of the action on Wednesday. SHARED-TWITTER 0 1 0 0 0 0 0 1 1 2 1
A reader asks how best to advise her child who wants to stop getting a monthly period. SHARED-TWITTER 0 1 0 0 1 0 0 1 0 3 3
I know Karens are hard. As a member of Gen X, I grew up surrounded by them. SHARED-TWITTER 0 0 0 0 0 0 0 0 0 0 0
Moscow has run a yearslong operation to blame Ukraine for its own 2016 election interference. Republicans have used similar talking points to defend President Trump in impeachment proceedings. SHARED-TWITTER 1 0 1 2 0 0 1 2 3 2 -1
Two-thirds of battleground state voters who chose Trump in 2016 but selected Democrats in the midterms say they will return to the president next year. SHARED-TWITTER 0 0 0 0 0 0 1 1 0 1 1
President Trump lashed out at military lawyers who tried the case of Edward Gallagher, a Navy SEAL who was acquitted of killing a captured teenage Islamic State fighter. SHARED-FACEBOOK 1 0 0 3 0 2 1 2 2 2 0
The aging owners of a Catskills farm say it “has to close so we can survive.” SHARED-FACEBOOK 0 1 0 0 0 0 0 0 0 1 1
After a bone marrow transplant, a man with leukemia found that his donor’s DNA traveled to unexpected parts of his body. A crime lab is now studying the case. SHARED-FACEBOOK 2 1 0 3 3 2 1 2 4 3 -1
We asked 18 families to show us what they have for dinner on a typical weeknight. SHARED-FACEBOOK 0 0 0 0 0 0 0 1 0 1 1
Thirty years ago, Deborah Copaken thought her boyfriend had stood her up. The real story was more complicated. So she used it as a cautionary tale to help reunite another couple. SHARED-FACEBOOK 0 1 0 1 0 0 0 1 1 2 1
Mary Cain’s male coaches were convinced she had to get “thinner, and thinner, and thinner.” Then her body started breaking down. SHARED-FACEBOOK 0 0 0 0 0 0 0 1 0 0 0
Was she the reason he was alive today? SHARED-FACEBOOK 0 1 0 0 1 0 0 1 0 2 2
Multiple police officers in Brooklyn say they were told by a commander that white and Asian people should be left alone. SHARED-FACEBOOK 0 1 0 1 1 0 0 2 0 2 2
Four African countries have reported new cases of polio linked to the oral vaccine, as global health numbers show there are now more children being paralyzed by viruses originating in vaccines than in the wild. SHARED-FACEBOOK 1 0 0 2 0 2 2 1 3 2 -1
Top military officials threatened to resign or be fired if their plans to remove Chief Gallagher from the SEALs were halted by President Trump, administration officials said. SHARED-FACEBOOK 2 1 1 3 0 2 1 3 2 2 0
For 40 years, journalists chronicled the eccentric royal family of Oudh, deposed aristocrats who lived in a ruined palace in the Indian capital. It was a tragic, astonishing story. But was it true? SHARED-FACEBOOK 1 0 1 1 0 1 0 0 2 0 -2
In a new memoir, the bassist describes how he expanded his consciousness, found his muse and landed in a storied rock band. SHARED-FACEBOOK 0 0 0 0 1 0 0 1 0 4 4
Hanks is playing Mister Rogers in a new movie and is just as nice as you think he is. Please read this article anyway. SHARED-FACEBOOK 0 0 0 0 0 0 0 0 0 0 0
The justice was admitted after experiencing chills and a fever and expects to be released as early as Sunday morning, a Supreme Court spokeswoman said. SHARED-FACEBOOK 1 1 0 2 0 0 0 1 0 2 2
Elon Musk’s car company presented its long-awaited pickup truck, but it didn’t quite go as planned. (Broken windows were involved.) SHARED-FACEBOOK 1 1 0 1 0 1 0 1 1 0 -1
Fred Rogers wasn’t just a brilliant educator and a profoundly moral person. He was an uncompromising artist. SHARED-FACEBOOK 1 1 0 0 1 0 0 2 0 2 2
Eight hours, 322 sales, two cops, one America’s Next Top Model, and one very persistent drug dealer. SHARED-FACEBOOK 0 1 0 0 0 0 0 1 0 3 3
Accusations of sexual abuse led to the expulsion of Mr. Morris, a former coach of the United States Olympic team. SHARED-FACEBOOK 2 0 2 2 0 2 0 3 2 1 -1
Two-thirds of battleground state voters who chose Trump in 2016 but selected Democrats in the midterms say they will return to the president next year. SHARED-FACEBOOK 0 0 0 0 0 0 1 1 0 1 1
The editors of The Times Book Review choose the best fiction and nonfiction titles this year. SHARED-FACEBOOK 0 0 0 0 0 0 0 0 0 0 0
barplot(
  sort(colSums(prop.table(nrc_data))), 
  horiz = TRUE, 
  cex.names = 0.7, 
  las = 1, 
  main = "Emotions in Sample text", xlab="Percentage"
)

Get nrc sentiment at NYTIMES api at the des_facet field of the returned API data which categorizes articles based on descriptions

#split list for des_facet1 across all observations
new.df2 <- data.frame()
length(new.df$des_facet[[3]])
## [1] 1
for (i in 1:nrow(new.df)){
  id <- as.numeric(i)
  type <- new.df$count_type[i]
  des_facet1 <- ''
  if (length(new.df$des_facet[[i]] > 0)){
  des_facet1 <- new.df$des_facet[[i]]}
  new.df2 <- rbind(new.df2,(cbind(new.df[i,c(1,2,14)],id,type,des_facet1)))
  
}
## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded

## Warning in data.frame(..., check.names = FALSE): row names were found from
## a short variable and have been discarded
head(new.df2)
##                                                                              url
## 1          https://www.nytimes.com/2019/12/08/nyregion/auschwitz-love-story.html
## 2          https://www.nytimes.com/2019/12/08/nyregion/auschwitz-love-story.html
## 3  https://www.nytimes.com/2019/12/07/opinion/sunday/katie-hill-resignation.html
## 4  https://www.nytimes.com/2019/12/07/opinion/sunday/katie-hill-resignation.html
## 5  https://www.nytimes.com/2019/12/07/opinion/sunday/katie-hill-resignation.html
## 31        https://www.nytimes.com/2019/11/13/movies/tom-hanks-mister-rogers.html
##                                                                                                                                                                                                                                                      adx_keywords
## 1  Holocaust and the Nazi Era;Concentration Camps;Jews and Judaism;Auschwitz Concentration Camp;Europe;Manhattan (NYC);World War II (1939-45);War Crimes, Genocide and Crimes Against Humanity;Wisnia, David;Tichauer, Helen (1918-2018);Dating and Relationships
## 2  Holocaust and the Nazi Era;Concentration Camps;Jews and Judaism;Auschwitz Concentration Camp;Europe;Manhattan (NYC);World War II (1939-45);War Crimes, Genocide and Crimes Against Humanity;Wisnia, David;Tichauer, Helen (1918-2018);Dating and Relationships
## 3                                                                               United States Politics and Government;Elections, House of Representatives;Women and Girls;House of Representatives;Democratic Party;United States;California;Hill, Katie (1987- )
## 4                                                                               United States Politics and Government;Elections, House of Representatives;Women and Girls;House of Representatives;Democratic Party;United States;California;Hill, Katie (1987- )
## 5                                                                               United States Politics and Government;Elections, House of Representatives;Women and Girls;House of Representatives;Democratic Party;United States;California;Hill, Katie (1987- )
## 31                                                                                                                                                                             Hanks, Tom;Actors and Actresses;Movies;A Beautiful Day in the Neighborhood (Movie)
##                                                                 title id
## 1  Lovers in Auschwitz, Reunited 72 Years Later. He Had One Question.  1
## 2  Lovers in Auschwitz, Reunited 72 Years Later. He Had One Question.  1
## 3                                 Katie Hill: It’s Not Over After All  2
## 4                                 Katie Hill: It’s Not Over After All  2
## 5                                 Katie Hill: It’s Not Over After All  2
## 31                   This Tom Hanks Story Will Help You Feel Less Bad  3
##              type                            des_facet1
## 1  SHARED-TWITTER            HOLOCAUST AND THE NAZI ERA
## 2  SHARED-TWITTER              DATING AND RELATIONSHIPS
## 3  SHARED-TWITTER UNITED STATES POLITICS AND GOVERNMENT
## 4  SHARED-TWITTER   ELECTIONS, HOUSE OF REPRESENTATIVES
## 5  SHARED-TWITTER                       WOMEN AND GIRLS
## 31 SHARED-TWITTER                                MOVIES
new.df2$des_facet1[1:10]
##  [1] HOLOCAUST AND THE NAZI ERA               
##  [2] DATING AND RELATIONSHIPS                 
##  [3] UNITED STATES POLITICS AND GOVERNMENT    
##  [4] ELECTIONS, HOUSE OF REPRESENTATIVES      
##  [5] WOMEN AND GIRLS                          
##  [6] MOVIES                                   
##  [7] VIDEO RECORDINGS, DOWNLOADS AND STREAMING
##  [8] LEGAL PROFESSION                         
##  [9] PRESIDENTIAL ELECTION OF 2020            
## [10] UNITED STATES POLITICS AND GOVERNMENT    
## 69 Levels: DATING AND RELATIONSHIPS ... OLYMPIC GAMES (1960)
new.df2 %>%
  count(type, des_facet1) %>%
  top_n(50) %>%
  ungroup() %>%
  mutate(des_facet1 = reorder(des_facet1, n)) %>%
  ggplot(aes(des_facet1, n, fill = type))  + geom_bar(stat = "identity") +
  geom_col(show.legend = FALSE) +
  facet_wrap(~type, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  coord_flip() 
## Selecting by n

Get Tokens and then build corpus and create wordcloud on the abstract field of the articles pulled back from API

#get token words from $abstract of returned results on most popular articles
p_word_v <- get_tokens(new.df$abstract, pattern = "\\W")

#build corpus
words <- Corpus(VectorSource(p_word_v))
# Convert the text to lower case
words <- tm_map(words, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(words, content_transformer(tolower)):
## transformation drops documents
# Remove numbers
words <- tm_map(words, removeNumbers)
## Warning in tm_map.SimpleCorpus(words, removeNumbers): transformation drops
## documents
# Remove english common stopwords
words <- tm_map(words, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(words, removeWords, stopwords("english")):
## transformation drops documents
# Remove punctuations
words <- tm_map(words, removePunctuation)
## Warning in tm_map.SimpleCorpus(words, removePunctuation): transformation
## drops documents
# Eliminate extra white spaces
words <- tm_map(words, stripWhitespace)
## Warning in tm_map.SimpleCorpus(words, stripWhitespace): transformation
## drops documents
dtm <- TermDocumentMatrix(words)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)

set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=100, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))
## Warning in wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words
## = 100, : midterms could not be fit on page. It will not be plotted.
## Warning in wordcloud(words = d$word, freq = d$freq, min.freq = 1, max.words
## = 100, : republican could not be fit on page. It will not be plotted.

barplot(d[1:25,]$freq, las = 2, names.arg = d[1:25,]$word,
        col ="lightblue", main ="Most frequent words",
        ylab = "Word frequencies")

Report how Syuzhet, bing, afinn and nrc reports sentiments

syuzhet_vector <- get_sentiment(new.df$abstract, method="syuzhet")
bing_vector <- get_sentiment(new.df$abstract, method="bing")
afinn_vector <- get_sentiment(new.df$abstract, method="afinn")
nrc_vector <- get_sentiment(new.df$abstract, method="nrc", lang = "english")

#sign converts all positive #'s to 1 and all -ve # to -ve 1
rbind(
  sign(syuzhet_vector),
  sign(bing_vector),
  sign(afinn_vector),
  sign(nrc_vector)
)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,]    1   -1    1   -1   -1   -1    1   -1   -1     1    -1     1     1
## [2,]    0   -1    1    0   -1   -1    1   -1    1     1     1     0    -1
## [3,]    1   -1    1    1   -1   -1    1   -1   -1     1     0     1     0
## [4,]    1   -1    0    1   -1    0    1   -1    0     1     0     0     1
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## [1,]     1     0    -1     1    -1    -1    -1    -1     1    -1     1
## [2,]     1     0    -1     1    -1    -1     1     0     0    -1     0
## [3,]     1     1     0     1    -1    -1     0    -1     0    -1     0
## [4,]     1     0     1     1     0    -1     1     0     1    -1     1
##      [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35]
## [1,]    -1     0     1    -1    -1    -1    -1     1     1     1    -1
## [2,]    -1     0     0     0    -1     1    -1     0     1     0    -1
## [3,]     1     1     1    -1     0    -1    -1     0     1     1    -1
## [4,]     1     0     1     1    -1     0    -1     1     0     1    -1
##      [,36] [,37] [,38] [,39] [,40]
## [1,]     1     1    -1    -1     0
## [2,]     1     1    -1     1     0
## [3,]     1     1    -1     0     1
## [4,]     1     1    -1     1     0
#understanding the overall emotional valence of the sentences, -ve indicates overall negative sentiment of the articles that are most popular across facebook and twitter that were shared
sum(syuzhet_vector)
## [1] 1.8
mean(syuzhet_vector)
## [1] 0.045
summary(syuzhet_vector)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -2.0000 -0.7625 -0.1000  0.0450  0.8750  2.3000

Build Corpus for just des_facet field and report wordcloud

#build corpus for des_facet1
words <- Corpus(VectorSource(new.df2$des_facet1))
# Convert the text to lower case
words <- tm_map(words, content_transformer(tolower))
## Warning in tm_map.SimpleCorpus(words, content_transformer(tolower)):
## transformation drops documents
# Remove numbers
words <- tm_map(words, removeNumbers)
## Warning in tm_map.SimpleCorpus(words, removeNumbers): transformation drops
## documents
# Remove english common stopwords
words <- tm_map(words, removeWords, stopwords("english"))
## Warning in tm_map.SimpleCorpus(words, removeWords, stopwords("english")):
## transformation drops documents
# Remove punctuations
words <- tm_map(words, removePunctuation)
## Warning in tm_map.SimpleCorpus(words, removePunctuation): transformation
## drops documents
# Eliminate extra white spaces
words <- tm_map(words, stripWhitespace)
## Warning in tm_map.SimpleCorpus(words, stripWhitespace): transformation
## drops documents
dtm <- TermDocumentMatrix(words)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)

set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
          max.words=100, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

Start Section for NYTIMES Archive API for 2 timeframes. Report pull back November 2018 and December 2017 data.

Loop through page JSON

Because there are many pages, we have to loop through the page look at only 3 pages. Save Rdata file so we can reload and don’t have to reconnect. Took about 3 minutes to get data.

myurlX <- "https://api.nytimes.com/svc/archive/v1/2018/11.json?api-key=wl3OA7v4AV7cjxGya142nvRGGv46HdNG"

initialQuery <- fromJSON(myurlX)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1) 

pages_2018 <- vector("list",length=maxPages)

#try with the max page limit at 10
maxPages = ifelse(maxPages >= 10, 5, maxPages)

for(i in 0:maxPages){
  nytSearch <- fromJSON(paste0(myurlX , "&page=", i), flatten = TRUE) %>% data.frame() 
  pages_2018[[i+1]] <- nytSearch 
  Sys.sleep(5)
}

nytimes_2018_Novarchive_12072019 <- rbind_pages(pages_2018)
save(nytimes_2018_Novarchive_12072019 ,file="nytimes_2018_Novarchive_12072019.Rdata")

#str(nytimes_2018_Novarchive_12032019[1:10])
nrow(nytimes_2018_Novarchive_12072019)
## [1] 40362
colnames(nytimes_2018_Novarchive_12072019)
##  [1] "copyright"                            
##  [2] "response.hits"                        
##  [3] "response.docs.web_url"                
##  [4] "response.docs.snippet"                
##  [5] "response.docs.blog"                   
##  [6] "response.docs.source"                 
##  [7] "response.docs.multimedia"             
##  [8] "response.docs.keywords"               
##  [9] "response.docs.pub_date"               
## [10] "response.docs.document_type"          
## [11] "response.docs.news_desk"              
## [12] "response.docs.type_of_material"       
## [13] "response.docs._id"                    
## [14] "response.docs.word_count"             
## [15] "response.docs.score"                  
## [16] "response.docs.uri"                    
## [17] "response.docs.slideshow_credits"      
## [18] "response.docs.print_page"             
## [19] "response.docs.section_name"           
## [20] "response.docs.headline.main"          
## [21] "response.docs.headline.kicker"        
## [22] "response.docs.headline.content_kicker"
## [23] "response.docs.headline.print_headline"
## [24] "response.docs.headline.name"          
## [25] "response.docs.headline.seo"           
## [26] "response.docs.headline.sub"           
## [27] "response.docs.byline.original"        
## [28] "response.docs.byline.person"          
## [29] "response.docs.byline.organization"
nytimes_2018_Novarchive_12072019$response.docs.snippet[1:50]
##  [1] "Tips for how to combine pleasure with wellness when you’re on the road, because isn’t living your best life the point of a vacation in the first place?"                                            
##  [2] "The district voted for Donald J. Trump and Mitt Romney by comfortable margins, but it has strong Democratic roots. Democrats have a registration advantage."                                        
##  [3] "Galas were held last week for the New York Restoration Project, Fashion Group International, Aperture and the Children’s Museums of Manhattan."                                                     
##  [4] "The Pulitzer Prize-winning author pays homage to Montreal’s patron saint, crooning the artist’s 1967 ballad ‘‘Suzanne,’’ with the musician Cassandra Jenkins on guitar."                            
##  [5] "Watch: The Met invited the break dancers to perform with the museum’s collection of breastplates, gauntlets, hoods and other heraldry live in its galleries."                                       
##  [6] "Austerity in a slump, stimulus in a boom."                                                                                                                                                          
##  [7] "Being a chef is tough on the feet and back. Here’s what Mr. Mattos does to stay fit and fresh. "                                                                                                    
##  [8] "The tenor Michael Fabiano wed Bryan McCalister at the Metropolitan Opera House, where they met in 2017 after Mr. Fabiano’s performance in “La Traviata.”"                                           
##  [9] "The elite field of the New York City Marathon includes 40-somethings like Abdi Abdirahman and Bernard Lagat. Increasingly, aging amateur runners are maintaining their speed. "                     
## [10] "Mr. Baker, who immigrated to New York from Nevis, like Hamilton, helped to persuade blacks to leave the Republican Party and later, to run for public office."                                      
## [11] "The scholar and author, most recently, of “Why Religion?” tends to avoid reading science fiction: “Religious traditions already are packed with fantasy stories.”"                                  
## [12] "P-Tech schools team up with business to provide lower-income students with much-needed STEM skills and even a job at IBM if they want one."                                                         
## [13] "Jay-Z channeled the 1968 Olympics for his Halloween look."                                                                                                                                          
## [14] "Ahead of the midterm elections, we asked young evangelicals to tell The Times about the relationship between their faith and their politics."                                                       
## [15] "The tenor Michael Fabiano wed Bryan McCalister at the Metropolitan Opera House, where they met in 2017 after Mr. Fabiano’s performance in “La Traviata.”"                                           
## [16] "The elite field of the New York City Marathon includes 40-somethings like Abdi Abdirahman and Bernard Lagat. Increasingly, aging amateur runners are maintaining their speed. "                     
## [17] "The Republican Party hopes to beat back the Democrats on Tuesday with a big push from the Christian right."                                                                                         
## [18] "Austerity in a slump, stimulus in a boom."                                                                                                                                                          
## [19] "Being a chef is tough on the feet and back. Here’s what Mr. Mattos does to stay fit and fresh. "                                                                                                    
## [20] "Though it involves a small number of people, the case deepens the party’s troubles over charges that it has not acted against anti-Semitism in its ranks."                                          
## [21] "Workers in the company’s offices around the world protested how it has handled cases of sexual harassment and misconduct."                                                                          
## [22] "By threatening to penalize Swift, the financial messaging service, the U.S. is alienating European allies and could undercut the dollar’s dominance."                                               
## [23] "The artist has pushed well beyond his popular “Soundsuits,” and has opened a multidisciplinary space in Chicago to support community collaborations."                                               
## [24] "The short-term political interests of both sides are in a collision with their long-term political and economic interests — not to mention the global economy’s."                                   
## [25] "Why is the Rust Belt trending blue for the midterms? The collapse of community may provide an answer."                                                                                              
## [26] "New York offers city financing to encourage developers to build co-living projects that are below market rate."                                                                                     
## [27] "A leaked school application has prompted debate about whether children in China’s test-crazed education system are being raised as soulless strivers."                                              
## [28] "Thursday: A protest in response to a Times article on sexual harassment, the political feud between Gavin Newsom and President Trump, and the last Battle of the Bay."                              
## [29] "If you’re a single parent looking for family vacation options that don’t assume two adults, these cruises and travel agencies — among other destinations — are here to help."                       
## [30] "The scholar and author, most recently, of “Why Religion?” tends to avoid reading science fiction: “Religious traditions already are packed with fantasy stories.”"                                  
## [31] "Following the money isn’t easy in the opaque world of venture capital."                                                                                                                             
## [32] "The former first couple’s production company will turn the Michael Lewis book “The Fifth Risk,” which examined the haphazard changeover in administration, into a series."                          
## [33] "The co-creative directors Laura Kim and Fernando Garcia have maintained some ‘Oscarisms’ while bringing their own vision to the brand."                                                             
## [34] "By threatening to penalize Swift, the financial messaging service, the U.S. is alienating European allies and could undercut the dollar’s dominance."                                               
## [35] "Workers in the company’s offices around the world protested how it has handled cases of sexual harassment and misconduct."                                                                          
## [36] "The artist has pushed well beyond his popular “Soundsuits,” and has opened a multidisciplinary space in Chicago to support community collaborations."                                               
## [37] "He created widely popular panoramic fictional universes that inspired movies and video games and that have led some to compare him to J.R.R. Tolkien."                                              
## [38] "A former president of Vassar College offers caveats."                                                                                                                                               
## [39] "A vandal scrawled “Kill,” followed by the slur, on a plaque at the monument."                                                                                                                       
## [40] "The order came a little over a week after the court granted the administration a partial victory in the case by temporarily blocking the deposition of Wilbur Ross, the commerce secretary."        
## [41] "The police said that someone was apparently holding a parking space for Mr. Baldwin, but another car pulled into the space and an altercation ensued."                                              
## [42] "In the On Politics newsletter: While the president stokes fears of an invasion, Beto O’Rourke pushes a message of positivity; plus new polls, and the latest from Opinion."                         
## [43] "Bob McAdoo won the N.B.A.’s Most Valuable Player Award in 1975, but it wasn’t until he came off the bench for the Lakers in the 80s that he found true success."                                    
## [44] "The star shares his five favorite movie musicals, and, yes, he says, “Labyrinth” counts: “It is totally a David Bowie movie musical.”"                                                              
## [45] "What are the president’s priorities? "                                                                                                                                                              
## [46] "In South Texas, Customs and Border Protection agents were already conducting military-style exercises as Army troops prepared to deploy along the border."                                          
## [47] "The decision of the president to invoke Ms. Abrams’s background so broadly was an escalation in his attacks on her bid to become the first black woman to be elected governor in the United States."
## [48] "Mr. Baker, who immigrated to New York from Nevis, like Hamilton, helped to persuade blacks to leave the Republican Party and later, to run for public office."                                      
## [49] "Community colleges are relying more and more on technology to help their students succeed."                                                                                                         
## [50] "High in the Andes Mountains, conservators are testing traditional methods for strengthening adobe buildings. "

Bigram Sentiment analysis on 2018 data and igraph

 new_tibble <- enframe(nytimes_2018_Novarchive_12072019$response.docs.snippet[1:50], name = NULL)
my_frame <- data.frame(nytimes_2018_Novarchive_12072019$response.docs.snippet)
names(my_frame) <- c('snippetcol')
nrow(my_frame)
## [1] 40362
snippet_bigrams <-  my_frame %>%
  unnest_tokens(bigram, snippetcol, token = "ngrams", n = 2)
nrow(snippet_bigrams)
## [1] 846533
snippet_bigrams %>%
  count(bigram, sort = TRUE)
## # A tibble: 57,007 x 2
##    bigram       n
##    <chr>    <int>
##  1 of the    5532
##  2 in the    4230
##  3 to the    2136
##  4 for the   1644
##  5 and the   1578
##  6 in a      1524
##  7 on the    1506
##  8 at the    1458
##  9 of a      1362
## 10 new york  1302
## # ... with 56,997 more rows
bigrams_separated <- snippet_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

bigrams_filtered <- bigrams_separated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

# new bigram counts:
bigram_counts <- bigrams_filtered %>% 
  count(word1, word2, sort = TRUE)

bigram_counts
## # A tibble: 17,092 x 3
##    word1     word2              n
##    <chr>     <chr>          <int>
##  1 president trump            834
##  2 midterm   elections        648
##  3 couple    met              420
##  4 president trump’s          384
##  5 york      city             306
##  6 york      times            294
##  7 trump     administration   258
##  8 white     house            240
##  9 world     war              216
## 10 los       angeles          210
## # ... with 17,082 more rows
# filter for only relatively common combinations
bigram_graph <- bigram_counts %>%
  filter(n > 70) %>%
  graph_from_data_frame()

bigram_graph
## IGRAPH 151ac89 DN-- 125 72 -- 
## + attr: name (v/c), n (e/n)
## + edges from 151ac89 (vertex names):
##  [1] president  ->trump          midterm    ->elections     
##  [3] couple     ->met            president  ->trump’s       
##  [5] york       ->city           york       ->times         
##  [7] trump      ->administration white      ->house         
##  [9] world      ->war            los        ->angeles       
## [11] corrections->appearing      week       ->ahead         
## [13] week’s     ->properties     social     ->media         
## [15] movie      ->news           prime      ->minister      
## + ... omitted several edges
set.seed(2017)


  ggraph(bigram_graph, layout = "kk") +
  geom_edge_link() +
  geom_node_point() +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1)

#ggsave("bigramgraph3.pdf", bigramgraph3, dpi = 200) 

set.seed(2016)

a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

ggraph(bigram_graph, layout = "kk") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

#ggsave("bigramgraph4.pdf", bigramgraph4, dpi = 200) 


# length(nytimes_2018_Novarchive_12032019$response.docs.snippet)
# my_corpus <- corpus(nytimes_2018_Novarchive_12032019$response.docs.snippet)
# my_sentences <- corpus_reshape(my_corpus, to = "sentences")
# ndoc(my_sentences)
# texts(my_sentences)[1]

Start Section for NYTIMES Archive API. Report pull back December 2017 data.

myurlX <- "https://api.nytimes.com/svc/archive/v1/2017/12.json?api-key=wl3OA7v4AV7cjxGya142nvRGGv46HdNG"

initialQuery <- fromJSON(myurlX)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1) 

pages_2017 <- vector("list",length=maxPages)

#try with the max page limit at 10
maxPages = ifelse(maxPages >= 10, 8, maxPages)

for(i in 0:maxPages){
  nytSearch <- fromJSON(paste0(myurlX , "&page=", i), flatten = TRUE) %>% data.frame() 
  pages_2017[[i+1]] <- nytSearch 
  Sys.sleep(5)
}

nytimes_2017_Decarchive_12072019 <- rbind_pages(pages_2017)
save(nytimes_2017_Decarchive_12072019 ,file="nytimes_2017_Decarchive_12072019")

#str(nytimes_2018_Novarchive_12032019[1:10])
nrow(nytimes_2017_Decarchive_12072019)
## [1] 59391
colnames(nytimes_2017_Decarchive_12072019)
##  [1] "copyright"                            
##  [2] "response.hits"                        
##  [3] "response.docs.web_url"                
##  [4] "response.docs.snippet"                
##  [5] "response.docs.blog"                   
##  [6] "response.docs.source"                 
##  [7] "response.docs.multimedia"             
##  [8] "response.docs.keywords"               
##  [9] "response.docs.pub_date"               
## [10] "response.docs.document_type"          
## [11] "response.docs.type_of_material"       
## [12] "response.docs._id"                    
## [13] "response.docs.word_count"             
## [14] "response.docs.score"                  
## [15] "response.docs.uri"                    
## [16] "response.docs.news_desk"              
## [17] "response.docs.slideshow_credits"      
## [18] "response.docs.section_name"           
## [19] "response.docs.print_page"             
## [20] "response.docs.abstract"               
## [21] "response.docs.headline.main"          
## [22] "response.docs.headline.kicker"        
## [23] "response.docs.headline.content_kicker"
## [24] "response.docs.headline.print_headline"
## [25] "response.docs.headline.name"          
## [26] "response.docs.headline.seo"           
## [27] "response.docs.headline.sub"           
## [28] "response.docs.byline.original"        
## [29] "response.docs.byline.person"          
## [30] "response.docs.byline.organization"
nytimes_2017_Decarchive_12072019$response.docs.snippet[1:50]
##  [1] "30 million people will suffer from eating disorders in their lifetime, yet decades after Karen Carpenter died from anorexia, myths about eating disorders continue."                                                                      
##  [2] "Chris Wilson, a former professional football player, returned to his Michigan hometown to coach its last remaining public high school team, offering lessons that go far beyond the field."                                               
##  [3] "The growing popularity of the NYC Ferry along the East River is helping to reshape the mostly industrial coastline."                                                                                                                      
##  [4] "15 things we love in her inimitable SoHo loft."                                                                                                                                                                                           
##  [5] "The first test of Prince Harry and Meghan Markle's fairy-tale romance? A British citizenship exam for Ms. Markle. Thousands of immigrants take it each year, and many fail miserably. Could you ace the test? Take our quiz and find out."
##  [6] "The president is trying to appease evangelical supporters and powerful Jewish donors even as he works to avoid derailing his administration’s peace efforts."                                                                             
##  [7] "Many of the journalists who stand accused of sexual harassment covered the 2016 presidential campaign."                                                                                                                                   
##  [8] "New York’s attorney general brought the charges, stemming from the shooting of a black motorist in Troy, against Rensselaer County D.A. Joel Abelove."                                                                                    
##  [9] "The president plans to repeal protections on as much as two million acres of public land in Utah."                                                                                                                                        
## [10] "Just another day at the Southeastern Conference. An athletic director fired, a $75 million coaching contract, and a postseason ban for Ole Miss."                                                                                         
## [11] "A by-no-means exhaustive list of the things our editors (and a few contributors) find interesting on a given week. "                                                                                                                      
## [12] "The city’s priciest sale last month was a penthouse occupying the entire 20th floor of the Norman Foster-designed condo at 551 West 21st Street."                                                                                         
## [13] "Secretary of State Rex W. Tillerson has been dogged for months by rumors he would resign but said any White House plan for him to resign was “laughable.”"                                                                                
## [14] "“8980: Book of Travelers,” a song cycle based on conversations the songwriter had on a nearly 9,000-mile Amtrak journey, comes to BAM."                                                                                                   
## [15] "Because the price to sign Ohtani will not be that prohibitive, many major league teams, even small-market ones, may want to get his attention."                                                                                           
## [16] "Romantic gifts from Andy Warhol to Jon Gould include what is believed to be one of the Pop artist’s last sculptures."                                                                                                                     
## [17] "New taxes are not the answer. A radical reorientation of their mission is."                                                                                                                                                               
## [18] "He joined Thor Heyerdahl in 1970 on the Ra II, just one of the adventures that turned a Brooklyn boy into an unconventional world traveler."                                                                                              
## [19] "The Hypocrites’ blithe production of the Gilbert and Sullivan classic offers the perfect restorative alternative to holiday drudgery."                                                                                                    
## [20] "More than 20 years after visiting East Africa, I am still weighed down by the hunger I saw there. Today, millions of people are hungry again."                                                                                            
## [21] "The couple was set up by friends and were wed at Deerfield, a golf course in Newark, Del."                                                                                                                                                
## [22] "Luis Rodriguez, a luxury real estate agent, and Ron Davis, a documentary filmmaker who made “Harry & Snowman,” wed Dec. 2 in Wellington, Fla."                                                                                            
## [23] "The couple met in July 2013 in New York while attending Grits and Biscuits, an event for young professionals."                                                                                                                            
## [24] "Everything you need to be healthier, wiser and more centered in the new year."                                                                                                                                                            
## [25] "An exhibition in Oxford, England, shows how the artist’s preoccupations with socialism, anti-fascism and a love of the land shaped her life and work."                                                                                    
## [26] "Corrections appearing in print on Friday, December 1, 2017."                                                                                                                                                                              
## [27] "Because the price to sign Ohtani will not be that prohibitive, many major league teams, even small-market ones, may want to get his attention."                                                                                           
## [28] "New taxes are not the answer. A radical reorientation of their mission is."                                                                                                                                                               
## [29] "The 83-year-old emperor will step down on April 30, 2019, becoming the first Japanese monarch to do so in two centuries."                                                                                                                 
## [30] "Romantic gifts from Andy Warhol to Jon Gould include what is believed to be one of the Pop artist’s last sculptures."                                                                                                                     
## [31] "Corrections appearing in print on Friday, December 1, 2017."                                                                                                                                                                              
## [32] "The 83-year-old emperor will step down on April 30, 2019, becoming the first Japanese monarch to do so in two centuries."                                                                                                                 
## [33] "The president plans to repeal protections on as much as two million acres of public land in Utah."                                                                                                                                        
## [34] "Just another day at the Southeastern Conference. An athletic director fired, a $75 million coaching contract, and a postseason ban for Ole Miss."                                                                                         
## [35] "The party, which criticized the growth of the federal debt under President Barack Obama, has rallied behind a tax cut that would send it to new heights."                                                                                 
## [36] "The president is trying to appease evangelical supporters and powerful Jewish donors even as he works to avoid derailing his administration’s peace efforts."                                                                             
## [37] "Many of the journalists who stand accused of sexual harassment covered the 2016 presidential campaign."                                                                                                                                   
## [38] "Mark Diehl"                                                                                                                                                                                                                               
## [39] "New York’s attorney general brought the charges, stemming from the shooting of a black motorist in Troy, against Rensselaer County D.A. Joel Abelove."                                                                                    
## [40] "Filthy scary alien monster cyborg goth drag is in the club."                                                                                                                                                                              
## [41] "New York’s attorney general brought the charges, stemming from the shooting of a black motorist in Troy, against Rensselaer County D.A. Joel Abelove."                                                                                    
## [42] "The president plans to repeal protections on as much as two million acres of public land in Utah."                                                                                                                                        
## [43] "Just another day at the Southeastern Conference. An athletic director fired, a $75 million coaching contract, and a postseason ban for Ole Miss."                                                                                         
## [44] "In the Giants’ first game without Manning starting since 2004, the Geno Smith-led offense looked mostly inept in a loss to the Oakland Raiders."                                                                                          
## [45] "Mr. Mayer produced orchestral works, operas, chamber pieces, music for children and more, sometimes displaying a whimsical streak."                                                                                                       
## [46] "We all lose because of the biases of Wall Street and Silicon Valley."                                                                                                                                                                     
## [47] "Corrections appearing in print on Sunday, December 3, 2017."                                                                                                                                                                              
## [48] "The couple met at New York University, from which they graduated, each magna cum laude."                                                                                                                                                  
## [49] "The couple was set up by friends and were wed at Deerfield, a golf course in Newark, Del."                                                                                                                                                
## [50] "Luis Rodriguez, a luxury real estate agent, and Ron Davis, a documentary filmmaker who made “Harry & Snowman,” wed Dec. 2 in Wellington, Fla."

Bigram Sentiment analysis on 2017 December data and igraph

#new_tibble <- enframe(nytimes_2018_Novarchive_12032019$response.docs.snippet[1:50], name = NULL)
my_frame2 <- data.frame(nytimes_2017_Decarchive_12072019$response.docs.snippet)
names(my_frame2) <- c('snippetcol')
nrow(my_frame2)
## [1] 59391
snippet_bigrams2 <-  my_frame2 %>%
  unnest_tokens(bigram, snippetcol, token = "ngrams", n = 2)
nrow(snippet_bigrams2)
## [1] 1219121
snippet_bigrams2 %>%
  count(bigram, sort = TRUE)
## # A tibble: 55,898 x 2
##    bigram       n
##    <chr>    <int>
##  1 of the    8118
##  2 in the    6534
##  3 on the    2808
##  4 to the    2763
##  5 new york  2556
##  6 at the    2439
##  7 in a      2331
##  8 for the   2304
##  9 and the   1962
## 10 and a     1665
## # ... with 55,888 more rows
bigrams_separated2 <- snippet_bigrams2 %>%
  separate(bigram, c("word1", "word2"), sep = " ")

bigrams_filtered2 <- bigrams_separated2 %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)

# new bigram counts:
bigram_counts2 <- bigrams_filtered2 %>% 
  count(word1, word2, sort = TRUE)

head(bigram_counts2)
## # A tibble: 6 x 3
##   word1     word2          n
##   <chr>     <chr>      <int>
## 1 president trump       1062
## 2 sexual    harassment   603
## 3 york      times        594
## 4 president trump’s      585
## 5 tax       bill         531
## 6 york      city         477
# filter for only relatively common combinations
bigram_graph2 <- bigram_counts2 %>%
  filter(n > 70) %>%
  graph_from_data_frame()

bigram_graph2
## IGRAPH 6508548 DN-- 220 141 -- 
## + attr: name (v/c), n (e/n)
## + edges from 6508548 (vertex names):
##  [1] president  ->trump          sexual     ->harassment    
##  [3] york       ->times          president  ->trump’s       
##  [5] tax        ->bill           york       ->city          
##  [7] sexual     ->misconduct     corrections->appearing     
##  [9] white      ->house          couple     ->met           
## [11] social     ->media          trump      ->administration
## [13] york       ->times’s        los        ->angeles       
## [15] climatetech->conference     health     ->care          
## + ... omitted several edges
set.seed(2017)

ggraph(bigram_graph2, layout = "kk") +
  geom_edge_link() +
  geom_node_point() +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1)

#ggsave("bigramigraph1.pdf", bigramigraph1, dpi = 200) 


set.seed(2016)

a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

ggraph(bigram_graph2, layout = "kk") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "lightblue", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
  theme_void()

#ggsave("bigramigraph2.pdf", bigramigraph2, dpi = 200) 


# length(nytimes_2018_Novarchive_12032019$response.docs.snippet)
# my_corpus <- corpus(nytimes_2018_Novarchive_12032019$response.docs.snippet)
# my_sentences <- corpus_reshape(my_corpus, to = "sentences")
# ndoc(my_sentences)
# texts(my_sentences)[1]

MODEL LDA: Run a topic model using quant_Eda for topics = 20

length(nytimes_2018_Novarchive_12072019$response.docs.snippet)
## [1] 40362
my_corpus <- corpus(nytimes_2018_Novarchive_12072019$response.docs.snippet[1:10000])

quant_dfm <- dfm(my_corpus, 
                remove_punct = TRUE, remove_numbers = TRUE, remove = stopwords("english"))
quant_dfm <- dfm_trim(quant_dfm, min_termfreq = 4, max_docfreq = 10)
quant_dfm
## Document-feature matrix of: 10,000 documents, 4,838 features (99.9% sparse).
set.seed(100)
if (require(topicmodels)) {
    my_lda_fit5 <- LDA(convert(quant_dfm, to = "topicmodels"), k = 20)
    get_terms(my_lda_fit5, 10)
}
##       Topic 1       Topic 2          Topic 3     Topic 4     
##  [1,] "advantage"   "<U+0645><U+0646>" "scored"    "bit"       
##  [2,] "monument"    "negotiating"    "keeps"     "damage"    
##  [3,] "similar"     "interviews"     "fortune"   "threatened"
##  [4,] "visit"       "rent"           "pasta"     "swing"     
##  [5,] "kavanaugh"   "communications" "nowhere"   "dallas"    
##  [6,] "shifts"      "delicate"       "mode"      "fourth"    
##  [7,] "arthur"      "miller"         "describes" "boards"    
##  [8,] "pita's"      "preside"        "tool"      "honest"    
##  [9,] "dance-drama" "pbs"            "eager"     "arguments" 
## [10,] "committed"   "ease"           "straight"  "taste"     
##       Topic 5       Topic 6       Topic 7     Topic 8      Topic 9      
##  [1,] "vacation"    "answer"      "loud"      "wildlife"   "counts"     
##  [2,] "protested"   "motorcycle"  "broader"   "motivated"  "consensus"  
##  [3,] "essential"   "hart"        "extremism" "increasing" "escalating" 
##  [4,] "appearance"  "style"       "landslide" "blood"      "association"
##  [5,] "harm"        "safer"       "gay"       "campus"     "rocket"     
##  [6,] "strangers"   "difference"  "available" "spectacle"  "brexit"     
##  [7,] "crimson"     "engaging"    "p.m"       "bans"       "delays"     
##  [8,] "walt"        "unthinkable" "decisions" "deepening"  "highlighted"
##  [9,] "potent"      "feared"      "eighth"    "creator"    "goldman"    
## [10,] "democracies" "reitman"     "worker"    "silence"    "partnered"  
##       Topic 10     Topic 11     Topic 12       Topic 13       Topic 14    
##  [1,] "tiger"      "long-term"  "crow"         "fewer"        "apple"     
##  [2,] "buildings"  "honor"      "inaccurate"   "high-profile" "mount"     
##  [3,] "incumbents" "swarming"   "citizenship"  "mississippi"  "uncertain" 
##  [4,] "obstacles"  "situation"  "civilians"    "cranberry"    "amazing"   
##  [5,] "roundup"    "soon-to-be" "psychiatrist" "waking"       "lift"      
##  [6,] "menu"       "positioned" "bronx"        "foreigners"   "anti-trump"
##  [7,] "warm"       "tense"      "activism"     "shootings"    "waiting"   
##  [8,] "co-founder" "spain"      "labor"        "masks"        "raise"     
##  [9,] "artistic"   "peru"       "gold"         "scorched"     "parliament"
## [10,] "selected"   "gap"        "stake"        "broad"        "baltimore" 
##       Topic 15     Topic 16       Topic 17       Topic 18   
##  [1,] "boy"        "applications" "doctor"       "newsom"   
##  [2,] "corruption" "beierle"      "tenor"        "feelings" 
##  [3,] "charter"    "identifying"  "expanded"     "1970s"    
##  [4,] "jones"      "involuntary"  "insisted"     "guy"      
##  [5,] "allows"     "celibates"    "20th-century" "ultimate" 
##  [6,] "ambitions"  "railing"      "blight"       "ongoing"  
##  [7,] "ministers"  "interracial"  "spitz"        "generous" 
##  [8,] "jail"       "b"            "comeback"     "dissident"
##  [9,] "sexually"   "rice"         "pianist"      "cyber"    
## [10,] "redrawn"    "felt"         "reaching"     "wine"     
##       Topic 19       Topic 20       
##  [1,] "club's"       "leg"          
##  [2,] "apparently"   "galas"        
##  [3,] "revival"      "installed"    
##  [4,] "achievements" "spicy"        
##  [5,] "settled"      "fx"           
##  [6,] "op-ed"        "governments"  
##  [7,] "propaganda"   "inner"        
##  [8,] "register"     "weird"        
##  [9,] "reviews"      "contradictory"
## [10,] "targeted"     "bus"
topics <- tidy(my_lda_fit5, matrix = "beta")
topics
## # A tibble: 96,760 x 3
##    topic term          beta
##    <int> <chr>        <dbl>
##  1     1 wellness 1.13e-321
##  2     2 wellness 4.80e-322
##  3     3 wellness 4.55e-322
##  4     4 wellness 9.00e-322
##  5     5 wellness 2.94e-  3
##  6     6 wellness 8.55e-322
##  7     7 wellness 2.87e-311
##  8     8 wellness 6.95e-322
##  9     9 wellness 1.04e-321
## 10    10 wellness 1.02e-321
## # ... with 96,750 more rows
top_terms <- topics %>%
  group_by(topic) %>%
  top_n(5, beta) %>%
  ungroup() %>%
  arrange(topic, -beta)

top_terms %>%
  mutate(term = reorder_within(term, beta, topic)) %>%
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip() +
  scale_x_reordered()

#ggsave("topicplot.pdf",topicplot, dpi = 300) 

Error running a cluster dendoogram

Trying to run a cluster dendogram but unable to resolve errors on NaN errors , unsure how to resovle these on the corpus. Intention was to show a dendogram.

# quant_dfm <- dfm(my_corpus,
#                 remove_punct = TRUE, remove_numbers = TRUE, remove = stopwords("english"))
# quant_dfm <- dfm_trim(quant_dfm, min_termfreq = 4, max_docfreq = 10)
# quant_dfm1 <- dfm_smooth(quant_dfm, smoothing =1)
# 
# 
# # hierarchical clustering - get distances on normalized dfm
# quant_dfm_mat1 <- dfm_weight(quant_dfm1,scheme = "prop") %>%
#     textstat_dist(method = "euclidean") %>% 
#     as.dist()
# 
# dfm_smooth(quant_dfm_mat, smoothing =1)
# 
# 
# 
# quant_dfm_new <- NaRV.omit(as.matrix(quant_dfm_mat))
# head(quant_dfm_new )
# 
# quant_dfm1  <- as.matrix(quant_dfm_mat, )
# quant_dfm1[2]
# nrow(quant_dfm1)
# 
# # hiarchical clustering the distance object;100
# quant_dfm_cluster <- hclust(quant_dfm_mat)
# 
# # label with document names
# quant_dfm_cluster$labels <- docnames(quant_dfm_cluster)
# 
# # plot as a dendrogram
# plot(quant_dfm_cluster, xlab = "", sub = "", 
#      main = "Euclidean Distance on Normalized Token Frequency")
# 
# 
# dd <- hclust(as.matrix(quant_dfm_mat))

Conclusions

ANALYSIS 1 SECTION

The correlation coefficient is 0.5788396. We see a positive linear relationship therefore we can hypothesize that their may be significant relationship between the two variables such that as the Mean scale score increases the Percentage of students with reduced lunches increases as well. From this analysis we see that my assumption was incorrect. This actually shows the opposite trend.

Now we can look into the p-value to determine if the difference in significant.My NULL hypothesis is that there is no statistical significance between the two variables.

WE can see the linear line below and the statistical analysis.

-Our p value is 0.000519 which is below .05 therefore we have strong evidence against the null hypothesis. To conclude we can say that their is a statistical significance between the two variables.My Analysis was very limited as it only included a small dataset.

ANALYSIS 2 SECTION:

Extracting data from NYtimes API, learning how to extract page wise data and building corpus from it was challenging. Also running an LDA topic model and fit needs to be explored better. On further reading the way all the articles were combined and read in across an entire month to understand topics might not be the right approach. Segmenting monthly articles retrieved and doing a gamma distribution a analysis on the LDA fit graph to do more analyis on topic terms and mismatch would be a better analyis in the future. Upon reviewing the frequency of terms related to me too, based on sentiment and bigram analysis the presence of this term within the NYtimes articles retrieved was not significant compared to topics related to Present Trump within the period of October 2018 or October, November, December 2017. The height of the Me too movement was in October 2017 so I expected to see possibly more on the wordcloud or articles circulated and shared on NYtimes related to this topic. Possibly another option would be to retrieve 100 max pages from the API as opposed to only 8 pages which by itself retrieved created a large corpus. So, the next method to choose would be through twitter which may have better data if searched on celebrities and specifically the #metoo related to the metoo movement. Also we scrapped the Wikipedia metoo main page and bigram analysis conducted on that and data stored in neo4j for future relationship (PART2).