A disseminação de informações falsas se tornou um dos principais desafios sociais contemporâneos. Notícias falsas circulam com velocidade e alcance maiores que notícias verdadeiras, explorando emoções negativas e narrativas sensacionalistas. Entender os padrões linguísticos e emocionais que caracterizam fake news permite desenvolver técnicas de verificação automática, apoiar jornalistas, pesquisadores e formuladores de políticas públicas.
Neste projeto utilizamos Processamento de Linguagem Natural (NLP) com enfoque em:
tokenização e limpeza de texto
contagem de frequência
TF-IDF
bigramas
análise de sentimentos (AFINN e NRC)
correlação de palavras
As principais técnicas aplicadas são:
remoção de ruído textual (stopwords, URLs, símbolos)
extração de variáveis textuais (número de palavras, média do tamanho da palavra)
mineração de texto com tidytext
visualização comparativa entre classes Fake e Real
Esta análise beneficia:
jornalistas e checadores de fatos
desenvolvedores de sistemas de detecção automática
pesquisadores em comunicação
plataformas de mídia que buscam reduzir desinformação
library(tidyverse)
library(tidytext)
library(textclean)
library(readr)
library(stringr)
library(lubridate)
library(ggridges)
library(scales)
library(knitr)
library(kableExtra)
library(wordcloud)
library(textdata)
library(widyr)
library(Matrix)
library(tibble)
| Pacote | Proposito | Uso_no_Projeto |
|---|---|---|
| tidyverse | Manipulação e limpeza de dados | Organização e transformação do texto |
| tidytext | Tokenização e análise textual | TF-IDF, stopwords, bigramas |
| textclean | Limpeza profunda de strings | Remoção de HTML, URLs e contrações |
| lubridate | Manipulação de datas | Padronização e limpeza de datas |
| readr | Importação de dados | Leitura dos CSV Fake.csv e True.csv |
| ggplot2 | Visualização de dados | Gráficos comparativos entre categorias |
| textdata | Sentimentos léxicos | AFINN e NRC para sentimentos e emoções |
| widyr | Correlação entre palavras | pairwise_cor entre tokens |
| kableExtra | Formatação de tabelas | Tabelas bonitas no HTML |
Utilizamos o dataset público:
Kaggle – Fake and Real News Dataset ~44.000 notícias categorizadas em Fake e Real, contendo título, texto, data e autor (nem sempre presente).
Principais características:
Texto não estruturado
Ausência de padronização
Presença de ruído textual (URLs, hífens, símbolos, HTML, datas estranhas)
fake_raw <- read_csv("Fake.csv")
real_raw <- read_csv("True.csv")
set.seed(42)
fake <- fake_raw %>% sample_n(15000) %>% mutate(label = "Fake")
real <- real_raw %>% sample_n(15000) %>% mutate(label = "Real")
news <- bind_rows(fake, real) %>%
mutate(id = row_number()) %>%
select(id, label, title, text, subject, date)
clean_text <- function(x){
x <- replace_url(x, " ")
x <- replace_html(x)
x <- str_to_lower(x)
x <- str_replace_all(x, "[^[:alnum:]\\s]", " ")
x <- str_squish(x)
x
}
news <- news %>%
mutate(text_clean = map_chr(text, ~ ifelse(is.na(.x), "", clean_text(.x))))
news <- news %>%
mutate(
n_words = str_count(text_clean, "\\S+"),
n_chars = nchar(text_clean),
avg_word_len = if_else(n_words > 0, n_chars / n_words, NA_real_)
)
head(news) %>% kable() %>% kable_styling()
| id | label | title | text | subject | date | text_clean | n_words | n_chars | avg_word_len |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Fake | 200 SOCIALIST RADICALS Storm Heritage Foundation In DC: “We’re shutting it down!” [Video] | A group called People s Action just held their big gathering in Washington, DC yesterday. They had the usual suspects like Keith Ellison and Socialist Bernie Sanders speak at the meeting. They were protesting against cuts from the budget The bottom line is the radicals don t want any cuts to big government because they want to keep the cradle to grave freebies going. Socialist Bernie Sanders was ready to fill their minds with visions of freebies from big government.EARLIER TODAY, the group decided to hit Heritage Foundation headquarters by storming the offices with 200 thugs.HERE S WHAT THEY SAID: We re shutting it down at @Heritage because it continues to be @realDonaldTrump s think tank. #RiseUp2017 #Budget4ThePeopleWe re shutting it down at @Heritage because it continues to be @realDonaldTrump s think tank. #RiseUp2017 #Budget4ThePeople pic.twitter.com/bnerOnBBa0 People s Action (@PplsAction) April 25, 2017Soros funded groups are all over DC with their push to keep Obama s agenda going. They got some help from Socialist Bernie Sanders who wants justice, justice, justice He s playing to his base in the video below:We must stand with @PplsAction and people all over this country in the fight for justice. Mr. Trump, you will not divide us up. pic.twitter.com/7kYBB7ZfvX Bernie Sanders (@SenSanders) April 24, 2017PEOPLE S ACTION SHEEPLE! | left-news | Apr 25, 2017 | agroupcalledpeoplesactionjustheldtheirbiggatheringinwashington dcyesterday theyhadtheusualsuspectslikekeithellisonandsocialistberniesandersspeakatthemeeting theywereprotestingagainstcutsfromthebudgetthebottomlineistheradicalsdontwantanycutstobiggovernmentbecausetheywanttokeepthecradletogravefreebiesgoing socialistberniesanderswasreadytofilltheirmindswithvisionsoffreebiesfrombiggovernment earliertoday thegroupdecidedtohitheritagefoundationheadquartersbystormingtheofficeswith200thugs hereswhattheysaid wereshuttingitdownat heritagebecauseitcontinuestobe realdonaldtrumpsthinktank riseup2017 budget4thepeoplewereshuttingitdownat heritagebecauseitcontinuestobe realdonaldtrumpsthinktank riseup2017 budget4thepeoplepic twitter com bneronbba0peoplesaction pplsaction april25 2017sorosfundedgroupsarealloverdcwiththeirpushtokeepobamasagendagoing theygotsomehelpfromsocialistberniesanderswhowantsjustice justice justicehesplayingtohisbaseinthevideobelow wemuststandwith pplsactionandpeoplealloverthiscountryinthefightforjustice mr trump youwillnotdivideusup pic twitter com 7kybb7zfvxberniesanders sensanders april24 2017peoplesactionsheeple | 38 | 1138 | 29.94737 |
| 2 | Fake | [VIDEO] FAIRFAX, VA BOARD VOTES TO TEACH STUDENTS GRADES K- 8 ABOUT GENDER IDENTITY AND GAY MARRIAGE In A Way Parents Cannot Opt out | Fairfax, VA Public Schools are not unique in their desire to force a deviant sexual agenda on our children. Why is the left so hell-bent on destroying the innocence of our youth?Public schools in Fairfax County, Virginia, are preparing to include gender identity in its curriculum, despite objections from parents.The district s Family Life Education (FLE) lessons will include teachings on heterosexual, homosexual, and bisexual, and transgender identity. The school board voted in May to add gender identity to the list.The move has angered many parents over what they see as forcing them to expose their children to issues that are not even part of state requirements. Starting in kindergarten, students will be taught about same-sex or gay marriage and the parents will not be able to opt out, Andrea Lafferty, president of the Traditional Values Coalition, told CBN News.Fairfax County Public Schools wrote a letter to parents in response to misperceptions about the new curriculum. Most sections in the FLECAC committee s report have been a part of the curriculum in past years, with the difference being that many of the instructional objectives now meet the Virginia Dept. of Education s (VDOE) general Health Standards of Learning, the board wrote. As-such (they) no longer have an opt-out option. These topics include conflict resolution skills, respecting individual differences such as disabilities, ethnicities and cultures and mental health areas, they wrote.Lafferty said students in 8th grade will be discussing President Bill Clinton s activity, along with oral and anal. Fourth graders will receive instruction about incest, she said. One of the big issues is in Virginia parents can opt their children out of certain parts of the Family Life Education. And so now what they re doing is trying to move parts of it from FLE to Health, which means parents cannot opt their children out, she said.Some parents are outraged that the proposed lessons are not even required by law but they are still being forced upon them. It s not a part of the state law, it s not a part of the state school board instruction, but they ve decided to add it against the will of many of the parents. We are very concerned that they are doing it here in Fairfax County and perhaps other places without the parents knowledge or consent, Lafferty said. It s just bizarre. They want to force this on the kids in Fairfax County when in fact it s not a part of SOLs or the required education, Lafferty added.Via: CBN News | left-news | Jun 11, 2015 | fairfax vapublicschoolsarenotuniqueintheirdesiretoforceadeviantsexualagendaonourchildren whyistheleftsohell bentondestroyingtheinnocenceofouryouth publicschoolsinfairfaxcounty virginia arepreparingtoincludegenderidentityinitscurriculum despiteobjectionsfromparents thedistrictsfamilylifeeducation fle lessonswillincludeteachingsonheterosexual homosexual andbisexual andtransgenderidentity theschoolboardvotedinmaytoaddgenderidentitytothelist themovehasangeredmanyparentsoverwhattheyseeasforcingthemtoexposetheirchildrentoissuesthatarenotevenpartofstaterequirements startinginkindergarten studentswillbetaughtaboutsame sexorgaymarriageandtheparentswillnotbeabletooptout andrealafferty presidentofthetraditionalvaluescoalition toldcbnnews fairfaxcountypublicschoolswrotealettertoparentsinresponsetomisperceptionsaboutthenewcurriculum mostsectionsintheflecaccommitteesreporthavebeenapartofthecurriculuminpastyears withthedifferencebeingthatmanyoftheinstructionalobjectivesnowmeetthevirginiadept ofeducations vdoe generalhealthstandardsoflearning theboardwrote as such they nolongerhaveanopt outoption thesetopicsincludeconflictresolutionskills respectingindividualdifferencessuchasdisabilities ethnicitiesandculturesandmentalhealthareas theywrote laffertysaidstudentsin8thgradewillbediscussingpresidentbillclintonsactivity alongwithoralandanal fourthgraderswillreceiveinstructionaboutincest shesaid oneofthebigissuesisinvirginiaparentscanopttheirchildrenoutofcertainpartsofthefamilylifeeducation andsonowwhattheyredoingistryingtomovepartsofitfromfletohealth whichmeansparentscannotopttheirchildrenout shesaid someparentsareoutragedthattheproposedlessonsarenotevenrequiredbylawbuttheyarestillbeingforceduponthem itsnotapartofthestatelaw itsnotapartofthestateschoolboardinstruction buttheyvedecidedtoadditagainstthewillofmanyoftheparents weareveryconcernedthattheyaredoingithereinfairfaxcountyandperhapsotherplaceswithouttheparentsknowledgeorconsent laffertysaid itsjustbizarre theywanttoforcethisonthekidsinfairfaxcountywheninfactitsnotapartofsolsortherequirededucation laffertyadded via cbnnews | 57 | 2091 | 36.68421 |
| 3 | Fake | CONFUSED PROTESTERS Swarm Outside Trump NYC Fundraiser: ‘Tax the Rich, Not Working People’ | The protesters in NYC must be confused They yelled tax the rich, not working people . Aren t the rich working people too? Many of the rich worked like crazy to be where they are today. Also, the rich pay plenty in taxes already. Don t you love how these people don t want anyone to be successful but want them to give away their hard-earned money?Dozens of protesters yelling shame gathered across the street from Cipriani on East 42nd Street early Saturday where President Trump was to attend a breakfast fundraiser. Members of SEIU aka Obama s Purple Army were out in force doing what they ve been trained to do PROTESTPresidential motorcade departs Cipriani earlier today. We will have the story later on @NY1. pic.twitter.com/uw6kWHdmH0 Shannan Ferry (@ShannanFerry) December 2, 2017 New York hates Trump, several protesters shouted. I believe it s time to stop lining the pockets of the rich and stealing from the poor, said John Eng, a 54-year-old real estate agent from Manhattan who was holding a sign declaring, The poor will have to eat the rich. The protesters were particularly enraged by the Senate s early-morning passage of a $1.2 trillion tax reform bill supported by Trump. A few shouted Kill the bill, don t kill us. That tax plan is an abomination, said Melissa Carpenter, a 51-year-old lawyer from Bayside who wore a Guy Fawkes mask for the occasion. He has some balls to come here. Read more NYP | politics | Dec 3, 2017 | theprotestersinnycmustbeconfusedtheyyelledtaxtherich notworkingpeople arenttherichworkingpeopletoo manyoftherichworkedlikecrazytobewheretheyaretoday also therichpayplentyintaxesalready dontyoulovehowthesepeopledontwantanyonetobesuccessfulbutwantthemtogiveawaytheirhard earnedmoney dozensofprotestersyellingshamegatheredacrossthestreetfromciprianioneast42ndstreetearlysaturdaywherepresidenttrumpwastoattendabreakfastfundraiser membersofseiuakaobamaspurplearmywereoutinforcedoingwhattheyvebeentrainedtodoprotestpresidentialmotorcadedepartsciprianiearliertoday wewillhavethestorylateron ny1 pic twitter com uw6kwhdmh0shannanferry shannanferry december2 2017newyorkhatestrump severalprotestersshouted ibelieveitstimetostopliningthepocketsoftherichandstealingfromthepoor saidjohneng a54 year oldrealestateagentfrommanhattanwhowasholdingasigndeclaring thepoorwillhavetoeattherich theprotesterswereparticularlyenragedbythesenatesearly morningpassageofa 1 2trilliontaxreformbillsupportedbytrump afewshoutedkillthebill dontkillus thattaxplanisanabomination saidmelissacarpenter a51 year oldlawyerfrombaysidewhoworeaguyfawkesmaskfortheoccasion hehassomeballstocomehere readmorenyp | 39 | 1170 | 30.00000 |
| 4 | Fake | Trump Willing To Discuss Solar Powered Border Wall If Everyone Agrees It Was His Idea | Trump took some time today during a White House meeting with Republican congressional leaders to brag about his new ideas for the border wall. According to three meeting attendees, Trump proposed that the wall could be covered with solar panels, and pay for the cost of the wall with the electricity it generated.Trump was happy about his idea of a wall that was up to 50 feet high and solar panel covered. He bragged that they would be beautiful structures, because most walls are only 14 or 15 feet high, and these ones would be taller and better.Trump did have one caveat going forward with solar panel border wall discussion he insisted that lawmakers could talk about it as long as they told everyone it was his idea! The truth is, obviously, it wasn t. But remember, we have a toddler for a President.A solar panel covered wall was actually proposed in a bid submitted during a U.S. requests for wall designs at the beginning of the year, according to a report by the AP. The companies who get the contracts will likely be announced later this month.This begs the question can we use the President s toddler tendencies for the good of the planet? Maybe all we need to do to get Trump to support renewable energy is to give him credit for coming up with it and give him extra praise like a three-year-old who ate his vegetables. You did it, Trumpy!Photo by Olivier Douliery-Pool/Getty Images | News | June 6, 2017 | trumptooksometimetodayduringawhitehousemeetingwithrepublicancongressionalleaderstobragabouthisnewideasfortheborderwall accordingtothreemeetingattendees trumpproposedthatthewallcouldbecoveredwithsolarpanels andpayforthecostofthewallwiththeelectricityitgenerated trumpwashappyabouthisideaofawallthatwasupto50feethighandsolarpanelcovered hebraggedthattheywouldbebeautifulstructures becausemostwallsareonly14or15feethigh andtheseoneswouldbetallerandbetter trumpdidhaveonecaveatgoingforwardwithsolarpanelborderwalldiscussionheinsistedthatlawmakerscouldtalkaboutitaslongastheytoldeveryoneitwashisidea thetruthis obviously itwasnt butremember wehaveatoddlerforapresident asolarpanelcoveredwallwasactuallyproposedinabidsubmittedduringau s requestsforwalldesignsatthebeginningoftheyear accordingtoareportbytheap thecompanieswhogetthecontractswilllikelybeannouncedlaterthismonth thisbegsthequestioncanweusethepresidentstoddlertendenciesforthegoodoftheplanet maybeallweneedtodotogettrumptosupportrenewableenergyistogivehimcreditforcomingupwithitandgivehimextrapraiselikeathree year oldwhoatehisvegetables youdidit trumpy photobyolivierdouliery pool gettyimages | 28 | 1149 | 41.03571 |
| 5 | Fake | BREAKING: SOUTH CAROLINA SENATE CAVES: VOTES TO REMOVE CONFEDERATE FLAG FROM STATEHOUSE GROUNDS…BECAUSE IT’S ALL ABOUT THE FLAG, YA KNOW | Another successful cleansing of our history like it or not what s next?Just a reminder of something Michelle Obama said in 2008 on the campaign trail in Puerto Rico: MICHELLE OBAMA: Barack knows that we are going to have to make sacrifices; we are going to have to change our conversation; we re going to have to change our traditions, our history; we re going to have to move into a different place as a nation. Change our traditions and change our history. What did she mean by that?Changing history means not just telling the same old tall tales of the free market system and the Founders. No, it s the history according to progressives. And it s not merely spinning the old facts; it s taking current events and molding them to fit the progressive agenda and, in this case, completely ignoring history. HERE S A BACKWARDS TIMELINE OF WHAT HAPPENED TODAY VIA THE POST AND COURIER:Members of the South Carolina Senate have voted 37-3 to remove the Confederate battle flag from the Statehouse grounds.Sen. Lee Bright, R-Roebuck, objected to giving the bill automatic third reading, which is usually a procedural vote, on Tuesday. For the bill to be sent to the House, it will need a two-thirds vote.Monday s three nay votes were from Bright, and Sens. Harvey Peeler and Danny Verdin. Plus, for the bill to be amended on third reading, it would need a three-fifths vote.Senate is scheduled to return Tuesday at 10 a.m.3:20 p.m. update: The senate has voted to table amendments that would have pushed the vote on the Confederate flag issue to a statewide referendum (36-3), allow the flag to flown on Statehouse grounds on Confederate Memorial Day (22-17) or replace the current flag with the First National Flag of the Confederate States of America (34-6). Now, various senators are taking turns speaking about the issue. No one has yet made a motion to vote on the bill that would remove the Confederate battle from the Statehouse grounds.1:50 p.m. update: After a short break, the Senate returned to debate the fate of the Statehouse s Confederate battle flag just after 1:15 p.m.Roebuck Republican Sen. Lee Bright s amendment has already died on a 36-3 vote. It would have placed the fate of the flag in the hands of voters.The Senate has now moved to discuss an amendment by Sen. Danny Verdin, R-Laurens. It would allow for the flag to be flown at the Confederate Soldier Monument on Confederate Memorial Day, which is May 10. Verdin has the floor.COLUMBIA It s been a morning of impassioned speeches in the South Carolina Senate, as lawmakers brace for discussion on a bill that will determine the fate of the Confederate battle flag on the Statehouse s grounds.The Senate is on recess until 1 p.m. Senate President Pro Tempore Hugh Leatherman said the heads of both the GOP and Democratic Caucus asked for body to break for a recess so that the caucuses could meet. But lawmakers are still planning on discussing the bill today. My intent is to give it second reading today and my intent would be to give it third reading tomorrow, Leatherman said. Will the Senate do that? Don t know. But we ll try to head in that direction. If the bill follows Leatherman s planned track, it ll be before the House for a vote on Wednesday. Only one amendment has been proposed in the Senate so far.Roebuck Republican Sen. Lee Bright s amendment would place the fate of the banner in the hands of voters. When the bill crosses the hall, it ll likely be met with an amendment by Rep. Mike Pitts, R-Laurens, who said he d like to see the battle flag replaced with Bonnie Blue.Meanwhile, members from both sides of the aisle have made speeches calling for the flag s removal this morning, including Pickens Republican Sen. Larry Martin, who said his view on the flag changed after the shooting that took the lives of nine churchgoers in Charleston on June 17.Martin said he looked at the flag as if it was given some sort of official status, because it flies on the capitol s grounds. That doesn t represent all of the people of South Carolina, Martin said. It isn t part of our future. It s part of our past. A two-thirds vote in each chamber is needed to do anything with any monument on the capitol s grounds, including the battle flag which is part of the Confederate Soldier Monument. That vote threshold has been met, according to a survey by The Post and Courier.Outside the Statehouse, dozens of protesters began to arrive Monday morning. Some called for the flag to come down. Others, such as Nelson Waller in his rebel flag tie, said the state was giving in to Northern liberals and civil rights activists. Waller carried a sign that read Keep the flag. Dump Nikki! Two decades ago, he carried a Dump Beasley sign after then-Gov. David Beasley made an unsuccessful attempt to get the Confederate flag off the Statehouse dome. | politics | Jul 6, 2015 | anothersuccessfulcleansingofourhistorylikeitornotwhatsnext justareminderofsomethingmichelleobamasaidin2008onthecampaigntrailinpuertorico michelleobama barackknowsthatwearegoingtohavetomakesacrifices wearegoingtohavetochangeourconversation weregoingtohavetochangeourtraditions ourhistory weregoingtohavetomoveintoadifferentplaceasanation changeourtraditionsandchangeourhistory whatdidshemeanbythat changinghistorymeansnotjusttellingthesameoldtalltalesofthefreemarketsystemandthefounders no itsthehistoryaccordingtoprogressives anditsnotmerelyspinningtheoldfacts itstakingcurrenteventsandmoldingthemtofittheprogressiveagendaand inthiscase completelyignoringhistory heresabackwardstimelineofwhathappenedtodayviathepostandcourier membersofthesouthcarolinasenatehavevoted37 3toremovetheconfederatebattleflagfromthestatehousegrounds sen leebright r roebuck objectedtogivingthebillautomaticthirdreading whichisusuallyaproceduralvote ontuesday forthebilltobesenttothehouse itwillneedatwo thirdsvote mondaysthreenayvoteswerefrombright andsens harveypeeleranddannyverdin plus forthebilltobeamendedonthirdreading itwouldneedathree fifthsvote senateisscheduledtoreturntuesdayat10a m 3 20p m update thesenatehasvotedtotableamendmentsthatwouldhavepushedthevoteontheconfederateflagissuetoastatewidereferendum 36 3 allowtheflagtoflownonstatehousegroundsonconfederatememorialday 22 17 orreplacethecurrentflagwiththefirstnationalflagoftheconfederatestatesofamerica 34 6 now varioussenatorsaretakingturnsspeakingabouttheissue noonehasyetmadeamotiontovoteonthebillthatwouldremovetheconfederatebattlefromthestatehousegrounds 1 50p m update afterashortbreak thesenatereturnedtodebatethefateofthestatehousesconfederatebattleflagjustafter1 15p m roebuckrepublicansen leebrightsamendmenthasalreadydiedona36 3vote itwouldhaveplacedthefateoftheflaginthehandsofvoters thesenatehasnowmovedtodiscussanamendmentbysen dannyverdin r laurens itwouldallowfortheflagtobeflownattheconfederatesoldiermonumentonconfederatememorialday whichismay10 verdinhasthefloor columbiaitsbeenamorningofimpassionedspeechesinthesouthcarolinasenate aslawmakersbracefordiscussiononabillthatwilldeterminethefateoftheconfederatebattleflagonthestatehousesgrounds thesenateisonrecessuntil1p m senatepresidentprotemporehughleathermansaidtheheadsofboththegopanddemocraticcaucusaskedforbodytobreakforarecesssothatthecaucusescouldmeet butlawmakersarestillplanningondiscussingthebilltoday myintentistogiveitsecondreadingtodayandmyintentwouldbetogiveitthirdreadingtomorrow leathermansaid willthesenatedothat dontknow butwelltrytoheadinthatdirection ifthebillfollowsleathermansplannedtrack itllbebeforethehouseforavoteonwednesday onlyoneamendmenthasbeenproposedinthesenatesofar roebuckrepublicansen leebrightsamendmentwouldplacethefateofthebannerinthehandsofvoters whenthebillcrossesthehall itlllikelybemetwithanamendmentbyrep mikepitts r laurens whosaidhedliketoseethebattleflagreplacedwithbonnieblue meanwhile membersfrombothsidesoftheaislehavemadespeechescallingfortheflagsremovalthismorning includingpickensrepublicansen larrymartin whosaidhisviewontheflagchangedaftertheshootingthattookthelivesofninechurchgoersincharlestononjune17 martinsaidhelookedattheflagasifitwasgivensomesortofofficialstatus becauseitfliesonthecapitolsgrounds thatdoesntrepresentallofthepeopleofsouthcarolina martinsaid itisntpartofourfuture itspartofourpast atwo thirdsvoteineachchamberisneededtodoanythingwithanymonumentonthecapitolsgrounds includingthebattleflagwhichispartoftheconfederatesoldiermonument thatvotethresholdhasbeenmet accordingtoasurveybythepostandcourier outsidethestatehouse dozensofprotestersbegantoarrivemondaymorning somecalledfortheflagtocomedown others suchasnelsonwallerinhisrebelflagtie saidthestatewasgivingintonorthernliberalsandcivilrightsactivists wallercarriedasignthatreadkeeptheflag dumpnikki twodecadesago hecarriedadumpbeasleysignafterthen gov davidbeasleymadeanunsuccessfulattempttogettheconfederateflagoffthestatehousedome | 124 | 3972 | 32.03226 |
| 6 | Fake | Fox News Host Calls For American Muslims With Links To ISIS To Be Executed Without Trial (VIDEO) | Fox News host,Judge Jeanine Pirro appeared to endorse the summary execution of Muslims suspected of links to ISIS-related terrorism without due process under the law.Pierre asked her viewers if the U.S. should apply the death penalty for ISIS-related violence. There was no real definition of what scale of violence or links to ISIS would apply. And the U.S. already lists 41 crimes as capital meaning they are punishable by death including murder, assassination, espionage, treason, and death resulting from aircraft hijacking. It is not supposed to matter what religious, racial or other motivation a suspect has of committing these crimes the court is only interested in whether they committed the acts or not.In short, there is no need for a new capital crime specific to Muslims. Unless of course, you re a Fox News blowhard with an axe to grind.The responses from her viewers began as expected.A viewer named Josephine raged: They aren t afraid to kill us, so why should we be afraid to give them the death penalty? While Kevin wrote: We ve got a few guns down here in Texas, and more than happy to handle it!, Pierre reads this out with a grin reaching from ear-to-ear as if she s about to break out into a Happy Birthday song or something when she is infact, calling for the summary execution of Muslims.She reads out Al s comment with the same hysterical smirk on her face: Why waste taxpayer money holding them in prison? They d kill us in a heartbeat. Adding her own snarky reply: Hey Al, and if we held them in prison Obama would let them out! But even amongst the viewership of Fox News, there are some people who have a respect for law and order and the need to actually investigate a person suspected of committing a crime, rather than arbitrarily killing them. One such viewer, named Kimberly, said: Don t get ahead of yourselves. Everyone deserves a fair trial. Let the courts decide. This sent Pirro into a rage-induced tailspin of a rant in which she saw fit to lecture Kimberly. Hey Kimberly, she snorts. I drafted legislation all the time to increase sentences for punishment. The courts are the ones that implement the laws that we draft. Here we have a Fox News host openly endorsing the summary execution of Muslims, presumably Muslim-Americans, who are even suspected of ISIS-related violence. A channel which calls it unconstitutional to ask a white christian to register his gun, thinks it s totally appropriate to summarily execute a Muslim. Fair and balanced? Not a chance.Featured Image vis Screengrab | News | January 11, 2016 | foxnewshost judgejeaninepirroappearedtoendorsethesummaryexecutionofmuslimssuspectedoflinkstoisis relatedterrorismwithoutdueprocessunderthelaw pierreaskedherviewersiftheu s shouldapplythedeathpenaltyforisis relatedviolence therewasnorealdefinitionofwhatscaleofviolenceorlinkstoisiswouldapply andtheu s alreadylists41crimesascapitalmeaningtheyarepunishablebydeathincludingmurder assassination espionage treason anddeathresultingfromaircrafthijacking itisnotsupposedtomatterwhatreligious racialorothermotivationasuspecthasofcommittingthesecrimesthecourtisonlyinterestedinwhethertheycommittedtheactsornot inshort thereisnoneedforanewcapitalcrimespecifictomuslims unlessofcourse youreafoxnewsblowhardwithanaxetogrind theresponsesfromherviewersbeganasexpected aviewernamedjosephineraged theyarentafraidtokillus sowhyshouldwebeafraidtogivethemthedeathpenalty whilekevinwrote wevegotafewgunsdownhereintexas andmorethanhappytohandleit pierrereadsthisoutwithagrinreachingfromear to earasifshesabouttobreakoutintoahappybirthdaysongorsomethingwhensheisinfact callingforthesummaryexecutionofmuslims shereadsoutalscommentwiththesamehystericalsmirkonherface whywastetaxpayermoneyholdingtheminprison theydkillusinaheartbeat addingherownsnarkyreply heyal andifweheldtheminprisonobamawouldletthemout butevenamongsttheviewershipoffoxnews therearesomepeoplewhohavearespectforlawandorderandtheneedtoactuallyinvestigateapersonsuspectedofcommittingacrime ratherthanarbitrarilykillingthem onesuchviewer namedkimberly said dontgetaheadofyourselves everyonedeservesafairtrial letthecourtsdecide thissentpirrointoarage inducedtailspinofarantinwhichshesawfittolecturekimberly heykimberly shesnorts idraftedlegislationallthetimetoincreasesentencesforpunishment thecourtsaretheonesthatimplementthelawsthatwedraft herewehaveafoxnewshostopenlyendorsingthesummaryexecutionofmuslims presumablymuslim americans whoareevensuspectedofisis relatedviolence achannelwhichcallsitunconstitutionaltoaskawhitechristiantoregisterhisgun thinksitstotallyappropriatetosummarilyexecuteamuslim fairandbalanced notachance featuredimagevisscreengrab | 63 | 2097 | 33.28571 |
news %>%
group_by(label) %>%
summarise(
mean_words = mean(n_words),
sd_words = sd(n_words),
median_words = median(n_words)
) %>%
kable() %>%
kable_styling(full_width = FALSE)
| label | mean_words | sd_words | median_words |
|---|---|---|---|
| Fake | 56.66653 | 56.86752 | 46 |
| Real | 53.71467 | 40.72551 | 47 |
data("stop_words")
tokens <- news %>%
unnest_tokens(word, text_clean) %>%
filter(!word %in% stop_words$word)
top_words <- tokens %>%
count(label, word, sort = TRUE) %>%
group_by(label) %>%
slice_max(order_by = n, n = 15)
ggplot(top_words,
aes(x = reorder_within(word, n, label), y = n, fill = label)) +
geom_col(show.legend = FALSE) +
facet_wrap(~label, scales="free") +
coord_flip()
O gráfico revela as palavras mais frequentes em cada categoria de notícia. Em geral, observamos que termos presentes em notícias falsas tendem a estar relacionados a eventos dramáticos, polêmicas políticas e linguagem de urgência, indicando apelo emocional ou sensacionalista. Já as notícias verdadeiras apresentam vocabulário mais técnico e factual, com maior presença de nomes de instituições, cargos e eventos específicos. Essa diferença sugere que o tipo de informação priorizada em cada classe é distinto, apontando padrões lexicais úteis para modelagem preditiva.
tfidf <- tokens %>%
count(label, word) %>%
bind_tf_idf(word, label, n)
tfidf_top <- tfidf %>%
group_by(label) %>%
slice_max(tf_idf, n = 15)
tfidf_top %>% kable() %>% kable_styling()
| label | word | n | tf | idf | tf_idf |
|---|---|---|---|---|---|
| Fake | gettyimages | 2019 | 0.0025332 | 0.6931472 | 0.0017559 |
| Fake | https | 1803 | 0.0022622 | 0.6931472 | 0.0015681 |
| Fake | 21wire | 564 | 0.0007077 | 0.6931472 | 0.0004905 |
| Fake | featuredimage | 512 | 0.0006424 | 0.6931472 | 0.0004453 |
| Fake | js | 466 | 0.0005847 | 0.6931472 | 0.0004053 |
| Fake | youtube | 404 | 0.0005069 | 0.6931472 | 0.0003514 |
| Fake | becomeamember | 313 | 0.0003927 | 0.6931472 | 0.0002722 |
| Fake | fjs | 279 | 0.0003501 | 0.6931472 | 0.0002426 |
| Fake | featuredimageviavideoscreencapture | 262 | 0.0003287 | 0.6931472 | 0.0002279 |
| Fake | youtu | 258 | 0.0003237 | 0.6931472 | 0.0002244 |
| Fake | seehisstoryhere | 249 | 0.0003124 | 0.6931472 | 0.0002166 |
| Fake | forentirestory | 246 | 0.0003087 | 0.6931472 | 0.0002139 |
| Fake | heresthevideoviayoutube | 236 | 0.0002961 | 0.6931472 | 0.0002052 |
| Fake | dailymail | 220 | 0.0002760 | 0.6931472 | 0.0001913 |
| Fake | featuredimageviascreengrab | 215 | 0.0002698 | 0.6931472 | 0.0001870 |
| Real | beijing | 354 | 0.0004732 | 0.6931472 | 0.0003280 |
| Real | ly | 262 | 0.0003502 | 0.6931472 | 0.0002428 |
| Real | seoul | 173 | 0.0002312 | 0.6931472 | 0.0001603 |
| Real | 8presidentialelection | 158 | 0.0002112 | 0.6931472 | 0.0001464 |
| Real | mexicocity | 158 | 0.0002112 | 0.6931472 | 0.0001464 |
| Real | madrid | 155 | 0.0002072 | 0.6931472 | 0.0001436 |
| Real | tmsnrt | 142 | 0.0001898 | 0.6931472 | 0.0001316 |
| Real | selection | 136 | 0.0001818 | 0.6931472 | 0.0001260 |
| Real | baghdad | 127 | 0.0001698 | 0.6931472 | 0.0001177 |
| Real | itwasindependentlycreatedbythereuterseditorialstaff | 121 | 0.0001617 | 0.6931472 | 0.0001121 |
| Real | saphadnoeditorialinvolvementinitscreationorproduction | 121 | 0.0001617 | 0.6931472 | 0.0001121 |
| Real | thisarticlewasfundedinpartbysap | 121 | 0.0001617 | 0.6931472 | 0.0001121 |
| Real | manila | 119 | 0.0001591 | 0.6931472 | 0.0001103 |
| Real | hetoldreuters | 111 | 0.0001484 | 0.6931472 | 0.0001028 |
| Real | fdp | 109 | 0.0001457 | 0.6931472 | 0.0001010 |
| Real | onechina | 109 | 0.0001457 | 0.6931472 | 0.0001010 |
Enquanto a análise de frequência destaca termos comuns, o TF-IDF evidencia palavras específicas de cada categoria. Os termos com maior TF-IDF em notícias falsas indicam narrativas repetidas apenas nesse grupo, possivelmente associadas a boatos populares ou teorias conspiratórias. Nos textos verdadeiros, predominam termos ligados a fatos comprováveis, locais geográficos, datas e declarações oficiais. Essa distinção reforça a hipótese de que notícias falsas exploram palavras carregadas de sugestão ou impacto, enquanto notícias reais utilizam linguagem descritiva e detalhada, baseada em precisão e contexto.
bigrams <- news %>%
unnest_tokens(bigram, text_clean, token="ngrams", n=2)
big_count <- bigrams %>%
count(label, bigram, sort=TRUE)
big_count %>% head(20) %>% kable() %>% kable_styling()
| label | bigram | n |
|---|---|---|
| Real | washington reuters | 4706 |
| Real | u s | 4549 |
| Fake | twitter com | 4208 |
| Fake | t co | 2122 |
| Real | theu s | 1964 |
| Real | reuters u | 1879 |
| Fake | pic twitter | 1689 |
| Fake | trump realdonaldtrump | 1090 |
| Fake | https t | 1085 |
| Fake | donaldj trump | 880 |
| Real | au s | 582 |
| Real | reuters theu | 580 |
| Real | newyork reuters | 574 |
| Fake | NA | 549 |
| Real | london reuters | 536 |
| Fake | 21wire tv | 503 |
| Real | moscow reuters | 443 |
| Real | s presidentdonaldtrump | 440 |
| Real | s president | 384 |
| Fake | u s | 380 |
A análise de bigramas mostra combinações típicas de palavras que aparecem juntas. Nos textos falsos, observam-se expressões como “breaking news”, “top story” ou frases que remetem a alarme, urgência e impacto imediato, característica de discursos sensacionalistas. Nas notícias verdadeiras, há mais bigramas contendo nomes próprios, instituições e cargos (“white house official”, “federal government”), indicando maior referência a fontes verificáveis. Esses padrões permitem compreender estruturas linguísticas recorrentes e possíveis “assinaturas” discursivas de cada tipo de texto.
tokens_freq <- tokens %>%
count(word) %>%
filter(n >= 50)
tokens_filt <- tokens %>%
semi_join(tokens_freq)
wcorr <- tokens_filt %>%
pairwise_cor(word, id, sort = TRUE)
head(wcorr, 20) %>%
kable() %>%
kable_styling()
| item1 | item2 | correlation |
|---|---|---|
| 00pmcst | 00pmpst | 1 |
| 00pmestforthisspecialbroadcast | 00pmpst | 1 |
| 00pmpst | 00pmcst | 1 |
| 00pmestforthisspecialbroadcast | 00pmcst | 1 |
| 00pmpst | 00pmestforthisspecialbroadcast | 1 |
| 00pmcst | 00pmestforthisspecialbroadcast | 1 |
| document | function | 1 |
| function | document | 1 |
| prisonsentencecommutedtoexpireonjuly28 | commutationgrant | 1 |
| commutationgrant | prisonsentencecommutedtoexpireonjuly28 | 1 |
| werepostedtotheverifiedtwitteraccountsofu | thefollowingstatements | 1 |
| thefollowingstatements | werepostedtotheverifiedtwitteraccountsofu | 1 |
| joinusforuncensored | tuneintothealternatecurrentradionetwork | 1 |
| uninterruptibletalkradio | tuneintothealternatecurrentradionetwork | 1 |
| madeforbarflyphilosophers | tuneintothealternatecurrentradionetwork | 1 |
| misguidedmoralists | tuneintothealternatecurrentradionetwork | 1 |
| masochists | tuneintothealternatecurrentradionetwork | 1 |
| streetcornerevangelists | tuneintothealternatecurrentradionetwork | 1 |
| maniacs | tuneintothealternatecurrentradionetwork | 1 |
| savants | tuneintothealternatecurrentradionetwork | 1 |
A análise de correlação entre palavras identifica pares de termos que tendem a aparecer juntos com frequência nas mesmas notícias. Nas notícias falsas, observamos correlações concentradas em narrativas polarizadas e repetitivas, sugerindo que certas expressões são utilizadas em conjunto para reforçar mensagens específicas. Em contrapartida, nas notícias verdadeiras, pares fortes refletem contexto factual — nomes de pessoas, eventos e datas. Esse resultado indica que, enquanto fake news “reciclam” estruturas narrativas fixas, textos reais apresentam dependências linguísticas mais diversificadas e factuais.
afinn <- get_sentiments("afinn")
sent <- tokens %>%
inner_join(afinn, by="word", relationship="many-to-many") %>%
group_by(id, label) %>%
summarise(score = sum(value))
ggplot(sent, aes(x=score, fill=label)) +
geom_density(alpha=.5)
A distribuição dos valores AFINN revela diferenças claras no uso de palavras emocionalmente carregadas. Notícias falsas apresentam uma cauda mais longa na direção negativa e maior dispersão geral, sugerindo maior polarização emocional — tanto positiva quanto negativa. Já os textos verdadeiros concentram-se em torno de valores próximos de zero, indicando maior neutralidade e objetividade. Esse padrão sustenta a hipótese de que estratégias emocionais mais intensas são utilizadas para aumentar engajamento e viralização de conteúdo falso.
nrc <- get_sentiments("nrc")
emo <- tokens %>%
inner_join(nrc, by="word", relationship="many-to-many") %>%
count(label, sentiment)
emo %>%
group_by(label) %>%
mutate(prop = n/sum(n)) %>%
ggplot(aes(x=sentiment, y=prop, fill=label)) +
geom_col(position="dodge") +
theme(axis.text.x=element_text(angle=45,hjust=1))
A análise NRC mostra a proporção relativa de emoções básicas presentes nos textos. Notícias falsas revelam predominância de emoções negativas como raiva, medo, desconfiança e tristeza, características frequentemente associadas à mobilização emocional e propaganda. Notícias verdadeiras, por sua vez, mostram maior presença de confiança, antecipação e alegria, sugerindo foco em informação factual, expectativa e divulgação de resultados. Essa diferença reforça que a dimensão emocional pode ser um critério eficiente na detecção automática de fake news.
Palavras exclusivas e bigramas mostram temas distintos entre Fake e Real.
TF-IDF revela linguagem mais repetitiva e sensacionalista em Fake.
Sentimento AFINN mostra maior polarização emocional em Fake.
NRC mostra emoções negativas predominantes (medo, raiva, desconfiança) em textos falsos.
Variáveis simples como número de palavras e média de comprimento diferenciam bem as classes.
Limitações
Apenas texto, sem imagens ou metadados sociais.
Apenas inglês.
Amostragem por 30.000 notícias.
Próximos Passos
Classificação supervisionada
Modelos ML (Random Forest, XGBoost)
Extração de tópicos (LDA)