This text mining project extensively explored a movies dataset containing around 70 tags exposing heterogeneous characteristics of movie plots and the multi-label associations of these tags with some 14K movie plot synopses.
This project embarks on a comprehensive exploration of movie datasets, focusing on two pivotal dimensions: topic modeling and sentiment analysis. The realm of topic modeling involves deciphering the underlying themes and subjects that pervade the vast landscape of movies. By employing advanced algorithms and techniques, the project aims to distill intricate patterns, identifying prevalent topics that characterize the diverse cinematic spectrum. This analytical approach allows for a nuanced understanding of the content and facilitates the categorization of movies based on their thematic content.
Simultaneously, the project delves into the intricate realm of sentiment analysis within the movie datasets. This facet entails gauging the emotional tone and subjective reactions associated with each film. Employing natural language processing and machine learning methodologies, the sentiment analysis endeavors to unveil sentiments towards movies. By synthesizing topic modeling and sentiment analysis, this project aspires to contribute valuable perspectives to the burgeoning field of film analytics, enhancing our comprehension of cinematic landscapes and audience responses.
Each record in the dataset consists of the following attributes:
## [1] 14828 6
## imdb_id title
## 1 tt0057603 I tre volti della paura
## 2 tt1733125 Dungeons & Dragons: The Book of Vile Darkness
## 3 tt0033045 The Shop Around the Corner
## 4 tt0113862 Mr. Holland's Opus
## 5 tt0086250 Scarface
## 6 tt1315981 A Single Man
## plot_synopsis
## 1 Note: this synopsis is for the orginal Italian release with the segments in this certain order.Boris Karloff introduces three horror tales of the macabre and the supernatural known as the 'Three Faces of Fear'.THE TELEPHONERosy (Michele Mercier) is an attractive, high-priced Parisian call-girl who returns to her spacious, basement apartment after an evening out when she immediately gets beset by a series of strange phone calls. The caller soon identified himself as Frank, her ex-pimp who has recently escaped from prison. Rosy is terrified for it was her testimony that landed the man in jail. Looking for solace, Rosy phones her lesbian lover Mary (Lynda Alfonsi). The two women have been estranged for some time, but Rosy is certain that she is the only one who can help her. Mary agrees to come over that night. Seconds later, Frank calls again, promising that no matter who she calls for protection, he will have his revenge. Unknown to Rosy, Mary is the caller impersonating Frank. Marry arrives at Rosy's apartment soon after, and does her best to calm Rosy's nerves. She gives the panic-struck woman a tranquillizer and puts her to bed.Later that night as Rosy sleeps, Mary gets up out of bed, and pens a note of confession: she was the one making the strange phone calls when she learned of Franks escape from prison. Knowing that Rosy would call on her for help, she explains that she felt it was her way of coming back into her life after their breakup. While she is busy writing, she fails to notice an intruder in the apartment. This time it is Frank, for real. He creeps up behind Mary and strangles her to death with one of Rosys nylon stockings. The sound of the struggle awaken Rosy and she gasps in fright. The murderous pimp realizes that he just killed the wrong woman, and slowly makes his way to Rosy's bed. However, earlier that night, Rosy had placed a butcher knife under her pillow at Mary's suggestion. Rosy seizes the knife and stabs Frank with it as he's beginning to strangle her. Rosy drops the knife and breaks down in hysteria, surrounded by the two corpses of her former lovers.THE WURDALAKIn 19th Century Russia, Vladimir D'Urfe is a young nobleman on a long trip. During the course of his journey, he finds a beheaded corpse with a knife plunged into its heart. He withdraws the blade and takes it as a souvenir.Later that night, Vladimir stops at a small rural cottage to ask for shelter. He notices several daggers hanging up on one of the walls, and a vacant space that happens to fit the one he has discovered. Vladimir is surprised by the entrance of Giorgio (Glauco Onorato), who explains that the knife belongs to his father, who has not been seen for five days. Giorgio offers a room to the young count, and subsequently introduces him to the rest of the family: his wife (Rika Dialina), their young son Ivan, Giorgio's younger brother Pietro (Massimo Righi), and sister Sdenka (Susy Anderson). It subsequently transpires that they are eagerly anticipating the arrival of their father, Gorcha, as well as the reason for his absence: he has gone to do battle with the outlaw and dreaded wurdalak Ali Beg. Vladimir is confused by the term, and Sdenka explains that a wurdalak is a walking cadaver who feeds on the blood of the living, preferably close friends and family members. Giorgio and Pietro are certain that the corpse Vladimir had discovered is that of Ali Beg, but also realize that there is a strong possibility that their father has been infected by the blood curse too. They warn the count to leave, but he decides to stay and await the old mans return.At the stroke of midnight, Gorcha (Boris Karloff) returns to the cottage. His sour demeanor and unkempt appearance bode the worse, and the two brothers are torn: they realize that it is their duty to kill Gorcha before he feeds on the family, but their love for him makes it difficult to reach a decision. Later that night, both Ivan and Pietro are attacked by Gorcha who drains them of blood, and then flees the cottage. Giorgio stakes and beheads Pietro to prevent him from reviving as a wurdalak. But he is prevented from doing so to Ivan when his wife threatens to commit suicide. Reluntantly, he agrees to bury the child without taking the necessary precautions.That same night, the child rises from his grave and begs to be invited into the cottage. The mother runs to her son's aid, stabbing Giorgio when he attempts to stop her, only to be greeted at the front door by Gorcha. The old man bits and infects his daughter-in-law, who then does the same for her husband. Vladimir and Sdenka flee from the cottage and go on the run and hide out in the ruins of an abandoned cathedral as dawn breaks. Vladimir is optimistic that a long and happy life lies with them. But Sdenka is reluctant to relinquish her family ties. She believes that she is meant to stay with the family.Sdenka's fears about her family are confirmed when that evening, Gorcha and her siblings show up at the abandoned Abby. As Vladimir sleeps, Sdenka is lured into their loving arms where they bite to death. Awakened by her screams, Vladimir rushes to her aid, but the family has already taken her home, forcing the lover to follow suite. The young nobleman finds her, lying motionless on her bed. Sdenka awakens, and a distinct change is visible on her face. No longer caring, Vladimir embraces her, and she bites and infects him too.THE DROP OF WATERIn Victorian London, England, Nurse Helen Chester (Jacqueline Pierreux) is called to a large house to prepare the corpse of an elderly medium for her burial. As she dressed the body, she notices an elaborate diamond ring on its finger. Tempted by greed, Nurse Chester steals it. As she does, a glass tips over, and drops of water begin to splash on the floor. She is also assailed by a fly, no doubt attracted by the odor of the body. Unsettled but pleased by her acquisition, she finishes the job and returns home to her small East End flat.After returning home, Nurse Chester is assailed by strange events. The buzzing fly returns and continues to pester her. Then the lights in her apartment go out, and the sounds of the dripping water continues with maddening regularity. She sees the old womans corpse lying on her bed, and coming towards her. The terrified woman begs for forgiveness, but she ultimately strangles herself, imaging that the medium's hands are gripping her throat.The next morning, the concierge (Harriet White Medin) discovers Nurse Chester's body and calls the police. The investigator on the scene (Gustavo de Nardo) quickly concludes that its a simple case and that Nurse Chester "died of fright". The pathologist arrives on the scene to examine the body before it's taken away and he notes that the only sign of violence is a small bruise on her left finger, mostly likely caused when someone pried a ring from her finger. As the doctor makes this observation, the concierge appears distressed, as she has apparently took the ring from the dead Nurse Chester, and is further distracted by the sound of a fly swooping about in the air....Boris Karloff makes a final appearance as Gorcha riding on his horse as he concludes the three tales of fear and tells the viewers to be careful while walking home at night for ghosts and vampires have no fear. The image pulls back to actually reveal him sitting on a prop fake horse with a camera crew and various crewmen moving branches around to simulate the scene of riding through the forest from the Wurdalak segment.
## 2 Two thousand years ago, Nhagruul the Foul, a sorcerer who reveled in corrupting the innocent and the spread of despair, neared the end of his mortal days and was dismayed. Consumed by hatred for the living, Nhagruul sold his soul to the demon Lords of the abyss so that his malign spirit would survive. In an excruciating ritual, Nhagrulls skin was flayed into pages, his bones hammered into a cover, and his diseased blood became the ink to pen a book most vile. Creatures vile and depraved rose from every pit and unclean barrow to partake in the fever of destruction. The kingdoms of Karkoth were consumed by this plague of evil until an order of holy warriors arose from the ashes. The Knights of the New Sun swore an oath to resurrect hope in the land. The purity of their hearts was so great that Pelor, the God of Light, gave the Knights powerful amulets with which to channel his power. Transcendent with divine might, the Knights of the New Sun pierced the shadow that had darkened the land for twelve hundred years and cast it asunder. But not all were awed by their glory. The disciples of Nhagruul disassembled the book and bribed three greedy souls to hide the pieces until they could be retrieved. The ink was discovered and destroyed but, despite years of searching, the cover and pages were never found. Peace ruled the land for centuries and the Knights got lost in the light of their own glory. As memory of the awful events faded so did the power of servants of Pelor. They unwittingly abandoned themselves in the incorrect belief that the Book of Vile Darkness could never again be made whole.Now, the remaining pieces have been discovered, and an ancient evil is attempting to bring them together and restore the relic and the evil it brought. But at the same time a potential new paladin has been named to the Knights of the New Sun to attempt to renew their power to fight this evil. But, to do so, he may need to go against all that he has held dear, risking more that just his own soul in his quest to destroy the evil that surrounds him at every turn.
## 3 Matuschek's, a gift store in Budapest, is the workplace of Alfred Kralik (James Stewart) and the newly hi Ed\nKlara Novak (Margaret Sullavan). At work they constantly irritate each other, but this daily aggravation is tempered by the fact that each has a secret pen pal with which they trade long soul-searching letters. Romantic correspondence is sent back and forth, and while Alfred and Klara trade barbs at work, they dream of someday meeting their sensitive, caring and unknown pen pal.Christmas is fast approaching, and the store is busy. Alfred had been with the store for some time, and has always been treated well by Mr. Matuschek (Frank Morgan), but lately his attitude has changed. Alfred is at a loss, and Matuschek avoids any explanation, finally telling Alfred that it would be best if he left. Stunned, Alfred accepts his last paycheck and says goodbye to everyone, including Klara. For once they are civil to each other.A long awaited meeting of the secret pen pals was planned for that night, and Alfred having just lost his job has no desire to go. Finding he can't fight his curiosity, he wanders to the restaurant where they'd agreed to meet and peeks in the window with his fellow employee. Of course, Klara is there waiting for him, with the chosen book and wearing a red carnation they'd agreed to use as a signal. Realizing that he'd been wrong about her all along, and that his irritation with her was actually masking his attraction, he finally enters and goes over to her table, but does not reveal his true reason for being there although he is aware she will be hurt that her pen pal doesn't show up. Alfred, hurt by her rudeness, finally leaves, knowing that she will wait all night for someone who is no longer coming.Meanwhile, back at the store, Mr. Matuschek has a late-night meeting with a private detective. He knows that his wife has been having an affair with one of his employees, and was convinced it was his trusted friend, Alfred. The detective however tells Matuschek it is in fact another employee and, heart-broken over his wife's infidelity he retires to his office. The delivery boy, returning late, enters and prevents Matuschek shooting himself with a pistol. Collapsing in grief and shame, Matuschekis rushed to the hospital.The next day Alfred visits Mr. Matuschek in his sick bed, where he asks for Alfred's forgiveness and puts him back to work, now as manager of the store. The delivery boy is rewarded with a raise to a store clerk. Klara arrives at work late, obviously heartbroken after the failure of her correspondent to materialize last night. When she finds Alfred in the manager's office she doesn't believe him and when she discovers it is true she faints in the middle of his office. Later, as she is resting at home, Alfred pays her a visit, and while he is there her aunt brings her another letter from her secret pen pal that explains his not being at the meeting because he saw her there with Alfred. Relieved about the misunderstanding she swears to Alfred she'll be back at work in the morning. Alfred is obviously working on a plan to reveal himself to Klara.Christmas Eve is here, and everyone works through the day. Mr. Matuschek has nearly recovered from his sickness and stops by to see how things are going, and when the final tally is made, the store has had its best sales day since 1928. Delighted, he hands out bonuses to all, and takes the new stock boy out for Christmas dinner. Alfred and Klara are getting ready to leave, and she has another date with her mystery pen pal, but Alfred delays her with a few questions. She's never yet seen him and doesn't even know his name but is convince end she will be engaged when she comes back to work. He tells her that the mysterious pen pal stopped by to see him earlier, and he is in fact fat, bald, older and unemployed and quite willing to live off Klara's income.\nAlfred reveals himself when he puts a red carnation in his lapel and suddenly eveything becomes clear to her.
## 4 Glenn Holland, not a morning person by anyone's standards, is woken up by his wife Iris early one bright September morning in 1964. Glenn has taken a job as a music teacher at the newly renamed John F. Kennedy High School. He intends his job to be a sabbatical from being a touring musician, during which he hopes to have more "free time" to compose. However, he soon finds that his job as a teacher is more time-consuming than he first thought.As he arrives at the school for the first time, he meets Vice Principal Wolters, who comments on his Corvair, the model of car the Ralph Nader wrote a book about. Inside the building, he meets Principal Helen Jacobs. Having got off to an awkward start with both of them, he goes to the music room and meets his students for the first time. The students are dull, apathetic, and mostly terrible musicians. At lunchtime, he meets the football coach, Bill Meister, and strikes up a friendship with him. At the end of his stressful first day, Glenn and Iris talk about their future. If everything goes according to plan, between his paychecks and what she made with her photography, he should be able to quit in four years and go back to his music, including composing.Glenn notices one dedicated but inept clarinet player, Gertrude Lang, and starts working with her individually. He continues attempting to teach the class about music and continues working on his music at home as time passes. Grading papers gradually replaces working on his own music during his home time, much to his chagrin. After several months, Glenn grows exasperated when it seems that none of his students have learned anything from his classes. Gertrude, despite diligent practice, does not improve her clarinet-playing. Glenn's exasperation is further compounded when the Principal Jacobs chastises him for not focusing properly on his students. She has noticed that he is even happier to leave at the end of each day than most of the students. Later, Glenn expounds his frustration to Iris, who then informs him that she's pregnant. Glenn is dumbstruck, and his muteness upsets Iris. To comfort her, he tells her a story about how he discovered John Coltrane (his favorite musician) records as a teenager, the point being he could get used to this turn of affairs.After some soul-searching, Glenn decides to try some unconventional methods of teaching music appreciation, including the use of 'Rock and Roll' to interest students, demonstrating to them the similarities between Bach's "Minuet in G" and rock-and-roll in the form of the Toys' "Lovers Concerto". For the first time, the students are interested in the class, and Glenn appears much happier as he relates this to Iris as they assemble a crib. Their apartment is getting more and more crowded, and Glenn suggests that they get a house. Iris is overjoyed, even though it means using their savings and Glenn sacrificing his summer vacation, which he intended to use to work on his composing, in order to make extra money teaching Driver's Ed. Glenn does right by his family but he knows he can forget about getting out of the teaching gig for the foreseeable future.Continuing his new, unorthodox teaching methods, he finally gets Gertrude, who was on the verge of giving up, to have a breakthrough and become a more skilled clarinet player. She rediscovers her joy of playing, and the now-competent band go on to play at the 1965 graduation. Summer vacation begins, and Glenn follows through on his plan to teach Driver's Ed, having a series of near-death experiences at the hands of new drivers. Glenn and Iris move into their new house. Soon, we see the Driver's Ed car once again, except this time it is Glenn himself driving like a maniac, breaking every traffic law - so that he could get to the hospital to see his newborn son, Coltrane ("Cole") Holland.Glenn's unorthodox teaching methods do not go unnoticed by Principal Jacobs, or Vice-Principal Wolters. They, along with the conservative School Board and the parents of the community, are hostile to rock-and-roll. Glenn is able to convince the principal that he believes strongly that teaching the students about all music, including rock-and-roll, will help them appreciate it all the more. The principal and vice principal also hand him a new assignment, to get a marching band together for the football team. Glenn is at a loss with this concept, until his Bill Meister agrees to help, in exchange for Glenn putting one of his football players, Louis Russ, in the band to allow him to earn an extra curricular activity credit, which he needs in order to stay eligible for the sports teams. Louis knows absolutely nothing about music, but takes up drums. He has trouble keeping time and always finds himself out of place with the rest of the band.Later, Glenn and Bill are chatting while playing chess. Bill, a bachelor, wants to know about Glenn's stories of debauchery as a traveling musician, but Glenn doesn't want to talk about the past, as he is a different person in a different time. Glenn instead tells Bill he is pessimistic about Louis Russ. Bill encourages him to keep trying. Much as he worked with Gertrude earlier, Glenn starts working one-on-one with Louis, helping to get a feel for the tempo of music. After some hard work, Louis gets it, and later, he marches with the band in the local parade, much to the delight of his family.Immediately behind the Kennedy band in the parade is a fire engine, and its deafeningly loud horn catches everyone by surprise. Iris looks into Cole's stroller to check on him, but the noise hasn't awakened Cole - the boy is deaf. The revelation drives a wedge between Glenn and his son, as it seems that his son cannot understand what he does. A more somber Glenn teaches his students about Beethoven, the deaf composer.Time passes, and we see a montage of events from the late '60s, as Glenn picks away as his composition a little at a time and watches Iris work with Cole. We stop again in the early '70s, with Glenn still directing his high school band. Cole is old enough to enter school. Because of her mounting frustration with her inability to communicate with Cole, she insists on sending him to a special school for the deaf, whatever the cost. The three of them visit the school. Glenn winces at the cost, but they enroll Cole and set about to learn sign language themselves, though Iris puts more effort into it than Glenn.Apathetic students still go through Glenn's classes, and one of them, named Stadler, is stoned. Glenn is chewing him out when Glenn receives bad news. He tells Stadler to meet him on Saturday. On that day, they appear at a funeral. Louis Russ has come home from Vietnam after being killed in action. Coach Meister is there, and he and Glenn mourn. At the end of that academic year, Bill reveals that he finally has a steady girl-friend, and Principal Jacobs retires from the high school, praising Glenn for what he has done.We see another montage of events, this time in the 1970s. Glenn continues teaching Driver's Ed in the summer. We see the class of 1980 being welcomed back, suggesting that it is now September 1979. Glenn and Bill Meister team up to help the Drama Department, when it is rumored that funding may be pulled. Glenn and Bill tell Wolters, now the principal, have an idea to be certain the school will make money rather than lose it; it will be a musical revue of Gershwin classics. During auditions for the musical revue, Glenn becomes entranced and interested in a talented young singer named Rowena Morgan. At home, the teenage Cole comes home and tells Glenn about the science fair, which Glenn missed. Iris is fluent at sign language now, but Glenn is still only fair. Iris reproaches him for spending so much time with the school projects and the students while neglecting his own son. Glenn is frustrated, realizing that his own musical composing has been on the back burner for 15 years now.Rowena visits Glenn at a diner, where he has gotten into the habit of going to get out of the house and have someplace quiet to work. Unknown to Iris, Glenn writes a small piece that he titles "Rowena's Theme," and takes an interest when Rowena states that she wants to leave town and go to New York to sing professionally. Glenn's life at home is still strained. Iris agrees to come to the school play on Saturday, because she had a meeting with Cole's teachers on opening night (Friday).The school revue arrives at last, and is a big hit, playing to a packed house. In the audience we see Coach Meister wearing a ring (he married the woman we saw earlier), and Sarah, the drama teacher, shows Principal Wolters something on a new invention, a handheld calculator (presumably, showing him how much gate money made it into the school coffers, as Wolters looks impressed). After the revue, Rowena comes to see Glenn in the auditorium, and she tells him she intends to pursue her dream of singing by going to New York the very next night, after the second and last performance of the revue. Glenn is taken aback. Rowena hints that she'd like Glenn to come with her. Glenn goes home and looks at his photo album, looking at pictures of his family and pictures of his old life as a traveling musician, now half a lifetime in the past. He is tempted to leave everything behind and go with Rowena to restore his old life as a musician. However, he realizes he is no longer the same person as he was then. He visits Rowena at the bus stop and sees her off, giving her the names of someone in New York who will help her find lodging. Glenn watches her depart, and goes home, content in his love for Iris.The timeline then shifts to late 1980, when John Lennon is killed. Glenn goes home and finds Cole working on Glenn's old Corvair. When Cole asks what is wrong, Glenn tries to explain, but then gives up, feeling that his son wouldn't understand John Lennon or his music. This infuriates Cole who (through Iris), explains that he does care about Glenn and knows about John Lennon, but that Glenn does not seem to be at all interested in communicating with him. Cole berates his father for putting so much effort in to teaching his students and very little towards him, calling him an asshole in sign language as he stalks off. Glenn then makes an effort, and even provides a concert at the high school, which also features lights and other items to enhance the show for deaf members of the school where Cole attends. Glenn, having become somewhat more proficient in sign language, even does an interpretation of Lennon's song 'Beautiful Boy,' dedicated to Cole. Later, Glenn discovers Cole listening to records by sitting on the speakers and feeling the vibrations through his body, and they can start healing the rift between them, even as Glenn's composition continues to gather dust.Time passes. It's now 1995. Glenn goes to see Principal Wolters, who announces that Art, Music and Drama have been cut from the school curriculum, and Glenn would be out of a job shortly. Glenn, who has become a cynical old man, tells Wolters that to cut the fine arts would lead to a generation of students who would be proficient at reading and writing and math (maybe) but would have nothing to read or write about. Wolters offers to write Glenn a reference, but Glenn, who is now 60 years old, fully recognizes the futility of the gesture. His working days are over and he knows it. Then he looks up at the picture on the wall of the long-departed former Principal Jacobs. He says Jacobs would have fought the budget cuts, and he will too. Glenn pleads to the school board to reconsider, but they refuse.At home, Iris reads a letter from the now-grown Cole. He has become a teacher himself, and was considering an offer from a university for the deaf in Washington, D.C. He also has taken Glenn's old car, the Corvair that we saw at the beginning and that Cole was working on in his teens, and jokingly writes that he will never give it back. Despondent, Glenn walks through the school on his last day, and he talks to Coach Bill, whose job as football coach is safe, though he can't be far from retirement himself. Glenn figures that he will bring in some money teaching piano lessons on the side, but he's unprepared to be forced into early retirement.On Glenn's final day at the school, Cole shows up driving the Corvair. School's out for him, too. Glenn is surprised when Iris and Cole lead him to the school auditorium, where they have organized a surprise going-away celebration for him. He sees many of his former students in the audience, including Stadler, the pothead from years before. Arriving next is Gertrude Lang, the clarinetist who Glenn helped in the 60s, who has since become the state's governor. Gertrude thanks Glenn for his dedication, and Glenn is very moved. He is moved to tears when she gives him a baton and asks him to conduct his own composition, which she had got hold of. The curtains open and a band, filled with more of Glenn's former students, is assembled and ready to play. Governor Lang picks up her clarinet and takes her place among them, and they play, for the first time, the musical Opus that Glenn had been picking away at for three decades.
## 5 In May 1980, a Cuban man named Tony Montana (Al Pacino) claims asylum, in Florida, USA, and is in search of the "American Dream" after departing Cuba in the Mariel boatlift of 1980. When questioned by three tough-talking INS officials, they notice a tattoo on Tony's left arm of a black heart with a pitchfork through it, which identifies him as a hitman, and detain him in a camp called 'Freedomtown' with other Cubans, including Tony's best friend and former Cuban Army buddy Manolo "Manny Ray" Ribiera (Steven Bauer), under the local I-95 expressway while the government evaluates their visa petitions.After 30 days of governmental dithering and camp rumors, Manny receives an offer from the Cuban Mafia which he quickly relays to Tony. If they kill Emilio Rebenga (Roberto Contreras) a former aide to Fidel Castro who is now detained in Freedomtown, they will receive green cards. Tony agrees, and kills Rebenga during a riot at Freedomtown. The murder of Rebenga was requested by Frank López, a wealthy, politically astute man who deals cars and trades in cocaine, as Rebenga had tortured López's brother to death while still in Cuba many years earlier.After getting their Green Cards, Tony Montana and Manny Ray find work as dishwashers in a corner sandwich/taco shop. Some weeks later, a López henchman and underboss, Omar Suárez (F. Murray Abraham), the man who contacted Manny for the Rebenga hit job, offers Tony and Manny a low-risk job of unloading marijuana from a boat from Mexico to arrive in Miami the following night for $500 each. Tony insults Suárez by turning down the job over the little money they will receive, and demands at least $1,000 for the work. After an altercation, Suárez sets Tony up on another job to purchase two kilograms of cocaine worth around $25,000 a piece from a Colombian dealer, named Hector The Toad, a medium to high-risk job for which Tony and Manny will receive $5,000 for their work.That weekend, Tony, Manny, and two other Marielitos in his crew whom they met in Freedomtown, Angel Fernández (Pepe Serna), and Chi Chi (Ángel Salazar) then set out to meet "Hector the Toad" (Al Israel) at a seedy motel on the boulevard in Miami Beach. While Manny and Chi Chi wait in the car on the street, Tony and Angel go up to the hotel room to meet with Hector. The meeting does not go smoothly, as Tony grows irritated with Hector, who is slow to give him the cocaine in exchange for money. Suddenly, Tony and Angel are double-crossed by the Colombian. It becomes apparent that Hector does not intend to sell Tony the cocaine he has; he only wants to steal the money Tony has been given to purchase the product. To convince Tony to give over the cash, Hector dismembers Angel in a shower stall with a chainsaw. After Angel is dead, Tony, about to suffer the same fate, is saved by Chi Chi and Manny who arrive in the nick of time to gun down Hector's henchmen. Manny receives a minor bullet wound in his shoulder when his Uzi sub-machine gun jams. Hector escapes but Tony vengefully confronts him in the street and shoots him dead in the middle of the crowded Ocean Drive (the now famous Miami South Beach boulevard). Tony and his crew then get away with both the cocaine and the money before the police arrive.The following night, Tony and Manny meet López (Robert Loggia) at his house for the first time where Tony impresses Lopez with not only the return of his cash but with a gift of the cocaine, a prize from the botched rip off. Frank immediately hires Tony and his crew into his criminal hierarchy, a representative of a Cuban mafia. But during this initial get together Tony also meets Lopez's lady, the blond and beautiful Elvira Hancock (Michelle Pfeiffer), who will eventually become the source of tension between the two men. Taking Tony and Manny out to a local nightclub, called The Babylon Club where Frank frequently attends, Tony and Manny see first-hand the high standard of living they have come to acquire. Though Frank actually warns against these excesses, Tony is seduced by them regardless. Thus, Tony Montana begins his rise through the ranks of the Miami cocaine underworld.Three months later. Tony Montana attempts to make amends by meeting with his estranged family. It is implied that Tony's father, a former U.S. Navy sailor, abandoned the family when Tony was little. Since then, his mother (Miriam Colon) and younger 19-year-old sister Gina (Mary Elizabeth Mastrantonio) have been living in Miami. Tony shows up at his mother's and Gina's house one evening, fashionably dressed, and offers them $1,000 in cash for financial support. Gina is overjoyed to see her older brother whom they have not seen for five years. But Tony's mother has only scorn for him since he turned his back on them many years ago for the quick and easy life of crime back in Cuba, and wants nothing to do with Tony, and she is too full of pride to accept his money despite being financially stricken. But Gina, who idolizes her brother, follows him outside where he slips her the money secretly. Gina tells Tony that she wants in on the flashy life that he has going for him. Tony's love for Gina is clearly genuine for she's the only person that he trusts, and is also very protective of her. Afterwords, Manny makes a comment to Tony about how attractive Gina is, but Tony angrily warns him to avoid courting her.Several months later, Tony is sent to Bolivia to help Omar set up a new distribution deal with Bolivian kingpin Alejandro Sosa (Paul Shenar), since Frank is having legal troubles that preclude him from leaving the country. Though Tony was supposed to let Omar do all the talking, Omar proves to be a poor negotiator, prompting Tony to step in and save the deal. They seem to negotiate a deal that, on the surface seems favorable to both sides, but Omar insists that Frank would not approve. Sosa then sides with Omar on this and suggests that Omar use his phone to call Frank. A few minutes later, Sosa hands Tony binoculars, and he sees two menacing assassins, Alberto the Shadow (Mark Margolis) and the Skull (Geno Silva), execute Omar by hanging him by the neck from an airborne helicopter. Sosa reveals that Alberto recognized Omar as once being an informant for the police several years ago, and he has a zero tolerance for disloyalty. Tony insists that he never goes back on his word, and that he never trusted Omar. Sosa agrees to bring Tony on board with him as his North American distributor of cocaine and other drugs. But upon their agreement, Sosa sternly warns Tony never to betray or double-cross him in any way.Upon his return to Florida, Tony is chewed out by Frank for overstepping his authority as well as hearing about Omar's death. Tony explains to Frank that for a price of $18 million to pay Sosa for the manufacturing and transportation costs, they will receive 2,000 kilograms of cocaine from Bolivia for nationwide sale and distribution which will earn them $75 million over a period of one year. Frank is worried because he does not have the many millions to pay Sosa for the cocaine, but Tony says that he is in tight with Sosa and he has established a "credit line" with him as well as work out a payment plan where they will pay Sosa $5 million up front and the rest in monthly installments. Plus, in case Frank comes up short a few million, Tony will earn the money needed through his own street contacts. Frank angrily tells Tony that he did not negotiate a good deal, for Sosa merely tricked him into thinking he did. Tony replies that it's time for them to "think big," and to expand the cartel for nationwide distribution of cocaine. With them as the main North American distributors and wholesalers of the Sosa cartel, they will make millions and become the biggest cartel in the continent. Frank warns Tony that Sosa cannot be trusted and that he will sooner or later turn against them for any slight deviation or compromise of his business. Frank orders Tony to stall his deal with Sosa for the time being. Frank then promptly tells Tony that ambitious drug dealers, such at himself, who want too much and crave money, power and attention, do not last long in the business. Tony leaves shrugging with indifference and strikes out on his own.(Note: Frank López's two warnings about Tony's greed and Sosa's violence will be later proven true.)After this incident, Tony then seeks out Elvira to whom he makes an unexpected marriage proposal. She is shaken by this, but agrees to think about it. Frank López is none too happy when he hears about this and decides to take out Tony.At the Babylon nightclub that evening, Tony is approached and shaken down by a Miami police detective, named Mel Bernstein (Harris Yulin). He proposes to "tax" Tony on his transactions in return for police protection and information. Tony is distracted by the sight of Gina dancing with a local drug dealer. He follows the two to a restroom stall where he berates Gina for her promiscuous conduct. He asks Manny to take her home. On the way Gina admits she is attracted to Manny. Manny wards her off, mindful of Tony's extreme protectiveness.Back at the nightclub, Tony is attacked by two gunmen but manages to escape, killing them both despite being wounded by a gunshot to his left shoulder. Suspecting Frank sent Bernstein and the hitmen, Tony asks one of his bodyguards, Nick The Pig, to call Frank after Tony arrives at Frank's office at 3:00 a.m. that very night and inform him the hit failed. Tony, Manny and Chi Chi visit Frank at his car dealership back office, who is with Det. Bernstein. Nick calls Frank, who confirms his involvement by playing the call off as Elvira telling him she'll be late home. When it becomes apparent that Bernstein (who is armed) will not help him, Frank begs for Tony's forgiveness, saying that he can have Elvira and ten million dollars in exchange for sparing his life. Tony will have none of it, and Manny coldly executes Frank. Bernstein insists that he could be a valuable ally for Tony, but Tony disagrees, and kills him too.His problems apparently solved, Tony begins a profitable relationship with Sosa, marries Elvira, buys a new mansion complete with surveillance cameras and numerous luxury items, and Tony even sets Gina up in business with her own beauty salon. Manny and Gina soon begin a romantic relationship, but they keep it secret from Tony who had firmly stated to Gina that he does not want her dating anybody.As Tony's business grows, so does his cocaine addiction and paranoia, and he begins to spiral out of control... the beginning of the end. His wife, who becomes further addicted to cocaine, becomes bored and emotionally distant. Tony's banker informs him that laundering the increasing flow of drug money has become increasingly difficult, so he will be charging higher fees, up to 10%. A Jewish mob boss, Mel Seidelbaum (Ted Beniades), contacts Manny, offering his assistance. However, as they are cleaning out the money, Seidelbaum reveals himself to be an undercover cop and arrests Tony. After posting a $5 million in bail, Tony's corrupt lawyer, Sheffield, tells him that although he may get him cleared of the corruption and money laundering charges, Tony will probably have to serve at least three years in prison for tax evasion. Manny suggests that he take it, as the American prison system is nowhere near as harsh as its Cuban counterpart, and the right legal loopholes could trim the sentence down to six months. However, the strung-out Tony yells that he would rather die than spend a single day in jail.After hearing about Tony's arrest, Sosa, not wanting to lose his main distributor, steps in to intervene by offering Tony a way out of going to prison. He calls Tony back to Bolivia where he introduces him to his cocaine "board of directors" a group that includes a sugar baron, Bolivia's military chief, and a mysterious American named Charles Goodson (Gregg Henry). We assume he is a corrupt CIA officer because Sosa guarantees that the IRS will not be able to send Tony to jail. But this help comes at a price. A Bolivian journalist is attempting to expose the ongoing corruption in the Bolivian government involvement in drug trafficking, and his crusade is beginning to hurt Sosa and his partners. Sosa will be sending Alberto to New York assassinate the journalist, but he needs Tony and his crew to provide some extra muscle. Tony is clearly disturbed by the assassination since it is against his custom to kill a man whom he sees as a civilian, plus Tony has never killed anybody who didn't wrong him personally. But seeing no other options, Tony reluctantly agrees to help Sosa with the hit.In the meantime, Tony's marriage with Elvira finally ends when after a bitter altercation at a local restaurant, she finally expresses her contempt for him and the lives he had led her on, and walks out of the restaurant, and out of his life. Tony, punch-drunk on cocaine, tells the restaurant's other patrons that his existence is necessary since society needs a man like him to call a criminal. Tony also informs Manny to look after things while he travels to New York on business (but he doesn't tell him about the Sosa assassination deal.)Tony, with his henchmen, Chi Chi and Reuben, and Alberto travel to New York City and Alberto places a bomb under the journalist's car in with the intention of detonating it outside the UN building before the journalist addresses the General Assembly and exposes Sosa's cartel. But Tony has second thoughts when the journalist unexpectedly picks up his wife and children. Tony, saying that the team was only supposed to kill only the journalist, shoots Alberto to prevent the journalist's family from being killed. When authorities later discover the unexploded bomb underneath the journalist's car, they realize that an execution had been planned and increase the amount of security protecting the journalist. Sosa is now the primary suspect and Sosa vows to get even with Tony.Returning to Miami, Tony discovers that Gina and Manny (who opposed the trip to New York) have disappeared. Tony has long harbored an apparent unnatural obsession for his sister and is overly protective of her for reasons that he may not understand himself. Tony visits his mother again where she angrily tells him about Gina's descent into an immoral life and accuses him of corrupting her with his flashy lifestyle. After getting Gina's home address from Mrs. Montana, who doesn't know who else lives there, Tony goes to the house in nearby Palm Grove. Much to Tony's surprise, Manny unexpectedly opens the door in his bathrobe. Tony then sees Gina in a night gown at the top of the stairs. Enraged that another man has obviously slept with his sister, Tony shoots Manny dead. Hysterical, Gina reveals to Tony that they had just been married and were going to surprise him. Tony, riddled with guilt, has Gina taken back to his mansion.In revenge for Tony's failure to kill the journalist, who has now exposed Sosa and his partners to the world as drug lords, Sosa sends a Latino mercenary hit squad (the size of a large platoon), to Tony's mansion to kill him that very evening. Sitting at his desk snorting from an enormous pile of cocaine, Tony realizes and regrets what he has done to his best friend. When Tony is contemplating his actions, Sosa's mercenaries breach the main gate at Tony's estate and quietly begin to kill all the guards around the mansion. At the same time, a distraught Gina, wearing only an unbuttoned sleep shirt and armed with a revolver, enters Tony's office to confront him with the truth about his feelings for her. She now realizes that Tony loves her in an unnatural way and demands, at gunpoint, that he make love to her. She begins to shoot at him while demanding he take her. A Sosa assassin hiding on the balcony, thinking Gina is shooting at him, leaps in and riddles her with bullets. An enraged Tony throws the man off the balcony and kills him with his sub-machine gun creating a storm of chaos at the mansion. At this point, the mercenaries, robbed of the element of surprise by the gunshots, swarm in to attack Tony's mansion from all directions.As all his men are being killed, Tony, still delirious from the cocaine, leans over Gina's dead body begging for forgiveness, at the same time the mercs break into the mansion, Chi Chi opens fire with an Uzi as he falls back and ends up banging on the door to Tony's office (it has been locked from the inside by Gina who was planning to kill Tony). Unfortunately, Tony does not seem to hear him. Chi Chi is shot in the back and Tony sees it on the security cameras.As the hit men prepare to storm his office, Tony finally snaps out of his drug-induced state, arms himself with an AR-15 assault rifle with an under-mounted M203 grenade launcher and blows down the door. A huge climatic gun battle begins as Tony takes position atop the grand staircase and guns down dozens of Sosa's men who try to storm the balcony. Tony is hit a number of times by return fire, but he keeps shooting. With most of Sosa's men dead, Tony, strung-out on drugs, defiantly yells out at the assassins, not realizing that the Skull has sneaked into the room behind him. The Skull shoots Tony in the back with a 12-Gauge shotgun. Tony falls off the balcony and into a reflecting pool at the base of the grand staircase. In the final shot, as the Skull and the few surviving assassins look on, Tony Montana lies dead... face down in the reflecting pool which is located below a large brass globe that says: THE WORLD IS YOURS.
## 6 George Falconer (Colin Firth) approaches a car accident in the middle of a snow-white scenery. There is a bloodied man there and he kisses him. He wakes up: he was dreaming about the moment when his partner of 16 years, Jim (Mathew Goode), died--though he was not there with him because Jim was visiting his disapproving family on his own. George remembers the phone ringing on that fateful day, when Jim's cousin told him about the fatal accident, and how George was not welcome to attend the funeral, because of the family's homophobia (common for the period and later). George remembers breaking down to Charley (Julianne Moore) that day, his best friend from his life in London, who had also relocated to LA; once briefly sexually attached to George before he was completely honest with himself, she may still feel attracted to him.George showers and dresses. It's November 30, 1962, the eve of the Cuban missile crisis. Though British, he is now a professor of English at UCLA. He is depressed, never having recovered from his loss; and when he leaves for work, he packs a gun in his briefcase.He tells his cleaning lady Alva (Paulette Lamori) that she has always been wonderful - in spite of her having forgotten to take out the bread from the fridge. George hugs her, which leaves her utterly confused.On campus, George notices a couple of students, chain-smoking Lois (Nicole Steinwedell) and a boy. One of the secretaries (Keri Lynn Pratt) tells him that she has given his address to some nice new student; it turns out to be this boy, Kenny Potter (Nicholas Hoult), who talks to him after class about the speech George has just given out in the classroom concerning minorities and fear. Kenny discusses recreational drug use with Kenny who tells him that he had never heard George express himself so openly in class as he had that day. He buys George a pencil sharpener as a token of gratitude for George's talking with him.George phones Charley, who is dressing for the dinner they have planned at her home. George gets into his car, and picks his gun after having cleaned up his office. However, Kenny appears once again, and invites him to go for a drink, observing George's depression and having noticed that he has cleaned out the desk in his office. George tells him it will have to be some other time. He goes to the bank to pick up various things from his safe deposit box, and when looking at a photo of his deceased lover, recalls a conversation with him on the beach.After buying some bullets, he goes to a convenience store. There, Carlos (Jon Kortajarena) bumps onto him, breaking the bottle of Scotch he has just bought. George buys a new bottle of Scotch and they talk. They smoke a few cigarettes and drink a bottle of gin together. George leaves, refusing Carlos' offer of company, saying that this is a serious day for him and that he's trying to get over an old love.At home, he puts on a record and remembers a conversation with Jim while each one was reading a different book on a couch. He pretends shooting himself as practice for later that night, but in a semi-comic scene, can't find the best position in which to accomplish it. Charley calls to remind him of their dinner plans, which he grudgingly attends after leaving a note and some money for Alva. They dance and talk about London, life, Charley's ex-husband's abandonment, and she offends George by suggesting that they might have had a "normal" life together if he hadn't been a "poof." Charley says George doesn't look well, reminding him of the heart attack he suffered near the time of Jim's death. Charley tries to convince George to spend the night at her home, but he leaves.The scene flashes back to 1946 when Jim and George had met when at a bar. Jim was on leave from the Army, right after the second world war. Returning to1962, we see George returning to the same bar, near his home; now a quiet place where he asks for a Scotch.Kenny has followed him there. They talk and then go to the beach and swim naked. They go to George's place. As George's forehead is bleeding, Kenny tends to it, and sees in the medicine's cabinet a nude photo of Jim. George sees Kenny strip off his wet clothes, but does nothing. Kenny says that he and Lois are not romantically involved. Not unlike George and Charley in the distant past, Kenny explains that they had a brief sexual liason. Kenny and George do not have sex, and Kenny stays on the couch, given the very late hour.George wakes in a few hours, and finds his gun under Kenny's covers and removes it, locking it up as Kenny sleeps. When he returns to bed, George dies of a heart attack, seeing the image of Jim kissing his forehead.
## tags
## 1 cult, horror, gothic, murder, atmospheric
## 2 violence
## 3 romantic
## 4 inspiring, romantic, stupid, feel-good
## 5 cruelty, murder, dramatic, cult, violence, atmospheric, action, romantic, revenge, sadist
## 6 romantic, queer, flashback
## split synopsis_source
## 1 train imdb
## 2 train imdb
## 3 test imdb
## 4 train imdb
## 5 val imdb
## 6 val imdb
“imdb_id” (IMDb Identifier): This column contains the unique identifier for each movie in the IMDb database.
“title”: This column contains the title of the movie.
“plot_synopsis”: This column contains a brief overview of the movie’s plot.
“tags”: This column may contain keywords or tags related to the movie, providing further synoriptors of its features.
“split”: This column may indicate the split or division within the dataset, such as training set, testing set, etc.
“synopsis_source”: This column may contain information about the source of the movie plot synopsis, indicating where the summary was obtained.
I will set up separate tidy data frames for the title, plot synopsis and tags, keep the dataset ids for each so that I can connect them later in the analysis if necessary.
# Movies title
movies.title= tibble(id = movies$imdb_id,
title = movies$title)
movies.title %>% head(5)## # A tibble: 5 × 2
## id title
## <chr> <chr>
## 1 tt0057603 I tre volti della paura
## 2 tt1733125 Dungeons & Dragons: The Book of Vile Darkness
## 3 tt0033045 The Shop Around the Corner
## 4 tt0113862 Mr. Holland's Opus
## 5 tt0086250 Scarface
# movies plot synopsis
movies.syno= tibble(id= movies$imdb_id,
syno=movies$plot_synopsis)
movies.syno %>% head(5)## # A tibble: 5 × 2
## id syno
## <chr> <chr>
## 1 tt0057603 "Note: this synopsis is for the orginal Italian release with the se…
## 2 tt1733125 "Two thousand years ago, Nhagruul the Foul, a sorcerer who reveled …
## 3 tt0033045 "Matuschek's, a gift store in Budapest, is the workplace of Alfred …
## 4 tt0113862 "Glenn Holland, not a morning person by anyone's standards, is woke…
## 5 tt0086250 "In May 1980, a Cuban man named Tony Montana (Al Pacino) claims asy…
## # A tibble: 5 × 2
## id tag
## <chr> <chr>
## 1 tt0057603 cult, horror, gothic, murder, atmospheric
## 2 tt1733125 violence
## 3 tt0033045 romantic
## 4 tt0113862 inspiring, romantic, stupid, feel-good
## 5 tt0086250 cruelty, murder, dramatic, cult, violence, atmospheric, action, rom…
movies.title %>%
unnest_tokens(word,title) %>%
anti_join(stop_words) -> movies.title #29,599
movies.title %>% head(10)## # A tibble: 10 × 2
## id word
## <chr> <chr>
## 1 tt0057603 tre
## 2 tt0057603 volti
## 3 tt0057603 della
## 4 tt0057603 paura
## 5 tt1733125 dungeons
## 6 tt1733125 dragons
## 7 tt1733125 book
## 8 tt1733125 vile
## 9 tt1733125 darkness
## 10 tt0033045 shop
movies.syno %>%
unnest_tokens(word,syno) %>%
anti_join(stop_words) -> movies.syno # 5,991,811
movies.syno %>% head(10)## # A tibble: 10 × 2
## id word
## <chr> <chr>
## 1 tt0057603 note
## 2 tt0057603 synopsis
## 3 tt0057603 orginal
## 4 tt0057603 italian
## 5 tt0057603 release
## 6 tt0057603 segments
## 7 tt0057603 order.boris
## 8 tt0057603 karloff
## 9 tt0057603 introduces
## 10 tt0057603 horror
movies.tag %>%
separate_rows(tag, sep = ", ") %>%
mutate(tag = trimws(tag)) -> movies.tag
movies.tag$tag %>% unique()#71 tags## [1] "cult" "horror" "gothic"
## [4] "murder" "atmospheric" "violence"
## [7] "romantic" "inspiring" "stupid"
## [10] "feel-good" "cruelty" "dramatic"
## [13] "action" "revenge" "sadist"
## [16] "queer" "flashback" "mystery"
## [19] "suspenseful" "neo noir" "prank"
## [22] "psychedelic" "tragedy" "autobiographical"
## [25] "home movie" "good versus evil" "depressing"
## [28] "realism" "boring" "haunting"
## [31] "sentimental" "paranormal" "historical"
## [34] "storytelling" "comedy" "fantasy"
## [37] "philosophical" "adult comedy" "cute"
## [40] "entertaining" "bleak" "humor"
## [43] "plot twist" "christian film" "pornographic"
## [46] "insanity" "brainwashing" "sci-fi"
## [49] "dark" "claustrophobic" "psychological"
## [52] "melodrama" "historical fiction" "absurd"
## [55] "satire" "alternate reality" "alternate history"
## [58] "comic" "grindhouse film" "thought-provoking"
## [61] "clever" "western" "blaxploitation"
## [64] "whimsical" "intrigue" "allegory"
## [67] "anti war" "avant garde" "suicidal"
## [70] "magical realism" "non fiction"
What are the most common words in the movies titles and synopsis?
## # A tibble: 30 × 2
## word n
## <chr> <int>
## 1 2 249
## 2 la 175
## 3 night 152
## 4 ii 133
## 5 dead 127
## 6 house 108
## 7 love 108
## 8 3 103
## 9 de 103
## 10 black 90
## # ℹ 20 more rows
## # A tibble: 30 × 2
## word n
## <chr> <int>
## 1 tells 32060
## 2 time 17470
## 3 house 17183
## 4 home 16250
## 5 takes 14469
## 6 father 14073
## 7 car 13607
## 8 police 12986
## 9 night 12564
## 10 day 12506
## # ℹ 20 more rows
Words like “tells”, “time” and “house” used very often in movies synopsis. I want to remove digits and some words like “la”,“ii”,“de”, “le”, “iii”,form these data frames for other analysis; and the people’s name, they are not meaningful for most cases.
Remove the meaningless words
data("babynames")
babynames$name %>% unique() %>% tolower()->names
my.stopwords = tibble(word= c(as.character(1:100),
"i","ii","iii","iv","v","vi","vii","viii","ix","x",
"la","de","le",
names))
movies.title %>% anti_join(my.stopwords)->movies.title
movies.title %>%
count(word,sort = T) %>%
mutate(word=fct_reorder(word,n)) %>%
top_n(14) %>%
ggplot(aes(x=word,y=n)) +
geom_col(aes(fill=factor(word)))+
coord_flip()+
theme(legend.position = "none") +
scale_fill_manual(values =rev(c(dark_cols,light_cols))) +
ggtitle("Movies title Top Word Counts") -> pic02;pic02movies.syno %>% anti_join(my.stopwords) ->movies.syno
movies.syno%>%
count(word,sort = T) %>%
mutate(word=fct_reorder(word,n)) %>%
top_n(14) %>%
ggplot(aes(x=word,y=n)) +
geom_col(aes(fill=factor(word)))+
coord_flip()+
theme(legend.position = "none") +
scale_fill_manual(values =rev(c(dark_cols,light_cols))) +
ggtitle("Movies synopsis Top Word Counts")->pic03;pic03As a next step, I want to examine which words commonly occur together in movies titles, and synopsis. I can then examine word networks for these fields; this may help us see, for example, which datasets are related to each other.
Visualizing title networks
set.seed(1234)
title.word.pairs %>%
filter(n>=7) %>% #left 92 pairs
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha=n,
edge_width=n),
edge_colour=dark_cols[5]) +
geom_node_point(size=4)+
geom_node_text(aes(label =name), repel=T,
point.padding = unit(0.2, "lines")) +
theme_void() +
labs(title = "Word network in the movies titles") -> pic04
pic04## Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
From movies title Network, I can see clear clustering in this network of title words. Such as: 1.”trek”, “star”,“wars”, “episode” .. these words tend to go together in the war filed. 2.”story”, “love” these words tend to go together in the love filed
Visualizing plot synopsis networks
syno.word.pairs %>%
filter(n>=2600) %>% #left 162 pairs
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha=n,
edge_width=n),
edge_colour=dark_cols[4]) +
geom_node_point(size=4)+
geom_node_text(aes(label =name), repel=T,
point.padding = unit(0.3, "lines")) +
theme_void() +
labs(title = "Word network in the movies synopsis")-> pic05
pic05From movies synopsis Network, showing such strong connections between the top dozen or so words (words like “time”, “take”, “home”, “life”, and “tell”) that we do not see clear clustering structure in the network. We may want to use tf-idf as a metric to find characteristic words for each synopsis field, instead of looking at counts of words.
Visualizing tags networks Next, let’s make a network of the tags to see which tags commonly occur together in the same movies.
tag.word.pairs %>%
filter(n>=200) %>% #left 162 pairs
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha=n,
edge_width=n),
edge_colour=dark_cols[2]) +
geom_node_point(size=4)+
geom_node_text(aes(label =name), repel=T,
point.padding = unit(0.3, "lines")) +
theme_void() +
labs(title = "tags network in the movies tags")-> pic06
pic06From movies tags Network, showing such strong connections between the movies tags (tags like “murder”, “violence”, “revenge”) that we do not see clear clustering structure in the network.
Tf-idf, the term frequency times inverse document frequency, to identify words that are especially important to a document within a collection of documents.
So I will apply that approach to the synopsis fields of these movies dataset.
syno.tf.idf = movies.syno %>%
count(id,word, sort = T) %>%
bind_tf_idf(word,id,n)
syno.tf.idf %>%
arrange(desc(tf_idf)) %>%
head(10)## # A tibble: 10 × 6
## id word n tf idf tf_idf
## <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 tt1470023 macgruber 32 0.151 9.60 1.45
## 2 tt0107180 bamm 24 0.188 7.66 1.44
## 3 tt0082841 mlle 28 0.166 8.51 1.41
## 4 tt0255643 sreekrishnan 9 0.143 9.60 1.37
## 5 tt0480669 héctor 55 0.183 7.30 1.34
## 6 tt0086802 snorks 12 0.135 9.60 1.29
## 7 tt1309181 dünya 11 0.134 9.60 1.29
## 8 tt0172234 roop 18 0.15 8.51 1.28
## 9 tt0040178 jayakrishnan 25 0.132 9.60 1.27
## 10 tt0403702 sheeni 20 0.129 9.60 1.24
These are the most important words in the synopsis fields as measured by tf-idf, meaning they are common but not too common.
Connecting synopsis fields to tags.
We now know which words in the synopsis have high tf-idf, and we also have labels for these synopsis in the tags. Let’s do a full join of tags data frame and the data frame of synopsis words with tf-idf, and then find the highest tf-idf words for a given tags.
syno.tf.idf = full_join(syno.tf.idf, movies.tag ,by="id")
syno.tf.idf %>%
filter(tag %in% top10.tag$tag[1:9] ) %>%
arrange(desc(tf_idf)) %>%
group_by(tag) %>%
#distinct(word,tag,.keep_all = T) %>%
slice_max(tf_idf, n=15,with_ties = F) %>%
ungroup() %>%
mutate(word = factor(word, levels=rev(unique(word)))) %>%
ggplot(aes(tf_idf,word,fill=factor(tag))) +
geom_col(show.legend = F) +
scale_fill_manual(values = c(dark_cols,light_cols))+
facet_wrap(~tag, ncol=3, scales="free") +
labs(title = " Distribution of tf-idf for words from movies synopsis labeled with belonged tag",
subtitle = "Highest tf-idf words in movies synopsis fields",
x = "tf-idf",
y = NULL)-> pic07
ggsave("output/pic07.jpg",plot = pic07, width = 8, height = 10, units = "in",device = "jpg")
pic07 Using tf-idf has allowed us to
identify important synopsis words for each of these tags. However, since
a movie plot synopsis can be associated with multiple tags, certain high
tf-idf words like “MacGruber” and “Mordrid” may simultaneously appear
across several tags. This highlights the challenge of disambiguating
unique associations in the context of multi-tagged synopses.
How many topics will we tell the algorithm to make? This is a question much like in k-means clustering; we don’t really know ahead of time. I tried the following modeling procedure using 4, 8, 16 topics; I found that at ? topics, documents are still getting sorted into topics cleanly but going much beyond that caused the distributions of γ, the probability that each document belongs in each topic, to look worrisome. We will show more details on this later.
top_terms_by_topic_LDA <- function(doc_vec,
doc_names,
plot = T,
number_of_topics = 4) # number of topics (4 by default)
{
# create a corpus (type of object expected by tm) and document term matrix
DTM=CreateDtm(doc_vec =doc_vec ,
doc_names =doc_names ,
ngram_window = c(1,3),
lower=T,
stopword_vec = c(stopwords::stopwords("en"),
stopwords::stopwords(source = "smart"),
my.stopwords$word
),
remove_punctuation = T,
remove_numbers = T,
verbose=F)
# remove any empty rows in our document term matrix
non_zero_docs <- rowSums(DTM) > 0
DTM <- DTM[non_zero_docs, ]
# we'll get an error when we try to run our LDA)
# preform LDA & get the words/topic in a tidy text format
lda <- LDA(DTM, method = "Gibbs",k = number_of_topics, control = list(seed = 1234))
topics <- tidy(lda, matrix = "beta")
# get the top ten terms for each topic
top_terms <- topics %>% # take the topics data frame and..
group_by(topic) %>%
slice_max(beta,n=15,with_ties = F) %>%
ungroup() %>%
arrange(topic,-beta)%>%
mutate(term=reorder_within(term,beta,topic))
lda.gamma=tidy(lda,matrix="gamma")
# if the user asks for a plot (TRUE by default)
if (plot) {
# plot the top ten terms for each topic in order
plot1 <- ggplot(data = top_terms, aes(beta, term, fill = as.factor(topic))) +
geom_col(show.legend = FALSE) +
scale_y_reordered() +
labs(title = "Top 15 terms in each LDA topic",
x = expression(beta), y = NULL) +
facet_wrap(~ topic, ncol = 2, scales = "free")
plot2 <- ggplot(lda.gamma, aes(gamma, fill = as.factor(topic))) +
geom_histogram(alpha = 0.8, show.legend = FALSE) +
facet_wrap(~ topic, ncol = 2) +
scale_y_log10() +
labs(title = "Distribution of probability for each topic",
y = "Number of documents", x = expression(gamma))
return(list(topics = topics, top_terms = top_terms, lda_gamma = lda.gamma, beta_plot = plot1, gamma_plot = plot2))
} else {
# if the user does not request a plot
# return a list of sorted terms instead
return(list(topics = topics, top_terms = top_terms, lda_gamma = lda.gamma))
}
}movies.4topic=top_terms_by_topic_LDA(
doc_vec = movies$plot_synopsis ,
doc_names =movies$imdb_id ,
plot = T,
number_of_topics = 4)
saveRDS(movies.4topic,file="output/movies.4topic.rds")## # A tibble: 10 × 3
## topic term beta
## <int> <chr> <dbl>
## 1 1 aa_ab 0.0000000189
## 2 2 aa_ab 0.000000193
## 3 3 aa_ab 0.0000000152
## 4 4 aa_ab 0.0000000183
## 5 1 aa_ab_laut 0.0000000189
## 6 2 aa_ab_laut 0.000000193
## 7 3 aa_ab_laut 0.0000000152
## 8 4 aa_ab_laut 0.0000000183
## 9 1 aa_batteries 0.0000000189
## 10 2 aa_batteries 0.0000000176
The column beta(β) tells us the probability of that term being generated from that topic for that document. It is the probability of that term (word) belonging to that topic. Notice that some of the values for β are very, very low, and some are not so low. What is each topic about? Let’s examine the top 10 terms for each topic.
## # A tibble: 30 × 3
## topic term beta
## <int> <fct> <dbl>
## 1 1 back___1 0.00113
## 2 1 find___1 0.000885
## 3 1 man___1 0.000695
## 4 1 time___1 0.000690
## 5 1 ship___1 0.000672
## 6 1 king___1 0.000631
## 7 1 world___1 0.000563
## 8 1 finds___1 0.000558
## 9 1 tells___1 0.000558
## 10 1 earth___1 0.000522
## # ℹ 20 more rows
So, we can see for the topic 1, words like “back”, “find”,“man” and “time” have the highest probability. It is not very easy to interpret what the topics are about from a data frame like this so let’s look at this information visually in next figure.
From the top 15 terms in each LDA 4 topics. We can see there are clearly 4 movies topics. Topic 1: A theme that revolves around fiction concepts such as time, discovery. Topic 2: A theme related to family, love movies. Topic 3: A theme involving police investigation movies. Topic 4: A theme related to war, killing related concepts.
Next, we can see the probabilities of documents belonging to these 4 topics.
## # A tibble: 59,312 × 3
## document topic gamma
## <chr> <int> <dbl>
## 1 tt0057603 1 0.229
## 2 tt1733125 1 0.712
## 3 tt0033045 1 0.145
## 4 tt0113862 1 0.0223
## 5 tt0086250 1 0.0447
## 6 tt1315981 1 0.0510
## 7 tt0249380 1 0.174
## 8 tt0408790 1 0.0777
## 9 tt0021079 1 0.0864
## 10 tt1615065 1 0.0257
## # ℹ 59,302 more rows
These are the probabilities of documents belonging to topics. Notice that some of the probabilities are low and some are higher. Our model has assigned a probability to each synopsis belonging to each of the topics we constructed from the sets of words. How are the probabilities distributed? Let’s visualize it.
ggplot(movies.4topic$lda_gamma , aes(gamma)) +
geom_histogram(alpha = 0.8) +
scale_y_log10() +
labs(title = "Distribution of probabilities for all 4 topics",
y = "Number of documents", x = expression(gamma)) -> topic4.all
topic4.all## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
From this distribution of probabilities for all LDA 4 topics. The y-axis in the plot is presented on a logarithmic scale for better visibility of details. The variable γ(gamma) ranges from 0 to 1, representing the probability that a given document pertains to a specific topic. Notably, numerous values cluster near zero, indicating many documents that do not align with any particular topic. Conversely, values near γ=1 signify documents confidently associated with their respective topics. This distribution underscores effective discrimination, distinguishing documents as either belonging or not to specific topics. Further insight can be gained by examining how probabilities are distributed within each 4 topics, as illustrated in the subsequent figure.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
From these 4 distributions of probabilities for each LDA 4 topics. We can see there are many documents with γ close to 1; these are the documents that do belong to topic according to the model. There are also many documents with γ close to 0; these are the documents that do not belong to topic. Each document appears in each panel in this plot, and its γ for that topic tells us that document’s probability of belonging in that topic.
Although these topics are quite clear, but I tried options higher than 4 (such as 8 or 16) to check documents were optimal into topics very well.
ggsave("output/movies.4topic.beta.jpg",plot=movies.4topic$beta_plot,
device="jpg", width=5,height=8,units="in")
ggsave("output/movies.4topic.gamma.jpg",plot=movies.4topic$gamma_plot,
device="jpg", width=5,height=8,units="in")
ggsave("output/movies.4topic.gamma.all.jpg",plot= topic4.all,
device="jpg", width=8,height=5,units="in")movies.8topic=top_terms_by_topic_LDA(
doc_vec = movies$plot_synopsis ,
doc_names =movies$imdb_id ,
plot = T,
number_of_topics = 8)
#save models and plots
saveRDS(movies.8topic,file="output/movies.8topic.rds") From the top 15 terms in
each LDA 8 topics. We can see there are still clearly 8 movies topics.
Topic 1: Theme related to police, gangs, crime related concepts. Topic
2: Theme related to war, military related concepts. Topic 3: Theme
related to zombies related concepts. Topic 4: Theme related to schools
related concepts. Topic 5: Theme related to Vampires, monsters and
related supernatural concepts. Topic 6: Theme related to home, family
related concepts. Topic 7: Theme related to witches and evil elements.
Topic 8: Themes include missionaries, magic concepts.
So I favor the LDA model with 8 topics because it allows for a richer exploration of diverse concepts. The increased granularity in having 8 topics facilitates clearer distinctions between themes, enhancing the interpretability and depth of the model’s representation of underlying patterns within the dataset.
ggplot(movies.8topic$lda_gamma , aes(gamma)) +
geom_histogram(alpha = 0.8) +
scale_y_log10() +
labs(title = "Distribution of probabilities for all 8 topics",
y = "Number of documents", x = expression(gamma)) -> topic8.all
topic8.all This distribution
shows that documents are being well discriminated as belonging to 8
topics. values near γ=1 signify documents confidently associated with
their respective topics. While values near γ=0 signify documents are not
confidently associated with any topics.
From this plot, I find
documents cleanly sorted in some topics. For topic 3 and 6, there are
many documents with γ close to 1; these are the documents that do belong
to these topics according to the model. This plot displays the type of
information we used to choose how many topics for our topic modeling
procedure. When we tried options higher than 8 (such as 16), the
distribution of γ moves away from near γ=1 and documentation is not well
organized by topics.
ggsave("output/movies.8topic.beta.jpg",plot=movies.8topic$beta_plot,
device="jpg", width=5,height=8,units="in")
ggsave("output/movies.8topic.gamma.jpg",plot=movies.8topic$gamma_plot ,
device="jpg", width=5,height=8,units="in")
ggsave("output/movies.8topic.gamma.all.jpg",plot= topic8.all,
device="jpg", width=8,height=5,units="in")movies.16topic=top_terms_by_topic_LDA(
doc_vec = movies$plot_synopsis ,
doc_names =movies$imdb_id ,
plot = T,
number_of_topics = 16)
saveRDS(movies.16topic,file="output/movies.16topic.rds")ggplot(movies.16topic$lda_gamma , aes(gamma)) +
geom_histogram(alpha = 0.8) +
scale_y_log10() +
labs(title = "Distribution of probabilities for all 16 topics",
y = "Number of documents", x = expression(gamma)) -> topic16.all;topic16.allmovies.8topic$gamma_plot
ggsave("output/movies.16topic.beta.jpg",plot=movies.16topic$beta_plot,
device="jpg", width=5,height=15,units="in")
ggsave("output/movies.16topic.gamma.jpg",plot=movies.16topic$gamma_plot ,
device="jpg", width=5,height=15,units="in")
ggsave("output/movies.16topic.gamma.all.jpg",plot= topic16.all,
device="jpg", width=8,height=5,units="in") In comparing the models
with 4, 8, and 16 topics, I think that the 8-topic model is considered
optimal. This conclusion is drawn based on the clarity and
interpretability of the topics it generates. The identified topics in
the 8-topic model exhibit distinct and easily understandable themes.
Additionally, the distribution of γ, representing the probability that a
document belongs to a given topic, tends to be closer to 1. This
indicates that the model effectively discriminates documents,
confidently assigning them to specific topics. The balance between
granularity and coherence in the 8-topic model contributes to its
perceived optimality, offering a meaningful and nuanced representation
of the underlying patterns within the text data.
Once I have using 8 LDA topic modeling the movies plot synopsis, my next step is to understand the opinion or emotion in the text. This is considered sentiment analysis.
There are a variety of dictionaries that exist for evaluating the
opinion or emotion in text. The tidytext package contains
three sentiment lexicons in the sentiments dataset.
## # A tibble: 10 × 2
## sentiment n
## <chr> <int>
## 1 anger 1245
## 2 anticipation 837
## 3 disgust 1056
## 4 fear 1474
## 5 joy 687
## 6 negative 3316
## 7 positive 2308
## 8 sadness 1187
## 9 surprise 532
## 10 trust 1230
## # A tibble: 2 × 2
## sentiment n
## <chr> <int>
## 1 negative 4781
## 2 positive 2005
## # A tibble: 11 × 2
## value n
## <dbl> <int>
## 1 -5 16
## 2 -4 43
## 3 -3 264
## 4 -2 966
## 5 -1 309
## 6 0 1
## 7 1 208
## 8 2 448
## 9 3 172
## 10 4 45
## 11 5 5
The nrc lexicon categorizes words in a binary fashion
into categories of positive, negative, anger, anticipation,
disgust, fear, joy, sadness, surprise, and trust.
The bing lexicon categorizes words in a binary fashion
into positive and negative categories.
The AFINN lexicon assigns words with a score that runs
between -5 and 5, with negative scores indicating
negative sentiment and positive scores indicating
positive sentiment.
I will use the nrc sentiment data set to assess the different sentiments that are represented across the movies plot synopsis.
## # A tibble: 6 × 2
## id word
## <chr> <chr>
## 1 tt0057603 note
## 2 tt0057603 synopsis
## 3 tt0057603 orginal
## 4 tt0057603 italian
## 5 tt0057603 release
## 6 tt0057603 segments
movies.syno %>%
inner_join(get_sentiments("bing")) %>%
count(sentiment,sort=T) -> overall.sentiment.bing
ggplot(overall.sentiment.bing,
aes(x=sentiment,y=n)) +
geom_col(aes(fill=factor(sentiment))) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 30,
vjust = 0.6))From this overall sentiment plot, we can see the movies negative emotions is obviously more than positvie.
NRC sentiment dataset has specific emotion, so we can see what’s specific negative emotions or positive emotions existing in the movies plot synopsis.
movies.syno %>%
inner_join(get_sentiments("nrc")) %>%
count(sentiment,sort=T) %>%
mutate(sentiment = fct_reorder(sentiment,desc(n)) )-> overall.sentiment.nrc
ggplot(overall.sentiment.nrc,
aes(x=sentiment,y=n)) +
geom_col(aes(fill=factor(sentiment))) +
theme(legend.position = "none",
axis.text.x = element_text(angle = 30,
vjust = 0.6))So, the overall nrc sense also gives us that there is a negative sentiment presence than positive. And the negative sentiment “fear”,“sadness” and “anger” relatively higher emotions in movies plot synopsis. While positive sentiment such as “joy” is rarely relatively.
Afinn sentiment dataset has scores presenting the positve or negative level, so we can see what’s specific scores existing in the movies plot synopsis.
movies.syno %>%
inner_join(get_sentiments("afinn")) %>%
summarise(sum(value))-> overall.sentiment.afinn
cat("Sentiment socres for all movies plot synopsis is negative, and the score is ",overall.sentiment.afinn$`sum(value)`)## Sentiment socres for all movies plot synopsis is negative, and the score is -599144
cat("Average sentiment socres for each plot synopsis is also negative, and the sore is ",overall.sentiment.afinn$`sum(value)`/length(unique(movies.syno$id)))## Average sentiment socres for each plot synopsis is also negative, and the sore is -40.40626
So, the overall sense gives us that there is a negative sentiment presence than positive. But how sentiment changes over each movie plot synopsis?
#bing lexicon
movies.syno %>%
group_by(id) %>%
inner_join(get_sentiments("bing")) %>%
count(id, sentiment) %>%
ungroup() %>%
pivot_wider(names_from = sentiment,
values_from = n) %>%
mutate(sentiment= positive-negative) %>%
select(id,sentiment) %>%
mutate(type="bing") -> each.sentiment.bing## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("bing")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 71466 of `x` matches multiple rows in `y`.
## ℹ Row 4723 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
#nrc
movies.syno %>%
group_by(id) %>%
inner_join(get_sentiments("nrc")) %>%
filter(sentiment %in% c("positive","negative")) %>% #90,467 words
count(id,sentiment) %>%
ungroup() %>%
pivot_wider(names_from = sentiment,
values_from = n) %>%
mutate(sentiment=positive-negative) %>%
select(id,sentiment) %>%
mutate(type="nrc")-> each.sentiment.nrc## Joining with `by = join_by(word)`
## Warning in inner_join(., get_sentiments("nrc")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 10 of `x` matches multiple rows in `y`.
## ℹ Row 1709 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
#afinn
movies.syno %>%
group_by(id) %>%
inner_join(get_sentiments("afinn")) %>% # 65,094 words
summarise(sentiment=sum(value)) %>% #2,181
ungroup() %>%
select(id,sentiment) %>%
mutate(type="afinn")-> each.sentiment.afinn## Joining with `by = join_by(word)`
rbind(each.sentiment.bing, each.sentiment.nrc, each.sentiment.afinn) ->
alltype.sentiment
alltype.sentiment$type= factor(alltype.sentiment$type,
levels = c("bing","nrc","afinn"),
labels = c("bing","nrc","afinn"))
ggplot(alltype.sentiment,
aes(x=id,y=sentiment,fill=type)) +
geom_col() +
facet_wrap(~type,ncol = 1) +
theme(legend.position = "none",
axis.text.x = element_blank()) -> each.sentiment.plot
each.sentiment.plot## Warning: Removed 462 rows containing missing values (`position_stack()`).
#ggsave("output/pic08.jpg",plot = each.sentiment.plot, width = 6, height = 10, units = "in",device = "jpg")The three different lexicons for calculating sentiment give results that are different in an absolute sense but have fairly similar relative trajectories and trendency through the novels. We see similar dips and peaks in sentiment at about the same places in the novel, but the absolute values are significantly different. In some instances, it apears the AFINN lexicon finds more negative sentiments than the NRC and BING lexicon. And NRC find more positive sentiments than the Bing and Afinn lexicon. we can compare how movies plot synopsis differ in their sentiment (both direction and magnitude).
After using 8 topic modeling, can we see the sentiment in different topics. Firstly keeping only the document-topic entries that have higer probabilities (γ) greater than some cut-off value; I choose to use 0.5, which means documents belongs to 1 specific topic that γ>0.5. Then discoving sentiment are associated with which topic.
## # A tibble: 10 × 3
## document topic gamma
## <chr> <int> <dbl>
## 1 tt1270479 1 0.536
## 2 tt0429493 1 0.695
## 3 tt0097579 1 0.536
## 4 tt0119488 1 0.554
## 5 tt1366365 1 0.637
## 6 tt0082817 1 0.505
## 7 tt0317740 1 0.534
## 8 tt0831887 1 0.564
## 9 tt0065214 1 0.521
## 10 tt1152836 1 0.582
topic.sentiment %>%
left_join(each.sentiment.bing, by=join_by(document==id)) ->topic.sentiment.bing
topic.sentiment.bing %>% head(10)## # A tibble: 10 × 5
## document topic gamma sentiment type
## <chr> <int> <dbl> <int> <chr>
## 1 tt1270479 1 0.536 -46 bing
## 2 tt0429493 1 0.695 -18 bing
## 3 tt0097579 1 0.536 -70 bing
## 4 tt0119488 1 0.554 -78 bing
## 5 tt1366365 1 0.637 -14 bing
## 6 tt0082817 1 0.505 -35 bing
## 7 tt0317740 1 0.534 -13 bing
## 8 tt0831887 1 0.564 -33 bing
## 9 tt0065214 1 0.521 -49 bing
## 10 tt1152836 1 0.582 -72 bing
ggplot(topic.sentiment.bing,
aes(x=document,y=sentiment,fill=factor(topic))) +
geom_col() +
facet_wrap(~topic,ncol = 2,scales = "free_x") +
theme(legend.position = "none",
axis.text.x = element_blank()) +
labs(x="Movies plot synopsis")-> topic.sentiment.bing.plot
ggsave("output/topic.bing.jpg",plot = topic.sentiment.bing.plot, width = 6, height = 10, units = "in",device = "jpg") The sentiment analysis of movie
plot synopses reveals intriguing patterns. Notably, most movies exhibit
predominantly negative sentiments, portraying themes that likely involve
conflict, adversity, or darker narratives. Interestingly, Topic 4,
associated with “schools,” stands out with relatively more positive
sentiments. This suggests that movies centered around educational
settings may convey uplifting or optimistic elements. In contrast, Topic
2, revolving around “war” and “military,” and Topic 6, focused on “home”
and “family,” showcase a higher prevalence of negative sentiments. This
aligns with the expectation that war and family-related themes often
involve complex and emotionally charged storytelling. The sentiment
analysis provides valuable insights into the emotional tones embedded in
diverse movie themes, offering a nuanced perspective on the cinematic
landscape.