link: https://rpubs.com/FaizAslam_s2125991/Tutorial_3
How many new Cases per day?
How Many were Vacinated per day?
How many death per day?
What are the common symptoms and early symptoms?
What are the age group of people who died?
Where are the places did their visit prior before positive/
What are the patients medical history to avoid further complication?
How does an individual is affected by each vaccine?
How long does the vaccine works on an individual?
Effective gap-days for the first and second dose of vaccine
Does one need a booster shot?
What is the number of new cases tomorrow,in a week, in a month and in 6 months?
How does the movement of the people effect the number of daily cases?
Movement of the covid wave from 1 area to another based the movement of traffic?
library(rvest)
library(stringr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
This allows R to read the following link in HTML
link = "https://www.imdb.com/search/title/?title_type=feature&genres=adventure&explore=genres&view=advanced"
page = read_html(link)
name = page %>%
html_nodes(".lister-item-header a") %>%
html_text()
Getting the movie link, but it doesnt have the full details on the page For EG. the movie Dune : https://www.imdb.com/title/tt1160419/?ref_=adv_li_tt this Link doesnt actually shows the full cast members hence “?ref_=adv_li_tt” was needed to be replaced with “fullcredits?ref_=tt_ov_st_sm”
movie_links = page %>%
html_nodes(".lister-item-header a") %>%
html_attr("href") %>%
paste("https://www.imdb.com/", ., sep = "")
movie_links = gsub("?ref_=adv_li_tt", "fullcredits?ref_=tt_ov_st_sm",
movie_links)
year = page %>%
html_nodes(".text-muted.unbold") %>%
html_text()
Extracting the summary includes some html syntaxing hence the data needed to cleaned beforehand
summary = page %>%
html_nodes(".text-muted+ .text-muted , .ratings-bar+ .text-muted") %>%
html_text()
summary = str_replace_all(summary, "[\r\n]", "")
genre = page %>%
html_nodes(".genre") %>%
html_text()
genre = str_replace_all(genre, "[\r\n]", "")
director = page %>%
html_nodes(".text-muted~ .text-muted+ p a:nth-child(1)") %>%
html_text()
df <- data.frame(name, genre, year, director, summary, stringsAsFactors = FALSE)
head(df, 10)
The Movie Eternal was used for this example
movie_link = "https://www.imdb.com/title/tt1160419/fullcredits?ref_=tt_ov_st_sm"
get_cast = function(movie_link) {
movie_page = read_html(movie_link)
movie_cast = movie_page %>%
html_nodes(".primary_photo+ td a") %>%
html_text()
movie_cast = str_replace_all(movie_cast, "[\r\n]", "") %>%
paste(collapse = ",")
print(movie_cast)
}
get_cast(movie_link)
## [1] " Timothée Chalamet, Rebecca Ferguson, Oscar Isaac, Jason Momoa, Stellan Skarsgård, Stephen McKinley Henderson, Josh Brolin, Javier Bardem, Sharon Duncan-Brewster, Chang Chen, Dave Bautista, David Dastmalchian, Zendaya, Charlotte Rampling, Babs Olusanmokun, Benjamin Clémentine, Souad Faress, Golda Rosheuvel, Roger Yuan, Seun Shote, Neil Bell, Oliver Ryan, Stephen Collins, Charlie Rawes, Richard Carter, Ben Dilloway, Elmi Rashid Elmi, Tachia Newall, Gloria Obianyo, Fehinti Balogun, Dora Kápolnai-Schvab, Joelle, Jimmy Walker, Paul Bullion, Milena Sidorova, János Timkó, Jean Gilpin, Marianne Faithfull, Ellen Dubin, Károly Baksai, Björn Freiberg, Balázs Megyeri, Michael Nardone, Duncan Pow, Ferenc Iván Szabó, Laszlo Szilagyi, Peter Sztojanov Jr., István Áldott"