First, load the lottery data from the hard disk:
load("lottery.Rdata")
Next, load the libraries needed.
library(rvest)
library(stringr)
I’ll show you how to scrape the data of one lottery. You can copy the
code and put a for loop to get all the lottery info.
#suppose we select the second lottery in the data
lottery <- overall_lottery[2]
lottery
## [1] "@what2do 发布的抽奖活动「(已报备)老婆们可以帮忙投个小比赛相关的票吗 抽...」(https://douc.cc/2CPZJa),回复该讨论帖参加抽奖,预定于 2023-05-02 00:00:00 开奖,符合条件的抽奖人数共 13 名,中奖人数 2 名。现已开奖! 中奖豆友为: @momo @算了一起死吧"
To get the lottery info, use the following codes:
link <- str_extract(lottery,"https://.{14}")
lottery_info <- read_html(link) %>%
html_node("script[type='application/ld+json']") %>%
html_text2()
lottery_info
## [1] "{ \"@context\": \"http://schema.org\", \"@type\": \"Conversation\", \"text\": \"投24小时冲刺班\r 投完截下图抽两个老婆6.6\r 谢谢老婆们老婆们天天开心👉🥺👈\", \"name\": \"(已报备)老婆们可以帮忙投个小比赛相关的票吗 抽两个6.6👉👈\", \"url\": \"https://www.douban.com/group/topic/287728195/\", \"commentCount\": \"14\", \"dateCreated\": \"2023-05-01T16:41:51\", \"interactionStatistic\": { \"@type\": \"InteractionCounter\", \"interactionType\": \"http://schema.org/LikeAction\", \"userInteractionCount\": 0 } }"
You can use the following codes to extract info. from
lottery_info.
# remove all the lines, tabs and extra spaces
lottery_info <- gsub("[\n\"\r\t\\s+]","",lottery_info)
lottery_info
## [1] "{ @context: http://chema.org, @type: Converation, text: 投24小时冲刺班 投完截下图抽两个老婆6.6 谢谢老婆们老婆们天天开心👉🥺👈, name: (已报备)老婆们可以帮忙投个小比赛相关的票吗 抽两个6.6👉👈, url: http://www.douban.com/group/topic/287728195/, commentCount: 14, dateCreated: 2023-05-01T16:41:51, interactionStatitic: { @type: InteractionCounter, interactionType: http://chema.org/LikeAction, uerInteractionCount: 0 } }"