Lottery Scraping for Douban

First, load the lottery data from the hard disk:

load("lottery.Rdata")

Next, load the libraries needed.

library(rvest)
library(stringr)

I’ll show you how to scrape the data of one lottery. You can copy the code and put a for loop to get all the lottery info.

#suppose we select the second lottery in the data
lottery <- overall_lottery[2]
lottery

## [1] "@what2do 发布的抽奖活动「（已报备）老婆们可以帮忙投个小比赛相关的票吗 抽...」（https://douc.cc/2CPZJa），回复该讨论帖参加抽奖，预定于 2023-05-02 00:00:00 开奖，符合条件的抽奖人数共 13 名，中奖人数 2 名。现已开奖！ 中奖豆友为： @momo @算了一起死吧"

To get the lottery info, use the following codes:

link <- str_extract(lottery,"https://.{14}")
lottery_info <- read_html(link) %>% 
  html_node("script[type='application/ld+json']") %>% 
  html_text2()
lottery_info

## [1] "{ \"@context\": \"http://schema.org\", \"@type\": \"Conversation\", \"text\": \"投24小时冲刺班\r 投完截下图抽两个老婆6.6\r 谢谢老婆们老婆们天天开心👉🥺👈\", \"name\": \"（已报备）老婆们可以帮忙投个小比赛相关的票吗 抽两个6.6👉👈\", \"url\": \"https://www.douban.com/group/topic/287728195/\", \"commentCount\": \"14\", \"dateCreated\": \"2023-05-01T16:41:51\", \"interactionStatistic\": { \"@type\": \"InteractionCounter\", \"interactionType\": \"http://schema.org/LikeAction\", \"userInteractionCount\": 0 } }"

You can use the following codes to extract info. from lottery_info.

# remove all the lines, tabs and extra spaces
lottery_info <- gsub("[\n\"\r\t\\s+]","",lottery_info)
lottery_info

## [1] "{ @context: http://chema.org, @type: Converation, text: 投24小时冲刺班 投完截下图抽两个老婆6.6 谢谢老婆们老婆们天天开心👉🥺👈, name: （已报备）老婆们可以帮忙投个小比赛相关的票吗 抽两个6.6👉👈, url: http://www.douban.com/group/topic/287728195/, commentCount: 14, dateCreated: 2023-05-01T16:41:51, interactionStatitic: { @type: InteractionCounter, interactionType: http://chema.org/LikeAction, uerInteractionCount: 0 } }"