I.何をする?

次を参考にしてBeautifulsoupの使い方を学びます。

Super Simple Way to Scrape BBC News Articles in Python

II.仮想環境の作成

Iのリンク文書に書かれています。

III.Beautifulsoupでデータを取得

1.BBCのトップページを読み込む

# import libraries
import requests
from bs4 import BeautifulSoup as bs
# read the page and parse it
page = requests.get('https://www.bbc.com/news')
soup = bs(page.content, 'html.parser')

2.Most watchedとMost readのHTMLのclass=を見つける

Most watachedは5件、Most readは10件です。

次はBrave BrowserのView>Developer>Inspect Elementsの画面です。ブラウザによってはDeveloperを表示するよう設定する必要があります。

ポインターをおくとハイライトされます(Command+Shift+C)。

次からタイトルの書かれているclassは、“gs-c-promo-heading__title gel-pica-bold”であることがわかります。

<span class="gs-c-promo-heading__title gel-pica-bold">Homes and buildings destroyed in Israel and Gaza</span>

3.テキストを含むclassを取得

top5_10 = [heading for heading in soup.find_all(class_= "gs-c-promo-heading__title gel-pica-bold")]
top5_10
## [<span class="gs-c-promo-heading__title gel-pica-bold">The cost of calling out a 'rape joke'</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Homes and buildings destroyed in Israel and Gaza</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Netanyahu defends press building bombing</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Lebanon: A country in free-fall</span>, <span class="gs-c-promo-heading__title gel-pica-bold">One-minute World News</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Gaza says Sunday was 'deadliest day' so far</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Why is Gaza blurry on Google Maps?</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Hugs and holidays resume as rules ease for millions</span>, <span class="gs-c-promo-heading__title gel-pica-bold">India's Covid crisis hits vaccine-sharing scheme</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Taiwan orders toughest curbs amid Covid spike</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Four arrested in anti-Semitism video investigation</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Missing Houston tiger handed into police</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Tom Cruise signs shirts for Covid-hit football club</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Two killed in collapse at West Bank synagogue</span>, <span class="gs-c-promo-heading__title gel-pica-bold">Netanyahu says Gaza strikes to continue 'at force'</span>]

4.テキストを取得

get_text()でテキストを取り出します。これは便利!

*resultsで、リストresultsをバラして、print()にわたします。

For loopsの場合

results = []
for n in range(15):
  title = top5_10[n].get_text()
  results.append(title)
# print list items line by line
print(*results, sep="\n")
## The cost of calling out a 'rape joke'
## Homes and buildings destroyed in Israel and Gaza
## Netanyahu defends press building bombing
## Lebanon: A country in free-fall
## One-minute World News
## Gaza says Sunday was 'deadliest day' so far
## Why is Gaza blurry on Google Maps?
## Hugs and holidays resume as rules ease for millions
## India's Covid crisis hits vaccine-sharing scheme
## Taiwan orders toughest curbs amid Covid spike
## Four arrested in anti-Semitism video investigation
## Missing Houston tiger handed into police
## Tom Cruise signs shirts for Covid-hit football club
## Two killed in collapse at West Bank synagogue
## Netanyahu says Gaza strikes to continue 'at force'

List comprehensionsの場合

results2 = [top5_10[n].get_text() for n in range(15)]
# print list items line by lin
print(*results2, sep="\n")
## The cost of calling out a 'rape joke'
## Homes and buildings destroyed in Israel and Gaza
## Netanyahu defends press building bombing
## Lebanon: A country in free-fall
## One-minute World News
## Gaza says Sunday was 'deadliest day' so far
## Why is Gaza blurry on Google Maps?
## Hugs and holidays resume as rules ease for millions
## India's Covid crisis hits vaccine-sharing scheme
## Taiwan orders toughest curbs amid Covid spike
## Four arrested in anti-Semitism video investigation
## Missing Houston tiger handed into police
## Tom Cruise signs shirts for Covid-hit football club
## Two killed in collapse at West Bank synagogue
## Netanyahu says Gaza strikes to continue 'at force'

5.Google Sheetsで翻訳

Googleシートにペーストして、次のコードで翻訳しました。

=googletranslate(セルの位置,"en","ja")

ちょっと訳がよくないですが、簡単なので大目にみましょう。がんばれば、Google翻訳を利用したトップニュース翻訳アプリも作れそうですが、、、相当な頑張りが必要かも。