Lab 2

article summary

The Medium article, Swedish Question Answering with BERT, proposes an approach for creating a BERT model for a previously un-modeled language for training on a QA (question answering) task. Using SQuAD 2.0, a reading comprehension set consisting of contexts from a set of Wikipedia articles, Dr. Susumu Okazawa details the machine-learning process for constructing Swedish QA systems with BERT-family. His team focuses on the illustrated method below:

translate the English dataset into Swedish with the Google Translate API
fine-tune the National Library of Sweden’s pre-trained Swedish BERT
use the fine-tuned Swedish BERT for Swedish QA

In properly preparing the datasets for extractive question answering tasks, a method for resolving context and position translation issues had to be defined (1. insert a marker to isolate the answer within the context; 2. translate the marked context; 3. extract the marked span). The tuned model was then trained and evaluated on the resulting dataset – performance metrics can be seen in the article!

references

SQuAD 2.0 - https://rajpurkar.github.io/SQuAD-explorer/
KB Lab’s Swedish Bert model hub - https://huggingface.co/susumu2357/bert-base-swedish-squad2
Swedish SQuAD 2.0 dataset - https://huggingface.co/datasets/susumu2357/squad_v2_sv
Susumu Okazawa, PhD’s github – https://github.com/susumu2357/SQuAD_v2_sv
Savantic AB - https://www.savantic.se/en/startsida-english/

medium stats

publication	publication_date	article_claps
Towards Data Science	2021-02-25	22

author & publication

Susumu Okazawa, PhD, is a machine learning engineer at Savantic AB, a Stockholm-based research institute dedicated to raising the level of AI competency in Sweden through industry consulting (providing data driven solutions) and public education (offering courses, talks, and workshops).

discussion

what do i think?

I thought it was super cool to see the problem-solving process of a machine learning engineering team – as someone interested in NLP and its applications, I thought it was super fascinating to see the reconciliation of non-english language models and translation tasks. It was super surprising to see that the Swedish BERT model was developed by the National Library of Sweden! I’d love to further explore their model hub and see what other ML projects the lab has proposed.

Lab 2

Izzy Shehan

03-24-2021

article summary

references

medium stats

author & publication

discussion

what do i think?

kable and plotly

Lab 2

Izzy Shehan

03-24-2021

article summary

references

medium stats

author & publication

discussion

related articles

what do i think?

kable and plotly