Datascience capstone project presentation

Lian Rui
September 27, 2018.

Background and Objectives.

This is the summary report on the capstone project of “datascience specialization” provided by John Hopskins University.

Methodology.

The ngram approach was used to build the language model:

Further works.

There are several limitations in this simple case:

  • 1. I applied stupid backoff method by Brants et al. But the key assumption for stupid backoff is huge corpus. So far I don't know if cureent training set is big enough to meet the assumption;
  • 2. As such, I didn't compare the stupid backoff approach with other more sophisticated smoothing method, for example Kneser-Ney Smoothing;
  • 3. I didn't set test data set to quantify the accuracy of the model;
  • 4. In terms of programming, the current codes are kind of 'cumbersome', further efforts can definitely optimize the efficiency.

Acknowledgements.

I've learnt great deal in this capstone project and the JHU datascience specialization.

  • Many thanks for the great great professors in datascience specialization: Jeff Leek, Roger D. Peng and Brian Caffo
  • Many thanks for the mentors in the forum of datascience speciliation;
  • Many thanks for the students who reviewed and commented my every assignment;