Slide 3: App data arrangement
Due to the large size of training data, we did few things to optimize:
- Indexed dictionary: So word will be encoded into numbers (i.e. indexes)
- Put ngramCountTable into files: I saved each n-gram count into individual files using n-gram index as file name