news <- readLines("E:/Nada/Others/Courses/Data Science Specialization/Ex/Capstone/final/en_US/en_US.news.txt", encoding = "UTF-8", skipNul = TRUE, warn=FALSE)
blogs <- readLines("E:/Nada/Others/Courses/Data Science Specialization/Ex/Capstone/final/en_US/en_US.blogs.txt", encoding = "UTF-8", skipNul = TRUE, warn=FALSE)
twitter <- readLines("E:/Nada/Others/Courses/Data Science Specialization/Ex/Capstone/final/en_US/en_US.twitter.txt", encoding = "UTF-8", skipNul = TRUE, warn=FALSE)
This part calculates maximum, minimum and average number of words per line in the three files.
## Dataset Lines Chars Words WPL_Min WPL_Mean WPL_Max
## 1 news 77259 15639408 2651432 1 34.61779 1123
## 2 blogs 899288 206824382 37570839 0 41.75107 6726
## 3 twitter 2360148 162096241 30451170 1 12.75065 47
This part calculates maximum, minimum and average number of words per line in the three files.
This part calculates maximum, minimum and average number of words per line in the three files.