Mukesh Saxena
1-Aug-2018
MOOC stands for massive open online course (MOOC) which is an online education system providing various courses, which aims at large-scale interactive participation and open access via web. MOOC aims to provide real time education online with the help of various features like videos, study materials, quizzes and online exams and also tries to make it more efficient than the real time education in class rooms by removing time constraints and location constraints. MOOCs also provide interactive discussion sessions for the user through interactive discussion forums that help to build a community for the students and professors.
Coursera is an online learning platform founded by Stanford professors Andrew Ng and Daphne Koller that offers courses, specializations, and degrees. Open EdX is an open source platform for building MOOCs with various ad-vanced features to make the online education more effective. Standard features being provided by Coursera are listed as follows:
-> Peer Grading.
-> Self Grading.
-> Staff Grading.
Data Science is a rapidly accelerating field that combines expertise in the management, analysis and visualization of large-scale and complex data. The Johns Hopkins Bloomberg School of Public Health is expanding its open education offerings in this area with a structured and comprehensive Data Science Specialization offered through Coursera, a leading provider of Massive Open Online Courses (MOOCs).
What makes the specialization unique is that it covers the complete set of skills for data science from soup to nuts.
This course was an introduction to the tools and ideas that we will see throughout the rest of the Data Science Specialization.The course track was focused on providing us with two things:
1) An introduction to the key ideas behind working with data in a scientific way that will produce new and reproducible insight.
2) An introduction to the tools that will allow us to execute on a data analytic strategy, from raw data in a database to a completed report with interactive graphics.
This course was primarily focused on getting set up with the appropriate tools and accounts we will need for the rest of the specialization and on giving us a solid grounding in the key conceptual ideas. We were also made to Install R Studio, Setting up of GitHub account. Also an introduction to the forums, how to find help and how to use forums was told.
This is the second course in the Data Science Specialization and it focuses on the nuts and bolts of using R as a programming language. The Course started with an overview and history of R followed by data types, objects, vectors, lists, data frames, Matrices, Factors, how to deal with missing values in data, Reading tables and subsetting.
This course focused on preparing us for collecting and cleaning data for downstream analysis and sharing. One of the major components of a data scientist's job is to collect and clean data. Whether at a small organization or a major enterprise, the first step in using data is getting, cleaning and understanding the data. In this course, we focus on R packages and a few outside tools that can be used to collect data from a variety of sources, from Excel files to databases like MySQL. We were also tought about variety of formats including JSON, XML, and flat files (.csv, .txt).
The course also contained, Reading from Web,API's and various other formats,dplyr library,Managing and merging different data sets. The emphasis of this course was on creating tidy data sets that can be used in downstream analyses.
This is the fourth course in the Data Science Specialization. Exploratory data analysis (EDA) is a key element of data science because it allows you to develop a rough idea of what your data look like and what kinds of questions might be answered by them. EDA is often the “fun part” of data analysis, where you get to play around with the data and, well, explore!
The course empasis on different operations that can be done on a data like Statistical operations, plotting, using graphics devices, Clustering, Dimension Reduction.
In this course we learned about the ideas of reproducible research and reporting of statistical analyses. Topics covered include literate programming tools, evidence-based data analysis, and organizing data analyses. In this course we also learned to write a document using R markdown, integrate live R code into a literate statistical program, compile R markdown documents using knitr and related tools, publish reproducible documents to the web(Rpubs), and organize a data analysis so that it is reproducible and accessible to others.
This course presents the fundamentals of statistical inference that you will need throughout the rest of the Data Science track.
This Specialization is focused on providing us with two things:
This course presents the fundamentals of regression modeling that you will need for the rest of the specialization and ultimately for your work in the field of data science.Linear models, as their name implies, relates an outcome to a set of predictors of interest using linear assumptions. Regression models, a subset of linear models, are the most important statistical analysis tool in a data scientist's toolkit. This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA. Analysis of residuals and variability were investigated.
The course also covered modern thinking on model selection and novel uses of regression models including scatterplot smoothing. Regression Models represents a both fundamental and foundational component of the series, and it presents the single most practical data analysis toolset, using only a bare minimum of mathematics.
This course focuses on developing the tools and techniques for understanding, building, and testing prediction functions.
These tools are at the center of the Data Science revolution. Many researchers, companies, and governmental organizations would like to use the cheap and abundant data they are collecting to predict what customers will like, what services to offer, or how to improve people's lives.
One of the most common tasks performed by data scientists and data analysts are prediction and machine learning. This course covered the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates.
The course also introduced a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course also coverd the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation.
A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. This course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course focused on the fundamentals of creating a data product that can be used to tell a story about data to a mass audience.
In this class we learned a variety of core tools for creating data products in R and R Studio.
The objective of this Capstone Project is to produce a predictive text algorithm written in R that based on a user's text input will suggest the next 8 most likely words to be entered.
As the user inputs characters the set of characters will be compared to text against a word list. The predicted word will be the word that has the highest probability following the previous word or multi-word phrase.
In the current project stage the dataset has been downloaded from Coursera and Swiftkey. Some initial exploratory data analysis has been performed along with some data preparation in order to proceed with the predictive modeling and construction of the end user application.
The next objective is to find the optimal sample size from the dataset required to build a corpus on which to train the prediction algorithm.