My Courses in Coursera

My journey into R

Edmund Julian Ofilada
Data Scientist

Press arrow keys to direction you want to go
press 'f' to enable fullscreen mode or
press 'o' for overview mode
press 'w' to make the screen narrow or wide
press 'g' to jump to a particular slide
press '?' to search for a particular word
Pressing Esc exits all of these modes.

This presentation was created using R in RStudio

In 2012 I decided to write a paper to describe the results of POHJD, Project oral Health for juvenile Diabetics. My initial foray into research led me to the wonderful world of R programming and the community of Coursera.

Johns Hopkins Data Science Specialization in Coursera

Data Science is a rapidly accelerating field that combines expertise in the management, analysis and visualization of largescale and complex data. The Johns Hopkins Bloomberg School of Public Health offers a structured and comprehensive Data Science Specialization through Coursera, a leading provider of Massive Open Online Courses (MOOCs).

The data science specialization introduced me to R beginning with the Data Scientist's Toolbox course and culminating in the 10th and final course, the capstone project. The capstone project was designed to test the learners acquired knowledge and skills from the previous courses.

Course certificates may be verified by clicking on the course names

The Data Scientist's Toolbox

The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio. Grade Achieved: 96.0%

R Programming

The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples. Grade Achieved: 100.0%

Getting and Cleaning Data

The course covers obtaining data from the web, from APIs, from databases and from colleagues in various formats. It also covers the basics of data cleaning and how to make data "tidy". Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data. Grade Achieved: 100.0%

Exploratory Data Analysis

This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. - Grade Achieved: 98.0%

Plotting Air Pollution Data

Reproducible Research

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. Grade Achieved: 100.0

Demonstrating the effects of different types of imputation

Statistical Inference

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, .) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. Grade Achieved: 100.0%

Demonstrating the Central Limit Theorem through Simulation

Regression Models

This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA are covered as well. Analysis of residuals and variability will be investigated. The course also covers modern thinking on model selection and novel uses of regression models including scatterplot smoothing. Grade Achieved: 100.0%

plot of chunk unnamed-chunk-1

Linear Regression

Practical Machine learning

This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation. Grade Achieved: 94.0%

Classification and Regression Tree Models

Developing Data Products

A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. The course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course focusses on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience. Grade Achieved: 100.0%

plot of chunk unnamed-chunk-2

Data Science Capstone

The capstone project class allow students to create a usable/public data product that can be used to show their skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners. Grade Achieved: 96.5%

Plotting Word Clouds - Natural Language Processing

Statistics with R Specialization - Duke University

To reinforce and expand on what i learned previously I took another statistics specialization (5 courses) with a whole course devoted to Bayesian Statistics

Introduction to Probability and Data
Inferential Statistics
Linear Regression and Modeling
Bayesian Statistics
Capstone Project

Course projects i submitted as a requirement for this specialization may be viewed in this repository

Introduction to Probability and Data

The course introduces the learner to sampling and exploring data, as well as basic probability theory and Bayes' rule. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. Grade Achieved: 99.4%

Prevalence of eye complications among the different age groups among diabetic subjects. Data from the 2013 Behavioral Risk Factor Surveillance System (BRFSS)

Inferential Statistics

The course covers commonly used statistical inference methods for numerical and categorical data. The student will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of his analysis using R and RStudio (free statistical software). Grade Achieved: 98.8%

Is confidence in educational institutions associated with opinions regarding Government spending for education? Data is a subset from the General Social Survey (GSS) Cumulative File 1972-2012

Linear Regression and Modeling

The course introduces simple and multiple linear regression models. These models allow the learner to assess the relationship between variables in a data set and a continuous response variable. Grade Achieved: 99.4%

Residual diagnostics - Predicting movie popularity using dataset from the Rotten Tomatoes and Internet Movie Database

Bayesian Statistics

The course applies Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. Grade Achieved: 97.5%

Choosing a model by Bayesian Information Criterion BIC

Statistics with R Capstone

For the capstone project, a large and complex dataset was provided to the learners. The analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different. - Grade Achieved: 92.5%

Predicting House Prices in Ames, Iowa. Data from the Ames Assessor's Office from 2006 to 2010

Mastering Software Development in R Specialization - Johns Hopkins University

To reinforce and expand on what i learned previously I took another statistics specialization (5 courses) with a whole course devoted to Bayesian Statistics

Introduction to Probability and Data
Inferential Statistics
Linear Regression and Modeling
Bayesian Statistics
Capstone Project

The R Programming Environment

This course provides a rigorous introduction to the R programming language, with a particular focus on using R for software development in a data science setting. Grade Achieved: 100.0%

Exploring Zika cases in Brazil - Data from the Center for Disease Control Epidemic Prediction Initiative

Advanced R Programming

This course covers advanced topics in R programming that are necessary for developing powerful, robust, and reusable data science tools. Topics covered include functional programming in R, robust error handling, object oriented programming, profiling and benchmarking, debugging, and proper design of functions. Grade Achieved: 97.5%

Creating and profiling functions

Building R packages

Code must be organized and distributed in a manner that adheres to community-based standards and provides a good user experience. It cover R package development, writing good documentation and vignettes, writing robust software, cross-platform development, continuous integration tools, and distributing packages via CRAN and GitHub. Grade Achieved: 97.5%

Building Data Visualization Tools

Visualization remains one of the most powerful ways to draw conclusions from data, but the influx of new data types requires the development of new visualization techniques and building blocks. This course provides the skills for creating new visualization building blocks using the ggplot2 framework and describe how to use and extend the system to suit the specific needs of organizations or teams. Grade Achieved: 97.0%

Plotting Wind Speed and Radius of Typhoon Haiyan

Other courses from Coursera

The Unix Workbench

The goal of this course is to acquaint learners with Unix-like operating systems like macOS and Linux distributions like Ubuntu. Grade Achieved: 93.7%

Version Control with Git

The course builds a strong foundational knowledge of the Git version control system. Developers and members of IT (Information Technology) are the primary audience. Grade Achieved: 90.3%

Interprofessional Healthcare informatics

Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. It incorporates technology-enabled educational innovations to bring the subject matter to life and create a vital online learning community and healthcare informatics network. Grade Achieved: 95.6%

Systems Thinking in Public Health

This course provides an introduction to systems thinking and systems models in public health. Problems in public health and health policy tend to be complex with many actors, institutions and risk factors involved. If an outcome depends on many interacting and adaptive parts and actors the outcome cannot be analyzed or predicted with traditional statistical methods. Systems thinking is a core skill in public health and helps health policymakers build programs and policies that are aware of and prepared for unintended consequences. Grade Achieved: 90.0%

Introduction to Systems Thinking and Understanding Complex Adaptive Systems

Introduction to Systematic Review and Meta-Analysis

The course introduces learners to methods on how perform systematic reviews and meta-analysis of clinical trials. We will cover how to formulate an answerable research question, define inclusion and exclusion criteria, search for the evidence, extract data, assess the risk of bias in clinical trials, and perform a meta-analysis. Grade Achieved: 89.9%

plot of chunk unnamed-chunk-3

Results from 13 studies examining the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis.

Design and Interpretation of Clinical Trials

The course will explain the basic principles for design of randomized clinical trials and how they should be reported. The first part of the course will introduce students to terminology used in clinical trials and the several common designs used for clinical trials, such as parallel and cross-over designs. The course will also explain some of the mechanics of clinical trials, like randomization and blinding of treatment. Grade Achieved: 100.0%

Plotting Treatment Outcomes

Statistical Reasoning for Public Health 1: Estimation, Inference, & Interpretation

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics. Grade Achieved: 96.5%

Creating and interpretting Kaplan Meier Survival Plots

Statistical Reasoning for Public Health 2: Regression Methods

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction. Grade Achieved: 75.0%

Predicting arm circumference among Nepalese children

Mathematical Biostatistics Boot Camp 1

This course presents the fundamental probability and statistical concepts used in elementary data analysis. Grade Achieved: 96.1%

Bayesian Statistics

Epidemiology: The Basic Science of Public Health

Often called "the cornerstone" of public health, epidemiology is the study of the distribution and determinants of diseases, health conditions, or events among populations and the application of that study to control health problems. By applying the concepts learned in this course to current public health problems and issues, students will understand the practice of epidemiology as it relates to real life and makes for a better appreciation of public health programs and policies. This course explores public health issues like cardiovascular and infectious diseases - both locally and globally - through the lens of epidemiology. Grade Achieved: 84.4%

Coursera Mentor Community and Training Course

The purpose of this Center is to train you to be the best Community Mentors you can be. You will learn Community Management best practices, how to respond to participants in the forums, and how to respond to common student questions. After you complete training, this Resource Center will become a Portal for you to contact members of the Coursera Community Team, engage with fellow Mentors from other courses, and continue to learn best practices for Community Mentoring. Grade Achieved: 98.8%

Courera Graders Course

This grading training course is designed to help you become better at assessing and leaving feedback on peer assignments. For anyone wanting to become an official grader, completion of this course is mandatory. Grade Achieved: 100%

Coursera Beta Testing Course

My Portfolio

Projects that I submitted to the data scientist specialization course may be found at my Datascience repository

My capstone project and related submissions may be found at the following links:

- https://docofi.shinyapps.io/ShinyApp10/
- http://www.rpubs.com/DocOfi/170100
- http://www.rpubs.com/DocOfi/170079
- https://docofi.shinyapps.io/capstone/
- http://www.rpubs.com/DocOfi/194236

Repository for my Statistics with R Project

Plotting_Earthquake_Data_in_R

My Data Visualization Projects

Projects i created on my own initiative.