Edmund Julian Ofilada
Data Scientist
This presentation was created using R in RStudio
In 2012 I decided to write a paper to describe the results of POHJD, Project oral Health for juvenile Diabetics. My initial foray into research led me to the wonderful world of R programming and the community of Coursera.
Data Science is a rapidly accelerating field that combines expertise in the management, analysis and visualization of largescale and complex data. The Johns Hopkins Bloomberg School of Public Health offers a structured and comprehensive Data Science Specialization through Coursera, a leading provider of Massive Open Online Courses (MOOCs).
The data science specialization introduced me to R beginning with the Data Scientist's Toolbox course and culminating in the 10th and final course, the capstone project. The capstone project was designed to test the learners acquired knowledge and skills from the previous courses.
Course certificates may be verified by clicking on the course names
The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like version control, markdown, git, GitHub, R, and RStudio. Grade Achieved: 96.0%
The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. Topics in statistical data analysis will provide working examples. Grade Achieved: 100.0%
The course covers obtaining data from the web, from APIs, from databases and from colleagues in various formats. It also covers the basics of data cleaning and how to make data "tidy". Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data. Grade Achieved: 100.0%
This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. - Grade Achieved: 98.0%

This course focuses on the concepts and tools behind reporting modern data analyses in a reproducible manner. Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. Grade Achieved: 100.0

Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data oriented strategies and explicit use of designs and randomization in analyses. Furthermore, there are broad theories (frequentists, Bayesian, likelihood, design based, .) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference. A practitioner can often be left in a debilitating maze of techniques, philosophies and nuance. This course presents the fundamentals of inference in a practical approach for getting things done. Grade Achieved: 100.0%

This course covers regression analysis, least squares and inference using regression models. Special cases of the regression model, ANOVA and ANCOVA are covered as well. Analysis of residuals and variability will be investigated. The course also covers modern thinking on model selection and novel uses of regression models including scatterplot smoothing. Grade Achieved: 100.0%

Linear Regression
This course will cover the basic components of building and applying prediction functions with an emphasis on practical applications. The course will provide basic grounding in concepts such as training and tests sets, overfitting, and error rates. The course will also introduce a range of model based and algorithmic machine learning methods including regression, classification trees, Naive Bayes, and random forests. The course will cover the complete process of building prediction functions including data collection, feature creation, algorithms, and evaluation. Grade Achieved: 94.0%

A data product is the production output from a statistical analysis. Data products automate complex analysis tasks or use technology to expand the utility of a data informed model, algorithm or inference. The course covers the basics of creating data products using Shiny, R packages, and interactive graphics. The course focusses on the statistical fundamentals of creating a data product that can be used to tell a story about data to a mass audience. Grade Achieved: 100.0%

The capstone project class allow students to create a usable/public data product that can be used to show their skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners. Grade Achieved: 96.5%

To reinforce and expand on what i learned previously I took another statistics specialization (5 courses) with a whole course devoted to Bayesian Statistics
Course projects i submitted as a requirement for this specialization may be viewed in this repository
The course introduces the learner to sampling and exploring data, as well as basic probability theory and Bayes' rule. A variety of exploratory data analysis techniques will be covered, including numeric summary statistics and basic data visualization. Grade Achieved: 99.4%
The course covers commonly used statistical inference methods for numerical and categorical data. The student will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of his analysis using R and RStudio (free statistical software). Grade Achieved: 98.8%

The course introduces simple and multiple linear regression models. These models allow the learner to assess the relationship between variables in a data set and a continuous response variable. Grade Achieved: 99.4%

The course applies Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. Grade Achieved: 97.5%

For the capstone project, a large and complex dataset was provided to the learners. The analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different. - Grade Achieved: 92.5%
To reinforce and expand on what i learned previously I took another statistics specialization (5 courses) with a whole course devoted to Bayesian Statistics
This course provides a rigorous introduction to the R programming language, with a particular focus on using R for software development in a data science setting. Grade Achieved: 100.0%

This course covers advanced topics in R programming that are necessary for developing powerful, robust, and reusable data science tools. Topics covered include functional programming in R, robust error handling, object oriented programming, profiling and benchmarking, debugging, and proper design of functions. Grade Achieved: 97.5%

Code must be organized and distributed in a manner that adheres to community-based standards and provides a good user experience. It cover R package development, writing good documentation and vignettes, writing robust software, cross-platform development, continuous integration tools, and distributing packages via CRAN and GitHub. Grade Achieved: 97.5%
Visualization remains one of the most powerful ways to draw conclusions from data, but the influx of new data types requires the development of new visualization techniques and building blocks. This course provides the skills for creating new visualization building blocks using the ggplot2 framework and describe how to use and extend the system to suit the specific needs of organizations or teams. Grade Achieved: 97.0%
The goal of this course is to acquaint learners with Unix-like operating systems like macOS and Linux distributions like Ubuntu. Grade Achieved: 93.7%
The course builds a strong foundational knowledge of the Git version control system. Developers and members of IT (Information Technology) are the primary audience. Grade Achieved: 90.3%
Interprofessional Healthcare Informatics is a graduate-level, hands-on interactive exploration of real informatics tools and techniques offered by the University of Minnesota and the University of Minnesota's National Center for Interprofessional Practice and Education. It incorporates technology-enabled educational innovations to bring the subject matter to life and create a vital online learning community and healthcare informatics network. Grade Achieved: 95.6%
This course provides an introduction to systems thinking and systems models in public health. Problems in public health and health policy tend to be complex with many actors, institutions and risk factors involved. If an outcome depends on many interacting and adaptive parts and actors the outcome cannot be analyzed or predicted with traditional statistical methods. Systems thinking is a core skill in public health and helps health policymakers build programs and policies that are aware of and prepared for unintended consequences. Grade Achieved: 90.0%

The course introduces learners to methods on how perform systematic reviews and meta-analysis of clinical trials. We will cover how to formulate an answerable research question, define inclusion and exclusion criteria, search for the evidence, extract data, assess the risk of bias in clinical trials, and perform a meta-analysis. Grade Achieved: 89.9%

Results from 13 studies examining the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis.
The course will explain the basic principles for design of randomized clinical trials and how they should be reported. The first part of the course will introduce students to terminology used in clinical trials and the several common designs used for clinical trials, such as parallel and cross-over designs. The course will also explain some of the mechanics of clinical trials, like randomization and blinding of treatment. Grade Achieved: 100.0%

A conceptual and interpretive public health approach to some of the most commonly used methods from basic statistics. Grade Achieved: 96.5%

A practical and example filled tour of simple and multiple regression techniques (linear, logistic, and Cox PH) for estimation, adjustment and prediction. Grade Achieved: 75.0%

This course presents the fundamental probability and statistical concepts used in elementary data analysis. Grade Achieved: 96.1%

Often called "the cornerstone" of public health, epidemiology is the study of the distribution and determinants of diseases, health conditions, or events among populations and the application of that study to control health problems. By applying the concepts learned in this course to current public health problems and issues, students will understand the practice of epidemiology as it relates to real life and makes for a better appreciation of public health programs and policies. This course explores public health issues like cardiovascular and infectious diseases - both locally and globally - through the lens of epidemiology. Grade Achieved: 84.4%
The purpose of this Center is to train you to be the best Community Mentors you can be. You will learn Community Management best practices, how to respond to participants in the forums, and how to respond to common student questions. After you complete training, this Resource Center will become a Portal for you to contact members of the Coursera Community Team, engage with fellow Mentors from other courses, and continue to learn best practices for Community Mentoring. Grade Achieved: 98.8%
This grading training course is designed to help you become better at assessing and leaving feedback on peer assignments. For anyone wanting to become an official grader, completion of this course is mandatory. Grade Achieved: 100%
Projects that I submitted to the data scientist specialization course may be found at my Datascience repository
My capstone project and related submissions may be found at the following links:
- https://docofi.shinyapps.io/ShinyApp10/
- http://www.rpubs.com/DocOfi/170100
- http://www.rpubs.com/DocOfi/170079
- https://docofi.shinyapps.io/capstone/
- http://www.rpubs.com/DocOfi/194236

Projects i created on my own initiative.