About the course

This course will provide an overview of fundamental concepts for creating an effective data science project and will introduce tools and techniques for data wrangling, statistical modelling, visualisation and reproducible reporting using R, a public domain language for data analysis. The R language provides a rich and flexible environment for working with data, especially data to be used for statistical modelling or graphics.

The R system has an extensive library of packages that offer state-of-the-art-abilities. Many of the analyses that they offer are not even available in any of the standard packages. R enables you to escape from the restrictive environments and sterile analyses offered by commonly used statistical software packages. It enables easy experimentation and exploration, which improves data analysis. R is a tool behind reporting modern data analyses in a reproducible manner making an analysis more useful to others because the data and code that actually conducted the analysis can be made available. As such R has become the lingua franca of quantitative research.

The course will start by introducing the fundamental concepts of R: basic use of R console through RStudion IDE, inputting and importing data, record keeping and general good practice of R project workflow. It will then move onto basic statistical concepts and statistical modelling techniques. Basic statistical concepts, which theoretically may be perceived as complex, can be more effectively communicated by using visualisation. Hence, the formal abstractness of Statistics can be demystified by visualising its application context, which is why a focus will be made on building appropriate visualisation of a given data analysis problem and intelligent reproducible data analysis reporting using RMarkdown. This course will benefit anyone who wants to discover effective and attractive ways to visually analyse and communicate data.

With the knowledge gained in this course, you will be ready to undertake your first very own data analysis.

Course Aims

  • To provide a framework for developing analytical skills for handling a range of data sets and the appropriate analytical methodologies.
  • To familiarise with R/RStudio’s data handling facilities that will expand the range of Data Science problems that can be effectively analysed.
  • To provide the tools and technical skills to enable a range of statistical analysis to be undertaken.
  • To enable the intelligent reproducible reporting of the results of a statistical analysis to target audiences with diverse levels of numerate/statistical understanding.
  • To provide a sufficient base to enable the pursuance of more complex statistical analysis.

Learning Outcomes

On successful completion of this course you will be able to:

  1. Use R for data manipulation and visualisation;
  2. Understand the conceptual issues involved in data analysis;
  3. Critically evaluate the modelling, estimation, validation cycle for data analysis;
  4. Effectively use R as an essential aid to analyse realistic data science problems.

Teaching and Learning Strategy

Essential data handling and statistical modelling techniques are introduced during the tutoring sessions. Student is then expected to use his/her own time to deepen his/her understanding of these models. The conceptual models are made to come alive during the tutoring through the application of R. Student is then expected to use his/her own time to hone acquired data handling expertise. Student is given the opportunity to test his/her knowledge, both conceptual and practical, on a weekly basis through interactive tutoring sessions.

Student is expected to participate fully in all of these delivery modes, but in particular is expected to have attempted any pre-set work and come fully prepared to discuss any problems encountered and debate the ideas and any issues raised.

Indicative Syllabus

Week 1

RStudio IDE; R language; Data classification and summary statistics.

In this lesson you will learn how to use RStudio IDE for R from its installation to RStudio customisation and files navigation. You will learn good habits and practice of workflow in an R project. Once you get comfortable with the RStudio working environment you will move onto mastering the fundamental concepts of R language.

What will you learn:

• Basic use of R/RStudio console

• Good habits for workflow

• Inputting and importing different data types

R environment: record keeping

• Data classification

• Descriptive summary statistics

R graphics

Week 2

Introduction to DA Methodology and Measured vs. Attribute (MvA) type bivariate DA

In this lesson you will learn fundamental concepts of statistical modelling starting with exploring the data by using appropriate plots and computation of descriptive statistic and moving onto inferential statistics of parameter estimation and hypothesis testing. You will learn how to match up data types with an appropriate statistical model with the focus on the ‘Measured vs Attribute’ type of a bivariate data analysis problem. With the knowledge from this lesson you will be able to conduct basic ‘MvA’ type statistical analysis, interpret and report its outcomes in an appropriate manner.

What will you learn:

• Concept of statistical distribution

• Exploring different data types

• Common data-analysis methodology

• To investigate relationships between M and A variables

  • Two tail t-test

  • One-way ANOVA

• Statistical reporting

Week 3

Regression Modelling: MvM type bivariate DA

You will learn how to describe relationships between two measured variables. You will depict these relationships graphically, in the form of summary statistics, and through simple linear regression models. Hence, in this lesson regression modelling is the key modelling construct that is developed. You will learn the importance of selecting an appropriate causal model depending upon particular circumstances. With the knowledge from this lesson you will be able to conduct simple regression analysis, interpret and report its outcomes in an appropriate manner.

What will you learn:

• to investigate relationships between M and M variables by:

  • fitting a linear model

  • estimating parameters

  • validating model: the coefficient of determination \(R^2\)

  • interpretation of the parameters and reporting of the nature of the relationship

• about the conceptual issues involved in data analysis

• to critically evaluate the modelling, estimation, validation cycle for data analysis

Week 4

Wrangling and Visualising Data

In this lesson you will learn some of the fundamental techniques for data exploration and transformation through the use of dplyr package. This tidy verse package helps make your exploration Intuitive to write and easy to read. You will learn dplyr’s key verbs for data manipulation that would help you uncover and shape the information within the data that is easy to turn into informative plots. Through the use of grammar of graphics plotting concepts implemented in ggplot2 package you will be able create meaningful exploratory plots. You will develop understanding about way in which you should be able to think about necessary data transformations and summaries that can lead to an informative visualisation.

What will you learn:

dplyr’s key data manipulation verbs:

  • select,

  • mutate,

  • filter,

  • arrange and

  • summarise/summarize

• to aggregate data by groups

• to chain data manipulation operations using the pipe operator

• to specify ggplot2 building blocks and combine them to create graphical display

• about the philosophy that guides ggplot2: grammatical elements (layers) and aesthetic mappings.

Week 5

Reproduceable reporting using RMarkdown

In this lesson you will learn how to turn your analyses into high quality documents and presentations with R Markdown. You will be designing reproducible reports by automating the reporting process. With the knowledge from this lesson you will be able to create reports straight from your R code allowing you to document your analysis and its results as an HTML, pdf, slideshow or Microsoft Word document.

What will you learn:

• Authoring R Markdown Reports

• Embedding R Code

knitr to compile dynamic R code

\(LaTex\) to incorporate mathematical expressions

• Deploy the document