Question 1/A - Certification

The certificate of completion for the R for Data Science: Analysis and Visualization by Professor Poulson on LinkedIn Learning will follow on the next page.

Certification Image
Certification Image

Question 1/B - Summary

Part 2 of question one asks for a summary of the course on LinkedIn Learning: R for Data Science: Analysis and Visualization by Professor Poulson.

The R for Data Science course, offered on the LinkedIn Learning platform, comprehensively introduces R and RStudio. It includes videos with short sections detailing the R programming language and how it can be used for data exploration, visualization, and statistical modeling. Throughout the course, the instructor, Barton Poulson, uses RStudio to demonstrate how to get started using R and includes beginner files to start learning to code alongside the videos. The course covers the RStudio environment as well as tools and packages that can be used to facilitate data analysis.

The course begins with discussing where R performs well, mainly since Python and other programming languages are prevalent in computer science and mathematics. R is still the language of choice for statistical modeling, so it is an essential tool for anyone in mathematics or related fields. R for Data Science covers a wide range of information, and section two introduces students to working in RStudio using basic commands. Following along with the videos allows students to begin learning the language immediately. Students are also introduced to the RStudio environment, including how to use headers, comments, and piping, which allows users to sequence commands more efficiently in R. Necessary packages, such as tidyverse and ggplot2, are discussed, as are sample datasets, importing data from spreadsheets, and creating databases in R.

Section three covers data visualization in R and the importance of visualizing results in comprehending the information that has been collected. Students learn how to create bar charts, which are great for categorical data, histograms to show the distribution of continuous data, and box plots, which provide a visualization of the data spread and help to identify outliers. Advanced visualization charts such as scatterplots, line charts, and cluster charts make it possible to look at multi-variable relationships and trends over time.

Section four addresses data wrangling. Collecting data in groups or focusing on specific information is often helpful, particularly with large datasets. Professor Poulson discusses creating subgroups of data and the importance of excluding data when needed. Often, recoding and computing new variables are needed to work with the information that has been collected. The data may not be pristine; for instance, character strings may need to be transformed into factors, or new variables may need to be created.

The course’s final section details what to do with the data once it has been input and cleaned for numerical analysis. The data can be further sorted into factors and variables for computing frequencies to summarize categorical data. Descriptive statistics, including the median, mean, and standard deviation, can be quickly and efficiently computed from the data. Contingency tables, correlations, and regression lines analyze relationships between variables to further visualize trends in the dataset.

Overall, the course was a concise program and gives new R users valuable tools so that they can immediately begin their journey. Professor Poulson highlights everything needed to start, giving students a platform to explore R and RStudio.

References

Poulson, B. (2023, January 23). R for data science - R for Data Science: Analysis and Visualization. LinkedIn. https://www.linkedin.com/learning/r-for-data-science-analysis-and-visualization/r-for-data-science?u=222312266

Question 2 - “mtcars” dataset

Use the built-in “mtcars” dataset to find the the average miles per gallon and weight if you group by the number of cylinders.

mtcars %>%
  group_by(cyl) %>%
  summarise(
    avg_mpg = mean(mpg),
    avg_wt = mean(wt)
  )

Cars with 4 cylinders on average get approximately 26.7 miles per gallon and weigh approximately 2,286 lbs.

Cars with 6 cylinders on average get approximately 19.7 miles per gallon and weigh approximately 3,100 lbs.

Cars with 8 cylinders on average get approximately 15.1 miles per gallon and weigh approximately 4,000 lbs.