this is where you give details about what you’ve been collecting and how much you data have; why you choose this data to collect; how you managed the quality and frequency of collection issues; what you did to anonymise or de-identify the data, and how you dealt with the storage and sharing of data within the group.
Criterion 1: Strength of justification for the method to obtain data from multiple sources, for gaining insight into a chosen problem, including analysis of data quality issues in the individual and group data. Consider, do you tell the reader:
What are you exploring and why?
How might data help inform this issue? What ethical and privacy considerations have you made (you should consider these throughout the report)?
What have you collected, how, and what are the issues with this method?
You can include a simple table here to give an overview. But make sure you also explain the table, and connect what you have collected and the method to (1) the need or issues you’re addressing, and (2) issues such as privacy, ethics, data quality, potential of data science vs other approaches, etc.
What data cleaning, missing data, validation, etc. have you considered? What are the impacts of any issues in your data?
Explain your investigation and data analysis.
You should include examples of all the datasets, including (1) your personal data, and (2) the pooled group unstructured and structured data. Draw on external sources of summary data as a comparison.
Criterion 2: Insightfulness in the analysis of the obtained data, including quality issues, to draw conclusions in a professional and engaging manner. Consider, do you:
You should include your analysis here using R code, or by loading images and tables you’ve saved from other programs, like this. You’ll see I set a pixel width for the first one.
knitr::include_graphics(here::here("AT2_default_template/data/uts_logo_new.png"), dpi = NA)Or like this:
You can include numbers and analysis manually (by typing).
You can also **output analysis such as correlation coefficients in-line with code: 0.418684* That looked like this:
cor.test(as.numeric(mtcars$mpg),as.numeric(mtcars$qsec))$estimate[[1]]
If you downloaded the full folder, then the csv files you need are there.
You must always cross reference your figures, never have a visualisation that is not explained explicitly in the text. See Figure 2.1.
par(mar = c(4, 4, .2, .1))
plot(cars) # a scatterplotFigure 2.1: The cars data.
viz <- tibble(
long_name = c("ggplot2","ggiraph","RAWGraphs","Tableau"),
short_name = c("ggplot2","ggiraph","RAWGraphs","Tableau"),
blurb = c("graphics grammar for R","Build interactive htmlwidget ggplots","A non-code easy to use but very impressive free visualisation tool, highly recommend","A non-code visualisation tool, lots of functions with a bit of a learning curve. Free for students"),
viz_url = c("https://ggplot2.tidyverse.org/","https://davidgohel.github.io/ggiraph/articles/offcran/using_ggiraph.html","https://rawgraphs.io/gallery","https://www.tableau.com/academic/students#form"),
image_url = c("https://raw.githubusercontent.com/tidyverse/ggplot2/main/man/figures/logo.png","https://davidgohel.github.io/ggiraph/reference/figures/logo.png","https://raw.githubusercontent.com/densitydesign/raw/master/imgs/raw_header.jpg","https://upload.wikimedia.org/wikipedia/en/thumb/0/06/Tableau_logo.svg/250px-Tableau_logo.svg.png"),
features = c("general-viz; R","general-viz; interactive; R", "general-viz; GUI","general-viz; GUI")
)
viz %>%
cards(
title = long_name,
text = blurb,
link = viz_url,
image = image_url,
tags = paste("all;", features),
width = 4,
footer = features,
layout = "label-right"
)
# ADD CARDS HERE FOR anything else around EDA and visualisation. Below demo some ideas Make sure to review materials from weeks 4-6 in Canvas
You can see in the code chunk below that we read in a csv, and then create a visualisation (or, include a file image), and other statistical analyses. Here we’re just outputing Figure 2.2, but you’d do more analysis (and discuss it properly in a paragraph).
mt_data <- readr::read_csv(readr_example("mtcars.csv"))
mt_data %>%
ggplot(aes(x = mpg, y = qsec)) +
geom_point()Figure 2.2: Read mtcars data and basic scatterplot.
Criterion 3 Insightfulness in identification, contextualisation and reflection on ethical, privacy, and legal issues relevant to the collection, analysis, and use of one’s own and other’s personal data: The AT2 is designed to give you experience in collecting and working with personal data. It can be confronting at times, and you should consider issues of privacy and ethics, with regard to both the specifics of what you did, and implications for data science, connecting to legal and ethical frameworks. You could imagine what would happen if a data science troll (malicious agent) gained access to your data. We want you to consider:
Think about: Is there relevant ethical or legal guidance from: Australian law, professional organisations (medical, sales/marketing), non-disclosure and consent agreements, terms terms and conditions that you sign when using various apps and websites, or ethical frameworks, and how they help us navigate the benefits and risks of actions Tools like the ODI data ethics canvas or The Omidyar EthicalOS Risk Mitigation checklist
discuss aspects of the process that you see as important. For example, what difficulties did you encounter; how could you avoid problems if you did it again; etc General reflection on what you learnt during this task. What are you unsure about? What would you do differently if you had to do it all again?
What are the implications of your work for data science? What issues have you encountered that exist in wider data science, or that data science practices help address? Are there any innovations that arise from your work? (for individual people, for companies building QS applications, for data science work e.g. around cleaning data, for policy, etc.etc.)
Criterion 4 Strength of connection between the individual experience of this QS project to the practice of data science (and the preceding three criteria): When you’re writing, think about both the specifics of your own analysis and insights, and what your work tells us about the wider practice and implications of data science, drawing on sources to contextualise and support your claims.
You might break the above sections into multiple subheadings. You can include a final conclusion (overall what you found, personally, about the data or QS, or about data science generally), but make sure it addresses the assessment criteria.
If you are submitting any additional materials, such as slide elements, video, audio, or anything else please ensure I have access to them. Remember to check any inculded links!
Diagrams, figures, charts and illustrations must be labelled, and explained, and must be referred to from somewhere in the report. If drawn from another source, then the source must be provided.
include samples of your data - enough to give a sense of what your data looks like. Consider how to maintain privacy. Including image/text, etc. samples can provide important insight into this rich non-numeric data.
If you did analysis you didn’t include in the main report, you can include some of it here (just things you think might be interesting please).
If it was a creative piece (a sketch, or anything else) you can include a photo if that’s easiest.
You can download it and include it like this (play around with the settings to make it display sensibly). You can also include it as a html file, etc.
knitr::include_graphics(here::here("AT2_default_template/data/DSI_copy_ODI_Data_Ethics_Canvas.pdf"))if(knitr::is_html_output()) knitr::include_url("https://docs.google.com/document/d/14qEn_yoOpInUAvA8ndNGMNuhxgywQJ3ODmCKcxYZPGQ/edit?usp=sharing")Three approaches here:
R packages you used to the references.#If you leave this chunk as it is, as long as the libraries are installed it should work.
library("RefManageR")
#library("bibtex")
knitr::write_bib(.packages(), here::here("AT2_default_template/packages.bib"), tweak = TRUE)
print(RefManageR::ReadBib( here::here("AT2_default_template/packages.bib"), check = FALSE),
.opts = list(style = "text", bib.style = "authoryear"))Henry, L. and H. Wickham (2020). purrr: Functional Programming Tools. R package version 0.3.4. <URL: https://CRAN.R-project.org/package=purrr>.
Müller, K. (2020). here: A Simpler Way to Find Your Files. R package version 1.0.1. <URL: https://CRAN.R-project.org/package=here>.
Müller, K. and H. Wickham (2021). tibble: Simple Data Frames. R package version 3.1.6. <URL: https://CRAN.R-project.org/package=tibble>.
McLean, M. W. (2014). Straightforward Bibliography Management in R Using the RefManager Package. arXiv: 1403.2036 [cs.DL]. <URL: https://arxiv.org/abs/1403.2036>.
—–— (2017). “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software. DOI: 10.21105/joss.00338.
—–— (2020). RefManageR: Straightforward BibTeX and BibLaTeX Bibliography Management. R package version 1.3.0. <URL: https://github.com/ropensci/RefManageR/>.
Navarro, D. (2021). bs4cards: Generate Bootstrap Cards. R package version 0.1.1. <URL: https://github.com/djnavarro/bs4cards>.
R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. <URL: https://www.R-project.org/>.
Rinker, T. W. and D. Kurkiewicz (2018). pacman: Package Management for R. version 0.5.0. Buffalo, New York. <URL: http://github.com/trinker/pacman>.
Rinker, T. and D. Kurkiewicz (2019). pacman: Package Management Tool. R package version 0.5.1. <URL: https://github.com/trinker/pacman>.
Sievert, C. and J. Cheng (2021). bslib: Custom Bootstrap ‘Sass’ Themes for shiny and rmarkdown. R package version 0.3.1. <URL: https://CRAN.R-project.org/package=bslib>.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN: 978-3-319-24277-4. <URL: https://ggplot2.tidyverse.org>.
—–— (2021a). forcats: Tools for Working with Categorical Variables (Factors). R package version 0.5.1. <URL: https://CRAN.R-project.org/package=forcats>.
—–— (2021b). tidyr: Tidy Messy Data. R package version 1.1.4. <URL: https://CRAN.R-project.org/package=tidyr>.
—–— (2021c). tidyverse: Easily Install and Load the Tidyverse. R package version 1.3.1. <URL: https://CRAN.R-project.org/package=tidyverse>.
—–— (2022). stringr: Simple, Consistent Wrappers for Common String Operations. http://stringr.tidyverse.org.
Wickham, H., M. Averick, J. Bryan, et al. (2019). “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43, p. 1686. DOI: 10.21105/joss.01686.
Wickham, H., W. Chang, L. Henry, et al. (2021). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.3.5. <URL: https://CRAN.R-project.org/package=ggplot2>.
Wickham, H., R. François, and L. D’Agostino McGowan (2022). emo: Easily Insert Emoji. R package version 0.0.0.9000. <URL: https://github.com/hadley/emo>.
Wickham, H., R. François, L. Henry, et al. (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.7. <URL: https://CRAN.R-project.org/package=dplyr>.
Wickham, H., J. Hester, and J. Bryan (2022). readr: Read Rectangular Text Data. R package version 2.1.2. <URL: https://CRAN.R-project.org/package=readr>.
[Knight and Shum (2020)]