1 Introduction

this is where you give details about what you’ve been collecting and how much you data have; why you choose this data to collect; how you managed the quality and frequency of collection issues; what you did to anonymise or de-identify the data, and how you dealt with the storage and sharing of data within the group.

Criterion 1: Strength of justification for the method to obtain data from multiple sources, for gaining insight into a chosen problem, including analysis of data quality issues in the individual and group data. Consider, do you tell the reader:

  • What data you’re collecting Why that data is interesting to collect
  • How you’re collecting the data, justifying that method (evaluate the benefits/issues with the app or method you choose, etc.)
  • What is the quality of the data you have (missing data, issues in the data, etc.) and implications.

1.1 Problem Definition: Understanding People And Data

What are you exploring and why?

How might data help inform this issue? What ethical and privacy considerations have you made (you should consider these throughout the report)?

1.2 Data Understanding and Investigation

What have you collected, how, and what are the issues with this method?

You can include a simple table here to give an overview. But make sure you also explain the table, and connect what you have collected and the method to (1) the need or issues you’re addressing, and (2) issues such as privacy, ethics, data quality, potential of data science vs other approaches, etc.

1.3 Preparing the Data

What data cleaning, missing data, validation, etc. have you considered? What are the impacts of any issues in your data?

2 Analysis

Explain your investigation and data analysis.

You should include examples of all the datasets, including (1) your personal data, and (2) the pooled group unstructured and structured data. Draw on external sources of summary data as a comparison.

Criterion 2: Insightfulness in the analysis of the obtained data, including quality issues, to draw conclusions in a professional and engaging manner. Consider, do you:

  • Provide analysis, presenting both a range of visualizations/data summaries and drawing conclusions from them?
  • Contextualise your findings, noting your particular context (you know about this data! There were holidays, assignment due dates, etc.), and limitations in the analysis?
  • Use appropriate analysis (we discuss this in class)

2.1 Inspiration

You should include your analysis here using R code, or by loading images and tables you’ve saved from other programs, like this. You’ll see I set a pixel width for the first one.

knitr::include_graphics(here::here("AT2_default_template/data/uts_logo_new.png"), dpi = NA)

Or like this: A logo

You can include numbers and analysis manually (by typing).

You can also **output analysis such as correlation coefficients in-line with code: 0.418684* That looked like this: cor.test(as.numeric(mtcars$mpg),as.numeric(mtcars$qsec))$estimate[[1]]

If you downloaded the full folder, then the csv files you need are there.

You must always cross reference your figures, never have a visualisation that is not explained explicitly in the text. See Figure 2.1.

par(mar = c(4, 4, .2, .1))
plot(cars)  # a scatterplot
The cars data.

Figure 2.1: The cars data.

viz <- tibble(
  long_name = c("ggplot2","ggiraph","RAWGraphs","Tableau"),
  short_name = c("ggplot2","ggiraph","RAWGraphs","Tableau"),
  blurb = c("graphics grammar for R","Build interactive htmlwidget ggplots","A non-code easy to use but very impressive free visualisation tool, highly recommend","A non-code visualisation tool, lots of functions with a bit of a learning curve. Free for students"),
  viz_url = c("https://ggplot2.tidyverse.org/","https://davidgohel.github.io/ggiraph/articles/offcran/using_ggiraph.html","https://rawgraphs.io/gallery","https://www.tableau.com/academic/students#form"),
  image_url = c("https://raw.githubusercontent.com/tidyverse/ggplot2/main/man/figures/logo.png","https://davidgohel.github.io/ggiraph/reference/figures/logo.png","https://raw.githubusercontent.com/densitydesign/raw/master/imgs/raw_header.jpg","https://upload.wikimedia.org/wikipedia/en/thumb/0/06/Tableau_logo.svg/250px-Tableau_logo.svg.png"),
  features = c("general-viz; R","general-viz; interactive; R", "general-viz; GUI","general-viz; GUI")
)

viz %>% 
  cards(
    title = long_name,
    text = blurb,
    link = viz_url,
    image = image_url,
    tags = paste("all;", features),
    width = 4,
    footer = features,
    layout = "label-right"
  )

ggplot2

graphics grammar for R

ggiraph

Build interactive htmlwidget ggplots

RAWGraphs

A non-code easy to use but very impressive free visualisation tool, highly recommend

Tableau

A non-code visualisation tool, lots of functions with a bit of a learning curve. Free for students

# ADD CARDS HERE FOR anything else around EDA and visualisation. Below demo some ideas 

Make sure to review materials from weeks 4-6 in Canvas

2.2 Examples

You can see in the code chunk below that we read in a csv, and then create a visualisation (or, include a file image), and other statistical analyses. Here we’re just outputing Figure 2.2, but you’d do more analysis (and discuss it properly in a paragraph).

mt_data <- readr::read_csv(readr_example("mtcars.csv"))

mt_data %>% 
  ggplot(aes(x = mpg, y = qsec)) +
  geom_point()
Read mtcars data and basic scatterplot.

Figure 2.2: Read mtcars data and basic scatterplot.

3 Evaluation, Discussion, and Conclusions

3.1 Ethics and Impact

Criterion 3 Insightfulness in identification, contextualisation and reflection on ethical, privacy, and legal issues relevant to the collection, analysis, and use of one’s own and other’s personal data: The AT2 is designed to give you experience in collecting and working with personal data. It can be confronting at times, and you should consider issues of privacy and ethics, with regard to both the specifics of what you did, and implications for data science, connecting to legal and ethical frameworks. You could imagine what would happen if a data science troll (malicious agent) gained access to your data. We want you to consider:

  • What harm can be done? (i.e., what is the risk, and the likelihood of those risks occurring?). You could consider this both in relation to insights from your own data, and the implications of those insights over much larger datasets. Imagine that we conducted your QS project on many more people, and now have millions of datapoints on the variables you collected…what insights can be gained (by individuals and companies?), what is the balance of concerns? What are the legal/privacy implications (i.e., what laws or principles have been breached?) and ethical ethical considerations? (i.e., what insight do ethical frameworks provide us to navigate the issue)
  • What strategies could be adopted to do differently? Again, consider both your own case (“We should have…”) and wider practice (“App policies should…”)
  • What examples exist in wider data science that relate to your work?; How were these issues dealt with? You might use the AI Incidents database or the AI and algorithmic incidents and controversies repository. Mozilla’s privacy not included project analyses apps in different categories (with links to news stories and focusing on privacy), and may also be useful.

Think about: Is there relevant ethical or legal guidance from: Australian law, professional organisations (medical, sales/marketing), non-disclosure and consent agreements, terms terms and conditions that you sign when using various apps and websites, or ethical frameworks, and how they help us navigate the benefits and risks of actions Tools like the ODI data ethics canvas or The Omidyar EthicalOS Risk Mitigation checklist

3.2 Limitations, Future Work, Reflection

discuss aspects of the process that you see as important. For example, what difficulties did you encounter; how could you avoid problems if you did it again; etc General reflection on what you learnt during this task. What are you unsure about? What would you do differently if you had to do it all again?

What are the implications of your work for data science? What issues have you encountered that exist in wider data science, or that data science practices help address? Are there any innovations that arise from your work? (for individual people, for companies building QS applications, for data science work e.g. around cleaning data, for policy, etc.etc.)

Criterion 4 Strength of connection between the individual experience of this QS project to the practice of data science (and the preceding three criteria): When you’re writing, think about both the specifics of your own analysis and insights, and what your work tells us about the wider practice and implications of data science, drawing on sources to contextualise and support your claims.

3.2.1 Final conclusion

You might break the above sections into multiple subheadings. You can include a final conclusion (overall what you found, personally, about the data or QS, or about data science generally), but make sure it addresses the assessment criteria.

4 Appendices

If you are submitting any additional materials, such as slide elements, video, audio, or anything else please ensure I have access to them. Remember to check any inculded links!

Diagrams, figures, charts and illustrations must be labelled, and explained, and must be referred to from somewhere in the report. If drawn from another source, then the source must be provided.

4.1 Data sample

include samples of your data - enough to give a sense of what your data looks like. Consider how to maintain privacy. Including image/text, etc. samples can provide important insight into this rich non-numeric data.

4.2 Extra analyses, sketches, painting, knitted data blankets, data sculpture, etc.

If you did analysis you didn’t include in the main report, you can include some of it here (just things you think might be interesting please).

If it was a creative piece (a sketch, or anything else) you can include a photo if that’s easiest.

4.3 ODI Data Ethics Canvas

You can download it and include it like this (play around with the settings to make it display sensibly). You can also include it as a html file, etc.

knitr::include_graphics(here::here("AT2_default_template/data/DSI_copy_ODI_Data_Ethics_Canvas.pdf"))
if(knitr::is_html_output()) knitr::include_url("https://docs.google.com/document/d/14qEn_yoOpInUAvA8ndNGMNuhxgywQJ3ODmCKcxYZPGQ/edit?usp=sharing")

5 References and footnotes methods

Three approaches here:

  1. .bib file References will automatically appear here if you use the .bib file, and the code chunk below will automatically add any R packages you used to the references.
  2. footnotes 1 you’ll see the footnote texts below, 2 - you’ll need to add the formated citation to these footnotes
  3. manually adding references 🤮 - this is the worst option, because you have to remember what you said in the text, and make sure you include the citation here. I strongly recommend using a service such as zotero, which integrates with RStudio and Word, and creates an automatically formatted bibliography for you based on your citations, and helps you collect correct citation information when you’re reading articles at the click of a button.

5.1 Packages used

#If you leave this chunk as it is, as long as the libraries are installed it should work.

library("RefManageR")
#library("bibtex")
knitr::write_bib(.packages(),  here::here("AT2_default_template/packages.bib"), tweak = TRUE) 
print(RefManageR::ReadBib( here::here("AT2_default_template/packages.bib"), check = FALSE),
    .opts = list(style = "text", bib.style = "authoryear"))

Henry, L. and H. Wickham (2020). purrr: Functional Programming Tools. R package version 0.3.4. <URL: https://CRAN.R-project.org/package=purrr>.

Müller, K. (2020). here: A Simpler Way to Find Your Files. R package version 1.0.1. <URL: https://CRAN.R-project.org/package=here>.

Müller, K. and H. Wickham (2021). tibble: Simple Data Frames. R package version 3.1.6. <URL: https://CRAN.R-project.org/package=tibble>.

McLean, M. W. (2014). Straightforward Bibliography Management in R Using the RefManager Package. arXiv: 1403.2036 [cs.DL]. <URL: https://arxiv.org/abs/1403.2036>.

—–— (2017). “RefManageR: Import and Manage BibTeX and BibLaTeX References in R”. In: The Journal of Open Source Software. DOI: 10.21105/joss.00338.

—–— (2020). RefManageR: Straightforward BibTeX and BibLaTeX Bibliography Management. R package version 1.3.0. <URL: https://github.com/ropensci/RefManageR/>.

Navarro, D. (2021). bs4cards: Generate Bootstrap Cards. R package version 0.1.1. <URL: https://github.com/djnavarro/bs4cards>.

R Core Team (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. <URL: https://www.R-project.org/>.

Rinker, T. W. and D. Kurkiewicz (2018). pacman: Package Management for R. version 0.5.0. Buffalo, New York. <URL: http://github.com/trinker/pacman>.

Rinker, T. and D. Kurkiewicz (2019). pacman: Package Management Tool. R package version 0.5.1. <URL: https://github.com/trinker/pacman>.

Sievert, C. and J. Cheng (2021). bslib: Custom Bootstrap ‘Sass’ Themes for shiny and rmarkdown. R package version 0.3.1. <URL: https://CRAN.R-project.org/package=bslib>.

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN: 978-3-319-24277-4. <URL: https://ggplot2.tidyverse.org>.

—–— (2021a). forcats: Tools for Working with Categorical Variables (Factors). R package version 0.5.1. <URL: https://CRAN.R-project.org/package=forcats>.

—–— (2021b). tidyr: Tidy Messy Data. R package version 1.1.4. <URL: https://CRAN.R-project.org/package=tidyr>.

—–— (2021c). tidyverse: Easily Install and Load the Tidyverse. R package version 1.3.1. <URL: https://CRAN.R-project.org/package=tidyverse>.

—–— (2022). stringr: Simple, Consistent Wrappers for Common String Operations. http://stringr.tidyverse.org.

Wickham, H., M. Averick, J. Bryan, et al. (2019). “Welcome to the tidyverse”. In: Journal of Open Source Software 4.43, p. 1686. DOI: 10.21105/joss.01686.

Wickham, H., W. Chang, L. Henry, et al. (2021). ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. R package version 3.3.5. <URL: https://CRAN.R-project.org/package=ggplot2>.

Wickham, H., R. François, and L. D’Agostino McGowan (2022). emo: Easily Insert Emoji. R package version 0.0.0.9000. <URL: https://github.com/hadley/emo>.

Wickham, H., R. François, L. Henry, et al. (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.7. <URL: https://CRAN.R-project.org/package=dplyr>.

Wickham, H., J. Hester, and J. Bryan (2022). readr: Read Rectangular Text Data. R package version 2.1.2. <URL: https://CRAN.R-project.org/package=readr>.

[Knight and Shum (2020)]

References and Footnotes

Knight, Simon, and Simon Buckingham Shum. 2020. “Artificial Intelligence Holds Great Potential for Both Students and Teachers &Ndash; but Only If Used Wisely.” The Conversation. http://theconversation.com/artificial-intelligence-holds-great-potential-for-both-students-and-teachers-but-only-if-used-wisely-81024. https://theconversation.com/artificial-intelligence-holds-great-potential-for-both-students-and-teachers-but-only-if-used-wisely-81024.

  1. This is the text of the footnote which you can see at the bottom of the page.↩︎

  2. This is the second one.↩︎