class: middle background-image: url(data:image/png;base64,#LTU_logo.jpg) background-position: top left background-size: 30% # STM1001 [Topic 1](https://bookdown.org/a_shaker/STM1001_Topic_1/) Lecture ## Introduction to statistics and presenting data ### La Trobe University This lecture complements the [Topic 1 readings](https://bookdown.org/a_shaker/STM1001_Topic_1/) --- # Topic 1: Related Links ## Readings [Topic 1 Readings](https://bookdown.org/a_shaker/STM1001_Topic_1/) ## Maths Background [Percentages and proportions](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#percentages-and-proportions) [Rounding and decimal places of accuracy](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#rounding-and-decimal-places-of-accuracy) [Intervals](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#intervals) --- # Topic 1: Introduction to statistics and presenting data **Outline:** <iframe src="https://bookdown.org/a_shaker/STM1001_Topic_1/" width="100%" height="400px" data-external="1"></iframe> --- # Where are we headed in this subject? * In this subject, we will be learning how to ***Make Sense of Data*** -- * One of the most important tools we can use to do so is Statistics .content-box-blue[ .center[ **What is Statistics?** ] Statistics allows us to make sense of data. It involves **collecting**, **describing**, and **analysing** data, and sometimes **drawing conclusions** from data. ] -- * How should we **collect** the data (including study design) * Once we have the data, can we **describe** it? * Can we draw **conclusions**, or **inferences**, about what we are seeing? --- # Introduction to statistics and presenting data .content-box-blue[ .center[ **Descriptive Statistics** ] Descriptive statistics involves summarising and displaying data via graphical and numerical means. ] -- * We will first consider some variable types, and then look at some examples... * We will consider an example from the `survey` data set from the R package `MASS` (Venables & Ripley, 2002) * University students studying statistics were asked questions including which hand they write with, how often they smoke, and their height --- # Types of variables How we present data often depends on what type of ***variable(s)*** we are looking at. .content-box-blue[ .center[ **Categorical (qualitative) Variable** A variable that is separated into groups. Categorical variables can be either: ] * **Nominal**: Where the groups are characterised by names, labels or categories. For example, eye colour (blue, brown, green, etc.), car brand (Hyundai, Toyota, Holden, etc.), or state (VIC, NSW, SA, etc.). * **Ordinal**: Where the groups can be arranged into a specific order. For example, how much a person smokes (never, occasional, regular, heavy), or level of exercise (none, some, frequent). ] --- # Types of variables .content-box-blue[ .center[ **Numerical (quantitative) Variable** A numerical variable is one that represents counts or measurements. The two types of numerical variables we will be looking at are: ] * **Discrete**: Where the set of all possible values is countable. For example, the number of heartbeats per minute, or the number of heads observed when flipping a coin five times. * **Continuous**: Where the variable can take an infinite number of values within a certain range. For example, height, weight or age. ] --- # Frequency Table: Nominal data * We display the **number** of students in each category |Writing Hand | Frequency| |:------------|---------:| |Left | 18| |Right | 218| --- # Relative Frequency Table: Nominal data * We display the **percentage** of students in each category |Writing Hand | Relative Frequency (%)| |:------------|----------------------:| |Left | 7.63| |Right | 92.37| -- * Relative frequencies can be expressed as **decimals** (or **proportions**), **fractions**, or **percentages**. -- * **Percentages** can be converted into **proportions** by dividing by 100 -- * Conversely, **proportions** can be converted into **percentages** by multiplying by 100 -- * If you are asked to provide relative frequencies in an assessment, be sure to provide it in the format requested in the question --- # Cumulative frequency and relative frequency tables: Ordinal data * For ordinal data, as well as frequencies and relative frequencies, we can display the cumulative frequencies: |Smoke | Frequency| Cumulative Frequency| Relative Frequency (%)| Cumulative Relative Frequency (%)| |:------------|---------:|--------------------:|----------------------:|---------------------------------:| |Never | 189| 189| 80.08| 80.08| |Occasionally | 19| 208| 8.05| 88.14| |Regularly | 17| 225| 7.20| 95.34| |Heavy | 11| 236| 4.66| 100.00| --- # Bar chart * Same information presented visually: <img src="data:image/png;base64,#Topic_1_Lecture_files/figure-html/unnamed-chunk-5-1.svg" style="display: block; margin: auto;" /> --- # Pie chart <img src="data:image/png;base64,#Topic_1_Lecture_files/figure-html/unnamed-chunk-6-1.svg" style="display: block; margin: auto;" /> --- # Cumulative frequency and relative frequency tables: Numerical data * For numerical data, we first need to group the data into equal, non-overlapping intervals. * For example, the below table presents the number of students in each 10cm range: |Height | Frequency| Cumulative Frequency| Relative Frequency (%)| Cumulative Relative Frequency (%)| |:---------|---------:|--------------------:|----------------------:|---------------------------------:| |[150,160) | 19| 19| 9.09| 9.09| |[160,170) | 65| 84| 31.10| 40.19| |[170,180) | 69| 153| 33.01| 73.21| |[180,190) | 45| 198| 21.53| 94.74| |[190,200) | 10| 208| 4.78| 99.52| |[200,210) | 1| 209| 0.48| 100.00| --- # Cumulative frequency and relative frequency tables: Numerical data * The square brackets "[" mean that the interval includes that particular number, whereas the round brackets ")" mean that the interval goes all the way up (or down) to, but does not include, that particular number. -- * For example, the interval [190, 200) starts at (and includes) 190, and goes all the way up to, but does not include, 200. This means that the intervals do not overlap. --- # Histograms * Like bar charts, but for numerical data. * At a glance, we can quickly see the shape of the data. For example, does it look bell-shaped, or does it seem to be skewed to the left or to the right? <img src="data:image/png;base64,#Topic_1_Lecture_files/figure-html/unnamed-chunk-8-1.svg" style="display: block; margin: auto;" /> --- # Drawing conclusions (inferences) * After observing data via **descriptive statistics**, we may wish to draw **conclusions**, or **inferences**, about what we are seeing * This is called ***inferential statistics*** .content-box-blue[ .center[ **Inferential Statistics** ] Inferential statistics involves drawing conclusions from data. ] --- # Drawing conclusions (inferences) * Normally, we use data available to us in a **sample** to make **inferences** about a **population**  --- # Drawing conclusions (inferences) * When we take a **sample**, we hope it is **representative** of the **population** -- * But realistically, each time we take a random **sample**, we could get a different **estimate** * How close might our **sample estimates** be to the true **population parameters**? -- * We will usually never know, but ***statistics*** gives us tools to factor the uncertainty into our conclusions * We will be covering ***inferential statistics*** later on in this subject --- # What you need to do before this week's core computer lab ### Complete the Week 1 Checklist (see LMS) which will include the following: 1. I have checked which stream I am in by finding my student number in the Student stream allocations document in the "Start Here" tile on LMS. 1. I have installed jamovi (Sci/Health) or R (Data Science) on my computer 1. I have allocated to classes via Allocate+ --- # Checking your stream * If you are not sure which stream you are in, please check the **Student stream allocations** document on LMS * Computer lab classes are stream-specific. Therefore, you must be allocated to a stream in order to attend the correct computer lab --- name: menti class: middle background-image: url(data:image/png;base64,#menti.jpg) background-size: 115% # Kahoot ## Go to [www.kahoot.it](https://www.kahoot.it) and use ## the code provided --- background-image: url(data:image/png;base64,#computerlab.jpg) background-position: bottom background-size: 75% class: center # See you in the computer labs! Don't forget to submit the **Week 1 checklist** on LMS before your computer labs! --- # References Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/. --- class: middle <font color = "grey"> These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License <a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a> </font>