| itle: “DSci110 Homework 1 Target” |
| uthor: “Rosario Ciancio” |
| ate: “01 20, 2020” |
| utput: |
| df_document: |
| umber_sections: TRUE |
| ontsize: 12pt |
You will need to use either straight pdf generation, or converting a Word or html file to pdf:}
This is where you tell the reader what it is you are going to be showing them. Typically, if this is an article, you will be outlining a background thesis, and presenting support for your conclusions.
What follows is a math document that descries how you calculate the average of a set of numbers. Remember, it isn’t the _____math________ we are trying to do here. This is why we picked a very simple formula. We are trying to learn how to communicate_______ data science ideas effectively!
So in particular: we are not trying to write a
We are trying to describe an analysis. So here we go:
Let _____\(x_{1},x_{2},x_{3},x_{N}=x_{k}^{n}\) __________________ be a set of data values. We define____ the average of the values \(x_{k}\)____ as_______\(1/N\sum_{k = 1}^{N} x_{k}\)__________
Now we can also describe a very important measure in data terms, namely the .
This is a measure of how “spread out” a set of data values is. If we let \(\tilde{x}\) ___ be the ____average_______ of our data, then we define this ``standard" measure of spread-outness as \(s\), given by
\[ s = [\sum_{k = 1}^{N}1/N (x_{k}^2-\tilde{x}^2)]^{1/2} \]
Note that these values are called , which means they were derived from a set of data. An overall population can have a which is often denoted by \(\mu\), and a population standard deviation, denoted by \(\sigma\).
This formula can be simplified a bit, using the “bar” notation for average, as
\[s =[\overline{x^2}-\tilde{x}^2]^.5\] where the symbol \(\overline{x^2}\) denotes the avearge of the squares of the data values:
\[\overline{x^2} =(1/N)1/N\sum_{k = 1}^{N} (x^2)_{k}\]
Basic R code (without leveraging the built-in functions) could be as follows:
std <- sqrt( sum(x^2)/length(x) - (sum(x)/length(x))^2)
We chose “std” since it is a bit more suggestive as a variable name than just “s”. Recall the mean, \(\bar{x}\), or \(\mu\) is:
mu <- sum(x)/length(x)
Eventually, we will generate fancier graphics similar to this, using other R packages: