In this technical assignment, you will have the opportunity to apply the statistical knowledge that you’ve learned so far. The assignment will require you to use R Studio and import data. We recognize that doing both of these things may be new to you, so please make sure to ask your Lab TA, lecture TA, or Dr. Woodward any questions that may come up. Though the assignment is not due until Friday, it is strongly recommended that you start the assignment before then.
I worked by myself
dataS <- read.csv("~/Downloads/dataS.csv")
There are two different types of variables in the dataset (chr and dbL) I’m ssuming it’s divded by nominal and interval/ratio data
4)Print the salary_in_usd column.
table(dataS$salary_in_usd)
##
## 2859 4000 5409 5679 5707 5882 6072 8000 9272 9466 10000
## 1 2 1 1 1 1 2 1 1 1 2
## 10354 12000 12103 12901 13400 15966 16228 16904 18000 18053 18442
## 1 3 1 1 1 1 1 1 1 1 2
## 18907 19609 20000 20171 21637 21669 21844 21983 22611 24000 24342
## 1 1 5 1 1 1 1 1 1 1 1
## 24823 25000 25532 26005 28016 28369 28399 28476 28609 29751 30428
## 2 1 1 1 1 1 1 1 1 1 1
## 31615 31875 32974 33511 33808 35590 35735 36259 36643 37236 37300
## 1 1 3 1 1 1 1 1 1 1 1
## 37825 38400 38776 39263 39916 40000 40038 40189 40481 40570 41689
## 1 1 1 2 1 1 1 1 1 1 1
## 42000 42197 43331 43966 45391 45618 45760 45807 45896 46597 46759
## 1 1 1 2 1 1 1 3 1 1 1
## 46809 47282 47899 48000 49268 49461 49646 50000 50180 51064 51321
## 1 1 1 1 1 2 1 5 1 1 1
## 51519 52000 52351 52396 53192 54000 54094 54238 54742 54957 55000
## 1 1 3 1 1 1 1 1 1 3 2
## 56000 56256 56738 58000 58035 58255 58894 59102 59303 60000 60757
## 1 1 1 3 1 1 1 2 1 5 1
## 61300 61467 61896 62000 62649 62651 62726 63711 63810 63831 63900
## 2 1 1 1 1 1 2 1 1 2 1
## 64849 65000 65013 65438 65949 66022 66265 67000 68147 68428 69000
## 1 2 1 3 2 1 1 1 1 1 1
## 69336 69741 69999 70000 70139 70500 70912 71444 71786 71982 72000
## 1 2 1 2 1 1 1 1 1 1 1
## 72212 72500 73000 74000 74130 75000 75774 76833 76940 76958 77364
## 1 1 1 1 1 4 1 3 2 1 1
## 77684 78000 78526 78791 79039 79197 79833 80000 81000 81666 82500
## 1 1 4 2 1 1 2 8 1 2 1
## 82528 82744 82900 84900 85000 86703 87000 87425 87738 87932 88654
## 2 1 1 1 4 1 1 1 1 4 3
## 89294 90000 90320 90700 90734 91000 91237 91614 93000 93150 93427
## 1 6 5 1 2 1 1 2 1 1 1
## 93700 94564 94665 95550 95746 96113 96282 98000 98158 99000 99050
## 2 1 1 1 1 1 1 1 3 2 1
## 99100 99360 99703 100000 100800 101570 102100 102839 103000 103160 103691
## 1 1 1 15 1 1 2 1 1 1 1
## 104702 104890 105000 105400 106000 106260 108800 109000 109024 109280 110000
## 2 1 5 1 1 2 1 1 1 2 5
## 110037 110500 110925 111775 112000 112300 112872 112900 113000 113476 114047
## 1 1 1 1 1 1 1 4 1 1 1
## 115000 115500 115934 116000 116150 116914 117104 117789 118000 118187 119059
## 5 1 2 1 1 1 1 2 1 1 1
## 120000 120160 120600 122346 123000 124190 124333 125000 126000 126500 127221
## 12 1 1 1 3 1 1 3 1 2 1
## 128875 129000 130000 130026 130800 132000 132320 135000 136000 136600 136620
## 2 1 8 1 1 1 3 9 1 1 1
## 136994 137141 138000 138350 138600 140000 140250 140400 141300 141846 144000
## 1 1 1 1 1 8 1 3 1 1 3
## 144854 145000 146000 147000 147800 148261 150000 150075 150260 151000 152000
## 1 2 1 1 1 1 12 1 1 1 1
## 152500 153000 153667 154000 154600 155000 156600 157000 158200 159000 160000
## 1 2 1 1 2 3 1 1 1 1 8
## 160080 161342 162674 164000 164996 165000 165220 165400 167000 167875 168000
## 2 1 1 1 2 4 1 2 2 1 1
## 170000 173762 174000 175000 175100 176000 177000 180000 181940 183228 183600
## 8 1 2 2 1 1 1 5 1 1 1
## 184700 185000 185100 187442 188000 189650 190000 190200 192400 192564 192600
## 1 2 1 1 1 2 1 1 1 1 1
## 195000 196979 200000 200100 205300 206699 208775 209100 210000 211500 213120
## 1 1 10 1 3 1 1 2 5 1 1
## 214000 215300 216000 220000 220110 224000 225000 230000 235000 240000 241000
## 1 2 1 3 2 1 2 3 2 1 1
## 242000 243900 250000 256000 260000 266400 270000 276000 324000 325000 380000
## 1 1 2 1 2 1 1 1 1 1 1
## 405000 412000 416000 423000 450000 600000
## 1 1 1 1 2 1
it’s qualitative data about the salaries of people in the sample. It also shows the frequency of each value.
Bar graph
library(ggplot2)
ggplot(dataS, aes(job_title))+
geom_bar()
Note: you may not be able to see all of the names on your x axis. If
this occurs, you can add this code (without the quotes) to the end of
your graph code: “+theme(axis.text.x = element_text(angle = 90, vjust =
0.5, hjust=1))”
you can also click the “zoom” button if you cannot see your graph clearly.
Histogram
library(ggplot2)
ggplot(dataS, aes(salary_in_usd))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
{r}chunk data4$Z_score<-(dataS$salary_in_usd-mean(dataS$salary_in_usd))/sd(dataS$salary_in_usd) data4$Z_score
dataS$Z_score<-scale(dataS$salary_in_usd)
12). make a graph of the zscores you’ve created
library(ggplot2)
ggplot(dataS, aes(Z_score))+
geom_boxplot()
write your answer here
The mean is a little below the 0, most of the data is between -.5 and .5. The bottom 25% of data is less varied than the top 75%. There are 8 outliers in the upper region of the data.