Overview

In this technical assignment, you will have the opportunity to apply the statistical knowledge that you’ve learned so far. The assignment will require you to use R Studio and import data. We recognize that doing both of these things may be new to you, so please make sure to ask your Lab TA, lecture TA, or Dr. Woodward any questions that may come up. Though the assignment is not due until Friday, it is strongly recommended that you start the assignment before then.

Getting Started:

  1. Please write the names of the classmates you’ve worked with. If you worked by yourself, please write “I worked by myself”

I worked by myself

  1. Download the data science salary data from Canvas. Open this data in R Studio (Hint: Select “Import Data” from the Global Environment and choose “From Text Base.”)
dataS <- read.csv("~/Downloads/dataS.csv")

Understanding your Data:

  1. View your dataset in R Studio. (Do NOT write this code in your markdown file. just in the console.) What do you notice about it? (e.g. What variables are in the file?)

There are two different types of variables in the dataset (chr and dbL) I’m ssuming it’s divded by nominal and interval/ratio data

4)Print the salary_in_usd column.

table(dataS$salary_in_usd)
## 
##   2859   4000   5409   5679   5707   5882   6072   8000   9272   9466  10000 
##      1      2      1      1      1      1      2      1      1      1      2 
##  10354  12000  12103  12901  13400  15966  16228  16904  18000  18053  18442 
##      1      3      1      1      1      1      1      1      1      1      2 
##  18907  19609  20000  20171  21637  21669  21844  21983  22611  24000  24342 
##      1      1      5      1      1      1      1      1      1      1      1 
##  24823  25000  25532  26005  28016  28369  28399  28476  28609  29751  30428 
##      2      1      1      1      1      1      1      1      1      1      1 
##  31615  31875  32974  33511  33808  35590  35735  36259  36643  37236  37300 
##      1      1      3      1      1      1      1      1      1      1      1 
##  37825  38400  38776  39263  39916  40000  40038  40189  40481  40570  41689 
##      1      1      1      2      1      1      1      1      1      1      1 
##  42000  42197  43331  43966  45391  45618  45760  45807  45896  46597  46759 
##      1      1      1      2      1      1      1      3      1      1      1 
##  46809  47282  47899  48000  49268  49461  49646  50000  50180  51064  51321 
##      1      1      1      1      1      2      1      5      1      1      1 
##  51519  52000  52351  52396  53192  54000  54094  54238  54742  54957  55000 
##      1      1      3      1      1      1      1      1      1      3      2 
##  56000  56256  56738  58000  58035  58255  58894  59102  59303  60000  60757 
##      1      1      1      3      1      1      1      2      1      5      1 
##  61300  61467  61896  62000  62649  62651  62726  63711  63810  63831  63900 
##      2      1      1      1      1      1      2      1      1      2      1 
##  64849  65000  65013  65438  65949  66022  66265  67000  68147  68428  69000 
##      1      2      1      3      2      1      1      1      1      1      1 
##  69336  69741  69999  70000  70139  70500  70912  71444  71786  71982  72000 
##      1      2      1      2      1      1      1      1      1      1      1 
##  72212  72500  73000  74000  74130  75000  75774  76833  76940  76958  77364 
##      1      1      1      1      1      4      1      3      2      1      1 
##  77684  78000  78526  78791  79039  79197  79833  80000  81000  81666  82500 
##      1      1      4      2      1      1      2      8      1      2      1 
##  82528  82744  82900  84900  85000  86703  87000  87425  87738  87932  88654 
##      2      1      1      1      4      1      1      1      1      4      3 
##  89294  90000  90320  90700  90734  91000  91237  91614  93000  93150  93427 
##      1      6      5      1      2      1      1      2      1      1      1 
##  93700  94564  94665  95550  95746  96113  96282  98000  98158  99000  99050 
##      2      1      1      1      1      1      1      1      3      2      1 
##  99100  99360  99703 100000 100800 101570 102100 102839 103000 103160 103691 
##      1      1      1     15      1      1      2      1      1      1      1 
## 104702 104890 105000 105400 106000 106260 108800 109000 109024 109280 110000 
##      2      1      5      1      1      2      1      1      1      2      5 
## 110037 110500 110925 111775 112000 112300 112872 112900 113000 113476 114047 
##      1      1      1      1      1      1      1      4      1      1      1 
## 115000 115500 115934 116000 116150 116914 117104 117789 118000 118187 119059 
##      5      1      2      1      1      1      1      2      1      1      1 
## 120000 120160 120600 122346 123000 124190 124333 125000 126000 126500 127221 
##     12      1      1      1      3      1      1      3      1      2      1 
## 128875 129000 130000 130026 130800 132000 132320 135000 136000 136600 136620 
##      2      1      8      1      1      1      3      9      1      1      1 
## 136994 137141 138000 138350 138600 140000 140250 140400 141300 141846 144000 
##      1      1      1      1      1      8      1      3      1      1      3 
## 144854 145000 146000 147000 147800 148261 150000 150075 150260 151000 152000 
##      1      2      1      1      1      1     12      1      1      1      1 
## 152500 153000 153667 154000 154600 155000 156600 157000 158200 159000 160000 
##      1      2      1      1      2      3      1      1      1      1      8 
## 160080 161342 162674 164000 164996 165000 165220 165400 167000 167875 168000 
##      2      1      1      1      2      4      1      2      2      1      1 
## 170000 173762 174000 175000 175100 176000 177000 180000 181940 183228 183600 
##      8      1      2      2      1      1      1      5      1      1      1 
## 184700 185000 185100 187442 188000 189650 190000 190200 192400 192564 192600 
##      1      2      1      1      1      2      1      1      1      1      1 
## 195000 196979 200000 200100 205300 206699 208775 209100 210000 211500 213120 
##      1      1     10      1      3      1      1      2      5      1      1 
## 214000 215300 216000 220000 220110 224000 225000 230000 235000 240000 241000 
##      1      2      1      3      2      1      2      3      2      1      1 
## 242000 243900 250000 256000 260000 266400 270000 276000 324000 325000 380000 
##      1      1      2      1      2      1      1      1      1      1      1 
## 405000 412000 416000 423000 450000 600000 
##      1      1      1      1      2      1
  1. What type of information do you think is contained in this variable? (There is no one correct answer- I haven’t told you what these values mean. We want you to think critically about the information being provided to you.)

it’s qualitative data about the salaries of people in the sample. It also shows the frequency of each value.

Graphing

  1. What type of graph would be best for the job_title variable?

Bar graph

  1. Create the graph that you identified in #6.
library(ggplot2)
ggplot(dataS, aes(job_title))+
geom_bar()

Note: you may not be able to see all of the names on your x axis. If this occurs, you can add this code (without the quotes) to the end of your graph code: “+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))”

you can also click the “zoom” button if you cannot see your graph clearly.

  1. What type of graph would be appropriate for the salary_in_usd column?

Histogram

  1. create the graph you chose in #8.
library(ggplot2)
ggplot(dataS, aes(salary_in_usd))+
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Standard Scores and Z scores

  1. calculate z scores for all salary_in_usd scores.

{r}chunk data4$Z_score<-(dataS$salary_in_usd-mean(dataS$salary_in_usd))/sd(dataS$salary_in_usd) data4$Z_score

  1. Save your standard scores (z scores) from #2 to the dataset. (Hint use data$column<- code to make standard scores)
dataS$Z_score<-scale(dataS$salary_in_usd)

12). make a graph of the zscores you’ve created

library(ggplot2)
ggplot(dataS, aes(Z_score))+
geom_boxplot()

  1. What do you notice about the two graphs?

write your answer here

calculating standard scores

  1. Aria wants to compare scores that students recieve on case study assignments. They know that the average case study score is 88.7 and has a standard deviation of 3.8. What is the standard score for a student who received a 93?
  1. Interpret the z score (standard score) that you’ve calculated.

The mean is a little below the 0, most of the data is between -.5 and .5. The bottom 25% of data is less varied than the top 75%. There are 8 outliers in the upper region of the data.