This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
#Nominal
gender <- c("male","female","male","female")
#This is nominal because the data can be categorized into male or female but female isn't consider higher or lower than male
#Ordinal
spice_level <- c("mild","spicy","extra spicy")
#This is ordinal because the data can be categorized under the different spice level and can be rank from low spice to more spicy
#Interval
time <- c(1,2,3,4,5,6,7,8,9,10)
#This is interval because it's evenly spaced and in order but 0 in time doesn't mean no time
#Ratio
income <- c(1000,2000,3000,4000)
#This is ratio because it carries the elements of interval but has a natural zero point as 0 means no income
I combined the x and y data into a data frame. Then used ggplot2 library to put x in the x-axis and y in the y-axis. I used geom_point() function to create the scatter plot with points and choose minimal theme.
library(e1071)
library(ggplot2)
mydata <- data.frame(
x <- c(12,17,12,11,10,7,5,14),
y <- c(4.2,5,6,9,11,12,8,15) )
ggplot(mydata, aes(x=x, y=y))+
geom_point()+
theme_minimal()
I assigned values to event variable and percentage variable. Later I used %>% to pass result from one line to another. First I created a new data frame with event and percentage variables under this new variable data4. Within data4, I created a new variable call label which translate the percentage variable with a “%” sign. Using ggplot2 library, I created a pie chart with x as nothing, y as the percentage variable and fill=event so I;m grouping them by event. For labels, I assign the label to label variable to show the percentage of each pie and used coord_polar(theta=“y”) to convert stacked bar chart into a pie chart.
For part b, because there is 1100 total respondent ad 40% think cancer would be found first so we need to find 40% of 1100 which is 0.40*1100 and I imputed the function into r and got 440 people.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
event <- c("cure for cancer found","end of dependence on oil", "signs of life in outer space", "peace in the middle east", "other", "none will happen")
percentage <- c(40,27,12,8,6,7)
data4 <- data.frame(event,percentage) %>%
mutate(label=paste0(percentage,"%"))
ggplot(data4,aes(x="",y=percentage,fill=event))+
geom_col()+
geom_label(aes(label=label),
position=position_stack(vjust=0.5),
show.legend=FALSE)+
coord_polar(theta="y")+
theme_void()
#b. how many people think cancer would be found first
0.40*1100
## [1] 440
#There are 440 people since 40% of 1100 is 0.40*1100=440.
I chose the team , Mia. First we want to get access to the library and excel data.
library(readxl)
library(ggplot2)
nbadata <- read_excel("2020-2021 NBA Stats Player Box Score Advanced Metrics.xlsx")
I used subset to calculate the number of rows when TEAM==“Mia”
ncol(subset(nbadata, TEAM=="Mia")) #29 columns for Mia Team
## [1] 29
nrow(subset(nbadata,TEAM=="Mia")) #16 rows for Mia Team
## [1] 16
I created a new data set with only TEAM==“Mia”. Then I created another data set with only the selected 8 variables I used newdata2[1:3,] to display the first 3 rows
newdata1 <- subset(nbadata, TEAM=="Mia")
newdata2 <- newdata1[c("FULL NAME", "TEAM", "POS", "AGE", "GP", "MPG", "2P%","3P%")]
newdata2[1:3,]
## # A tibble: 3 × 8
## `FULL NAME` TEAM POS AGE GP MPG `2P%` `3P%`
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Precious Achiuwa Mia F 21.4 31 14.2 0.573 0
## 2 Bam Adebayo Mia C-F 23.6 29 33.9 0.573 0.333
## 3 Avery Bradley Mia G 30.2 10 21.1 0.536 0.421
mean(newdata2$MPG)
## [1] 21.43125
median(newdata2$MPG)
## [1] 21.3
library("modeest")
##
## Attaching package: 'modeest'
## The following object is masked from 'package:e1071':
##
## skewness
mfv(newdata2$MPG)
## [1] 8.8 9.8 10.7 10.9 14.2 15.0 15.4 21.1 21.5 26.0 26.5 29.5 32.9 33.3 33.4
## [16] 33.9
ggplot(newdata2,aes(x=MPG))+
geom_histogram(binwidth=5, fill="purple", color="yellow")+
labs(x="MPG",y="count")+
theme_minimal()
sd(newdata2$'3P%')
## [1] 0.2043189
max(newdata2$`3P%`)
## [1] 1
min(newdata2$`3P%`)
## [1] 0
library(DescTools)
## Registered S3 method overwritten by 'httr':
## method from
## print.response rmutil
range(newdata2$`3P%`)
## [1] 0 1
ggplot(newdata2,aes(x=AGE, y=POS, fill=POS))+
geom_boxplot(fill="lightgreen")+
labs(title="Age by POS",
x="Age",
fill="POS")+
theme_minimal()
This chart is showing the distribution of age grouped by different POS. From the chart we can tell POS G-F has the highest median and highest age since it’s horizontal line stretches the longest towards the right, given there’s no outlier in this data set. POS G has the lowest age in the data set as the horizontal line is most toward the left. Also, POS C-F only have one observation so it’s just a vertical line.
freq_table <- newdata2 %>%
group_by(POS) %>%
count()
ggplot(newdata2, aes(x=POS))+
geom_bar(fill="lightpink", color="darkred")+
labs(x="POS",y="frequency",title="Frequency of POS")+
theme_minimal()