To find descriptive statistics use df_stats(~variable, data=Dataset, statistic of interest)
As an example: df_stats(~variable, data=Dataset, mean) –gives mean df_stats(~variable, data=Dataset, median) –gives median df_stats(~variable, data=Dataset, sd) –gives standard deviation df_stats(~variable, data=Dataset, var) –gives variance df_stats(~variable, data=Dataset, range) –gives range df_stats(~variable, summary) –gives the 5 number summary and the mean
The advantages of this method, are that you can do 1.df_stats(~variable, data=Dataset, mean, median, sd, var) to find more than one descriptive statistic at the same time. 2.df_stats(quantitative variable ~ qualitative variable, data=Dataset, mean) –gives mean for the subgroups in the dataset. gf_boxplot(~variable, title=“type in what you want the title to be”) –produces a vertical modified boxplot with a title.
Example:
Cancer<-read.csv("https://krkozak.github.io/MAT160/cancer.csv")
df_stats(~survival, data=Cancer, mean)
## mean_survival
## 1 558.625
df_stats(~survival, data=Cancer, range)
## range_survival_1 range_survival_2
## 1 20 3808
df_stats(~survival, data=Cancer, sd)
## sd_survival
## 1 776.4787
df_stats(~survival, data=Cancer,var)
## var_survival
## 1 602919.1
df_stats(~survival, data=Cancer, summary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 20 102.5 265.5 558.625 721 3808
df_stats(~survival, data=Cancer, mean, median, sd)
## mean_survival median_survival sd_survival
## 1 558.625 265.5 776.4787
gf_boxplot(~survival, data=Cancer, title="Survival time since chemotherapy")
To Filter based on the value of a variable For example, if you want to find the mean, median, and standard deviation of survival time for each type of cancer in the dataset, and create a boxplot, follow the example
df_stats(survival~organ, data=Cancer, mean, sd)
## organ mean_survival sd_survival
## 1 Breast 1395.9091 1238.9667
## 2 Bronchus 211.5882 209.8586
## 3 Colon 457.4118 427.1686
## 4 Ovary 884.3333 1098.5788
## 5 Stomach 286.0000 346.3096
gf_boxplot(survival~organ, data=Cancer, title="Survival time since chemotherapy")
To find a weighted average use weighted.mean(x,p), where x and p are put in as variables and not as datasets. Suppose you scored 95 on an assignment that is worth 15% of your grade, 83 on an assignment that is worth 25% of your grade, 76 on something worth 25%, and 84 on something worth 35%. What grade do you have in the class. First, summarizing the information. This would be: grade: 95, 83, 76, 84 weight: .15, .25, .25, .35 Notice the weight adds to 1, and that you have to have two 0.25 since you have to give a weight for each assignment. To put the data in as variables, call grade x (you can call it whatever you want), and type in x<-c(95, 83, 76, 84) to tell R what the grades are. Similarly for the weights. Call them p (or whatever) and write percentages as decimals. The command for finding the weighted average in R for this example is
x<-c(95, 83, 76, 84)
p<-c(0.15, 0.25, 0.25, 0.35)
weighted.mean(x,p)
## [1] 83.4