To find descriptive statistics use df_stats(~variable, data=Dataset, statistic of interest)
As an example:
df_stats(~variable, data=Dataset, mean) –gives mean
df_stats(~variable, data=Dataset, median) –gives median
df_stats(~variable, data=Dataset, sd) –gives standard deviation
df_stats(~variable, data=Dataset, var) –gives variance
df_stats(~variable, data=Dataset, range) –gives range
df_stats(~variable, summary) –gives the 5 number summary and the mean
df_stats(~variable, data=Dataset, mean, median, sd, var) to find more than one descriptive statistic at the same time.
Using the dataset Cancer:
head(Cancer)
## survival organ
## 1 124 Stomach
## 2 42 Stomach
## 3 25 Stomach
## 4 45 Stomach
## 5 412 Stomach
## 6 51 Stomach
Find the mean:
df_stats(~survival, data=Cancer, mean)
## response mean
## 1 survival 558.625
Find the range:
df_stats(~survival, data=Cancer, range)
## response range_1 range_2
## 1 survival 20 3808
Find the standard deviation:
df_stats(~survival, data=Cancer, sd)
## response sd
## 1 survival 776.4787
Find the variance:
df_stats(~survival, data=Cancer,var)
## response var
## 1 survival 602919.1
Find the five number summary:
df_stats(~survival, data=Cancer, summary)
## response Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 survival 20 102.5 265.5 558.625 721 3808
Finding several statistics at the same time:
df_stats(~survival, data=Cancer, mean, median, sd)
## response mean median sd
## 1 survival 558.625 265.5 776.4787
gf_boxplot(~variable, title=“type in what you want the title to be”) –produces a vertical modified boxplot with a title.
gf_boxplot(~survival, data=Cancer, title="Survival time since chemotherapy", fill="blue", color="blue")
If you wish to divide your data into a categorical variable, called faceting, then you use this command: df_stats(quantitative variable ~ categorical variable, data=Dataset, mean). This gives the mean for the faceting in the dataset. You can do multiple descriptive statistics for facets also. You can also facet with box plots by using quantitative variable~categorical variable in place of ~variable.
Suppose you want to find the mean, median, and standard deviation of survival time for each type of cancer in the dataset, and create a boxplot for each cancer type. The process is:
df_stats(survival~organ, data=Cancer, mean, sd)
## response organ mean sd
## 1 survival Breast 1395.9091 1238.9667
## 2 survival Bronchus 211.5882 209.8586
## 3 survival Colon 457.4118 427.1686
## 4 survival Ovary 884.3333 1098.5788
## 5 survival Stomach 286.0000 346.3096
gf_boxplot(survival~organ, data=Cancer, title="Survival time since chemotherapy", ylab="Time (months)", xlab="Cancer Type", fill=~organ, color=~organ)
To find a weighted average use weighted.mean(x,p), where x and p are put in as variables and not as datasets.
Suppose you scored 95 on homework that is worth 15% of your grade, 83 on test 1 that is worth 25% of your grade, 76 on test 2 worth 25%, and 84 on final exam worth 35%. What grade do you have in the class. First, summarizing the information in a table.
| Assignment | Grade | Weight |
|---|---|---|
| homework | 95 | 0.15 |
| test 1 | 83 | 0.25 |
| test 2 | 76 | 0.25 |
| final exam | 84 | 0.35 |
Notice the weight adds to 1, and that you have to have two 0.25 since you have to give a weight for each assignment. To put the data in as variables, call grade (you can call it whatever you want), and type in grade<-c(95, 83, 76, 84) to tell r what the grades are. Similarly for the weights. Call them weights (or whatever you prefer) and write percentages as decimals. The command for finding the weighted average in R for this example is
grade<-c(95, 83, 76, 84)
weights<-c(0.15, 0.25, 0.25, 0.35)
weighted.mean(grade,weights)
## [1] 83.4