Descriptive Statistics with R

To find descriptive statistics use df_stats(~variable, data=Dataset, statistic of interest)

As an example: df_stats(~variable, data=Dataset, mean) –gives mean df_stats(~variable, data=Dataset, median) –gives median df_stats(~variable, data=Dataset, sd) –gives standard deviation df_stats(~variable, data=Dataset, var) –gives variance df_stats(~variable, data=Dataset, range) –gives range df_stats(~variable, summary) –gives the 5 number summary and the mean

The advantages of this method, are that you can do 1.df_stats(~variable, data=Dataset, mean, median, sd, var) to find more than one descriptive statistic at the same time. 2.df_stats(quantitative variable ~ qualitative variable, data=Dataset, mean) –gives mean for the subgroups in the dataset. gf_boxplot(~variable, title=“type in what you want the title to be”) –produces a vertical modified boxplot with a title.

Example:

Cancer<-read.csv("https://krkozak.github.io/MAT160/cancer.csv")
df_stats(~survival, data=Cancer, mean) 
##   mean_survival
## 1       558.625
df_stats(~survival, data=Cancer, range) 
##   range_survival_1 range_survival_2
## 1               20             3808
df_stats(~survival, data=Cancer, sd)
##   sd_survival
## 1    776.4787
df_stats(~survival, data=Cancer,var)
##   var_survival
## 1     602919.1
df_stats(~survival, data=Cancer, summary)
##   Min. 1st Qu. Median    Mean 3rd Qu. Max.
## 1   20   102.5  265.5 558.625     721 3808
df_stats(~survival, data=Cancer, mean, median, sd)
##   mean_survival median_survival sd_survival
## 1       558.625           265.5    776.4787
gf_boxplot(~survival, data=Cancer, title="Survival time since chemotherapy")

Filtering

To Filter based on the value of a variable For example, if you want to find the mean, median, and standard deviation of survival time for each type of cancer in the dataset, and create a boxplot, follow the example

df_stats(survival~organ, data=Cancer, mean, sd)
##      organ mean_survival sd_survival
## 1   Breast     1395.9091   1238.9667
## 2 Bronchus      211.5882    209.8586
## 3    Colon      457.4118    427.1686
## 4    Ovary      884.3333   1098.5788
## 5  Stomach      286.0000    346.3096
gf_boxplot(survival~organ, data=Cancer, title="Survival time since chemotherapy")

Weighted Average

To find a weighted average use weighted.mean(x,p), where x and p are put in as variables and not as datasets. Suppose you scored 95 on an assignment that is worth 15% of your grade, 83 on an assignment that is worth 25% of your grade, 76 on something worth 25%, and 84 on something worth 35%. What grade do you have in the class. First, summarizing the information. This would be: grade: 95, 83, 76, 84 weight: .15, .25, .25, .35 Notice the weight adds to 1, and that you have to have two 0.25 since you have to give a weight for each assignment. To put the data in as variables, call grade x (you can call it whatever you want), and type in x<-c(95, 83, 76, 84) to tell R what the grades are. Similarly for the weights. Call them p (or whatever) and write percentages as decimals. The command for finding the weighted average in R for this example is

x<-c(95, 83, 76, 84)
p<-c(0.15, 0.25, 0.25, 0.35)
weighted.mean(x,p)
## [1] 83.4