Descriptive Statistics with R

Descriptive statistics

To find descriptive statistics use df_stats(~variable, data=Dataset, statistic of interest)

As an example:

df_stats(~variable, data=Dataset, mean) –gives mean

df_stats(~variable, data=Dataset, median) –gives median

df_stats(~variable, data=Dataset, sd) –gives standard deviation

df_stats(~variable, data=Dataset, var) –gives variance

df_stats(~variable, data=Dataset, range) –gives range

df_stats(~variable, summary) –gives the 5 number summary and the mean

df_stats(~variable, data=Dataset, mean, median, sd, var) to find more than one descriptive statistic at the same time.

Example:

Using the dataset Cancer:

head(Cancer)

##   survival   organ
## 1      124 Stomach
## 2       42 Stomach
## 3       25 Stomach
## 4       45 Stomach
## 5      412 Stomach
## 6       51 Stomach

Find the mean:

df_stats(~survival, data=Cancer, mean)

##   response    mean
## 1 survival 558.625

Find the range:

df_stats(~survival, data=Cancer, range)

##   response range_1 range_2
## 1 survival      20    3808

Find the standard deviation:

df_stats(~survival, data=Cancer, sd)

##   response       sd
## 1 survival 776.4787

Find the variance:

df_stats(~survival, data=Cancer,var)

##   response      var
## 1 survival 602919.1

Find the five number summary:

df_stats(~survival, data=Cancer, summary)

##   response Min. 1st Qu. Median    Mean 3rd Qu. Max.
## 1 survival   20   102.5  265.5 558.625     721 3808

Finding several statistics at the same time:

df_stats(~survival, data=Cancer, mean, median, sd)

##   response    mean median       sd
## 1 survival 558.625  265.5 776.4787

Boxplot:

gf_boxplot(~variable, title=“type in what you want the title to be”) –produces a vertical modified boxplot with a title.

gf_boxplot(~survival, data=Cancer, title="Survival time since chemotherapy", fill="blue", color="blue")

Boxplot of cancer data

Faceting

If you wish to divide your data into a categorical variable, called faceting, then you use this command: df_stats(quantitative variable ~ categorical variable, data=Dataset, mean). This gives the mean for the faceting in the dataset. You can do multiple descriptive statistics for facets also. You can also facet with box plots by using quantitative variable~categorical variable in place of ~variable.

Example:

Suppose you want to find the mean, median, and standard deviation of survival time for each type of cancer in the dataset, and create a boxplot for each cancer type. The process is:

df_stats(survival~organ, data=Cancer, mean, sd)

##   response    organ      mean        sd
## 1 survival   Breast 1395.9091 1238.9667
## 2 survival Bronchus  211.5882  209.8586
## 3 survival    Colon  457.4118  427.1686
## 4 survival    Ovary  884.3333 1098.5788
## 5 survival  Stomach  286.0000  346.3096

gf_boxplot(survival~organ, data=Cancer, title="Survival time since chemotherapy", ylab="Time (months)", xlab="Cancer Type", fill=~organ, color=~organ)

Boxplot of cancer data facetted

Weighted Average

To find a weighted average use weighted.mean(x,p), where x and p are put in as variables and not as datasets.

Example:

Suppose you scored 95 on homework that is worth 15% of your grade, 83 on test 1 that is worth 25% of your grade, 76 on test 2 worth 25%, and 84 on final exam worth 35%. What grade do you have in the class. First, summarizing the information in a table.

Assignment	Grade	Weight
homework	95	0.15
test 1	83	0.25
test 2	76	0.25
final exam	84	0.35

Notice the weight adds to 1, and that you have to have two 0.25 since you have to give a weight for each assignment. To put the data in as variables, call grade (you can call it whatever you want), and type in grade<-c(95, 83, 76, 84) to tell r what the grades are. Similarly for the weights. Call them weights (or whatever you prefer) and write percentages as decimals. The command for finding the weighted average in R for this example is

grade<-c(95, 83, 76, 84)
weights<-c(0.15, 0.25, 0.25, 0.35)
weighted.mean(grade,weights)

## [1] 83.4