Task One

Structures of browsers data frame

## 'data.frame':    5 obs. of  2 variables:
##  $ names      : Factor w/ 5 levels "Chrome","Firefox",..: 4 5 2 3 1
##  $ proportions: num  1 9 20 26 44

This shows the structure of the data. The structure shows that there are two variables, “name” and “proportions”. It also shows that there are five ovservations.

Charts for the data

This pie chart plots the Browser names against the proportion of users for each browser.

The bar graph is plotting the Browser names against the proportion of users.

Discussion

A pie chart is best used when representing parts of a whole as they can nor show changes over time. Bar graphs can be used to represent differences between gropes of data or to track data over a period of time. Visually the bar graph is a better representation of the above data as it shows the proportion of users for each browser in relation to the proportion of users for the other browsers. However, the pie chart makes it difficult to see some of the data/segments in relation to the rest of the data.

When presenting data with few data points,like the data above, a simple table could possibly be a better way to represent that data. However, when they is a lot of data points it may become difficult to see the data in a table. This is when a bar graph (or other type of graph/chart) would be a better way to represent the data as it would make it easier to quickly see which set of data is grater or fewer than the rest.

Task Two

## 'data.frame':    8 obs. of  2 variables:
##  $ Treatment: num  32.1 34.9 25.5 33.6 28 ...
##  $ Control  : num  45 28.3 36.9 34.9 47.5 ...

This shows that there are two vareables, “Treatment” and “Control” with 8 observations.

##    Treatment        Control     
##  Min.   :23.72   Min.   :28.26  
##  1st Qu.:26.02   1st Qu.:35.02  
##  Median :30.08   Median :36.41  
##  Mean   :30.06   Mean   :37.94  
##  3rd Qu.:33.92   3rd Qu.:41.31  
##  Max.   :36.38   Max.   :47.46

Ploting the data

# Task three

## 'data.frame':    8 obs. of  2 variables:
##  $ Treatment: num  32.1 34.9 25.5 33.6 28 ...
##  $ Control  : num  45 28.3 36.9 34.9 47.5 ...

This is a sumary of the datOL data. The sumary shows that there are two vareables “Treatment” and “Control”. It also showes that there are 8 observations. This set of data contains within it outliers that may have been recorded.

These boxplot and stripcharts show the spread of both sets of data, one with and one without outliers.

Robustness of statistics

##    Treatment        Control       
##  Min.   :23.72   Min.   :  28.26  
##  1st Qu.:26.02   1st Qu.:  35.65  
##  Median :30.08   Median :  40.94  
##  Mean   :30.06   Mean   :1528.55  
##  3rd Qu.:33.92   3rd Qu.:1035.59  
##  Max.   :36.38   Max.   :8000.00

The term robust in statistics refers to the strength of a statistical models, tests and procedures according to the specific conditions of the statistical analysis. Certain forms of statistical data are seen to also be robust. For example, the median and inter quartile range are seen to be robust where as the mean and the standard deviation are not. In this case the mean, maximum and third inter quartile range har impacted by the existing outliers. This could be due to there only being a few data points. However, the median and first inter quartile range are un affected by the outliers.

Unit 6: Assignment one

Michelle Pantling

22 November 2017