Quickstart

Martin Barner

2019-01-30

Pretext

The hypegrammaR package implements the IMPACT quantitative data analysis guidelines. While this guide works on its own, all of this will make a lot more sense if you read those first.

What you need:

So you want to analyse some data. cool. get all your stuff as csv files first. All your stuff means:

1. The simplest case (simple random sampling, no labels, no skiplogic, no special handling of select_multiple questions)

2. For stratified/cluster sampling:

3. To analyse select_multiple, to add labels and to correctly analyse skiplogic:

Simplest Case

Let’s assume for now that we have nothing but the data from a simple random survey design (not weighted, no cluster sampling..) .

Preparation

Install the hypegrammaR package (this line you only have to run once when using hypegrammaR for the first time, or to update to a new version):

devtools::install_github("mabafaba/hypegrammaR",build_opts = c())

(the build_opts = c() makes sure the package includes extra help pages & documentation)

Load the hypegrammaR package:

library(hypegrammaR)

Load the data:

mydata<-read.csv("../tests/testthat/data.csv")

Run the analysis

Identify your analysis parameters.

Now we can use the function analyse_indicator with the above as parameters:

result<-analyse_indicator(
                  data = mydata,
                  dependent.var = "nutrition_need",
                  independent.var = "region",
                  hypothesis.type = "group_difference",
                  independent.var.type = "categorical",
                  dependent.var.type="numerical",
                  sampling.strategy.cluster = F,
                  sampling.strategy.stratified = F)

You can find out what exactly you can/should enter for these parameters by running ?analyse_indicator, which will open the help page for the function (this works for any function)

See the results

The analyse_indicator function gives you a number of things:

The log message

First, a message telling you how it went:

result$message
#> [1] "success (or unidentified issue)"

That’s what we want to see. If something went wrong, it should tell you here what happened.

meta information

result$parameters
#> $dependent.var
#> [1] "nutrition_need"
#> 
#> $independent.var
#> [1] "region"
#> 
#> $dependent.var.type
#> [1] "numerical"
#> 
#> $independent.var.type
#> [1] "categorical"
#> 
#> $hypothesis.type
#> [1] "group_difference"
#> 
#> $sampling.strategy.cluster
#> [1] FALSE
#> 
#> $sampling.strategy.stratified
#> [1] FALSE
#> 
#> $case
#> [1] "CASE_group_difference_numerical_categorical"
#> attr(,"class")
#> [1] "analysis_case"

As you can see, it remembers what your input parameters were. It also added a standardised name of the analysis case.

The summary statistic

result$summary.statistic
#>                dependent.var.value independent.var.value   numbers
#> capitalcentral                  NA        capitalcentral 0.2724824
#> east                            NA                  east 0.2006114
#> north                           NA                 north 0.2956114
#> northeast                       NA             northeast 0.2002275
#> south                           NA                 south 0.1142410
#> southeast                       NA             southeast 0.2470808
#> west                            NA                  west 0.2521008
#>                         se       min       max
#> capitalcentral 0.006409252 0.2599205 0.2850443
#> east           0.007828254 0.1852683 0.2159545
#> north          0.007581281 0.2807523 0.3104704
#> northeast      0.007792918 0.1849537 0.2155014
#> south          0.005627861 0.1032106 0.1252714
#> southeast      0.009321706 0.2288106 0.2653510
#> west           0.008486603 0.2354674 0.2687343

In this case, “numbers” are averages, because the input variable was numerical. min and max is the corresponding confidence interval. dependent.var.value give the corresponding variable values if they are categorical (NA otherwise.) The summary statistic will always be organised with exactly these columns, no matter what analysis you did. This is so that if you add a new visualisations or ouput format, it will work for any output from this function.

The hypothesis test

Next, there’s information on which (if any) hypothesis test was used and the p value:

result$hypothesis.test
#> $result
#> $result$t
#> [1] -7.103762
#> 
#> $result$p.value
#> [1] 1.25167e-12
#> 
#> 
#> $parameters
#> $parameters$df
#> [1] 21655
#> 
#> 
#> $name
#> [1] "two sample ttest on difference in means (two sided)"

ou’ll probably be most interested in the p-value and the type of test that was used.

The visualisation

Finally, it returned a plot function that is appropriate to visualise the type of analysis that was performed:

labels_summary_statistic<-function(x){x}
visualise<-map_to_visualisation(result$parameters$case)

myvisualisation<-visualise(result)

myvisualisation

For advanced users (that know ggplot): The visualisation function returns a ggplot object, so you can add/overwrite ggplot stuff; for example:

myvisualisation+coord_polar()

Save/export the results

To save/export any results, you can use the generic map_to_file function. For example:

map_to_file(results$summary.statistics,"summary_stat.csv")
map_to_file(myvisualisation,"barchart.jpg",width="5",height="5")

You will find the files in your current working directory (which you can find out with getwd())

Cluster/Stratified Samples

Using the Questionnaire