Setting Your Working Directory

Remember, your files and your script need to be in the same folder location. Additionally, R has to know where to find those files. If you are not sure that your files are in the same location that you are working in, you can check by doing:

getwd()

This will print out your current working directory. If the folder it shows you is NOT the folder where your script and data files are stored, you will need to set the working directory to the correct folder. To do this follow these steps:

Go to “Session”
Select “Set Working Directory”
Select “Choose Directory”
Choose the folder that holds your script and data files
Rerun the code getwd() to ensure you are in the correct working directory now.

Installing and Loading Packages

The very first time you use a package, get a new computer, or update R, you will need to reinstall your packages to the R program. The general code for installing a package is install.packages(). An example can be seen below. Notice that you do need quotation marks around the package name at this stage.

Please note that the text in GREEN will need to be changed by you to list the proper package name.

                  #Insert package name here in place of the red text.
install.packages("ggplot2")

After you install a package, or restart R, you will need to load the library. Note that this must be done each time you open R, not just after installing the package. This is done with the general code library(). Notice you do not need the quotations around the package name at this stage.

        #Insert package name here.
library(ggplot2)

If you forget to read in the library of a package before you run a command that requires that package, you will get an error similar to this one: “Error in ggplot(): could not find function”ggplot”.

When you get an error that states it cannot find the function, you want to check two things (1) you loaded the library for the function and (2) there are no typos in the function name. If you are not sure which packages/library holds the function, you can find out by using the command

# type a question mark followed by the function command you want to check
 ?ggplot

This will print the help page for the function. At the very top of the page you will see the function and package written as “function {package name}”. This page also includes helpful information on how to format the code for the function and examples of how to write it.

Formatting and Reading in Data

We often give you data that is in “wide” format. R handles data that is in “long” format more efficiently for the analyses that you do.

For more information on wide vs long data, check this out

Make sure you data is in long format before you try to read it into R. This means that each column of your dataset should contain a single variable, and one variable should not be spread out over multiple columns. If the data is in the proper format, we use this code to read in the data: Rdataname=read.csv("filename.csv", header =T). An example is shown below.

#You can name the data anything you'd like. But you are going to reference it many times as you move forward. 
#So it should be easy to type and intuitive. In this case, I named mine "ExDat" because it is the example dataset. 
      #We use "read.csv" because it is a CSV file type. There are other options, but this is the one we use. 
ExDat=read.csv("ExDataFile.csv", header =T)
                #This file name must match EXACTLY as it appears in your folder. 
                                # header=T tells R that the very first row of your excel CSV file actually is your variable names, and not a data point.

Some common misteps here are misusing the header=T part of the code or forgetting the “.csv” at the end of the file name. So watch out for those potential issues. If your code has run properly, you should see the dataframe listed in your environment (top right hand corner of the screen), and the following code should let you preview your data:

#please note that you will have to change "ExDat" to whatever you named your specific dataframe
head(ExDat)

##   Categorical Subcategory Numerical1 Numerical2 Integer
## 1           A           1       0.08       0.25       1
## 2           B           2       0.10       0.41       2
## 3           C           1       0.10       0.66       1
## 4           D           2       0.53       0.04       4
## 5           A           1       0.00       0.69       2
## 6           B           1       0.54       0.73       4

Sometimes the way that we enter data in excel can be misleading for R. Remember, R is simply a computer program. It interprets the data and code exactly as it is written, sometimes to a fault. So before you move on, you want to check and make sure that the data is all in the proper format (per column). For example, certain statistical tests and graphs require either categorical or continuous data. So you want to be sure that your data is being interpreted correctly. In order to check the “class” assigned to each variable, you can use the code str(), which prints the “structure” of the dataframe.

#please note that you will have to change "ExDat" to whatever you named your specific dataframe
str(ExDat)

## 'data.frame':    41 obs. of  5 variables:
##  $ Categorical: chr  "A" "B" "C" "D" ...
##  $ Subcategory: int  1 2 1 2 1 1 2 1 2 1 ...
##  $ Numerical1 : num  0.08 0.1 0.1 0.53 0 0.54 0.6 0.23 0.4 0.67 ...
##  $ Numerical2 : num  0.25 0.41 0.66 0.04 0.69 0.73 0.73 0.87 0.38 0.72 ...
##  $ Integer    : int  1 2 1 4 2 4 5 2 5 3 ...

At this point, the Subcategory column is being read as an “int” or integer. We want it to be a categorical variable. In order to change from one class to another, we can use the general code format of DFname$Variable=as.class(DFname$Variable) where you enter your dataframe name as the “DFname”, the variable you want to change as the “Variable”, and you put the type of data you want it to be classified as in the “as.class” portion. An example can be seen below:

#I want to change the variables "category" and "subcategory" in the dataframe "ExDat" to a factor (categorical) based variable. This code tells it to look at that variable, change it to a factor, and overwrite the values in the original dataframe with the new factor based values. 
ExDat$Categorical=as.factor(ExDat$Categorical)
ExDat$Subcategory=as.factor(ExDat$Subcategory)

If you want to go to/from other data types, you can start to type as. and the list of class options will pop-up for you to choose from. If you are not sure which one to use, you can always double check by using the ? before the function to read the help summary. Some commonly used options are: 1. as.factor() 2. as.numeric() 3. as.integer() 4. as.character()

For more information on R variable types and how to choose the right one, check this out

After recategorizing a variable, you want to double check that it worked by looking at the structure of your data again.

#please note that you will have to change "ExDat" to whatever you named your specific dataframe
str(ExDat)

## 'data.frame':    41 obs. of  5 variables:
##  $ Categorical: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Subcategory: Factor w/ 2 levels "1","2": 1 2 1 2 1 1 2 1 2 1 ...
##  $ Numerical1 : num  0.08 0.1 0.1 0.53 0 0.54 0.6 0.23 0.4 0.67 ...
##  $ Numerical2 : num  0.25 0.41 0.66 0.04 0.69 0.73 0.73 0.87 0.38 0.72 ...
##  $ Integer    : int  1 2 1 4 2 4 5 2 5 3 ...

You can see that the data is now set up with “Category” and “Subcategory” as factor based variables, with “Numerical1” and “Numerical2” as “num” or numerical variables. Integer data remains an “int” or integer. We are now ready to move on.

Producing Summary Statistics

There are a number of reasons we produce summary statistics. The first is simply that it gives us a quick view into how our data might look. The second is that when we run statistical tests, they often produce p-values and test statistics that aren’t particularly intuitive. We needs descriptive statistics of the data in order to explain the results to someone in a way they will understand.

One way to quick a quick view of summary statistics is by using the summary() command. Using this command, you will get a results print out showing summary statistics for each column in your dataframe. For continuous data (numerical and integer) you will get the Mean, Median, Min, Max, and Quantiles for each column. For categorical data, you will get the number of observations within each group.

#please note that you will have to change "ExDat" to whatever you named your specific dataframe
summary(ExDat)

##  Categorical Subcategory   Numerical1       Numerical2        Integer     
##  A:11        1:24        Min.   :0.0000   Min.   :0.0400   Min.   :1.000  
##  B:10        2:17        1st Qu.:0.2300   1st Qu.:0.2900   1st Qu.:2.000  
##  C:10                    Median :0.5300   Median :0.5300   Median :3.000  
##  D:10                    Mean   :0.4961   Mean   :0.5332   Mean   :3.049  
##                          3rd Qu.:0.7200   3rd Qu.:0.7300   3rd Qu.:4.000  
##                          Max.   :0.9800   Max.   :0.9900   Max.   :5.000

However, when you have large dataframes, sometimes this is way more detail that you want or need. Sometimes you just want summary statistics for a specific column. To do that, you can specific a variable: summary(DFname$Variable) or class(DFname$Variable)

#please note that you will have to change "ExDat" to whatever you named your specific dataframe and Numerical1 to whatever variable you want descriptive statistics for
summary(ExDat$Numerical1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2300  0.5300  0.4961  0.7200  0.9800

#This can also be done using the "class" command
class(ExDat$Numerical1)

## [1] "numeric"

You will notice that this provided a single mean for the whole column of “Numerical1”. But what if I want to know the mean for each Category? In that case, we use a specialized command called SummarySE() that is housed within the Rmics package. You will need to install and load the package as we discussed before.

#Remember, you only need to install the package the first time you use it, if you get a new computer, or if you update R. 
install.packages("Rmisc")

#Load the library. 
#You will need to do this each and every time you open R. 
library(Rmisc)

You can now setup the code. This can be read as “Within the dataset ExDat, I want descriptive statistics on the variable we measured (Numerical1) grouped by my categorical variable (Categorical), and do it by removing all missing data (NAs in the dataset).

This prints the category name, the number of observations within each category, the mean for your measurevar (“Numerical1”), the standard deviation, standard error, and confidence interval values for each grouping (Category).

#summarySE command will tell it to get summary statistics
          #Put your dataframe name after "data="
                      #the measurevar is the numerical variable you want summary statistics for
                                                #groupvars is the variable that you want your group means divided by (one mean for each category)
                                                                        #remove any missing data from the dataset in order to run
summarySE(data=ExDat, measurevar="Numerical1", groupvars="Categorical", na.rm=T)

##   Categorical  N Numerical1        sd         se        ci
## 1           A 11  0.4063636 0.3079699 0.09285642 0.2068970
## 2           B 10  0.4490000 0.2264435 0.07160773 0.1619879
## 3           C 10  0.6640000 0.3077950 0.09733333 0.2201833
## 4           D 10  0.4740000 0.2985223 0.09440104 0.2135500

When you type it this way, it prints the answer for you immediately, but the results are not saved as a dataframe or in a variable. We sometimes want to save the results as a data.table so that you can reference the values for statistical tests and graphics. To do so, set it equal to something before you run it.

#I am saving the results as "sSE.results"
sSE.results=summarySE(data=ExDat, measurevar="Numerical1", groupvars="Categorical", na.rm=T)

When you do this, it does not automatically print the results. You have to “call” them to view them. You can do that by simply typing the name of the assingment you gave it. You will also see the results in your environment now (top right hand corner of R consule). This is how you know that you can reference it in future commands.

sSE.results

##   Categorical  N Numerical1        sd         se        ci
## 1           A 11  0.4063636 0.3079699 0.09285642 0.2068970
## 2           B 10  0.4490000 0.2264435 0.07160773 0.1619879
## 3           C 10  0.6640000 0.3077950 0.09733333 0.2201833
## 4           D 10  0.4740000 0.2985223 0.09440104 0.2135500

When we ask you to write results summaries, you will reference these numbers to compare the means of different comparison groups. You need these values in addition to the results (T value, F value ,p-value) provided by statistical tests.

You can mathematically compare the means given within the table as well. If I wanted to report the diffence between Numerical2 between Subcategories, I would use the following code. Test yourself! See if you can interpret the line of code below without any annotations provided to you.

sSE.results2=summarySE(data=ExDat, measurevar="Numerical2", groupvars="Subcategory", na.rm=T)
sSE.results2

##   Subcategory  N Numerical2        sd         se        ci
## 1           1 24  0.5300000 0.2842840 0.05802923 0.1200426
## 2           2 17  0.5376471 0.2978366 0.07223598 0.1531334

From here, I can compare the specific means given within the sSE.results2 table. In order to calculate the difference between two groups, you subtract value 2 from value 1, and divide by value 2. Try to think about this logically first. If I todl you that you got an 85 on the first test, and a 93 on the second test, and I wanted to know how much you improved. You would do (93-85)/85. If you do that in R: (93-85)/85, you get 0.094. This would be equivalent to a 9.4% increase in score, which makes sense.

You can follow the same logic here. I want to know the difference in Numerical1 between Subcategory 1 and Subcategory 2. Within my “sSE.results2” table, the mean of Subcategory 1 is in Row 1, Column 3. The notation of that is [1,3]. The Subcategory 2 mean is in Row 2, Column 3 of the table, which is noted as [2,3]. You can reference those locations in your code to subtract the values.

#Take the value in row 1, column 3 of sSE.results2 (mean of Subcategory 1)
                    #Take the value in row 2, column 3 of sSE.results2 (mean of Subcategory 2)
                                      #divide by the value in row 2, column 3 of sSE.results2 (mean of Subcategory 2)
(sSE.results2[1,3]-sSE.results2[2,3])/sSE.results2[2,3]

## [1] -0.01422319

This provides you with the difference in the values on a scale of 0-1. Now you multiply by 100 to get a “percent difference” between the means of the group. So the difference between my groups would be 1.4%. This is equal to the 0.14 * 100. The negative means that subcategory 1 was lower than subcategory 2. You can also see the direction by simply looking at your table to see which mean is lower. So in this case, Numerical1 was 1.4% lower in subcategory 1 than subcategory 2.

T-Tests

Remember, for a t-test to work, your categorical variable must have two factor levels. If you have more than two, you need a different test (see next section). You can start by looking at your data again. If you look at the output, my “subcategory” varibale has two levels and is appropriate for a T-test.

str(ExDat)

## 'data.frame':    41 obs. of  5 variables:
##  $ Categorical: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Subcategory: Factor w/ 2 levels "1","2": 1 2 1 2 1 1 2 1 2 1 ...
##  $ Numerical1 : num  0.08 0.1 0.1 0.53 0 0.54 0.6 0.23 0.4 0.67 ...
##  $ Numerical2 : num  0.25 0.41 0.66 0.04 0.69 0.73 0.73 0.87 0.38 0.72 ...
##  $ Integer    : int  1 2 1 4 2 4 5 2 5 3 ...

When comparing two groups, we use a t-test to do the statistical comparison. The general code for setting up a t-test is t.test(NumVar~CatVar, data=DFname), where you will need to enter your numerical variable for “NumVar”, your categorical variable for “CatVar”, and the dataframe name that holds your data after “data=”. Use the ?t.test command to learn more about how to set up a t-test in R.

#The command is to complete a t-test
      #always put your measured (numerical) variable first
                    #put your categorical variable with two levels after the numerical
                                  #tell it where to find the data in the dataframe
t.test(Numerical1 ~ Subcategory, data=ExDat)

## 
##  Welch Two Sample t-test
## 
## data:  Numerical1 by Subcategory
## t = -0.19106, df = 36.631, p-value = 0.8495
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -0.2057111  0.1702699
## sample estimates:
## mean in group 1 mean in group 2 
##       0.4887500       0.5064706

#This can be read as "perform a t-test to examine difference in numerical1 between subcategories within the ExDat dataframe".

Make sure to check the manual on how to properly report t-test results. It will look something like: “t([insert df]) = [insert T-value], p= [inset p-value]”. Example based on code above: (t(36.6) = -0.19, p = 0.84))

Anova and Post-hoc Analysis

An ANOVA is used when you have more than 2 groups within your categorical variable. For example, if you have “Control”, “Treatment 1”, and “Treatment 2” as your categories, you would need to do multiple t-test to compare them all (control- treatment 1; control-treatment 2; treatment 1- treatment 2) which is statistically inappropriate. Remember, this results in cumulative error, and inflated p-values that leads to increase risk fo Type I error.

The ANOVA allows you to compare all three treatments in one, single test. Like the t-test, you need one numerical variable and one categorical variable (this time with 3+ categories).

str(ExDat)

## 'data.frame':    41 obs. of  5 variables:
##  $ Categorical: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Subcategory: Factor w/ 2 levels "1","2": 1 2 1 2 1 1 2 1 2 1 ...
##  $ Numerical1 : num  0.08 0.1 0.1 0.53 0 0.54 0.6 0.23 0.4 0.67 ...
##  $ Numerical2 : num  0.25 0.41 0.66 0.04 0.69 0.73 0.73 0.87 0.38 0.72 ...
##  $ Integer    : int  1 2 1 4 2 4 5 2 5 3 ...

Looking at the data set, either Numerical1 or Numerical2 can be our numerical variables. Subcategory only has two levels (categories), we used a t-test to test it. However, “Categorical” has 4 levels (categories), and therefore is tested using an ANOVA. The general code for setting up an ANOVA is aov(NumVar~CatVar, data=DFname), where you will need to enter your numerical variable for “NumVar”, your categorical variable for “CatVar”, and the dataframe name that holds your data after “data=”. Use the ?aov command to learn more about how to set up an ANOVA in R.

#save your results an object so you can call it later. This can be whatever name you want, but make sure it is intuitive. 
      #use the aov command to run an ANOVA
          #First, list your numerical variable you are testing
                        #Next call your categorical variable
                                    #indicate which dataframe your variables are housed in.
ex.aov=aov(Numerical1 ~ Categorical, data=ExDat)
#This can be read as "Compare Numerical1 by Category in ExDat dataframe. 

#Call your ANOVA results object to see the printed output
ex.aov

## Call:
##    aov(formula = Numerical1 ~ Categorical, data = ExDat)
## 
## Terms:
##                 Categorical Residuals
## Sum of Squares    0.3975511 3.0646245
## Deg. of Freedom           3        37
## 
## Residual standard error: 0.287798
## Estimated effects may be unbalanced

This prints the default ANOVA results, however it does not include most of the information you need to report your statistical results. In order to see that information, like the F and P-value, you need to see a summary of the test summary().

summary(ex.aov)

##             Df Sum Sq Mean Sq F value Pr(>F)
## Categorical  3 0.3976 0.13252     1.6  0.206
## Residuals   37 3.0646 0.08283

You now have your p-value for your ANOVA, along with the F-value (ANOVA test statistic) and your degrees of freedom. Make sure to check the manual on how to properly report ANOVA results. It will look something like: “F([DOFn, DOFd]) = [insert F-value], p= [inset p-value]”. DOF in this case is the Degrees of Freedom reported within the test. Example based on code above: (F(3,37) = 1.6 p = 0.206)).

From here, you can see a single p-value that tells you if there is significant variance among your treatments. However, you don’t know which relationships are driving the relationships. To get pairwise comparisons, similiar to what you get from t-tests, we do what is referred to as a “post-hoc” test. This manually completes p-value adjustments to account for repeated testing, which corrects for the compounded error we warn you against with doing multiple t-tests instead of an ANOVA.

The general code for completing the post-hoc with a Tukey P-value adjustment is TukeyHSD().

#run the TukeyHSD command and call/reference the object that stores you ANOVA results. You will need to change the term inside your parentheses to match whatever you saved your ANOVA under. 
TukeyHSD(ex.aov)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Numerical1 ~ Categorical, data = ExDat)
## 
## $Categorical
##            diff         lwr       upr     p adj
## B-A  0.04263636 -0.29559518 0.3808679 0.9863643
## C-A  0.25763636 -0.08059518 0.5958679 0.1890128
## D-A  0.06763636 -0.27059518 0.4058679 0.9491994
## C-B  0.21500000 -0.13119102 0.5611910 0.3533652
## D-B  0.02500000 -0.32119102 0.3711910 0.9973555
## D-C -0.19000000 -0.53619102 0.1561910 0.4616992

This output gives you pairwise comparisons across each of your categories. In this example, none of the pairwise relationships are significant. However, you would want to reference these relationships in your comparisons when you write your results summary. Remember, these are the statistical comparisons, but to see the means for each Category, you would need to run summarySE to get the means for each group.

Graphics

ggplot is an incredibly powerful graphics tool. While it can take a bit of work to learn the code, once you learn it, there are very few graphics you can’t make using the same base code.

For more help on the logic of building a graph

This is also helpful

#You will need to install the package if you have not already 
install.packages("ggplot2")

#don't forget to read in the appropriate library!
library(ggplot2)

The first line of code always does the same thing: ggplot(DFname, aes(x=Var, y=Var)). This line uses the command “ggplot” to produce a plot from a specific dataframe (DFname), and you define the X and Y axes of the plot. “aes” means aesthetics - ” deals with the principles of beauty and artistic taste”. When you see “aes”, you are telling ggplot how you want the code to look.

The second line of the code describes the type of plot you are going to make. When you use the code geom_bar() you are telling ggplot that you want it to produce a barplot. When you put geom_bar(stat="identity") you are indicating what the bars will show. The heights of the bars commonly represent one of two things: (1) either a count of cases in each group, or (2) the values in a column of the data frame, such as a representation of mean. By default, geom_bar uses stat=“bin”. This makes the height of each bar equal to the number of cases in each group (#1 above)and will likely produce an error. If you want the heights of the bars to represent values (such as means) in the data, use stat=“identity” and map a value to the y aesthetic. A barplot is one MANY plots you can make with ggplot. The different types of geom_ options include:

Geom Graphic Types

All lines of code following these two are variable depending on what you want the graph to look like. The first two lines are mandatory. You must tell it what data to use, and what type of plot to make. Those lines alone will produce a graph. Everything afterward is to change the appearance of the graph or provide additional information on the plot.

ggplot(sSE.results, aes(x=Categorical, y=Numerical1)) +
  geom_bar(stat="identity")

In your code, the third line is used to add error bars to the plot. We ask you to base your error bars on the standard error. If you remember, you found standard error as part of your summarySE output. Lets take a look at mine:

sSE.results

##   Categorical  N Numerical1        sd         se        ci
## 1           A 11  0.4063636 0.3079699 0.09285642 0.2068970
## 2           B 10  0.4490000 0.2264435 0.07160773 0.1619879
## 3           C 10  0.6640000 0.3077950 0.09733333 0.2201833
## 4           D 10  0.4740000 0.2985223 0.09440104 0.2135500

You can see that I have a column that is labeled “se” and one that is labeled “Numerical1” and another called “Category”. Remember to always look at the object you are pulling your data from to ensure that you call the correct variables.

The general code to add errorbars to a graphic is geom_errorbar(aes(ymin=NumVar - se, ymax=NumVar + se), width=0.5). This code tells it to add an errorbar to each bar (each category) that is a certain value below (ymin=NumVar - se) and above (ymax=NumVar + se) the mean for each bar. When you use summarySE, the “se” variable is automatically called that. So you will not have to change it, but you will have to replace “NumVar” with your numerical variable.

ggplot(sSE.results, aes(x=Categorical, y=Numerical1)) +
  geom_bar(stat="identity") +
  geom_errorbar(aes(ymin=Numerical1-se, ymax=Numerical1+se), width=0.5)

The “width=0.5” part of the command tells R how wide to make the errorbars visually. Try changing it from 0.5 to 0.2 and see what happens!

ggplot(sSE.results, aes(x=Categorical, y=Numerical1)) +
  geom_bar(stat="identity") +
  geom_errorbar(aes(ymin=Numerical1-se, ymax=Numerical1+se), width=0.2)

The xlab() and ylab() commands rename your x and y axis to something that makes sense. Because of how we feed data into R, sometimes the way we label variables doesnt make much sense to someone else that isnt familiar with the data. So here, we can rename them! Whatever you put into the parentheses will rename your axis labels, but will not alter the data itself in any way. In this example, I have renamed “Numerical1” to “Numerical Varibale Label” and “Categorical” to “Categorical Variable Label”.

ggplot(sSE.results, aes(x=Categorical, y=Numerical1)) +
  geom_bar(stat="identity") +
  geom_errorbar(aes(ymin=Numerical1-se, ymax=Numerical1+se), width=0.5) + 
  theme_classic() +
  xlab("Categorical Variable Label") +
  ylab("Numerical Variable Label")

One more option is to strategically place text on your graphs to provide the viewers with more information. You can do that using the geom_text() command. There are many ways you can use geom_text, but when we are trying to add labels to our bars, we follow this format: geom_text(labels = c("label1", "label2", etc..), aes(y=NumVar + se + VALUE), x=CatVar, size=3). This code is telling ggplot to add text that you have assigned (Label1, Label2, Label3, etc… can be anything you want) to each bar (x=CatVar) at a certain height above the bar on the y-axis (y=NumVar + se +VALUE).

Common Mistakes 1. The number of labels you have must match the total number of bars you have! 2. The value that you enter in the y= statement determines how far about the bar your label is. If you have too much white space above the bar, decrease this value. If the labels are on top of the bar, increase the value. 3. size= only changes the FONT size of the label. It does not move it in any way. 4. There is often a missing/additional parentheses in this line of code. Check here first if you are getting an error related to “label”.

When you add all of these components together, you get a fully functional plot! It will have a bar showing the mean for each category, with the standard error displayed using error bars, and then the statistical signficance is indicated using text labels above each bar.

ggplot(sSE.results, aes(x=Categorical, y=Numerical1)) +
  geom_bar(stat="identity") +
  geom_errorbar(aes(ymin=Numerical1-se, ymax=Numerical1+se), width=0.5) + 
  theme_classic() +
  xlab("Categorical Variable Label") +
  ylab("Numerical Variable Label") +
  geom_text(label=c("A", "A", "A", "A"), aes(y=(Numerical1+se+0.1), x=Categorical), size=5)

If you get stuck with ggplot, GOOGLE IT! There are millions of resources available. It is an incredibly popular tool, and someone before has very likely had the same issue. If you google an error, or a goal that you want to meet, there is likely code online to show you how to do it.

Scripting ‘Cheat Sheets’

Abby Beatty

2024-02-15