Introduction

Now that you’re comfortable with RStudio’s interface, it’s time to learn the fundamental building blocks of R programming. Everything we’ll learn here applies directly to volleyball analysis - we’ll use volleyball examples throughout.

By the end of this tutorial, you’ll be able to:

  • Store data in variables
  • Work with different types of data
  • Create and manipulate lists of data (vectors)
  • Organize data in tables (data frames)
  • Perform calculations on your data

Why These Fundamentals Matter

Before you can analyze DVW files and create scouting reports, you need to understand how R stores and works with data. Think of this as learning the basic skills before running complex plays.


Section 1: Variables and Assignment

What is a Variable?

A variable is a named container that stores information. Think of it like labeling a box so you can find what you put in it later.

# Create a variable
kills <- 15

# Now "kills" contains the value 15
# You can use it anywhere:
print(kills)

# Do math with it:
kills + 5

# Use it in calculations:
kills * 2

The Assignment Operator: <-

We use <- to assign values to variables. You can read it as “gets” or “is assigned.”

kills <- 15        # kills gets 15
errors <- 3        # errors gets 3
attempts <- 35     # attempts gets 35

# You can also use = but <- is preferred in R
# Both work the same:
kills = 15
kills <- 15  # This is the R convention

Why <- instead of =? While both work for assignment, <- is the R convention and makes your code more readable to other R users. We’ll use <- throughout these tutorials.

Naming Variables

Rules for variable names:

  1. Can contain letters, numbers, underscores _, and periods .
  2. Must start with a letter
  3. Cannot contain spaces
  4. Case-sensitive: Kills and kills are different
  5. Cannot use special R words like TRUE, FALSE, if, for, etc.
# Good variable names:
player_kills <- 15
player1_kills <- 15
kills.per.set <- 4.5
totalKills <- 15

# Bad variable names (will cause errors):
# 1kills <- 15           # Can't start with number
# player kills <- 15     # No spaces allowed
# player-kills <- 15     # Minus sign not allowed
# TRUE <- 15             # Can't use reserved words

Naming conventions (recommendations):

  • Use descriptive names: hitting_eff is better than h or x
  • Use underscores for multi-word names: player_kills (called “snake_case”)
  • Be consistent in your style throughout your code
  • Avoid using names of existing functions (like mean, sum, c)

Variables in Action: Volleyball Stats

# Player statistics from a match
player_name <- "Sarah Smith"
kills <- 15
errors <- 3
attempts <- 35
blocks <- 2
aces <- 1

# Calculate hitting efficiency
# Formula: (Kills - Errors) / Attempts
hitting_eff <- (kills - errors) / attempts

# Calculate points responsible for  
# Formula: Kills + Blocks + Aces
points_responsible <- kills + blocks + aces

# Display results
print(player_name)
print(hitting_eff)
print(points_responsible)

Key concept: Once you store a value in a variable, you can reuse it multiple times without retyping the value.


Section 2: Data Types

R works with different types of data. Understanding these types helps you avoid errors and use the right functions.

Numeric Data

Numbers used for calculations.

# Whole numbers and decimals are both numeric
kills <- 15
hitting_eff <- 0.343
set_score <- 25
pass_rating <- 2.34

# You can do math with numeric data
kills + 5
hitting_eff * 100  # Convert to percentage
set_score - 20

# Check the type
class(kills)          # Returns "numeric"
class(hitting_eff)    # Returns "numeric"

Character Data (Text)

Text data, enclosed in quotes (single or double).

# Text must be in quotes
player_name <- "Sarah Smith"
position <- "Outside Hitter"
team_name <- "Nebraska"

# You can also use single quotes
position <- 'Outside Hitter'

# Without quotes, R looks for a variable:
# team_name <- Nebraska  # Error! R looks for a variable called Nebraska

# Check the type
class(player_name)  # Returns "character"

Common mistake: Forgetting quotes around text

# Wrong:
# player <- Sarah  # Error: object 'Sarah' not found

# Right:
player <- "Sarah"

Logical Data (TRUE/FALSE)

Used for yes/no, true/false conditions. Always written in ALL CAPS without quotes.

# Logical values
is_starter <- TRUE
is_injured <- FALSE
won_set <- TRUE

# Logical operators create logical values
kills > 10              # TRUE if kills is greater than 10
hitting_eff >= 0.300    # TRUE if hitting efficiency is at least .300
errors == 0             # TRUE if errors equals exactly 0 (note: two equal signs!)

# Examples with volleyball data
kills <- 15
errors <- 3
attempts <- 35

# Check conditions
kills > 10                    # TRUE
errors == 0                   # FALSE (errors is 3)
attempts >= 30                # TRUE
hitting_eff <- (kills - errors) / attempts
hitting_eff >= 0.300          # TRUE (0.343 is greater than 0.300)

# Check the type
class(is_starter)  # Returns "logical"

Important comparison operators:

  • == equal to (note: TWO equal signs)
  • != not equal to
  • > greater than
  • < less than
  • >= greater than or equal to
  • <= less than or equal to

Why Data Types Matter

R treats different data types differently:

# Numbers can be added:
10 + 5              # Returns 15

# Text gets combined:
"Sarah" + "Smith"   # ERROR! Can't add text

# Correct way to combine text:
paste("Sarah", "Smith")  # Returns "Sarah Smith"

# Be careful with types:
number_as_text <- "15"
number_as_text + 5  # ERROR! Can't add text to number

# Convert types when needed:
as.numeric("15") + 5     # Returns 20
as.character(15)         # Returns "15" (text)

Section 3: Vectors

What is a Vector?

A vector is a list of values of the same type. Think of it like a stat sheet column - all the kills from each set, or all the player names on a roster.

# Create a vector with c() function
# c stands for "combine" or "concatenate"
set_scores <- c(25, 23, 25, 25)

# A vector of player names
rotation <- c("Sarah", "Emma", "Olivia", "Mia", "Ava", "Isabella")

# A vector of whether sets were won
sets_won <- c(TRUE, FALSE, TRUE, TRUE)

# Display them
print(set_scores)
print(rotation)

Creating Vectors

The c() function combines values into a vector:

# Numeric vector - kills by set
kills_by_set <- c(4, 5, 6, 4, 3)

# Character vector - positions
positions <- c("OH", "OH", "S", "MB", "MB", "L")

# Logical vector - whether each serve was an ace
serve_results <- c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE)

# You can also create sequences
set_numbers <- 1:5              # Creates c(1, 2, 3, 4, 5)
set_numbers <- c(1, 2, 3, 4, 5) # Same result

rotation_positions <- 1:6       # Creates c(1, 2, 3, 4, 5, 6)

Vector Operations

You can do math on entire vectors at once:

# Kills by set
kills <- c(4, 5, 6, 4, 3)

# Double all values
kills * 2  # Returns c(8, 10, 12, 8, 6)

# Add a constant to all values
kills + 1  # Returns c(5, 6, 7, 5, 4)

# Operations between vectors (element-by-element)
kills <- c(15, 12, 18, 10, 14)
errors <- c(3, 5, 2, 4, 3)
attempts <- c(35, 30, 40, 28, 32)

# Calculate hitting efficiency for each match
hitting_eff <- (kills - errors) / attempts
print(hitting_eff)

Useful Vector Functions

# Example: Set scores from a match
set_scores <- c(25, 23, 25, 22, 15)

# How many elements?
length(set_scores)  # Returns 5

# Sum of all elements
sum(set_scores)     # Returns 110 total points

# Average (mean)
mean(set_scores)    # Returns 22

# Minimum and maximum
min(set_scores)     # Returns 15
max(set_scores)     # Returns 25

# More examples with player stats
kills <- c(15, 12, 18, 10, 14, 16, 11)

sum(kills)          # Total kills: 96
mean(kills)         # Average: 13.7
max(kills)          # Best performance: 18
min(kills)          # Lowest: 10
length(kills)       # Number of matches: 7

Accessing Vector Elements

Use square brackets [] to access specific elements:

# Player rotation
rotation <- c("Sarah", "Emma", "Olivia", "Mia", "Ava", "Isabella")

# Get the first player
rotation[1]  # Returns "Sarah"

# Get the third player
rotation[3]  # Returns "Olivia"

# Get the last player
rotation[6]  # Returns "Isabella"

# Get multiple elements
rotation[c(1, 3, 5)]  # Returns "Sarah", "Olivia", "Ava"

# Get a range
rotation[1:3]  # Returns "Sarah", "Emma", "Olivia" (first three)

# Get all except certain elements
rotation[-1]       # All except first
rotation[-c(1,3)]  # All except first and third

Important: R uses 1-based indexing, meaning the first element is at position 1 (not 0 like some other programming languages).

Modifying Vectors

# Start with kills by set
kills <- c(4, 5, 6, 4)

# Change the third set's kills
kills[3] <- 7
print(kills)  # Now c(4, 5, 7, 4)

# Add a new element to the end
kills <- c(kills, 5)  # Add set 5
print(kills)  # Now c(4, 5, 7, 4, 5)

# Combine two vectors
first_half <- c(4, 5)
second_half <- c(6, 4)
all_kills <- c(first_half, second_half)
print(all_kills)

Section 4: Data Frames

What is a Data Frame?

A data frame is a table with rows and columns, like a spreadsheet. Each column is a vector, and all columns have the same length.

Think of it as a stat sheet where:

  • Each row is a player
  • Each column is a statistic
  • All players have the same stats tracked
# Create a data frame of player statistics
player_stats <- data.frame(
  name = c("Sarah", "Emma", "Olivia", "Mia", "Ava"),
  kills = c(15, 12, 18, 10, 14),
  errors = c(3, 5, 2, 4, 3),
  attempts = c(35, 30, 40, 28, 32)
)

# View the data frame
print(player_stats)

# Better view in RStudio:
View(player_stats)  # Opens in a spreadsheet-like viewer

Creating Data Frames

# Method 1: Create from scratch
match_stats <- data.frame(
  set_number = c(1, 2, 3, 4, 5),
  our_score = c(25, 23, 25, 22, 15),
  opp_score = c(23, 25, 18, 25, 10),
  our_kills = c(14, 13, 16, 12, 11),
  our_errors = c(4, 5, 3, 4, 2)
)

print(match_stats)

# Method 2: Create from existing vectors
names <- c("Sarah", "Emma", "Olivia")
positions <- c("OH", "OH", "MB")
heights <- c(72, 70, 74)  # inches

roster <- data.frame(
  player_name = names,
  position = positions,
  height_inches = heights
)

print(roster)

Accessing Data Frame Elements

Access Columns

# Using $ notation (most common)
player_stats$kills       # Get the kills column
player_stats$name        # Get the name column

# Using bracket notation
player_stats[, "kills"]  # Get the kills column
player_stats[, 2]        # Get the second column (kills)

# Get multiple columns
player_stats[, c("name", "kills")]

Access Rows

# Get first row (first player)
player_stats[1, ]

# Get third row
player_stats[3, ]

# Get multiple rows
player_stats[c(1, 3, 5), ]  # First, third, and fifth players

# Get range of rows
player_stats[1:3, ]  # First three players

Access Specific Elements

# Get element in row 2, column 3
player_stats[2, 3]  # Emma's errors

# Get element by name
player_stats[2, "errors"]  # Also Emma's errors

# Get Sarah's kills
player_stats[1, "kills"]
player_stats$kills[1]  # Alternative way

Understanding Data Frame Structure

# See structure
str(player_stats)

# Dimensions (rows, columns)
dim(player_stats)     # Returns c(5, 4) - 5 rows, 4 columns
nrow(player_stats)    # Number of rows: 5
ncol(player_stats)    # Number of columns: 4

# Column names
names(player_stats)
colnames(player_stats)  # Same thing

# First few rows
head(player_stats)      # First 6 rows by default
head(player_stats, 3)   # First 3 rows

# Last few rows
tail(player_stats)      # Last 6 rows by default
tail(player_stats, 2)   # Last 2 rows

# Summary statistics
summary(player_stats)

Adding Columns

# Calculate and add a new column
player_stats$hitting_eff <- (player_stats$kills - player_stats$errors) / 
                             player_stats$attempts

# View updated data frame
print(player_stats)

# Add another column: kills per attempt (efficiency)
player_stats$kill_pct <- player_stats$kills / player_stats$attempts

print(player_stats)

# Add a text column
player_stats$position <- c("OH", "OH", "MB", "OH", "S")

print(player_stats)

Modifying Data

# Change a specific value
player_stats$kills[1] <- 16  # Change Sarah's kills to 16

# Change multiple values
player_stats$kills[c(1,3)] <- c(16, 19)  # Change Sarah and Olivia's kills

# Recalculate hitting efficiency after changes
player_stats$hitting_eff <- (player_stats$kills - player_stats$errors) / 
                             player_stats$attempts

print(player_stats)

Filtering Data Frames (Preview)

We’ll learn much more about filtering in Tutorial 02, but here’s a preview:

# Get players with more than 12 kills
player_stats[player_stats$kills > 12, ]

# Get players with hitting efficiency above .300
player_stats[player_stats$hitting_eff > 0.300, ]

# Get just the names of high-efficiency players
player_stats$name[player_stats$hitting_eff > 0.300]

Section 5: Bringing It All Together

Complete Example: Match Statistics

Let’s create a complete analysis combining everything we’ve learned:

# Create a match statistics data frame
match_stats <- data.frame(
  set_number = 1:5,
  our_score = c(25, 23, 25, 22, 15),
  opponent_score = c(23, 25, 18, 25, 10),
  our_kills = c(14, 13, 16, 12, 11),
  our_errors = c(4, 5, 3, 4, 2),
  our_attempts = c(32, 35, 38, 33, 28),
  our_aces = c(2, 1, 3, 1, 2),
  opp_errors = c(3, 4, 5, 3, 2)
)

# Calculate additional statistics
match_stats$our_hitting_eff <- (match_stats$our_kills - match_stats$our_errors) / 
                                match_stats$our_attempts

match_stats$set_won <- match_stats$our_score > match_stats$opponent_score

match_stats$point_differential <- match_stats$our_score - match_stats$opponent_score

# View the complete data frame
print(match_stats)

# Calculate match totals
total_kills <- sum(match_stats$our_kills)
total_errors <- sum(match_stats$our_errors)
total_attempts <- sum(match_stats$our_attempts)
total_aces <- sum(match_stats$our_aces)

# Calculate match hitting efficiency
match_hitting_eff <- (total_kills - total_errors) / total_attempts

# How many sets did we win?
sets_won <- sum(match_stats$set_won)

# Display match summary
cat("Match Summary:\n")
cat("Sets Won:", sets_won, "out of", nrow(match_stats), "\n")
cat("Total Kills:", total_kills, "\n")
cat("Total Aces:", total_aces, "\n")
cat("Match hitting efficiency:", round(match_hitting_eff, 3), "\n")

# Find best and worst sets by hitting efficiency
best_set <- match_stats$set_number[which.max(match_stats$our_hitting_eff)]
worst_set <- match_stats$set_number[which.min(match_stats$our_hitting_eff)]

cat("Best Set (hitting %):", best_set, "\n")
cat("Worst Set (hitting %):", worst_set, "\n")

Practice Exercise: Team Roster

Create your own data frame and perform calculations:

# Create a team roster data frame with at least 5 players
# Include: name, position, height (inches), year (Fr/So/Jr/Sr)
# Example structure:

roster <- data.frame(
  name = c("Player1", "Player2", "Player3", "Player4", "Player5"),
  position = c("OH", "MB", "S", "OH", "L"),
  height = c(72, 74, 70, 71, 68),
  year = c("Jr", "Sr", "So", "Fr", "Jr")
)

# Tasks:
# 1. Calculate average team height
# 2. Find the tallest player
# 3. Count how many outside hitters (OH) are on the roster
# 4. Add a column for height in feet (height in inches / 12)
# 5. Filter to show only players 6 feet or taller

# Your code here:

Solution:

# 1. Average height
mean(roster$height)

# 2. Tallest player
roster$name[which.max(roster$height)]

# 3. Count outside hitters
sum(roster$position == "OH")

# 4. Add height in feet column
roster$height_feet <- roster$height / 12

# 5. Players 6 feet or taller (72 inches)
roster[roster$height >= 72, ]

Section 6: Common Patterns and Best Practices

Working with Missing Data

Sometimes data is missing. R represents this as NA (Not Available).

# Create data with missing values
kills <- c(15, NA, 18, 12, NA)

# Many functions will return NA if there's any missing data
mean(kills)  # Returns NA

# Remove missing values for calculation
mean(kills, na.rm = TRUE)  # Returns mean of 15, 18, 12

# Check for missing values
is.na(kills)  # Returns TRUE/FALSE for each element

# Count missing values
sum(is.na(kills))  # Returns 2

# Remove rows with missing data
kills[!is.na(kills)]  # Returns c(15, 18, 12)

Meaningful Variable Names

Always use descriptive names:

# Bad - unclear what these represent
x <- 0.345
y <- 15
z <- 35

# Good - clear and descriptive
hitting_percentage <- 0.345
kills <- 15
attempts <- 35

# Bad - too abbreviated
k <- 15
e <- 3
a <- 35

# Good - clear but concise
kills <- 15
errors <- 3
attempts <- 35

Code Organization

# Organize your scripts with clear sections

# ============================================================================
# SETUP: Load packages and set parameters
# ============================================================================
library(dplyr)
library(ggplot2)

# ============================================================================
# LOAD DATA: Read in match data
# ============================================================================
# (We'll learn this in Tutorial 02)

# ============================================================================
# CALCULATE STATISTICS: hitting efficiencys and efficiency
# ============================================================================
kills <- 15
errors <- 3
attempts <- 35
hitting_eff <- (kills - errors) / attempts

# ============================================================================
# CREATE VISUALIZATIONS: Charts and graphs
# ============================================================================
# (We'll learn this in Tutorial 03)

# ============================================================================
# EXPORT RESULTS: Save outputs
# ============================================================================
# (We'll learn this in Tutorial 03)

Comments Are Your Friend

# Good commenting practice

# Calculate team hitting efficiency
# Formula: (Total Kills - Total Errors) / Total Attempts
team_kills <- 66
team_errors <- 18
team_attempts <- 178

team_hitting_eff <- (team_kills - team_errors) / team_attempts

# Note: hitting efficiency should be between 0 and 1
# Values above 0.300 are considered excellent
if (team_hitting_eff > 0.300) {
  print("Excellent hitting performance!")
}

Summary and Key Takeaways

What You Learned

  1. Variables: Store data with meaningful names using <-
  2. Data Types:
    • Numeric (numbers)
    • Character (text in quotes)
    • Logical (TRUE/FALSE)
  3. Vectors: Lists of same-type data created with c()
  4. Data Frames: Tables with rows and columns
  5. Accessing Data: Use $, [], and indexing
  6. Calculations: Perform operations on entire vectors/columns

Essential Functions to Remember

# Creating data
c()              # Combine values into vector
data.frame()     # Create data frame

# Examining data
head()           # First few rows
tail()           # Last few rows
str()            # Structure of data
summary()        # Summary statistics
class()          # Type of object
length()         # Length of vector
dim()            # Dimensions of data frame
names()          # Column names

# Calculations
sum()            # Total
mean()           # Average
min()            # Minimum
max()            # Maximum
round()          # Round numbers

# Accessing data
$                # Access column: df$column
[]               # Index: vector[1] or df[1,2]
which.max()      # Index of maximum value
which.min()      # Index of minimum value

Practice Before Moving On

Before Tutorial 02, make sure you can:

  1. ✓ Create variables and assign values
  2. ✓ Distinguish between numeric, character, and logical data
  3. ✓ Create vectors with c()
  4. ✓ Access elements of vectors with []
  5. ✓ Create data frames
  6. ✓ Access columns with $
  7. ✓ Add new calculated columns
  8. ✓ Use basic functions like mean(), sum(), max()

Additional Practice Challenges

Try these on your own:

Challenge 1: Player Statistics

Create a data frame with 6 players including: name, kills, errors, attempts, blocks, aces. Calculate hitting efficiency and kills+blocks (total points contributed).

Challenge 2: Set Analysis

Create vectors for 5 sets with: our score, opponent score. Determine which sets we won, calculate point differential, find our average score.

Challenge 3: Rotation Stats

Create a data frame with 6 rotations including: rotation number, attacks, kills, errors. Calculate hitting efficiency per rotation and find the strongest rotation.


What’s Next?

In Tutorial 02: Working with DVW Files, you’ll learn:

  • How to set up R Projects for better organization
  • Using the here package for file paths
  • Loading actual volleyball match data from DVW files
  • Using dplyr to filter and manipulate play-by-play data
  • Grouping and summarizing statistics by player, rotation, and more

You now have the foundation to start working with real volleyball data!


Quick Reference Sheet

# Variables and Assignment
variable_name <- value

# Data Types
123                    # numeric
"text"                 # character  
TRUE, FALSE            # logical

# Vectors
c(1, 2, 3, 4)         # create vector
vector[1]              # first element
vector[1:3]            # first three elements
length(vector)         # number of elements
sum(vector)            # total
mean(vector)           # average

# Data Frames
data.frame(col1 = c(), col2 = c())  # create
df$column              # access column
df[1, ]                # first row
df[, "column"]         # column by name
df$new <- calculation  # add column
nrow(df)               # number of rows
ncol(df)               # number of columns

# Useful Functions
head(data)             # first few rows
summary(data)          # summary statistics
str(data)              # structure
class(object)          # type of object

Save this reference sheet - you’ll use these commands constantly in volleyball analysis!