Now that you’re comfortable with RStudio’s interface, it’s time to learn the fundamental building blocks of R programming. Everything we’ll learn here applies directly to volleyball analysis - we’ll use volleyball examples throughout.
By the end of this tutorial, you’ll be able to:
Before you can analyze DVW files and create scouting reports, you need to understand how R stores and works with data. Think of this as learning the basic skills before running complex plays.
A variable is a named container that stores information. Think of it like labeling a box so you can find what you put in it later.
<-We use <- to assign values to variables. You can read
it as “gets” or “is assigned.”
kills <- 15 # kills gets 15
errors <- 3 # errors gets 3
attempts <- 35 # attempts gets 35
# You can also use = but <- is preferred in R
# Both work the same:
kills = 15
kills <- 15 # This is the R conventionWhy <- instead of =?
While both work for assignment, <- is the R convention
and makes your code more readable to other R users. We’ll use
<- throughout these tutorials.
Rules for variable names:
_, and
periods .Kills and kills are
differentTRUE,
FALSE, if, for, etc.# Good variable names:
player_kills <- 15
player1_kills <- 15
kills.per.set <- 4.5
totalKills <- 15
# Bad variable names (will cause errors):
# 1kills <- 15 # Can't start with number
# player kills <- 15 # No spaces allowed
# player-kills <- 15 # Minus sign not allowed
# TRUE <- 15 # Can't use reserved wordsNaming conventions (recommendations):
hitting_eff is better than
h or xplayer_kills
(called “snake_case”)mean,
sum, c)# Player statistics from a match
player_name <- "Sarah Smith"
kills <- 15
errors <- 3
attempts <- 35
blocks <- 2
aces <- 1
# Calculate hitting efficiency
# Formula: (Kills - Errors) / Attempts
hitting_eff <- (kills - errors) / attempts
# Calculate points responsible for
# Formula: Kills + Blocks + Aces
points_responsible <- kills + blocks + aces
# Display results
print(player_name)
print(hitting_eff)
print(points_responsible)Key concept: Once you store a value in a variable, you can reuse it multiple times without retyping the value.
R works with different types of data. Understanding these types helps you avoid errors and use the right functions.
Numbers used for calculations.
# Whole numbers and decimals are both numeric
kills <- 15
hitting_eff <- 0.343
set_score <- 25
pass_rating <- 2.34
# You can do math with numeric data
kills + 5
hitting_eff * 100 # Convert to percentage
set_score - 20
# Check the type
class(kills) # Returns "numeric"
class(hitting_eff) # Returns "numeric"Text data, enclosed in quotes (single or double).
# Text must be in quotes
player_name <- "Sarah Smith"
position <- "Outside Hitter"
team_name <- "Nebraska"
# You can also use single quotes
position <- 'Outside Hitter'
# Without quotes, R looks for a variable:
# team_name <- Nebraska # Error! R looks for a variable called Nebraska
# Check the type
class(player_name) # Returns "character"Common mistake: Forgetting quotes around text
Used for yes/no, true/false conditions. Always written in ALL CAPS without quotes.
# Logical values
is_starter <- TRUE
is_injured <- FALSE
won_set <- TRUE
# Logical operators create logical values
kills > 10 # TRUE if kills is greater than 10
hitting_eff >= 0.300 # TRUE if hitting efficiency is at least .300
errors == 0 # TRUE if errors equals exactly 0 (note: two equal signs!)
# Examples with volleyball data
kills <- 15
errors <- 3
attempts <- 35
# Check conditions
kills > 10 # TRUE
errors == 0 # FALSE (errors is 3)
attempts >= 30 # TRUE
hitting_eff <- (kills - errors) / attempts
hitting_eff >= 0.300 # TRUE (0.343 is greater than 0.300)
# Check the type
class(is_starter) # Returns "logical"Important comparison operators:
== equal to (note: TWO equal signs)!= not equal to> greater than< less than>= greater than or equal to<= less than or equal toR treats different data types differently:
# Numbers can be added:
10 + 5 # Returns 15
# Text gets combined:
"Sarah" + "Smith" # ERROR! Can't add text
# Correct way to combine text:
paste("Sarah", "Smith") # Returns "Sarah Smith"
# Be careful with types:
number_as_text <- "15"
number_as_text + 5 # ERROR! Can't add text to number
# Convert types when needed:
as.numeric("15") + 5 # Returns 20
as.character(15) # Returns "15" (text)A vector is a list of values of the same type. Think of it like a stat sheet column - all the kills from each set, or all the player names on a roster.
# Create a vector with c() function
# c stands for "combine" or "concatenate"
set_scores <- c(25, 23, 25, 25)
# A vector of player names
rotation <- c("Sarah", "Emma", "Olivia", "Mia", "Ava", "Isabella")
# A vector of whether sets were won
sets_won <- c(TRUE, FALSE, TRUE, TRUE)
# Display them
print(set_scores)
print(rotation)The c() function combines values into a vector:
# Numeric vector - kills by set
kills_by_set <- c(4, 5, 6, 4, 3)
# Character vector - positions
positions <- c("OH", "OH", "S", "MB", "MB", "L")
# Logical vector - whether each serve was an ace
serve_results <- c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE)
# You can also create sequences
set_numbers <- 1:5 # Creates c(1, 2, 3, 4, 5)
set_numbers <- c(1, 2, 3, 4, 5) # Same result
rotation_positions <- 1:6 # Creates c(1, 2, 3, 4, 5, 6)You can do math on entire vectors at once:
# Kills by set
kills <- c(4, 5, 6, 4, 3)
# Double all values
kills * 2 # Returns c(8, 10, 12, 8, 6)
# Add a constant to all values
kills + 1 # Returns c(5, 6, 7, 5, 4)
# Operations between vectors (element-by-element)
kills <- c(15, 12, 18, 10, 14)
errors <- c(3, 5, 2, 4, 3)
attempts <- c(35, 30, 40, 28, 32)
# Calculate hitting efficiency for each match
hitting_eff <- (kills - errors) / attempts
print(hitting_eff)# Example: Set scores from a match
set_scores <- c(25, 23, 25, 22, 15)
# How many elements?
length(set_scores) # Returns 5
# Sum of all elements
sum(set_scores) # Returns 110 total points
# Average (mean)
mean(set_scores) # Returns 22
# Minimum and maximum
min(set_scores) # Returns 15
max(set_scores) # Returns 25
# More examples with player stats
kills <- c(15, 12, 18, 10, 14, 16, 11)
sum(kills) # Total kills: 96
mean(kills) # Average: 13.7
max(kills) # Best performance: 18
min(kills) # Lowest: 10
length(kills) # Number of matches: 7Use square brackets [] to access specific elements:
# Player rotation
rotation <- c("Sarah", "Emma", "Olivia", "Mia", "Ava", "Isabella")
# Get the first player
rotation[1] # Returns "Sarah"
# Get the third player
rotation[3] # Returns "Olivia"
# Get the last player
rotation[6] # Returns "Isabella"
# Get multiple elements
rotation[c(1, 3, 5)] # Returns "Sarah", "Olivia", "Ava"
# Get a range
rotation[1:3] # Returns "Sarah", "Emma", "Olivia" (first three)
# Get all except certain elements
rotation[-1] # All except first
rotation[-c(1,3)] # All except first and thirdImportant: R uses 1-based indexing, meaning the first element is at position 1 (not 0 like some other programming languages).
# Start with kills by set
kills <- c(4, 5, 6, 4)
# Change the third set's kills
kills[3] <- 7
print(kills) # Now c(4, 5, 7, 4)
# Add a new element to the end
kills <- c(kills, 5) # Add set 5
print(kills) # Now c(4, 5, 7, 4, 5)
# Combine two vectors
first_half <- c(4, 5)
second_half <- c(6, 4)
all_kills <- c(first_half, second_half)
print(all_kills)A data frame is a table with rows and columns, like a spreadsheet. Each column is a vector, and all columns have the same length.
Think of it as a stat sheet where:
# Create a data frame of player statistics
player_stats <- data.frame(
name = c("Sarah", "Emma", "Olivia", "Mia", "Ava"),
kills = c(15, 12, 18, 10, 14),
errors = c(3, 5, 2, 4, 3),
attempts = c(35, 30, 40, 28, 32)
)
# View the data frame
print(player_stats)
# Better view in RStudio:
View(player_stats) # Opens in a spreadsheet-like viewer# Method 1: Create from scratch
match_stats <- data.frame(
set_number = c(1, 2, 3, 4, 5),
our_score = c(25, 23, 25, 22, 15),
opp_score = c(23, 25, 18, 25, 10),
our_kills = c(14, 13, 16, 12, 11),
our_errors = c(4, 5, 3, 4, 2)
)
print(match_stats)
# Method 2: Create from existing vectors
names <- c("Sarah", "Emma", "Olivia")
positions <- c("OH", "OH", "MB")
heights <- c(72, 70, 74) # inches
roster <- data.frame(
player_name = names,
position = positions,
height_inches = heights
)
print(roster)# See structure
str(player_stats)
# Dimensions (rows, columns)
dim(player_stats) # Returns c(5, 4) - 5 rows, 4 columns
nrow(player_stats) # Number of rows: 5
ncol(player_stats) # Number of columns: 4
# Column names
names(player_stats)
colnames(player_stats) # Same thing
# First few rows
head(player_stats) # First 6 rows by default
head(player_stats, 3) # First 3 rows
# Last few rows
tail(player_stats) # Last 6 rows by default
tail(player_stats, 2) # Last 2 rows
# Summary statistics
summary(player_stats)# Calculate and add a new column
player_stats$hitting_eff <- (player_stats$kills - player_stats$errors) /
player_stats$attempts
# View updated data frame
print(player_stats)
# Add another column: kills per attempt (efficiency)
player_stats$kill_pct <- player_stats$kills / player_stats$attempts
print(player_stats)
# Add a text column
player_stats$position <- c("OH", "OH", "MB", "OH", "S")
print(player_stats)# Change a specific value
player_stats$kills[1] <- 16 # Change Sarah's kills to 16
# Change multiple values
player_stats$kills[c(1,3)] <- c(16, 19) # Change Sarah and Olivia's kills
# Recalculate hitting efficiency after changes
player_stats$hitting_eff <- (player_stats$kills - player_stats$errors) /
player_stats$attempts
print(player_stats)We’ll learn much more about filtering in Tutorial 02, but here’s a preview:
# Get players with more than 12 kills
player_stats[player_stats$kills > 12, ]
# Get players with hitting efficiency above .300
player_stats[player_stats$hitting_eff > 0.300, ]
# Get just the names of high-efficiency players
player_stats$name[player_stats$hitting_eff > 0.300]Let’s create a complete analysis combining everything we’ve learned:
# Create a match statistics data frame
match_stats <- data.frame(
set_number = 1:5,
our_score = c(25, 23, 25, 22, 15),
opponent_score = c(23, 25, 18, 25, 10),
our_kills = c(14, 13, 16, 12, 11),
our_errors = c(4, 5, 3, 4, 2),
our_attempts = c(32, 35, 38, 33, 28),
our_aces = c(2, 1, 3, 1, 2),
opp_errors = c(3, 4, 5, 3, 2)
)
# Calculate additional statistics
match_stats$our_hitting_eff <- (match_stats$our_kills - match_stats$our_errors) /
match_stats$our_attempts
match_stats$set_won <- match_stats$our_score > match_stats$opponent_score
match_stats$point_differential <- match_stats$our_score - match_stats$opponent_score
# View the complete data frame
print(match_stats)
# Calculate match totals
total_kills <- sum(match_stats$our_kills)
total_errors <- sum(match_stats$our_errors)
total_attempts <- sum(match_stats$our_attempts)
total_aces <- sum(match_stats$our_aces)
# Calculate match hitting efficiency
match_hitting_eff <- (total_kills - total_errors) / total_attempts
# How many sets did we win?
sets_won <- sum(match_stats$set_won)
# Display match summary
cat("Match Summary:\n")
cat("Sets Won:", sets_won, "out of", nrow(match_stats), "\n")
cat("Total Kills:", total_kills, "\n")
cat("Total Aces:", total_aces, "\n")
cat("Match hitting efficiency:", round(match_hitting_eff, 3), "\n")
# Find best and worst sets by hitting efficiency
best_set <- match_stats$set_number[which.max(match_stats$our_hitting_eff)]
worst_set <- match_stats$set_number[which.min(match_stats$our_hitting_eff)]
cat("Best Set (hitting %):", best_set, "\n")
cat("Worst Set (hitting %):", worst_set, "\n")Create your own data frame and perform calculations:
# Create a team roster data frame with at least 5 players
# Include: name, position, height (inches), year (Fr/So/Jr/Sr)
# Example structure:
roster <- data.frame(
name = c("Player1", "Player2", "Player3", "Player4", "Player5"),
position = c("OH", "MB", "S", "OH", "L"),
height = c(72, 74, 70, 71, 68),
year = c("Jr", "Sr", "So", "Fr", "Jr")
)
# Tasks:
# 1. Calculate average team height
# 2. Find the tallest player
# 3. Count how many outside hitters (OH) are on the roster
# 4. Add a column for height in feet (height in inches / 12)
# 5. Filter to show only players 6 feet or taller
# Your code here:Solution:
# 1. Average height
mean(roster$height)
# 2. Tallest player
roster$name[which.max(roster$height)]
# 3. Count outside hitters
sum(roster$position == "OH")
# 4. Add height in feet column
roster$height_feet <- roster$height / 12
# 5. Players 6 feet or taller (72 inches)
roster[roster$height >= 72, ]Sometimes data is missing. R represents this as NA (Not
Available).
# Create data with missing values
kills <- c(15, NA, 18, 12, NA)
# Many functions will return NA if there's any missing data
mean(kills) # Returns NA
# Remove missing values for calculation
mean(kills, na.rm = TRUE) # Returns mean of 15, 18, 12
# Check for missing values
is.na(kills) # Returns TRUE/FALSE for each element
# Count missing values
sum(is.na(kills)) # Returns 2
# Remove rows with missing data
kills[!is.na(kills)] # Returns c(15, 18, 12)Always use descriptive names:
# Organize your scripts with clear sections
# ============================================================================
# SETUP: Load packages and set parameters
# ============================================================================
library(dplyr)
library(ggplot2)
# ============================================================================
# LOAD DATA: Read in match data
# ============================================================================
# (We'll learn this in Tutorial 02)
# ============================================================================
# CALCULATE STATISTICS: hitting efficiencys and efficiency
# ============================================================================
kills <- 15
errors <- 3
attempts <- 35
hitting_eff <- (kills - errors) / attempts
# ============================================================================
# CREATE VISUALIZATIONS: Charts and graphs
# ============================================================================
# (We'll learn this in Tutorial 03)
# ============================================================================
# EXPORT RESULTS: Save outputs
# ============================================================================
# (We'll learn this in Tutorial 03)<-c()$,
[], and indexing# Creating data
c() # Combine values into vector
data.frame() # Create data frame
# Examining data
head() # First few rows
tail() # Last few rows
str() # Structure of data
summary() # Summary statistics
class() # Type of object
length() # Length of vector
dim() # Dimensions of data frame
names() # Column names
# Calculations
sum() # Total
mean() # Average
min() # Minimum
max() # Maximum
round() # Round numbers
# Accessing data
$ # Access column: df$column
[] # Index: vector[1] or df[1,2]
which.max() # Index of maximum value
which.min() # Index of minimum valueBefore Tutorial 02, make sure you can:
c()[]$mean(), sum(),
max()Try these on your own:
Create a data frame with 6 players including: name, kills, errors, attempts, blocks, aces. Calculate hitting efficiency and kills+blocks (total points contributed).
Create vectors for 5 sets with: our score, opponent score. Determine which sets we won, calculate point differential, find our average score.
Create a data frame with 6 rotations including: rotation number, attacks, kills, errors. Calculate hitting efficiency per rotation and find the strongest rotation.
In Tutorial 02: Working with DVW Files, you’ll learn:
here package for file pathsdplyr to filter and manipulate play-by-play
dataYou now have the foundation to start working with real volleyball data!
# Variables and Assignment
variable_name <- value
# Data Types
123 # numeric
"text" # character
TRUE, FALSE # logical
# Vectors
c(1, 2, 3, 4) # create vector
vector[1] # first element
vector[1:3] # first three elements
length(vector) # number of elements
sum(vector) # total
mean(vector) # average
# Data Frames
data.frame(col1 = c(), col2 = c()) # create
df$column # access column
df[1, ] # first row
df[, "column"] # column by name
df$new <- calculation # add column
nrow(df) # number of rows
ncol(df) # number of columns
# Useful Functions
head(data) # first few rows
summary(data) # summary statistics
str(data) # structure
class(object) # type of objectSave this reference sheet - you’ll use these commands constantly in volleyball analysis!
Comments Are Your Friend