Now that you understand R fundamentals, it’s time to work with real volleyball data! In this tutorial, you’ll learn how to:
here package for reliable file pathsWhy this matters: Good organization now will save you hours of frustration later. You’ll be able to find your files, share your work, and avoid the dreaded “file not found” errors.
setwd()In Tutorial 00, we learned about setwd() to set our
working directory. While it works, it has serious problems:
# This works on YOUR computer:
setwd("C:/Users/YourName/Documents/Volleyball")
# But breaks if:
# - You move your files to a different folder
# - You work on a different computer
# - Someone else tries to run your code
# - You rename a folder in the pathThe issue: Hard-coded file paths make your code fragile and not portable.
The solution: R Projects + the here
package create reliable, portable file paths that work anywhere.
An R Project is a special folder that RStudio recognizes as a workspace. When you open an R Project:
setwd()Think of an R Project like a team binder - everything related to that team (rosters, stats, scouting reports) stays together in one place.
Let’s create a project for your volleyball analysis work.
Step-by-step:
In RStudio, click: File > New Project
Choose New Directory
Choose New Project
Fill in the details:
Volleyball_Analysis
(or whatever you prefer)Click Create Project
What just happened?
Volleyball_AnalysisVolleyball_Analysis.RprojHow to verify:
Once you’ve created a project, here’s how to open it again:
Method 1: Double-click the .Rproj file
in your file explorer
Method 2: In RStudio:
File > Open Project and select the .Rproj
file
Method 3: In RStudio: Click the project menu (top-right corner) and select from recent projects
Pro tip: Always open your project before starting work! This ensures all your file paths work correctly.
As you create analyses, you’ll accumulate:
Without organization, you’ll waste time searching for files and risk mixing up data from different matches or seasons.
Inside your R Project folder, create this structure:
Volleyball_Analysis/
├── Data/ (or Data_Raw - your DVW files go here)
├── Scripts/ (your R code goes here)
├── Outputs/ (plots, tables, images go here)
│ ├── Data/ (optional subfolder)
│ └── Plots/ (optional subfolder)
│ └── Tables/ (optional subfolder)
├── Reports/ (final scouting reports go here)
└── Volleyball_Analysis.Rproj (the project file)
In some projects, I maintain a Data_Raw directory for
DVW files and a separate Data_Processed directory for
cleaned outputs, rather than nesting an additional Data
folder inside Outputs. This choice is not prescriptive. It
is structural. The only requirement is internal consistency and a layout
that makes the state of the data immediately clear. Raw, transformed, or
final should be obvious at a glance, so there is no ambiguity when
returning to the project later.
You can create folders two ways:
Method 1: In your file explorer
Method 2: In R
Data/ (or Data_Raw/)
Scripts/
01_load_data.R,
02_calculate_stats.R,
opponent_scouting_report.ROutputs/
Reports/
Good file names:
nebraska_vs_wisconsin_2024-11-15.dvwopponent_hitting_chart.Rrotation_analysis_nebraska.pngBad file names:
data.dvw (too generic)final FINAL v2 (1).R (confusing, spaces, unclear)11-15 match.dvw (ambiguous date format)Best practices:
here Packagehere?The here package solves the file path problem elegantly.
It:
here Works# Instead of this (breaks easily):
setwd("C:/Users/YourName/Documents/Volleyball_Analysis")
data <- read.csv("Data/nebraska_match.csv")
# Use this (works everywhere):
library(here)
data <- read.csv(here("Data", "nebraska_match.csv"))
# here() builds the full path automatically:
# "C:/Users/YourName/Documents/Volleyball_Analysis/Data/nebraska_match.csv"herehere in Practice# Load the here package
library(here)
# Check where here thinks your project root is
here()
# Should show your project folder path
# Build paths to files
here("Data", "match.dvw") # Path to a DVW file
here("Scripts", "analysis.R") # Path to a script
here("Outputs", "Plots", "chart.png") # Path to save a plot
# Use here() when reading files
library(datavolley)
match_data <- dv_read(here("Data", "opponent_match.dvw"))
# Use here() when saving files
ggsave(here("Outputs", "Plots", "hitting_chart.png"))here is Better# Fragile approach (breaks when you move files):
match_data <- dv_read("C:/Users/YourName/Desktop/Volleyball/Data/match.dvw")
# Relative path (breaks when working directory changes):
match_data <- dv_read("Data/match.dvw")
# here approach (works everywhere, always):
library(here)
match_data <- dv_read(here("Data", "match.dvw"))
# Benefits:
# 1. Works on any computer
# 2. Works even if you move your project folder
# 3. Works when someone else runs your code
# 4. Explicit about file location (easier to debug)From now on, always use here() for file paths in
your scripts!
datavolley PackageThe datavolley package reads DVW files and converts them
into R data frames we can analyze.
Key functions:
dv_read() - Load a DVW fileplays() - Extract play-by-play data# Load required packages
library(datavolley)
library(here)
library(dplyr) # We'll use this for exploring the data
# Load a DVW file
# Let's start with the first Texas A&M match
match_data <- dv_read(here("Data", "match_1.dvw"))
# What did we get?
class(match_data) # Shows it's a "datavolley" object
# The match_data object contains multiple components:
# - Match metadata (teams, date, players)
# - Play-by-play data
# - And moreWhen you load a DVW file, you get a complex object with several parts:
The most important part of a DVW file is the plays data - the detailed record of every action:
# Extract the play-by-play data
plays <- plays(match_data)
# OR use the $ notation:
plays <- match_data$plays
# Now plays is a data frame - you can work with it using Tutorial 01 skills!
# Look at the first few rows
head(plays)
# Check the structure
str(plays)
# See what columns are available
names(plays)
# How many plays in this match?
nrow(plays)The plays data frame has many columns. Here are the most important ones:
# Team information
plays$team # Which team performed this action
plays$player_name # Player who performed the action
plays$player_id # Unique player identifier
plays$player_number # Jersey number
# Action information
plays$skill # Type of action: Serve, Reception, Attack, Block, Dig, Set, Freeball
plays$skill_type # Subtype of the skill
plays$evaluation_code # Quality rating: #, +, !, -, /, =
plays$evaluation # Text description of the result
plays$attack_code # Specific attack type (for attacks)
# Match context
plays$point_id # Which rally/point
plays$set_number # Which set (1-5)
plays$home_team_score # Score after this play
plays$visiting_team_score
# Court position
plays$start_zone # Where action started (1-9)
plays$end_zone # Where action ended
plays$start_coordinate_x # X coordinate
plays$start_coordinate_y # Y coordinate
plays$end_coordinate_x
plays$end_coordinate_y
# Match metadata
plays$match_id # Unique match identifier
plays$home_team # Home team name
plays$visiting_team # Visiting team name
# There are more columns - explore with names(plays)!Always check your data after loading to make sure everything looks correct:
# Load packages
library(datavolley)
library(here)
library(dplyr)
# Load data (using first match as example)
match_data <- dv_read(here("Data", "match_1.dvw"))
plays <- plays(match_data)
# ============================================================================
# BASIC CHECKS
# ============================================================================
# How many total plays?
nrow(plays)
# Typical match: 300-600 plays depending on length
# What teams played?
unique(plays$team)
# Should show two team names
# What skills are in the data?
table(plays$skill)
# Should show counts for: Serve, Reception, Set, Attack, Block, Dig, Freeball
# Which sets were played?
unique(plays$set_number)
# Should show 3, 4, or 5 depending on match length
# Date of the match
unique(plays$date)
# Player roster
unique(plays$player_name[plays$team == "Your Team Name"])
# Shows all players from your team# Look at just serves
serves <- plays |>
filter(skill == "Serve")
head(serves)
nrow(serves) # How many serves total?
# Look at just attacks
attacks <- plays |>
filter(skill == "Attack")
head(attacks)
nrow(attacks) # How many attacks total?
# Look at receptions (passes)
receptions <- plays |>
filter(skill == "Reception")
head(receptions)
# Common evaluation codes for receptions
table(receptions$evaluation_code)
# Should show: #, +, !, -, /, =Understanding evaluation codes is crucial for calculating stats:
# For RECEPTIONS (passes):
# # = Perfect pass (4 points in some systems, 3 in others)
# + = Good pass (3 or 2 points)
# ! = Medium/OK pass (2 or 1 point)
# - = Poor pass (1 or 0 points)
# / = Overpass (went over net)
# = = Error (ace for serving team)
# For ATTACKS:
# # = Kill (point scored)
# + = Positive attack (not killed but difficult for opponent)
# ! = Poor attack (easy for opponent)
# - = Blocked or dug easily
# / = Blocked for point
# = = Attack error
# View reception quality distribution
receptions <- plays |> filter(skill == "Reception")
table(receptions$evaluation_code)
# View attack outcomes
attacks <- plays |> filter(skill == "Attack")
table(attacks$evaluation_code)Always validate your data to catch problems early:
# ============================================================================
# CHECK FOR MISSING DATA
# ============================================================================
# Are there any rows with missing team names?
sum(is.na(plays$team))
# Any plays without a skill assigned?
sum(is.na(plays$skill))
# Any attacks without an attack code?
attacks <- plays |> filter(skill == "Attack")
sum(is.na(attacks$attack_code))
# ============================================================================
# CHECK FOR INCONSISTENT NAMES
# ============================================================================
# Player names should be consistent
unique(plays$player_name)
# Look for variations like "Smith, Sarah" vs "Sarah Smith" vs "S. Smith"
# Team names should be consistent
unique(plays$team)
# Should be exactly 2 teams, spelled consistently
# ============================================================================
# CHECK COORDINATES
# ============================================================================
# Court coordinates should be within bounds
# X coordinates: typically 0.5 to 3.5 (meters from center)
# Y coordinates: typically 0 to 7 (meters from back line)
summary(plays$start_coordinate_x)
summary(plays$start_coordinate_y)
# Check for impossible coordinates (outliers)
plays |>
filter(start_coordinate_x < 0 | start_coordinate_x > 4) |>
nrow() # Should be 0 or very few
# ============================================================================
# CHECK MATCH SCORES
# ============================================================================
# Final scores should make sense (25, 23, 22, 21, or 15 for set 5)
plays |>
group_by(set_number) |>
summarise(
max_home = max(home_team_score, na.rm = TRUE),
max_visiting = max(visiting_team_score, na.rm = TRUE)
)Create a quick overview of the match:
# Load packages
library(dplyr)
# Get plays data
plays <- plays(match_data)
# Match summary
print("========================================")
print("MATCH SUMMARY")
print("========================================")
print(paste("Teams:", unique(plays$home_team), "vs", unique(plays$visiting_team)))
print(paste("Date:", unique(plays$date)))
print(paste("Total Plays:", nrow(plays)))
print(paste("Sets Played:", max(plays$set_number)))
print("")
# Skills breakdown
print("Skills Breakdown:")
plays |>
group_by(skill) |>
summarise(count = n()) |>
arrange(desc(count)) |>
print()
print("")
# Team breakdown
print("Plays by Team:")
plays |>
group_by(team) |>
summarise(total_plays = n()) |>
print()Often you’ll want to analyze multiple matches (e.g., all matches vs an opponent):
# Load packages
library(datavolley)
library(here)
library(dplyr)
# Method 1: Load individually and combine (you've already done this in Section 9!)
match1 <- dv_read(here("Data", "match_1.dvw"))
match2 <- dv_read(here("Data", "match_2.dvw"))
match3 <- dv_read(here("Data", "match_3.dvw"))
# Extract plays from each
plays1 <- plays(match1)
plays2 <- plays(match2)
plays3 <- plays(match3)
# Combine into one data frame
all_plays <- bind_rows(plays1, plays2, plays3)
# Now all_plays contains data from all three matches!
nrow(all_plays)
# Method 2: Loop through multiple files (more advanced)
# List all DVW files in your Data folder
dvw_files <- list.files(here("Data"), pattern = "\\.dvw$", full.names = TRUE)
# Initialize empty list
all_matches <- list()
# Load each file
for (i in seq_along(dvw_files)) {
match <- dv_read(dvw_files[i])
all_matches[[i]] <- plays(match)
}
# Combine all into one data frame
all_plays <- bind_rows(all_matches)
# Check the result
print(paste("Loaded", length(dvw_files), "matches"))
print(paste("Total plays:", nrow(all_plays)))
# How many plays per match?
all_plays |>
group_by(match_id) |>
summarise(plays = n())When working with multiple matches, add identifying information:
# Load multiple matches with labels
match1 <- dv_read(here("Data", "nebraska_match1.dvw"))
match2 <- dv_read(here("Data", "nebraska_match2.dvw"))
plays1 <- plays(match1) |>
mutate(match_label = "Match 1 vs Nebraska")
plays2 <- plays(match2) |>
mutate(match_label = "Match 2 vs Nebraska")
# Combine
all_plays <- bind_rows(plays1, plays2)
# Now you can filter by match
all_plays |>
filter(match_label == "Match 1 vs Nebraska") |>
nrow()After loading and cleaning data, you might want to save it as a CSV for faster loading later:
# Load and process data
match_data <- dv_read(here("Data", "opponent_match.dvw"))
plays <- plays(match_data)
# Filter to just the opponent's plays
opponent_plays <- plays |>
filter(team == "Opponent Team Name")
# Save as CSV for faster loading next time
write.csv(opponent_plays,
here("Outputs", "Data", "opponent_processed.csv"),
row.names = FALSE)
# Later, you can load this much faster:
opponent_plays <- read.csv(here("Outputs", "Data", "opponent_processed.csv"))Always save your R scripts in the Scripts/ folder:
# In RStudio: File > Save As
# Navigate to your Scripts folder
# Name it descriptively: "01_load_opponent_data.R"
# At the top of your script, always include:
# ============================================================================
# Script: 01_load_opponent_data.R
# Purpose: Load and validate DVW files for opponent scouting
# Author: Your Name
# Date: 2024-11-15
# ============================================================================
# This helps you (and others) understand what the script doesHere’s a complete workflow from project setup to loaded data:
# ============================================================================
# SETUP: Load packages
# ============================================================================
library(datavolley)
library(here)
library(dplyr)
# ============================================================================
# VERIFY: Check project and folders
# ============================================================================
# Confirm we're in the right project
here()
# Check that Data folder exists
if (!dir.exists(here("Data"))) {
dir.create(here("Data"))
print("Created Data folder")
}
# List DVW files available
print("Available DVW files:")
list.files(here("Data"), pattern = "\\.dvw$")
# ============================================================================
# LOAD: Read DVW file
# ============================================================================
# Load the match
match_data <- dv_read(here("Data", "your_match.dvw"))
# Extract plays
plays <- plays(match_data)
# ============================================================================
# VALIDATE: Check the data
# ============================================================================
print("========================================")
print("DATA VALIDATION")
print("========================================")
# Basic checks
print(paste("Total plays:", nrow(plays)))
print(paste("Teams:", paste(unique(plays$team), collapse = " vs ")))
print(paste("Sets:", paste(unique(plays$set_number), collapse = ", ")))
print(paste("Date:", unique(plays$date)))
print("")
# Skills breakdown
print("Skills breakdown:")
table(plays$skill) |> print()
# Check for missing data
print("Missing data check:")
print(paste("Missing team:", sum(is.na(plays$team))))
print(paste("Missing skill:", sum(is.na(plays$skill))))
print(paste("Missing evaluation:", sum(is.na(plays$evaluation))))
# ============================================================================
# EXPLORE: Initial look at the data
# ============================================================================
print("========================================")
print("SAMPLE DATA")
print("========================================")
# Show first few attacks
print("First 5 attacks:")
plays |>
filter(skill == "Attack") |>
select(team, player_name, attack_code, evaluation_code, evaluation) |>
head(5) |>
print()
# Show first few serves
print("First 5 serves:")
plays |>
filter(skill == "Serve") |>
select(team, player_name, evaluation_code, evaluation) |>
head(5) |>
print()
print("========================================")
print("DATA LOADED SUCCESSFULLY!")
print("========================================")# Error: cannot open file 'match.dvw': No such file or directory
# Diagnose:
# 1. Check your working directory
here()
# 2. List files in your Data folder
list.files(here("Data"))
# 3. Check for typos in filename
# Common mistakes:
# - Wrong capitalization: "Match.dvw" vs "match.dvw"
# - Wrong extension: "match.DVW" vs "match.dvw"
# - Spaces in filename: "my match.dvw" (use "my_match.dvw")
# Solution: Use the EXACT filename as it appears in your folder# If your project isn't opening or behaving strangely:
# 1. Make sure you opened the .Rproj file (not just RStudio)
# 2. Check the project name in top-right corner of RStudio
# 3. Verify working directory is correct:
getwd()
# If still having issues, restart R:
# Session > Restart R
# Then reopen your project:
# File > Open Project > Select your .Rproj file# If dv_read() gives an error:
# 1. Make sure file is a valid DVW file (not corrupted)
# 2. Check that file path is correct
# 3. Try loading with full error messages:
tryCatch({
match_data <- dv_read(here("Data", "match_1.dvw"))
print("File loaded successfully!")
}, error = function(e) {
print("Error loading file:")
print(e$message)
})
# If file is corrupted, you may need to re-export from DataVolley softwarehere Package: Build reliable file
paths that work anywheredv_read() to
load volleyball match data# R Projects
# Create: File > New Project
# Open: Double-click .Rproj file
# here package
library(here)
here() # Show project root
here("Data", "match.dvw") # Build file path
# Loading DVW files
library(datavolley)
match_data <- dv_read(here("Data", "file.dvw")) # Load DVW
plays <- plays(match_data) # Extract play-by-play
# Initial exploration
head(plays) # First rows
names(plays) # Column names
nrow(plays) # Number of plays
unique(plays$team) # Teams in match
table(plays$skill) # Skills breakdown
# Data validation
sum(is.na(plays$column)) # Count missing values
unique(plays$player_name) # Check player names
summary(plays$coordinate_x) # Check coordinatesBefore starting any new analysis:
library(datavolley),
library(here), library(dplyr)dv_read(here("Data", "file.dvw"))plays <- plays(match_data)Before Tutorial 03, make sure you can:
here() to build file pathsdv_read()plays()