Introduction

Now that you understand R fundamentals, it’s time to work with real volleyball data! In this tutorial, you’ll learn how to:

Set up R Projects for organized, reproducible work
Create a proper folder structure for your analyses
Use the here package for reliable file paths
Load DVW (DataVolley) files into R
Explore the structure of volleyball play-by-play data
Validate that your data loaded correctly

Why this matters: Good organization now will save you hours of frustration later. You’ll be able to find your files, share your work, and avoid the dreaded “file not found” errors.

Section 1: The Problem with `setwd()`

In Tutorial 00, we learned about setwd() to set our working directory. While it works, it has serious problems:

# This works on YOUR computer:
setwd("C:/Users/YourName/Documents/Volleyball")

# But breaks if:
# - You move your files to a different folder
# - You work on a different computer
# - Someone else tries to run your code
# - You rename a folder in the path

The issue: Hard-coded file paths make your code fragile and not portable.

The solution: R Projects + the here package create reliable, portable file paths that work anywhere.

Section 2: R Projects - Your New Best Friend

What is an R Project?

An R Project is a special folder that RStudio recognizes as a workspace. When you open an R Project:

RStudio automatically sets the working directory to the project folder
Your scripts, data, and outputs stay organized in one place
File paths work reliably without setwd()
You can easily switch between different analyses

Think of an R Project like a team binder - everything related to that team (rosters, stats, scouting reports) stays together in one place.

Creating Your First R Project

Let’s create a project for your volleyball analysis work.

Step-by-step:

In RStudio, click: File > New Project
Choose New Directory
Choose New Project
Fill in the details:
- Directory name: Volleyball_Analysis (or whatever you prefer)
- Create project as subdirectory of: Choose where you want this folder (Desktop, Documents, etc.)
- Leave other options as default
Click Create Project

What just happened?

RStudio created a new folder called Volleyball_Analysis
Inside that folder, it created a file called Volleyball_Analysis.Rproj
RStudio opened this project automatically
Your working directory is now set to this project folder

How to verify:

# Check your working directory
getwd()

# Should show something like:
# "C:/Users/YourName/Documents/Volleyball_Analysis"

Opening an Existing Project

Once you’ve created a project, here’s how to open it again:

Method 1: Double-click the .Rproj file in your file explorer

Method 2: In RStudio: File > Open Project and select the .Rproj file

Method 3: In RStudio: Click the project menu (top-right corner) and select from recent projects

Pro tip: Always open your project before starting work! This ensures all your file paths work correctly.

Section 3: Folder Structure Best Practices

Why Organize Your Files?

As you create analyses, you’ll accumulate:

DVW files (match data)
R scripts (your code)
Plots and tables (outputs)
Scouting reports (final products)

Without organization, you’ll waste time searching for files and risk mixing up data from different matches or seasons.

Recommended Folder Structure

Inside your R Project folder, create this structure:

Volleyball_Analysis/
├── Data/                      (or Data_Raw - your DVW files go here)
├── Scripts/                   (your R code goes here)
├── Outputs/                   (plots, tables, images go here)
│   ├── Data/                  (optional subfolder)
│   └── Plots/                 (optional subfolder)
│   └── Tables/                (optional subfolder)
├── Reports/                   (final scouting reports go here)
└── Volleyball_Analysis.Rproj  (the project file)

In some projects, I maintain a Data_Raw directory for DVW files and a separate Data_Processed directory for cleaned outputs, rather than nesting an additional Data folder inside Outputs. This choice is not prescriptive. It is structural. The only requirement is internal consistency and a layout that makes the state of the data immediately clear. Raw, transformed, or final should be obvious at a glance, so there is no ambiguity when returning to the project later.

Creating Your Folders

You can create folders two ways:

Method 1: In your file explorer

Navigate to your project folder
Create new folders as shown above

Method 2: In R

# Create folders using R
dir.create("Data")
dir.create("Scripts")
dir.create("Outputs")
dir.create("Outputs/Data")
dir.create("Outputs/Plots")
dir.create("Outputs/Tables")
dir.create("Reports")

# Check that they were created
list.files()

What Goes Where?

Data/ (or Data_Raw/)

All your DVW files
Any CSV files with team colors, rosters, etc.
Original, unmodified data files
Never edit files in this folder - keep your raw data pristine

Scripts/

Your R code files (.R or .Rmd)
Name them descriptively: 01_load_data.R, 02_calculate_stats.R, opponent_scouting_report.R

Outputs/

PNG images of plots
CSV files with calculated statistics
Any intermediate files you create
These files can be regenerated from your scripts

Reports/

Final scouting reports
Presentation-ready documents
Files you’ll share with coaches

Naming Conventions

Good file names:

nebraska_vs_wisconsin_2024-11-15.dvw
opponent_hitting_chart.R
rotation_analysis_nebraska.png

Bad file names:

data.dvw (too generic)
final FINAL v2 (1).R (confusing, spaces, unclear)
11-15 match.dvw (ambiguous date format)

Best practices:

Use descriptive names
Include dates in YYYY-MM-DD format (sorts correctly)
Use underscores or hyphens, not spaces
Use lowercase (easier to type)
Be consistent across all files

Section 4: The `here` Package

Why Use `here`?

The here package solves the file path problem elegantly. It:

Automatically finds your project root directory
Builds file paths relative to your project
Works on any computer (Windows, Mac, Linux)
Makes your code portable and shareable

How `here` Works

# Instead of this (breaks easily):
setwd("C:/Users/YourName/Documents/Volleyball_Analysis")
data <- read.csv("Data/nebraska_match.csv")

# Use this (works everywhere):
library(here)
data <- read.csv(here("Data", "nebraska_match.csv"))

# here() builds the full path automatically:
# "C:/Users/YourName/Documents/Volleyball_Analysis/Data/nebraska_match.csv"

Installing and Loading `here`

# Install (only once)
install.packages("here")

# Load (every session)
library(here)

Using `here` in Practice

# Load the here package
library(here)

# Check where here thinks your project root is
here()
# Should show your project folder path

# Build paths to files
here("Data", "match.dvw")              # Path to a DVW file
here("Scripts", "analysis.R")          # Path to a script
here("Outputs", "Plots", "chart.png")  # Path to save a plot

# Use here() when reading files
library(datavolley)
match_data <- dv_read(here("Data", "opponent_match.dvw"))

# Use here() when saving files
ggsave(here("Outputs", "Plots", "hitting_chart.png"))

Why `here` is Better

# Fragile approach (breaks when you move files):
match_data <- dv_read("C:/Users/YourName/Desktop/Volleyball/Data/match.dvw")

# Relative path (breaks when working directory changes):
match_data <- dv_read("Data/match.dvw")

# here approach (works everywhere, always):
library(here)
match_data <- dv_read(here("Data", "match.dvw"))

# Benefits:
# 1. Works on any computer
# 2. Works even if you move your project folder
# 3. Works when someone else runs your code
# 4. Explicit about file location (easier to debug)

From now on, always use here() for file paths in your scripts!

Section 5: Loading DVW Files

The `datavolley` Package

The datavolley package reads DVW files and converts them into R data frames we can analyze.

Key functions:

dv_read() - Load a DVW file
plays() - Extract play-by-play data

# Install datavolley (only once)
install.packages("remotes")
remotes::install_github("openvolley/datavolley")

# Load the package (every session)
library(datavolley)

Your First DVW Load

# Load required packages
library(datavolley)
library(here)
library(dplyr)  # We'll use this for exploring the data

# Load a DVW file
# Let's start with the first Texas A&M match
match_data <- dv_read(here("Data", "match_1.dvw"))

# What did we get?
class(match_data)  # Shows it's a "datavolley" object

# The match_data object contains multiple components:
# - Match metadata (teams, date, players)
# - Play-by-play data
# - And more

Understanding DVW File Structure

When you load a DVW file, you get a complex object with several parts:

# Load a match
match_data <- dv_read(here("Data", "match_1.dvw"))

# See the high-level structure
str(match_data, max.level = 1)

# Key components:
# - meta: Match information (teams, date, players, etc.)
# - plays: Play-by-play data (this is what we'll use most!)

Extracting Play-by-Play Data

The most important part of a DVW file is the plays data - the detailed record of every action:

# Extract the play-by-play data
plays <- plays(match_data)

# OR use the $ notation:
plays <- match_data$plays

# Now plays is a data frame - you can work with it using Tutorial 01 skills!

# Look at the first few rows
head(plays)

# Check the structure
str(plays)

# See what columns are available
names(plays)

# How many plays in this match?
nrow(plays)

Key Columns in the Plays Data Frame

The plays data frame has many columns. Here are the most important ones:

# Team information
plays$team              # Which team performed this action
plays$player_name       # Player who performed the action
plays$player_id         # Unique player identifier
plays$player_number     # Jersey number

# Action information
plays$skill             # Type of action: Serve, Reception, Attack, Block, Dig, Set, Freeball
plays$skill_type        # Subtype of the skill
plays$evaluation_code   # Quality rating: #, +, !, -, /, =
plays$evaluation        # Text description of the result
plays$attack_code       # Specific attack type (for attacks)

# Match context
plays$point_id          # Which rally/point
plays$set_number        # Which set (1-5)
plays$home_team_score   # Score after this play
plays$visiting_team_score

# Court position
plays$start_zone        # Where action started (1-9)
plays$end_zone          # Where action ended
plays$start_coordinate_x  # X coordinate
plays$start_coordinate_y  # Y coordinate
plays$end_coordinate_x
plays$end_coordinate_y

# Match metadata
plays$match_id          # Unique match identifier
plays$home_team         # Home team name
plays$visiting_team     # Visiting team name

# There are more columns - explore with names(plays)!

Section 6: Exploring Your Data

Initial Data Checks

Always check your data after loading to make sure everything looks correct:

# Load packages
library(datavolley)
library(here)
library(dplyr)

# Load data (using first match as example)
match_data <- dv_read(here("Data", "match_1.dvw"))
plays <- plays(match_data)

# ============================================================================
# BASIC CHECKS
# ============================================================================

# How many total plays?
nrow(plays)
# Typical match: 300-600 plays depending on length

# What teams played?
unique(plays$team)
# Should show two team names

# What skills are in the data?
table(plays$skill)
# Should show counts for: Serve, Reception, Set, Attack, Block, Dig, Freeball

# Which sets were played?
unique(plays$set_number)
# Should show 3, 4, or 5 depending on match length

# Date of the match
unique(plays$date)

# Player roster
unique(plays$player_name[plays$team == "Your Team Name"])
# Shows all players from your team

Focused Exploration: Looking at Specific Skills

# Look at just serves
serves <- plays |>
  filter(skill == "Serve")

head(serves)
nrow(serves)  # How many serves total?

# Look at just attacks
attacks <- plays |>
  filter(skill == "Attack")

head(attacks)
nrow(attacks)  # How many attacks total?

# Look at receptions (passes)
receptions <- plays |>
  filter(skill == "Reception")

head(receptions)

# Common evaluation codes for receptions
table(receptions$evaluation_code)
# Should show: #, +, !, -, /, =

Examining Evaluation Codes

Understanding evaluation codes is crucial for calculating stats:

# For RECEPTIONS (passes):
# # = Perfect pass (4 points in some systems, 3 in others)
# + = Good pass (3 or 2 points)
# ! = Medium/OK pass (2 or 1 point)
# - = Poor pass (1 or 0 points)
# / = Overpass (went over net)
# = = Error (ace for serving team)

# For ATTACKS:
# # = Kill (point scored)
# + = Positive attack (not killed but difficult for opponent)
# ! = Poor attack (easy for opponent)
# - = Blocked or dug easily
# / = Blocked for point
# = = Attack error

# View reception quality distribution
receptions <- plays |> filter(skill == "Reception")
table(receptions$evaluation_code)

# View attack outcomes
attacks <- plays |> filter(skill == "Attack")  
table(attacks$evaluation_code)

Checking for Data Issues

Always validate your data to catch problems early:

# ============================================================================
# CHECK FOR MISSING DATA
# ============================================================================

# Are there any rows with missing team names?
sum(is.na(plays$team))

# Any plays without a skill assigned?
sum(is.na(plays$skill))

# Any attacks without an attack code?
attacks <- plays |> filter(skill == "Attack")
sum(is.na(attacks$attack_code))

# ============================================================================
# CHECK FOR INCONSISTENT NAMES
# ============================================================================

# Player names should be consistent
unique(plays$player_name)
# Look for variations like "Smith, Sarah" vs "Sarah Smith" vs "S. Smith"

# Team names should be consistent
unique(plays$team)
# Should be exactly 2 teams, spelled consistently

# ============================================================================
# CHECK COORDINATES
# ============================================================================

# Court coordinates should be within bounds
# X coordinates: typically 0.5 to 3.5 (meters from center)
# Y coordinates: typically 0 to 7 (meters from back line)

summary(plays$start_coordinate_x)
summary(plays$start_coordinate_y)

# Check for impossible coordinates (outliers)
plays |>
  filter(start_coordinate_x < 0 | start_coordinate_x > 4) |>
  nrow()  # Should be 0 or very few

# ============================================================================
# CHECK MATCH SCORES
# ============================================================================

# Final scores should make sense (25, 23, 22, 21, or 15 for set 5)
plays |>
  group_by(set_number) |>
  summarise(
    max_home = max(home_team_score, na.rm = TRUE),
    max_visiting = max(visiting_team_score, na.rm = TRUE)
  )

Quick Match Summary

Create a quick overview of the match:

# Load packages
library(dplyr)

# Get plays data
plays <- plays(match_data)

# Match summary
print("========================================")
print("MATCH SUMMARY")
print("========================================")
print(paste("Teams:", unique(plays$home_team), "vs", unique(plays$visiting_team)))
print(paste("Date:", unique(plays$date)))
print(paste("Total Plays:", nrow(plays)))
print(paste("Sets Played:", max(plays$set_number)))
print("")

# Skills breakdown
print("Skills Breakdown:")
plays |>
  group_by(skill) |>
  summarise(count = n()) |>
  arrange(desc(count)) |>
  print()

print("")

# Team breakdown
print("Plays by Team:")
plays |>
  group_by(team) |>
  summarise(total_plays = n()) |>
  print()

Section 7: Working with Multiple DVW Files

Loading Multiple Matches

Often you’ll want to analyze multiple matches (e.g., all matches vs an opponent):

# Load packages
library(datavolley)
library(here)
library(dplyr)

# Method 1: Load individually and combine (you've already done this in Section 9!)
match1 <- dv_read(here("Data", "match_1.dvw"))
match2 <- dv_read(here("Data", "match_2.dvw"))
match3 <- dv_read(here("Data", "match_3.dvw"))

# Extract plays from each
plays1 <- plays(match1)
plays2 <- plays(match2)
plays3 <- plays(match3)

# Combine into one data frame
all_plays <- bind_rows(plays1, plays2, plays3)

# Now all_plays contains data from all three matches!
nrow(all_plays)

# Method 2: Loop through multiple files (more advanced)
# List all DVW files in your Data folder
dvw_files <- list.files(here("Data"), pattern = "\\.dvw$", full.names = TRUE)

# Initialize empty list
all_matches <- list()

# Load each file
for (i in seq_along(dvw_files)) {
  match <- dv_read(dvw_files[i])
  all_matches[[i]] <- plays(match)
}

# Combine all into one data frame
all_plays <- bind_rows(all_matches)

# Check the result
print(paste("Loaded", length(dvw_files), "matches"))
print(paste("Total plays:", nrow(all_plays)))

# How many plays per match?
all_plays |>
  group_by(match_id) |>
  summarise(plays = n())

Organizing Multi-Match Data

When working with multiple matches, add identifying information:

# Load multiple matches with labels
match1 <- dv_read(here("Data", "nebraska_match1.dvw"))
match2 <- dv_read(here("Data", "nebraska_match2.dvw"))

plays1 <- plays(match1) |>
  mutate(match_label = "Match 1 vs Nebraska")

plays2 <- plays(match2) |>
  mutate(match_label = "Match 2 vs Nebraska")

# Combine
all_plays <- bind_rows(plays1, plays2)

# Now you can filter by match
all_plays |>
  filter(match_label == "Match 1 vs Nebraska") |>
  nrow()

Section 8: Saving Your Work

Saving Processed Data

After loading and cleaning data, you might want to save it as a CSV for faster loading later:

# Load and process data
match_data <- dv_read(here("Data", "opponent_match.dvw"))
plays <- plays(match_data)

# Filter to just the opponent's plays
opponent_plays <- plays |>
  filter(team == "Opponent Team Name")

# Save as CSV for faster loading next time
write.csv(opponent_plays, 
          here("Outputs", "Data", "opponent_processed.csv"),
          row.names = FALSE)

# Later, you can load this much faster:
opponent_plays <- read.csv(here("Outputs", "Data", "opponent_processed.csv"))

Saving Your Scripts

Always save your R scripts in the Scripts/ folder:

# In RStudio: File > Save As
# Navigate to your Scripts folder
# Name it descriptively: "01_load_opponent_data.R"

# At the top of your script, always include:
# ============================================================================
# Script: 01_load_opponent_data.R
# Purpose: Load and validate DVW files for opponent scouting
# Author: Your Name
# Date: 2024-11-15
# ============================================================================

# This helps you (and others) understand what the script does

Section 9: Putting It All Together

Complete Workflow Example

Here’s a complete workflow from project setup to loaded data:

# ============================================================================
# SETUP: Load packages
# ============================================================================
library(datavolley)
library(here)
library(dplyr)

# ============================================================================
# VERIFY: Check project and folders
# ============================================================================

# Confirm we're in the right project
here()

# Check that Data folder exists
if (!dir.exists(here("Data"))) {
  dir.create(here("Data"))
  print("Created Data folder")
}

# List DVW files available
print("Available DVW files:")
list.files(here("Data"), pattern = "\\.dvw$")

# ============================================================================
# LOAD: Read DVW file
# ============================================================================

# Load the match
match_data <- dv_read(here("Data", "your_match.dvw"))

# Extract plays
plays <- plays(match_data)

# ============================================================================
# VALIDATE: Check the data
# ============================================================================

print("========================================")
print("DATA VALIDATION")
print("========================================")

# Basic checks
print(paste("Total plays:", nrow(plays)))
print(paste("Teams:", paste(unique(plays$team), collapse = " vs ")))
print(paste("Sets:", paste(unique(plays$set_number), collapse = ", ")))
print(paste("Date:", unique(plays$date)))
print("")

# Skills breakdown
print("Skills breakdown:")
table(plays$skill) |> print()

# Check for missing data
print("Missing data check:")
print(paste("Missing team:", sum(is.na(plays$team))))
print(paste("Missing skill:", sum(is.na(plays$skill))))
print(paste("Missing evaluation:", sum(is.na(plays$evaluation))))

# ============================================================================
# EXPLORE: Initial look at the data
# ============================================================================

print("========================================")
print("SAMPLE DATA")
print("========================================")

# Show first few attacks
print("First 5 attacks:")
plays |>
  filter(skill == "Attack") |>
  select(team, player_name, attack_code, evaluation_code, evaluation) |>
  head(5) |>
  print()

# Show first few serves
print("First 5 serves:")
plays |>
  filter(skill == "Serve") |>
  select(team, player_name, evaluation_code, evaluation) |>
  head(5) |>
  print()

print("========================================")
print("DATA LOADED SUCCESSFULLY!")
print("========================================")

Section 10: Troubleshooting Common Issues

Issue 1: “Cannot find file”

# Error: cannot open file 'match.dvw': No such file or directory

# Diagnose:
# 1. Check your working directory
here()

# 2. List files in your Data folder
list.files(here("Data"))

# 3. Check for typos in filename
# Common mistakes:
# - Wrong capitalization: "Match.dvw" vs "match.dvw"
# - Wrong extension: "match.DVW" vs "match.dvw"
# - Spaces in filename: "my match.dvw" (use "my_match.dvw")

# Solution: Use the EXACT filename as it appears in your folder

Issue 2: “Could not find function ‘here’”

# Error: could not find function "here"

# Solution: Load the package first
library(here)

# If that doesn't work, install it:
install.packages("here")
library(here)

Issue 3: “Could not find function ‘dv_read’”

# Error: could not find function "dv_read"

# Solution: Load the datavolley package
library(datavolley)

# If that doesn't work, install it:
install.packages("remotes")
remotes::install_github("openvolley/datavolley")
library(datavolley)

Issue 4: Project Not Opening Correctly

# If your project isn't opening or behaving strangely:

# 1. Make sure you opened the .Rproj file (not just RStudio)
# 2. Check the project name in top-right corner of RStudio
# 3. Verify working directory is correct:
getwd()

# If still having issues, restart R:
# Session > Restart R

# Then reopen your project:
# File > Open Project > Select your .Rproj file

Issue 5: DVW File Won’t Load

# If dv_read() gives an error:

# 1. Make sure file is a valid DVW file (not corrupted)
# 2. Check that file path is correct
# 3. Try loading with full error messages:

tryCatch({
  match_data <- dv_read(here("Data", "match_1.dvw"))
  print("File loaded successfully!")
}, error = function(e) {
  print("Error loading file:")
  print(e$message)
})

# If file is corrupted, you may need to re-export from DataVolley software

Summary and Key Takeaways

What You Learned

R Projects: Create organized, portable workspaces
Folder Structure: Keep data, scripts, and outputs organized
The here Package: Build reliable file paths that work anywhere
Loading DVW Files: Use dv_read() to load volleyball match data
Exploring Data: Validate and understand play-by-play structure
Data Validation: Check for issues and inconsistencies

Essential Functions to Remember

# R Projects
# Create: File > New Project
# Open: Double-click .Rproj file

# here package
library(here)
here()                                    # Show project root
here("Data", "match.dvw")                # Build file path

# Loading DVW files
library(datavolley)
match_data <- dv_read(here("Data", "file.dvw"))  # Load DVW
plays <- plays(match_data)               # Extract play-by-play

# Initial exploration
head(plays)                              # First rows
names(plays)                             # Column names
nrow(plays)                              # Number of plays
unique(plays$team)                       # Teams in match
table(plays$skill)                       # Skills breakdown

# Data validation
sum(is.na(plays$column))                 # Count missing values
unique(plays$player_name)                # Check player names
summary(plays$coordinate_x)              # Check coordinates

Your Complete Project Setup Checklist

Before starting any new analysis:

✓ Create an R Project
✓ Create folder structure (Data, Scripts, Outputs, Reports)
✓ Place DVW files in Data folder
✓ Create a new script in Scripts folder
✓ Load packages: library(datavolley), library(here), library(dplyr)
✓ Load your DVW file with dv_read(here("Data", "file.dvw"))
✓ Extract plays: plays <- plays(match_data)
✓ Validate your data (check teams, skills, missing values)
✓ Save your script with a descriptive name

Practice Before Moving On

Before Tutorial 03, make sure you can:

✓ Create a new R Project
✓ Set up proper folder structure
✓ Use here() to build file paths
✓ Load a DVW file with dv_read()
✓ Extract play-by-play data with plays()
✓ Explore the data structure
✓ Check for data quality issues
✓ Load multiple DVW files and combine them

Tutorial 02: Structuring Volleyball Analytics Workflows

Organize projects, manage data paths, and load DVW files to maintain reliable, reproducible analysis

Ty Cogdill - Nerd Above the Net

Dec 24, 2025