Methods of Data Input in R

Authors	Rasouli, H., Moein, F
Date	December 20, 2025
Affiliation	ABSA, Tarbiat Modares University, Tehran, Iran
Workshop	R club for researchers

In the R programming language, data can be imported in various ways depending on the source of the data (manual entry, files, databases, web, etc.). The main methods are summarized below.

1️⃣ Manual Data Entry

Suitable for small datasets or educational examples.

Creating a Vector

# Create a numeric vector
x <- c(10, 20, 30, 40)

# Create a character vector
genes <- c("TP53", "BRCA1", "EGFR", "MYC")

Creating a Data Frame

# Create a data frame manually
patient_data <- data.frame(
  patient_id = c("P001", "P002", "P003", "P004"),
  age = c(45, 62, 33, 58),
  treatment = c("Drug_A", "Drug_B", "Drug_A", "Placebo"),
  response = c(1.2, 0.8, 1.5, -0.3)
)
print(patient_data)

2️⃣ Importing Data from Text Files

CSV Files (Comma-Separated Values)

# Basic import
gene_expression <- read.csv("gene_expression.csv")

# With additional options for biological data
clinical_data <- read.csv(
  "clinical_data.csv",
  header = TRUE,           # First row contains column names
  stringsAsFactors = FALSE # Keep character columns as characters
)

Tab-Delimited or Custom Delimiter Files

# Tab-delimited files (common in bioinformatics)
rnaseq_data <- read.table(
  "rnaseq_counts.txt",
  header = TRUE,
  sep = "\t",              # Tab separator
  row.names = 1            # Use first column as row names
)

# Space-delimited files
variant_data <- read.table(
  "snp_data.txt",
  sep = " ",
  na.strings = "NA"        # Specify missing values
)

3️⃣ Importing Data from Excel Files

Requires the readxl package, which handles both .xls and .xlsx formats.

# Install and load package
install.packages("readxl")
library(readxl)

# Read specific sheet
clinical_trials <- read_excel(
  "clinical_study.xlsx",
  sheet = "Patient_Data",    # Sheet name or number
  range = "A1:E100",         # Specific cell range
  col_types = c("text", "numeric", "text", "date", "numeric") # Column types
)

# List all sheets in workbook
excel_sheets("clinical_study.xlsx")

4️⃣ Importing Data from Statistical Software Files

The haven package imports data from SPSS, Stata, and SAS formats while preserving variable labels and value labels.

library(haven)

# SPSS files (.sav)
survey_data <- read_sav("survey_data.sav")

# View variable labels
attributes(survey_data$variable_name)$label

# Stata files (.dta)
economic_data <- read_dta("economic_data.dta")

# SAS files (.sas7bdat)
clinical_sas <- read_sas("clinical.sas7bdat")

# SAS transport files (.xpt)
nhanes_data <- read_xpt("demo_f.xpt")

5️⃣ Importing Data from Databases

SQLite Example

library(DBI)
library(RSQLite)

# Connect to database
con <- dbConnect(RSQLite::SQLite(), "genomics.db")

# List tables
dbListTables(con)

# Read entire table
variants <- dbReadTable(con, "snp_variants")

# Execute custom SQL query
high_impact_variants <- dbGetQuery(con, "
  SELECT chrom, position, gene, consequence 
  FROM variants 
  WHERE impact = 'HIGH' 
  AND allele_freq < 0.01
")

# Disconnect when done
dbDisconnect(con)

MySQL/PostgreSQL Example

library(RMySQL)
con <- dbConnect(MySQL(),
                 user = "username",
                 password = "password",
                 dbname = "database",
                 host = "localhost")

6️⃣ Importing Data from the Web

Direct Download of Online Files

# CSV from URL
covid_data <- read.csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")

# Using read.table for any delimiter
remote_data <- read.table(
  "https://example.com/data.txt",
  header = TRUE,
  sep = ","
)

JSON Data from APIs

library(jsonlite)

# From API endpoint
api_data <- fromJSON("https://api.genome.gov/variant/rs429358?format=json")

# From local JSON file
config <- fromJSON("analysis_config.json")

7️⃣ Importing Data via Graphical User Interface (GUI)

RStudio Import Dataset Feature

In RStudio, go to Environment pane
Click Import Dataset
Choose from:
- From Text (readr)
- From Excel
- From SPSS, SAS, Stata
Use the interactive interface to preview and set import options
Click Import to generate and execute the code

8️⃣ Importing Data from the Clipboard

Useful for quick transfers from Excel or other spreadsheet software.

# Copy data from Excel (Ctrl+C) then run:
copied_data <- read.table(
  "clipboard", 
  header = TRUE,
  sep = "\t"               # Default for Excel tabs
)

# For comma-separated clipboard data
clipboard_csv <- read.table(
  "clipboard",
  header = TRUE,
  sep = ","
)

9️⃣ Saving and Reloading R Data

RDS Format (Single Object)

# Save a single object
saveRDS(gene_expression, "gene_expression.rds")

# Read it back (object name can be different)
loaded_data <- readRDS("gene_expression.rds")

# Check object is identical
identical(gene_expression, loaded_data)

RData Format (Multiple Objects)

# Save multiple objects
save(clinical_data, variants, gene_expression, 
     file = "analysis_workspace.RData")

# Load all objects
load("analysis_workspace.RData")

# Save entire workspace
save.image("full_workspace.RData")

🔟 Interactive Data Input

Using `scan()` for Vector Input

# Interactive numeric input
cat("Enter blood pressure readings (press Enter twice to finish):\n")
bp_readings <- scan()

# Interactive character input
cat("Enter gene names (one per line, blank line to finish):\n")
gene_names <- scan(what = character())

Using `readline()` for Single Values

# Get user input
patient_id <- readline("Enter patient ID: ")
age <- as.numeric(readline("Enter patient age: "))

# Create data frame from interactive input
patient <- data.frame(
  id = patient_id,
  age = age,
  date = Sys.Date()
)

📊 Specialized Biological Data Formats

BED Files (Genomic Regions)

library(rtracklayer)
bed_data <- import("regions.bed")

VCF Files (Genetic Variants)

library(VariantAnnotation)
vcf <- readVcf("variants.vcf", genome = "hg38")

FASTQ Files (Sequencing Reads)

library(ShortRead)
fastq <- readFastq("sample_R1.fastq.gz")

📌 Summary Table

Data Source	Primary Function	Package	Common Use Cases
Manual Entry	`c()`, `data.frame()`	base R	Small datasets, examples
CSV Files	`read.csv()`, `read.table()`	base R	Most tabular data
Excel Files	`read_excel()`	readxl	Clinical data, lab results
SPSS/Stata/SAS	`read_sav()`, `read_dta()`, `read_sas()`	haven	Social science, clinical trials
Databases	`dbReadTable()`, `dbGetQuery()`	DBI	Large datasets, patient records
Web/API	`read.csv()`, `fromJSON()`	base R, jsonlite	Public datasets, API data
Clipboard	`read.table("clipboard")`	base R	Quick Excel transfers
R Binary	`readRDS()`, `load()`	base R	Saving analysis results
Bioinformatics	Specialized importers	Bioconductor	Genomic, sequencing data

💡 Best Practices

Always check data after import

str(data)        # Structure
head(data)       # First few rows
summary(data)    # Summary statistics
dim(data)        # Dimensions

Specify column types explicitly

read.csv("data.csv", 
         colClasses = c("character", "numeric", "factor"))

Handle missing values consistently

read.table("data.txt", 
           na.strings = c("NA", "", ".", "-999"))

Use relative paths for reproducibility

# Instead of:
# data <- read.csv("C:/Users/Name/Project/data.csv")

# Use:
data <- read.csv("data/raw/clinical_data.csv")

Save import commands in scripts
- Create a dedicated 01_data_import.R script
- Document data sources and modifications
- Use comments for special handling instructions

🔧 Troubleshooting Common Issues

Problem	Solution
Column type mismatches	Use `colClasses` argument or import then convert
Encoding issues	Specify `encoding = "UTF-8"` or appropriate encoding
Large files	Use `data.table::fread()` or `readr::read_csv()`
Memory limits	Read in chunks or use database connection
Date format problems	Specify `col_types` or convert after import

This guide covers the essential methods for data import in R. The choice of method depends on your data source, size, and analysis requirements. Always validate imported data before proceeding with analysis.