Methods of Data Input in R

Authors Rasouli, H., Moein, F
Date December 20, 2025
Affiliation ABSA, Tarbiat Modares University, Tehran, Iran
Workshop R club for researchers

In the R programming language, data can be imported in various ways depending on the source of the data (manual entry, files, databases, web, etc.). The main methods are summarized below.


1️⃣ Manual Data Entry

Suitable for small datasets or educational examples.

Creating a Vector

# Create a numeric vector
x <- c(10, 20, 30, 40)

# Create a character vector
genes <- c("TP53", "BRCA1", "EGFR", "MYC")

Creating a Data Frame

# Create a data frame manually
patient_data <- data.frame(
  patient_id = c("P001", "P002", "P003", "P004"),
  age = c(45, 62, 33, 58),
  treatment = c("Drug_A", "Drug_B", "Drug_A", "Placebo"),
  response = c(1.2, 0.8, 1.5, -0.3)
)
print(patient_data)

2️⃣ Importing Data from Text Files

CSV Files (Comma-Separated Values)

# Basic import
gene_expression <- read.csv("gene_expression.csv")

# With additional options for biological data
clinical_data <- read.csv(
  "clinical_data.csv",
  header = TRUE,           # First row contains column names
  stringsAsFactors = FALSE # Keep character columns as characters
)

Tab-Delimited or Custom Delimiter Files

# Tab-delimited files (common in bioinformatics)
rnaseq_data <- read.table(
  "rnaseq_counts.txt",
  header = TRUE,
  sep = "\t",              # Tab separator
  row.names = 1            # Use first column as row names
)

# Space-delimited files
variant_data <- read.table(
  "snp_data.txt",
  sep = " ",
  na.strings = "NA"        # Specify missing values
)

3️⃣ Importing Data from Excel Files

Requires the readxl package, which handles both .xls and .xlsx formats.

# Install and load package
install.packages("readxl")
library(readxl)

# Read specific sheet
clinical_trials <- read_excel(
  "clinical_study.xlsx",
  sheet = "Patient_Data",    # Sheet name or number
  range = "A1:E100",         # Specific cell range
  col_types = c("text", "numeric", "text", "date", "numeric") # Column types
)

# List all sheets in workbook
excel_sheets("clinical_study.xlsx")

4️⃣ Importing Data from Statistical Software Files

The haven package imports data from SPSS, Stata, and SAS formats while preserving variable labels and value labels.

library(haven)

# SPSS files (.sav)
survey_data <- read_sav("survey_data.sav")

# View variable labels
attributes(survey_data$variable_name)$label

# Stata files (.dta)
economic_data <- read_dta("economic_data.dta")

# SAS files (.sas7bdat)
clinical_sas <- read_sas("clinical.sas7bdat")

# SAS transport files (.xpt)
nhanes_data <- read_xpt("demo_f.xpt")

5️⃣ Importing Data from Databases

SQLite Example

library(DBI)
library(RSQLite)

# Connect to database
con <- dbConnect(RSQLite::SQLite(), "genomics.db")

# List tables
dbListTables(con)

# Read entire table
variants <- dbReadTable(con, "snp_variants")

# Execute custom SQL query
high_impact_variants <- dbGetQuery(con, "
  SELECT chrom, position, gene, consequence 
  FROM variants 
  WHERE impact = 'HIGH' 
  AND allele_freq < 0.01
")

# Disconnect when done
dbDisconnect(con)

MySQL/PostgreSQL Example

library(RMySQL)
con <- dbConnect(MySQL(),
                 user = "username",
                 password = "password",
                 dbname = "database",
                 host = "localhost")

6️⃣ Importing Data from the Web

Direct Download of Online Files

# CSV from URL
covid_data <- read.csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")

# Using read.table for any delimiter
remote_data <- read.table(
  "https://example.com/data.txt",
  header = TRUE,
  sep = ","
)

JSON Data from APIs

library(jsonlite)

# From API endpoint
api_data <- fromJSON("https://api.genome.gov/variant/rs429358?format=json")

# From local JSON file
config <- fromJSON("analysis_config.json")

7️⃣ Importing Data via Graphical User Interface (GUI)

RStudio Import Dataset Feature

  1. In RStudio, go to Environment pane
  2. Click Import Dataset
  3. Choose from:
    • From Text (readr)
    • From Excel
    • From SPSS, SAS, Stata
  4. Use the interactive interface to preview and set import options
  5. Click Import to generate and execute the code

R GUI File Menu

  • File → Import Dataset in base R GUI
  • Interactive dialog boxes for file selection and options

8️⃣ Importing Data from the Clipboard

Useful for quick transfers from Excel or other spreadsheet software.

# Copy data from Excel (Ctrl+C) then run:
copied_data <- read.table(
  "clipboard", 
  header = TRUE,
  sep = "\t"               # Default for Excel tabs
)

# For comma-separated clipboard data
clipboard_csv <- read.table(
  "clipboard",
  header = TRUE,
  sep = ","
)

9️⃣ Saving and Reloading R Data

RDS Format (Single Object)

# Save a single object
saveRDS(gene_expression, "gene_expression.rds")

# Read it back (object name can be different)
loaded_data <- readRDS("gene_expression.rds")

# Check object is identical
identical(gene_expression, loaded_data)

RData Format (Multiple Objects)

# Save multiple objects
save(clinical_data, variants, gene_expression, 
     file = "analysis_workspace.RData")

# Load all objects
load("analysis_workspace.RData")

# Save entire workspace
save.image("full_workspace.RData")

🔟 Interactive Data Input

Using scan() for Vector Input

# Interactive numeric input
cat("Enter blood pressure readings (press Enter twice to finish):\n")
bp_readings <- scan()

# Interactive character input
cat("Enter gene names (one per line, blank line to finish):\n")
gene_names <- scan(what = character())

Using readline() for Single Values

# Get user input
patient_id <- readline("Enter patient ID: ")
age <- as.numeric(readline("Enter patient age: "))

# Create data frame from interactive input
patient <- data.frame(
  id = patient_id,
  age = age,
  date = Sys.Date()
)

📊 Specialized Biological Data Formats

BED Files (Genomic Regions)

library(rtracklayer)
bed_data <- import("regions.bed")

VCF Files (Genetic Variants)

library(VariantAnnotation)
vcf <- readVcf("variants.vcf", genome = "hg38")

FASTQ Files (Sequencing Reads)

library(ShortRead)
fastq <- readFastq("sample_R1.fastq.gz")

📌 Summary Table

Data Source Primary Function Package Common Use Cases
Manual Entry c(), data.frame() base R Small datasets, examples
CSV Files read.csv(), read.table() base R Most tabular data
Excel Files read_excel() readxl Clinical data, lab results
SPSS/Stata/SAS read_sav(), read_dta(), read_sas() haven Social science, clinical trials
Databases dbReadTable(), dbGetQuery() DBI Large datasets, patient records
Web/API read.csv(), fromJSON() base R, jsonlite Public datasets, API data
Clipboard read.table("clipboard") base R Quick Excel transfers
R Binary readRDS(), load() base R Saving analysis results
Bioinformatics Specialized importers Bioconductor Genomic, sequencing data

💡 Best Practices

  1. Always check data after import

    str(data)        # Structure
    head(data)       # First few rows
    summary(data)    # Summary statistics
    dim(data)        # Dimensions
  2. Specify column types explicitly

    read.csv("data.csv", 
             colClasses = c("character", "numeric", "factor"))
  3. Handle missing values consistently

    read.table("data.txt", 
               na.strings = c("NA", "", ".", "-999"))
  4. Use relative paths for reproducibility

    # Instead of:
    # data <- read.csv("C:/Users/Name/Project/data.csv")
    
    # Use:
    data <- read.csv("data/raw/clinical_data.csv")
  5. Save import commands in scripts

    • Create a dedicated 01_data_import.R script
    • Document data sources and modifications
    • Use comments for special handling instructions

🔧 Troubleshooting Common Issues

Problem Solution
Column type mismatches Use colClasses argument or import then convert
Encoding issues Specify encoding = "UTF-8" or appropriate encoding
Large files Use data.table::fread() or readr::read_csv()
Memory limits Read in chunks or use database connection
Date format problems Specify col_types or convert after import

This guide covers the essential methods for data import in R. The choice of method depends on your data source, size, and analysis requirements. Always validate imported data before proceeding with analysis.