| Authors | Rasouli, H., Moein, F |
| Date | December 20, 2025 |
| Affiliation | ABSA, Tarbiat Modares University, Tehran, Iran |
| Workshop | R club for researchers |
In the R programming language, data can be imported in various ways depending on the source of the data (manual entry, files, databases, web, etc.). The main methods are summarized below.
Suitable for small datasets or educational examples.
# Create a numeric vector
x <- c(10, 20, 30, 40)
# Create a character vector
genes <- c("TP53", "BRCA1", "EGFR", "MYC")
# Create a data frame manually
patient_data <- data.frame(
patient_id = c("P001", "P002", "P003", "P004"),
age = c(45, 62, 33, 58),
treatment = c("Drug_A", "Drug_B", "Drug_A", "Placebo"),
response = c(1.2, 0.8, 1.5, -0.3)
)
print(patient_data)
# Basic import
gene_expression <- read.csv("gene_expression.csv")
# With additional options for biological data
clinical_data <- read.csv(
"clinical_data.csv",
header = TRUE, # First row contains column names
stringsAsFactors = FALSE # Keep character columns as characters
)
# Tab-delimited files (common in bioinformatics)
rnaseq_data <- read.table(
"rnaseq_counts.txt",
header = TRUE,
sep = "\t", # Tab separator
row.names = 1 # Use first column as row names
)
# Space-delimited files
variant_data <- read.table(
"snp_data.txt",
sep = " ",
na.strings = "NA" # Specify missing values
)
Requires the readxl package, which handles both .xls and
.xlsx formats.
# Install and load package
install.packages("readxl")
library(readxl)
# Read specific sheet
clinical_trials <- read_excel(
"clinical_study.xlsx",
sheet = "Patient_Data", # Sheet name or number
range = "A1:E100", # Specific cell range
col_types = c("text", "numeric", "text", "date", "numeric") # Column types
)
# List all sheets in workbook
excel_sheets("clinical_study.xlsx")
The haven package imports data from SPSS, Stata, and SAS
formats while preserving variable labels and value labels.
library(haven)
# SPSS files (.sav)
survey_data <- read_sav("survey_data.sav")
# View variable labels
attributes(survey_data$variable_name)$label
# Stata files (.dta)
economic_data <- read_dta("economic_data.dta")
# SAS files (.sas7bdat)
clinical_sas <- read_sas("clinical.sas7bdat")
# SAS transport files (.xpt)
nhanes_data <- read_xpt("demo_f.xpt")
library(DBI)
library(RSQLite)
# Connect to database
con <- dbConnect(RSQLite::SQLite(), "genomics.db")
# List tables
dbListTables(con)
# Read entire table
variants <- dbReadTable(con, "snp_variants")
# Execute custom SQL query
high_impact_variants <- dbGetQuery(con, "
SELECT chrom, position, gene, consequence
FROM variants
WHERE impact = 'HIGH'
AND allele_freq < 0.01
")
# Disconnect when done
dbDisconnect(con)
library(RMySQL)
con <- dbConnect(MySQL(),
user = "username",
password = "password",
dbname = "database",
host = "localhost")
# CSV from URL
covid_data <- read.csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")
# Using read.table for any delimiter
remote_data <- read.table(
"https://example.com/data.txt",
header = TRUE,
sep = ","
)
library(jsonlite)
# From API endpoint
api_data <- fromJSON("https://api.genome.gov/variant/rs429358?format=json")
# From local JSON file
config <- fromJSON("analysis_config.json")
Useful for quick transfers from Excel or other spreadsheet software.
# Copy data from Excel (Ctrl+C) then run:
copied_data <- read.table(
"clipboard",
header = TRUE,
sep = "\t" # Default for Excel tabs
)
# For comma-separated clipboard data
clipboard_csv <- read.table(
"clipboard",
header = TRUE,
sep = ","
)
# Save a single object
saveRDS(gene_expression, "gene_expression.rds")
# Read it back (object name can be different)
loaded_data <- readRDS("gene_expression.rds")
# Check object is identical
identical(gene_expression, loaded_data)
# Save multiple objects
save(clinical_data, variants, gene_expression,
file = "analysis_workspace.RData")
# Load all objects
load("analysis_workspace.RData")
# Save entire workspace
save.image("full_workspace.RData")
scan() for Vector Input# Interactive numeric input
cat("Enter blood pressure readings (press Enter twice to finish):\n")
bp_readings <- scan()
# Interactive character input
cat("Enter gene names (one per line, blank line to finish):\n")
gene_names <- scan(what = character())
readline() for Single Values# Get user input
patient_id <- readline("Enter patient ID: ")
age <- as.numeric(readline("Enter patient age: "))
# Create data frame from interactive input
patient <- data.frame(
id = patient_id,
age = age,
date = Sys.Date()
)
library(rtracklayer)
bed_data <- import("regions.bed")
library(VariantAnnotation)
vcf <- readVcf("variants.vcf", genome = "hg38")
library(ShortRead)
fastq <- readFastq("sample_R1.fastq.gz")
| Data Source | Primary Function | Package | Common Use Cases |
|---|---|---|---|
| Manual Entry | c(), data.frame() |
base R | Small datasets, examples |
| CSV Files | read.csv(), read.table() |
base R | Most tabular data |
| Excel Files | read_excel() |
readxl | Clinical data, lab results |
| SPSS/Stata/SAS | read_sav(), read_dta(),
read_sas() |
haven | Social science, clinical trials |
| Databases | dbReadTable(), dbGetQuery() |
DBI | Large datasets, patient records |
| Web/API | read.csv(), fromJSON() |
base R, jsonlite | Public datasets, API data |
| Clipboard | read.table("clipboard") |
base R | Quick Excel transfers |
| R Binary | readRDS(), load() |
base R | Saving analysis results |
| Bioinformatics | Specialized importers | Bioconductor | Genomic, sequencing data |
Always check data after import
str(data) # Structure
head(data) # First few rows
summary(data) # Summary statistics
dim(data) # DimensionsSpecify column types explicitly
read.csv("data.csv",
colClasses = c("character", "numeric", "factor"))Handle missing values consistently
read.table("data.txt",
na.strings = c("NA", "", ".", "-999"))Use relative paths for reproducibility
# Instead of:
# data <- read.csv("C:/Users/Name/Project/data.csv")
# Use:
data <- read.csv("data/raw/clinical_data.csv")Save import commands in scripts
01_data_import.R script| Problem | Solution |
|---|---|
| Column type mismatches | Use colClasses argument or import then convert |
| Encoding issues | Specify encoding = "UTF-8" or appropriate encoding |
| Large files | Use data.table::fread() or
readr::read_csv() |
| Memory limits | Read in chunks or use database connection |
| Date format problems | Specify col_types or convert after import |
This guide covers the essential methods for data import in R. The choice of method depends on your data source, size, and analysis requirements. Always validate imported data before proceeding with analysis.