通过搜索或程序列表打开终端(Terminal)

一、OMIM 数据处理

[tab] to complete input text;

pwd: print work directory;

pwd
## /home/wangzg/Nutstore Files/Nutstore/备课/2025春/数据处理编程与公共数据分析/bash_test

mkdir: make directory;

cd: change directory;

mkdir -p my_folder
cd my_folder
# cd /
# cd ..
# cd

ssh 连接到远程服务器

# ssh user@remote_host

下载文件与解压文件

# wget https://gitee.com/ceapple/student-pumc/raw/master/morbidmap.zip
# curl https://gitee.com/ceapple/student-pumc/raw/master/morbidmap.zip > morbidmap.zip
# unzip morbidmap.zip
# wget https://gitee.com/ceapple/student-pumc/raw/master/Corona_virus_T1.xlsx

nano or vi用于编辑文本文件;

ls: list files;

ls 
## Corona_virus_T1.xlsx
## morbidmap
## sa.txt
## t1.csv
## t2.csv
## t3.csv
## t_bind.csv
## test.db
## t_query.csv

cat: output file content;

head 或 tail: view file head/tail lines (example: head -n 10 test.txt);

less: view very big file, [q] for quit;

# cat morbidmap
head -n 10 morbidmap
# less morbidmap
## 17,20-lyase deficiency, isolated, 202110 (3)  |CYP17A1, CYP17, P450C17|609300|10q24.3
## 17-alpha-hydroxylase/17,20-lyase deficiency, 202110 (3)  |CYP17A1, CYP17, P450C17|609300|10q24.3
## 17-beta-hydroxysteroid dehydrogenase X deficiency, 300438 (3)  |HSD17B10, HADH2, ERAB, MRXS10, MRX17, MRX31, DUPXp11.22|300256|Xp11.2
## 1p36 deletion syndrome (2)  |SKI|164780|1p36.3
## 2-methylbutyrylglycinuria, 610006 (3)  |ACADSB, SBCAD|600301|10q25-q26
## 3-M syndrome, 273750 (3)  |CUL7|609577|6p21.1
## 3-Methylcrotonyl-CoA carboxylase 1 deficiency, 210200 (3)  |MCCC1, MCCA|609010|3q25-q27
## 3-Methylcrotonyl-CoA carboxylase 2 deficiency, 210210 (3)  |MCCC2, MCCB|609014|5q12-q13
## 3-beta-hydroxysteroid dehydrogenase, type II, deficiency (3)  |HSD3B2|201810|1p13.1
## 3-hydroxyacyl-CoA dehydrogenase deficiency, 231530 (3)  |HADHSC, SCHAD, HHF4|601609|4q22-q26

diff 比较两个文件差异

find 查找文件

find ./ -name '*.csv' -type f
## ./t3.csv
## ./t_bind.csv
## ./t_query.csv
## ./t2.csv
## ./t1.csv

wc -l: count lines

wc -l morbidmap
## 5504 morbidmap

cut: 从文本文件中提取列 (example: cut -f 2 test.txt means cut column 2);

cut -f 4 -d '|' morbidmap | head -n 6
## 10q24.3
## 10q24.3
## Xp11.2
## 1p36.3
## 10q25-q26
## 6p21.1

grep: 模式匹配

搜索疾病基因型与表型强关联的数据

grep "(3)" morbidmap > sa.txt
head sa.txt
wc -l sa.txt
## 17,20-lyase deficiency, isolated, 202110 (3)  |CYP17A1, CYP17, P450C17|609300|10q24.3
## 17-alpha-hydroxylase/17,20-lyase deficiency, 202110 (3)  |CYP17A1, CYP17, P450C17|609300|10q24.3
## 17-beta-hydroxysteroid dehydrogenase X deficiency, 300438 (3)  |HSD17B10, HADH2, ERAB, MRXS10, MRX17, MRX31, DUPXp11.22|300256|Xp11.2
## 2-methylbutyrylglycinuria, 610006 (3)  |ACADSB, SBCAD|600301|10q25-q26
## 3-M syndrome, 273750 (3)  |CUL7|609577|6p21.1
## 3-Methylcrotonyl-CoA carboxylase 1 deficiency, 210200 (3)  |MCCC1, MCCA|609010|3q25-q27
## 3-Methylcrotonyl-CoA carboxylase 2 deficiency, 210210 (3)  |MCCC2, MCCB|609014|5q12-q13
## 3-beta-hydroxysteroid dehydrogenase, type II, deficiency (3)  |HSD3B2|201810|1p13.1
## 3-hydroxyacyl-CoA dehydrogenase deficiency, 231530 (3)  |HADHSC, SCHAD, HHF4|601609|4q22-q26
## 3-hydroxyisobutryl-CoA hydrolase deficiency, 250620 (3)  |HIBCH|610690|2q32.2
## 4116 sa.txt

grep -i : 忽略大小写

grep -i "breast cancer" morbidmap | head
## Androgen insensitivity, partial, with or without breast cancer, 312300 (3)  |AR, DHTR, TFM, SBMA, KD, SMAX1, HYSP1|313700|Xq11-q12
## Breast cancer (1)  |BCPR|113721|17p13.3
## Breast cancer (1)  |ESR1, ESR|133430|6q25.1
## Breast cancer (3)  |TSG101|601387|11p15.2-p15.1
## Breast cancer, 114480 (3)  |PPM1D, WIP1|605100|17q22-q23
## Breast cancer, 114480 (3)  |SLC22A1L, BWSCR1A, IMPT1|602631|11p15.5
## Breast cancer, 114480 (3)  |TP53, P53, LFS1|191170|17p13.1
## Breast cancer, 11:22 translocation associated (1)  |BRCATA|600048|11q23
## Breast cancer, early-onset, 114480 (3)  |BRIP1, BACH1, FANCJ|605882|17q22
## Breast cancer, invasive ductal, 114480 (3)  |RAD54L, HR54, HRAD54|603615|1p32
grep -P "[|]5q[0-9]+" morbidmap | head
## 3-Methylcrotonyl-CoA carboxylase 2 deficiency, 210210 (3)  |MCCC2, MCCB|609014|5q12-q13
## Achondrogenesis Ib, 600972 (3)  |SLC26A2, DTD, DTDST, D5S1708, EDM4|606718|5q32-q33.1
## Adenoma, periampullary (3)  |APC, GS, FPC, BTPS2|611731|5q21-q22
## Adenomatous polyposis coli, 175100 (3)  |APC, GS, FPC, BTPS2|611731|5q21-q22
## Anemia, megaloblastic, due to DHFR deficiency (1) (?)  |DHFR|126060|5q11.2-q13.2
## Angioedema, hereditary, type III, 610618 (3)  |F12, HAF, HAE3|610619|5q33-qter
## Aortic aneurysm, familial thoracic 2 (2)  |AAT2, FAA2|607087|5q13-q14
## Arthrogryposis multiplex congenita, neurogenic (2)  |AMCN, AMCN1|208100|5q35
## Atelosteogenesis II, 256050 (3)  |SLC26A2, DTD, DTDST, D5S1708, EDM4|606718|5q32-q33.1
## Atopy, 147050 (3)  |SPINK5, LEKTI|605010|5q32

定位到5q染色体区域的疾病数目

grep -P "[|]5q[0-9]+" morbidmap | wc -l
## 190

grep -Eo -E: 扩展的正则表达式; -o: 仅输出匹配的部分; -n也输出行号

cut -f 4 -d '|' morbidmap | sed 's/Chr\.//g' | grep -Eon '^[0-9]+|X|Y' | head -n 10
## 1:10
## 2:10
## 3:X
## 4:1
## 5:10
## 6:6
## 7:3
## 8:5
## 9:1
## 10:4

sed ‘s/Chr.//g’ # 例如Chr.X 会被替换成X

sort: 排序 sort -g , sort compare according to general numerical value

cat morbidmap | sort

uniq -c; 对每个一个独立的值计数, 定位到各染色体的疾病数目

cut -f 4 -d '|' morbidmap | sed 's/Chr\.//g' | grep -Eo '^[0-9]+|X|Y' | sort -g | uniq -c 
##     413 X
##      10 Y
##     515 1
##     379 2
##     328 3
##     221 4
##     259 5
##     315 6
##     250 7
##     189 8
##     206 9
##     229 10
##     364 11
##     252 12
##     126 13
##     160 14
##     155 15
##     206 16
##     319 17
##      85 18
##     217 19
##     129 20
##      51 21
##     126 22

awk: 抽取符合某种模式的数据块

提取第一个数据块

awk -F '|' '{print $1}' morbidmap | head 
## 17,20-lyase deficiency, isolated, 202110 (3)  
## 17-alpha-hydroxylase/17,20-lyase deficiency, 202110 (3)  
## 17-beta-hydroxysteroid dehydrogenase X deficiency, 300438 (3)  
## 1p36 deletion syndrome (2)  
## 2-methylbutyrylglycinuria, 610006 (3)  
## 3-M syndrome, 273750 (3)  
## 3-Methylcrotonyl-CoA carboxylase 1 deficiency, 210200 (3)  
## 3-Methylcrotonyl-CoA carboxylase 2 deficiency, 210210 (3)  
## 3-beta-hydroxysteroid dehydrogenase, type II, deficiency (3)  
## 3-hydroxyacyl-CoA dehydrogenase deficiency, 231530 (3)

匹配整行有cancer的记录,并输出第一列

awk -F "|" '/cancer/ {print $1}' morbidmap | head 
## Androgen insensitivity, partial, with or without breast cancer, 312300 (3)  
## Bladder cancer, 109800 (3)  
## Bladder cancer, 109800 (3)  
## Bladder cancer, 109800 (3)  
## Breast cancer (1)  
## Breast cancer (1)  
## Breast cancer (3)  
## Breast cancer, 114480 (3)  
## Breast cancer, 114480 (3)  
## Breast cancer, 114480 (3)

仅匹配第一列有cancer的记录

awk -F "|" '$1 ~ /cancer/ {print $1}' morbidmap | head
# 另一种写法 awk -F "|" '{if ($1~ "cancer") print $1}' morbidmap | head
## Androgen insensitivity, partial, with or without breast cancer, 312300 (3)  
## Bladder cancer, 109800 (3)  
## Bladder cancer, 109800 (3)  
## Bladder cancer, 109800 (3)  
## Breast cancer (1)  
## Breast cancer (1)  
## Breast cancer (3)  
## Breast cancer, 114480 (3)  
## Breast cancer, 114480 (3)  
## Breast cancer, 114480 (3)

第一列有cancer的记录有多少行?

awk -F "|" '$1 ~ /cancer/ {print $1}' morbidmap | wc -l
## 180

将第一列转换成小写,再匹配,即忽略大小写

awk -F "|" 'tolower($1) ~ /cancer/ {print $1}' morbidmap | wc -l 
## 181
# awk -F "|" '/cancer|Cancer/ {print $1}' morbidmap | head

仅匹配第一列没有cancer的记录

awk -F "|" '$0 !~ /cancer/ {print $1}' morbidmap | head
## 17,20-lyase deficiency, isolated, 202110 (3)  
## 17-alpha-hydroxylase/17,20-lyase deficiency, 202110 (3)  
## 17-beta-hydroxysteroid dehydrogenase X deficiency, 300438 (3)  
## 1p36 deletion syndrome (2)  
## 2-methylbutyrylglycinuria, 610006 (3)  
## 3-M syndrome, 273750 (3)  
## 3-Methylcrotonyl-CoA carboxylase 1 deficiency, 210200 (3)  
## 3-Methylcrotonyl-CoA carboxylase 2 deficiency, 210210 (3)  
## 3-beta-hydroxysteroid dehydrogenase, type II, deficiency (3)  
## 3-hydroxyacyl-CoA dehydrogenase deficiency, 231530 (3)
cat morbidmap | awk -F '|' '{print $1"; "$4}'  | head
## 17,20-lyase deficiency, isolated, 202110 (3)  ; 10q24.3
## 17-alpha-hydroxylase/17,20-lyase deficiency, 202110 (3)  ; 10q24.3
## 17-beta-hydroxysteroid dehydrogenase X deficiency, 300438 (3)  ; Xp11.2
## 1p36 deletion syndrome (2)  ; 1p36.3
## 2-methylbutyrylglycinuria, 610006 (3)  ; 10q25-q26
## 3-M syndrome, 273750 (3)  ; 6p21.1
## 3-Methylcrotonyl-CoA carboxylase 1 deficiency, 210200 (3)  ; 3q25-q27
## 3-Methylcrotonyl-CoA carboxylase 2 deficiency, 210210 (3)  ; 5q12-q13
## 3-beta-hydroxysteroid dehydrogenase, type II, deficiency (3)  ; 1p13.1
## 3-hydroxyacyl-CoA dehydrogenase deficiency, 231530 (3)  ; 4q22-q26

sed: 流编辑器

将所有的OMIM ID前加00

awk -F "|" '{print $3}' morbidmap | sed -r 's/[0-9]+/00&/' | head
## 00609300
## 00609300
## 00300256
## 00164780
## 00600301
## 00609577
## 00609010
## 00609014
## 00201810
## 00601609

将染色体定位仅保留数字和XY部分

cut -f 4 -d "|" morbidmap | sed -E 's/([0-9XY]+)[A-Za-z]*.*/\1/;s/^Chr\.([0-9XY]+).*/\1/' | head
## 10
## 10
## X
## 1
## 10
## 6
## 3
## 5
## 1
## 4

二、冠状病毒药物数据 use csvkit

Read data

# sudo apt-get install csvkit
in2csv -n Corona_virus_T1.xlsx
## Sheet1
## Sheet2
## Sheet3

将数据转换文本格式

in2csv --sheet Sheet1  Corona_virus_T1.xlsx > t1.csv
in2csv --sheet Sheet2  Corona_virus_T1.xlsx > t2.csv
in2csv --sheet Sheet3  Corona_virus_T1.xlsx > t3.csv

csvstat对数据进行简单统计

csvstat t1.csv
##   1. "Disease"
## 
##  Type of data:          Text
##  Contains null values:  False
##  Non-null values:       22
##  Unique values:         4
##  Longest value:         17 characters
##  Most common values:    MERS-CoV (10x)
##                         SARS-CoV (8x)
##                         SARS (3x)
##                         HKU4-CoV and mers (1x)
## 
##   2. "Drug name"
## 
##  Type of data:          Text
##  Contains null values:  False
##  Non-null values:       22
##  Unique values:         18
##  Longest value:         47 characters
##  Most common values:    Ribavirin (3x)
##                         6-mercaptopurine (2x)
##                         6-thioguanine (2x)
##                         Lopinavir (1x)
##                         ritonavir (1x)
## 
##   3. "Chinese name"
## 
##  Type of data:          Boolean
##  Contains null values:  True (excluded from calculations)
##  Non-null values:       0
##  Unique values:         1
##  Most common values:    None (22x)
## 
##   4. "ChEBI ID"
## 
##  Type of data:          Text
##  Contains null values:  True (excluded from calculations)
##  Non-null values:       21
##  Unique values:         16
##  Longest value:         14 characters
##  Most common values:    CHEBI_63580  (4x)
##                         CHEBI_50667 (2x)
##                         CHEBI_9555  (2x)
##                         CHEBI_90705  (2x)
##                         CHEBI_31781 (1x)
## 
## Row count: 22

csvlook: 打印文件

csvlook t1.csv
## | Disease           | Drug name                                       | Chinese name | ChEBI ID       |
## | ----------------- | ----------------------------------------------- | ------------ | -------------- |
## | SARS              | Ribavirin                                       |              | CHEBI_63580    |
## | SARS              | Lopinavir                                       |              | CHEBI_31781    |
## | SARS              | ritonavir                                       |              | CHEBI_45409    |
## | MERS-CoV          | IFNα                                            |              | 2a: CHEBI_5937 |
## | MERS-CoV          | IFNβ1b                                          |              | CHEBI_5938     |
## | MERS-CoV          | 6-mercaptopurine                                |              | CHEBI_50667    |
## | MERS-CoV          | 6-thioguanine                                   |              | CHEBI_9555     |
## | SARS-CoV          | 6-mercaptopurine                                |              | CHEBI_50667    |
## | SARS-CoV          | 6-thioguanine                                   |              | CHEBI_9555     |
## | HKU4-CoV and mers | SG85                                            |              |                |
## | MERS-CoV          | Mycophenolic acid                               |              | CHEBI_168396   |
## | MERS-CoV          | chloroquine                                     |              | CHEBI_3638     |
## | MERS-CoV          | chlorpromazine                                  |              | CHEBI_3647     |
## | MERS-CoV          | loperamide                                      |              | CHEBI_6532     |
## | MERS-CoV          | cyclosporine A                                  |              | CHEBI:4031     |
## | SARS-CoV          | SB203580                                        |              | CHEBI_90705    |
## | MERS-CoV          | SB203581                                        |              | CHEBI_90705    |
## | SARS-CoV          | Ribavirin                                       |              | CHEBI_63580    |
## | SARS-CoV          | (Pulsed) methylprednisolone                     |              | CHEBI_6888     |
## | SARS-CoV          | Ribavirin                                       |              | CHEBI_63580    |
## | SARS-CoV          | Ribavirin lopinavir–ritonavir + corticosteroids |              | CHEBI_63580    |
## | SARS-CoV          | Interferon alfa‑1+corticosteroid                |              | ,CHEBI_50858   |
csvlook t3.csv
## | Drug name         | DrON ID        | NDF-RT ID    |
## | ----------------- | -------------- | ------------ |
## | Mycophenolic acid | DRON_00016769  | N0000148832  |
## | chloroquine       | DRON_00001135  | N0000147767  |
## | chlorpromazine    | DRON_00021685  | N0000146214  |
## | loperamide        | DRON_00022294  | N0000147893  |

csvcut: 按照列名称或列号提取数据

csvcut -n t1.csv
##   1: Disease
##   2: Drug name
##   3: Chinese name
##   4: ChEBI ID
csvcut -c 1 t1.csv 
## Disease
## SARS
## SARS
## SARS
## MERS-CoV
## MERS-CoV
## MERS-CoV
## MERS-CoV
## SARS-CoV
## SARS-CoV
## HKU4-CoV and mers
## MERS-CoV
## MERS-CoV
## MERS-CoV
## MERS-CoV
## MERS-CoV
## SARS-CoV
## MERS-CoV
## SARS-CoV
## SARS-CoV
## SARS-CoV
## SARS-CoV
## SARS-CoV
csvcut -c "Drug name" t1.csv
## Drug name
## Ribavirin
## Lopinavir
## ritonavir
## IFNα
## IFNβ1b
## 6-mercaptopurine
## 6-thioguanine
## 6-mercaptopurine
## 6-thioguanine
## SG85
## Mycophenolic acid
## chloroquine
## chlorpromazine
## loperamide 
## cyclosporine A
## SB203580
## SB203581
## Ribavirin
## (Pulsed) methylprednisolone
## Ribavirin
## Ribavirin lopinavir–ritonavir + corticosteroids
## Interferon alfa‑1+corticosteroid
csvcut -c 1,2 t1.csv
## Disease,Drug name
## SARS,Ribavirin
## SARS,Lopinavir
## SARS,ritonavir
## MERS-CoV,IFNα
## MERS-CoV,IFNβ1b
## MERS-CoV,6-mercaptopurine
## MERS-CoV,6-thioguanine
## SARS-CoV,6-mercaptopurine
## SARS-CoV,6-thioguanine
## HKU4-CoV and mers,SG85
## MERS-CoV,Mycophenolic acid
## MERS-CoV,chloroquine
## MERS-CoV,chlorpromazine
## MERS-CoV,loperamide 
## MERS-CoV,cyclosporine A
## SARS-CoV,SB203580
## MERS-CoV,SB203581
## SARS-CoV,Ribavirin
## SARS-CoV,(Pulsed) methylprednisolone
## SARS-CoV,Ribavirin
## SARS-CoV,Ribavirin lopinavir–ritonavir + corticosteroids
## SARS-CoV,Interferon alfa‑1+corticosteroid
csvcut -c "Disease","ChEBI ID" t1.csv
## Disease,ChEBI ID
## SARS,CHEBI_63580 
## SARS,CHEBI_31781
## SARS,CHEBI_45409 
## MERS-CoV,2a: CHEBI_5937
## MERS-CoV,CHEBI_5938 
## MERS-CoV,CHEBI_50667
## MERS-CoV,CHEBI_9555 
## SARS-CoV,CHEBI_50667
## SARS-CoV,CHEBI_9555 
## HKU4-CoV and mers,
## MERS-CoV,CHEBI_168396 
## MERS-CoV,CHEBI_3638 
## MERS-CoV,CHEBI_3647 
## MERS-CoV,CHEBI_6532 
## MERS-CoV,CHEBI:4031
## SARS-CoV,CHEBI_90705 
## MERS-CoV,CHEBI_90705 
## SARS-CoV,CHEBI_63580 
## SARS-CoV,CHEBI_6888 
## SARS-CoV,CHEBI_63580 
## SARS-CoV,CHEBI_63580 
## SARS-CoV,",CHEBI_50858 "

csvgrep 对Disease进行模式匹配

csvgrep -c "Disease" -m "SARS-CoV" t1.csv
## Disease,Drug name,Chinese name,ChEBI ID
## SARS-CoV,6-mercaptopurine,,CHEBI_50667
## SARS-CoV,6-thioguanine,,CHEBI_9555 
## SARS-CoV,SB203580,,CHEBI_90705 
## SARS-CoV,Ribavirin,,CHEBI_63580 
## SARS-CoV,(Pulsed) methylprednisolone,,CHEBI_6888 
## SARS-CoV,Ribavirin,,CHEBI_63580 
## SARS-CoV,Ribavirin lopinavir–ritonavir + corticosteroids,,CHEBI_63580 
## SARS-CoV,Interferon alfa‑1+corticosteroid,,",CHEBI_50858 "

?i: 忽略大小写

csvgrep -c "Disease" -r "^(?i)SARS" t1.csv
## /home/wangzg/.local/lib/python3.10/site-packages/csvkit/utilities/csvgrep.py:64: DeprecationWarning: Flags not at the start of the expression '^(?i)SARS' but at position 1
## Disease,Drug name,Chinese name,ChEBI ID
## SARS,Ribavirin,,CHEBI_63580 
## SARS,Lopinavir,,CHEBI_31781
## SARS,ritonavir,,CHEBI_45409 
## SARS-CoV,6-mercaptopurine,,CHEBI_50667
## SARS-CoV,6-thioguanine,,CHEBI_9555 
## SARS-CoV,SB203580,,CHEBI_90705 
## SARS-CoV,Ribavirin,,CHEBI_63580 
## SARS-CoV,(Pulsed) methylprednisolone,,CHEBI_6888 
## SARS-CoV,Ribavirin,,CHEBI_63580 
## SARS-CoV,Ribavirin lopinavir–ritonavir + corticosteroids,,CHEBI_63580 
## SARS-CoV,Interferon alfa‑1+corticosteroid,,",CHEBI_50858 "

按行连接文件

csvstack  t1.csv t2.csv
## Disease,Drug name,Chinese name,ChEBI ID
## SARS,Ribavirin,,CHEBI_63580 
## SARS,Lopinavir,,CHEBI_31781
## SARS,ritonavir,,CHEBI_45409 
## MERS-CoV,IFNα,,2a: CHEBI_5937
## MERS-CoV,IFNβ1b,,CHEBI_5938 
## MERS-CoV,6-mercaptopurine,,CHEBI_50667
## MERS-CoV,6-thioguanine,,CHEBI_9555 
## SARS-CoV,6-mercaptopurine,,CHEBI_50667
## SARS-CoV,6-thioguanine,,CHEBI_9555 
## HKU4-CoV and mers,SG85,,
## MERS-CoV,Mycophenolic acid,,CHEBI_168396 
## MERS-CoV,chloroquine,,CHEBI_3638 
## MERS-CoV,chlorpromazine,,CHEBI_3647 
## MERS-CoV,loperamide ,,CHEBI_6532 
## MERS-CoV,cyclosporine A,,CHEBI:4031
## SARS-CoV,SB203580,,CHEBI_90705 
## MERS-CoV,SB203581,,CHEBI_90705 
## SARS-CoV,Ribavirin,,CHEBI_63580 
## SARS-CoV,(Pulsed) methylprednisolone,,CHEBI_6888 
## SARS-CoV,Ribavirin,,CHEBI_63580 
## SARS-CoV,Ribavirin lopinavir–ritonavir + corticosteroids,,CHEBI_63580 
## SARS-CoV,Interferon alfa‑1+corticosteroid,,",CHEBI_50858 "
## SARS-CoV,Triparanol,,CHEBI_135714
## SARS-CoV,Anisomycin,,CHEBI_338412 
## SARS-CoV,Cycloheximide,,CHEBI_27641
## SARS-CoV,Homoharringtonine,,CHEBI_71019 
## SARS-CoV,Benztropine mesylate,,CHEBI_3049 
## SARS-CoV,Fluspirilene,,
## SARS-CoV,Thiothixene,,CHEBI_9571 
## SARS-CoV,Fluphenazine hydrochloride,,CHEBI_5126
## SARS-CoV,Promethazine hydrochloride,,CHEBI_8462 
## SARS-CoV,Astemizole,,CHEBI_2896 
## SARS-CoV,Chlorphenoxamine hydrochloride,,CHEBI_135288 

按行连接文件,并增加source列

csvstack -g  "T1","T2" t1.csv t2.csv > t_bind.csv
csvstack -g  "T1","T2" -n "source" t1.csv t2.csv > t_bind.csv
csvlook t_bind.csv
## | source | Disease           | Drug name                                       | Chinese name | ChEBI ID       |
## | ------ | ----------------- | ----------------------------------------------- | ------------ | -------------- |
## | T1     | SARS              | Ribavirin                                       |              | CHEBI_63580    |
## | T1     | SARS              | Lopinavir                                       |              | CHEBI_31781    |
## | T1     | SARS              | ritonavir                                       |              | CHEBI_45409    |
## | T1     | MERS-CoV          | IFNα                                            |              | 2a: CHEBI_5937 |
## | T1     | MERS-CoV          | IFNβ1b                                          |              | CHEBI_5938     |
## | T1     | MERS-CoV          | 6-mercaptopurine                                |              | CHEBI_50667    |
## | T1     | MERS-CoV          | 6-thioguanine                                   |              | CHEBI_9555     |
## | T1     | SARS-CoV          | 6-mercaptopurine                                |              | CHEBI_50667    |
## | T1     | SARS-CoV          | 6-thioguanine                                   |              | CHEBI_9555     |
## | T1     | HKU4-CoV and mers | SG85                                            |              |                |
## | T1     | MERS-CoV          | Mycophenolic acid                               |              | CHEBI_168396   |
## | T1     | MERS-CoV          | chloroquine                                     |              | CHEBI_3638     |
## | T1     | MERS-CoV          | chlorpromazine                                  |              | CHEBI_3647     |
## | T1     | MERS-CoV          | loperamide                                      |              | CHEBI_6532     |
## | T1     | MERS-CoV          | cyclosporine A                                  |              | CHEBI:4031     |
## | T1     | SARS-CoV          | SB203580                                        |              | CHEBI_90705    |
## | T1     | MERS-CoV          | SB203581                                        |              | CHEBI_90705    |
## | T1     | SARS-CoV          | Ribavirin                                       |              | CHEBI_63580    |
## | T1     | SARS-CoV          | (Pulsed) methylprednisolone                     |              | CHEBI_6888     |
## | T1     | SARS-CoV          | Ribavirin                                       |              | CHEBI_63580    |
## | T1     | SARS-CoV          | Ribavirin lopinavir–ritonavir + corticosteroids |              | CHEBI_63580    |
## | T1     | SARS-CoV          | Interferon alfa‑1+corticosteroid                |              | ,CHEBI_50858   |
## | T2     | SARS-CoV          | Triparanol                                      |              | CHEBI_135714   |
## | T2     | SARS-CoV          | Anisomycin                                      |              | CHEBI_338412   |
## | T2     | SARS-CoV          | Cycloheximide                                   |              | CHEBI_27641    |
## | T2     | SARS-CoV          | Homoharringtonine                               |              | CHEBI_71019    |
## | T2     | SARS-CoV          | Benztropine mesylate                            |              | CHEBI_3049     |
## | T2     | SARS-CoV          | Fluspirilene                                    |              |                |
## | T2     | SARS-CoV          | Thiothixene                                     |              | CHEBI_9571     |
## | T2     | SARS-CoV          | Fluphenazine hydrochloride                      |              | CHEBI_5126     |
## | T2     | SARS-CoV          | Promethazine hydrochloride                      |              | CHEBI_8462     |
## | T2     | SARS-CoV          | Astemizole                                      |              | CHEBI_2896     |
## | T2     | SARS-CoV          | Chlorphenoxamine hydrochloride                  |              | CHEBI_135288   |

依据Drug name合并数据

csvjoin t1.csv t3.csv -c "Drug name" | csvlook
## | Disease  | Drug name         | Chinese name | ChEBI ID      | DrON ID        | NDF-RT ID    |
## | -------- | ----------------- | ------------ | ------------- | -------------- | ------------ |
## | MERS-CoV | Mycophenolic acid |              | CHEBI_168396  | DRON_00016769  | N0000148832  |
## | MERS-CoV | chloroquine       |              | CHEBI_3638    | DRON_00001135  | N0000147767  |
## | MERS-CoV | chlorpromazine    |              | CHEBI_3647    | DRON_00021685  | N0000146214  |
## | MERS-CoV | loperamide        |              | CHEBI_6532    | DRON_00022294  | N0000147893  |

保留非匹配部分

csvjoin t1.csv t3.csv -c "Drug name" --outer
## Disease,Drug name,Chinese name,ChEBI ID,Drug name2,DrON ID,NDF-RT ID
## SARS,Ribavirin,,CHEBI_63580 ,,,
## SARS,Lopinavir,,CHEBI_31781,,,
## SARS,ritonavir,,CHEBI_45409 ,,,
## MERS-CoV,IFNα,,2a: CHEBI_5937,,,
## MERS-CoV,IFNβ1b,,CHEBI_5938 ,,,
## MERS-CoV,6-mercaptopurine,,CHEBI_50667,,,
## MERS-CoV,6-thioguanine,,CHEBI_9555 ,,,
## SARS-CoV,6-mercaptopurine,,CHEBI_50667,,,
## SARS-CoV,6-thioguanine,,CHEBI_9555 ,,,
## HKU4-CoV and mers,SG85,,,,,
## MERS-CoV,Mycophenolic acid,,CHEBI_168396 ,Mycophenolic acid,DRON_00016769 ,N0000148832
## MERS-CoV,chloroquine,,CHEBI_3638 ,chloroquine,DRON_00001135 ,N0000147767
## MERS-CoV,chlorpromazine,,CHEBI_3647 ,chlorpromazine,DRON_00021685 ,N0000146214 
## MERS-CoV,loperamide ,,CHEBI_6532 ,loperamide ,DRON_00022294 ,N0000147893 
## MERS-CoV,cyclosporine A,,CHEBI:4031,,,
## SARS-CoV,SB203580,,CHEBI_90705 ,,,
## MERS-CoV,SB203581,,CHEBI_90705 ,,,
## SARS-CoV,Ribavirin,,CHEBI_63580 ,,,
## SARS-CoV,(Pulsed) methylprednisolone,,CHEBI_6888 ,,,
## SARS-CoV,Ribavirin,,CHEBI_63580 ,,,
## SARS-CoV,Ribavirin lopinavir–ritonavir + corticosteroids,,CHEBI_63580 ,,,
## SARS-CoV,Interferon alfa‑1+corticosteroid,,",CHEBI_50858 ",,,

命令串联

csvlook t1.csv; csvstat t1.csv
## | Disease           | Drug name                                       | Chinese name | ChEBI ID       |
## | ----------------- | ----------------------------------------------- | ------------ | -------------- |
## | SARS              | Ribavirin                                       |              | CHEBI_63580    |
## | SARS              | Lopinavir                                       |              | CHEBI_31781    |
## | SARS              | ritonavir                                       |              | CHEBI_45409    |
## | MERS-CoV          | IFNα                                            |              | 2a: CHEBI_5937 |
## | MERS-CoV          | IFNβ1b                                          |              | CHEBI_5938     |
## | MERS-CoV          | 6-mercaptopurine                                |              | CHEBI_50667    |
## | MERS-CoV          | 6-thioguanine                                   |              | CHEBI_9555     |
## | SARS-CoV          | 6-mercaptopurine                                |              | CHEBI_50667    |
## | SARS-CoV          | 6-thioguanine                                   |              | CHEBI_9555     |
## | HKU4-CoV and mers | SG85                                            |              |                |
## | MERS-CoV          | Mycophenolic acid                               |              | CHEBI_168396   |
## | MERS-CoV          | chloroquine                                     |              | CHEBI_3638     |
## | MERS-CoV          | chlorpromazine                                  |              | CHEBI_3647     |
## | MERS-CoV          | loperamide                                      |              | CHEBI_6532     |
## | MERS-CoV          | cyclosporine A                                  |              | CHEBI:4031     |
## | SARS-CoV          | SB203580                                        |              | CHEBI_90705    |
## | MERS-CoV          | SB203581                                        |              | CHEBI_90705    |
## | SARS-CoV          | Ribavirin                                       |              | CHEBI_63580    |
## | SARS-CoV          | (Pulsed) methylprednisolone                     |              | CHEBI_6888     |
## | SARS-CoV          | Ribavirin                                       |              | CHEBI_63580    |
## | SARS-CoV          | Ribavirin lopinavir–ritonavir + corticosteroids |              | CHEBI_63580    |
## | SARS-CoV          | Interferon alfa‑1+corticosteroid                |              | ,CHEBI_50858   |
##   1. "Disease"
## 
##  Type of data:          Text
##  Contains null values:  False
##  Non-null values:       22
##  Unique values:         4
##  Longest value:         17 characters
##  Most common values:    MERS-CoV (10x)
##                         SARS-CoV (8x)
##                         SARS (3x)
##                         HKU4-CoV and mers (1x)
## 
##   2. "Drug name"
## 
##  Type of data:          Text
##  Contains null values:  False
##  Non-null values:       22
##  Unique values:         18
##  Longest value:         47 characters
##  Most common values:    Ribavirin (3x)
##                         6-mercaptopurine (2x)
##                         6-thioguanine (2x)
##                         Lopinavir (1x)
##                         ritonavir (1x)
## 
##   3. "Chinese name"
## 
##  Type of data:          Boolean
##  Contains null values:  True (excluded from calculations)
##  Non-null values:       0
##  Unique values:         1
##  Most common values:    None (22x)
## 
##   4. "ChEBI ID"
## 
##  Type of data:          Text
##  Contains null values:  True (excluded from calculations)
##  Non-null values:       21
##  Unique values:         16
##  Longest value:         14 characters
##  Most common values:    CHEBI_63580  (4x)
##                         CHEBI_50667 (2x)
##                         CHEBI_9555  (2x)
##                         CHEBI_90705  (2x)
##                         CHEBI_31781 (1x)
## 
## Row count: 22

Chaining 运行不成功因为没有t100.csv

# csvlook t100.csv && csvstat t1.csv
# Error: No such file or directory: 't100.csv'

Write & read database

csvsql --db "sqlite:///test.db" --insert t1.csv
sql2csv --db "sqlite:///test.db" --query "select * from t1" | head
## /usr/lib/python3/dist-packages/agatesql/table.py:183: RemovedIn20Warning: Deprecated API features detected! These feature(s) are not compatible with SQLAlchemy 2.0. To prevent incompatible upgrades prior to updating applications, ensure requirements files are pinned to "sqlalchemy<2.0". Set environment variable SQLALCHEMY_WARN_20=1 to show all deprecation warnings.  Set environment variable SQLALCHEMY_SILENCE_UBER_WARNING=1 to silence this message. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)
## OperationalError: (sqlite3.OperationalError) table t1 already exists
## [SQL: 
## CREATE TABLE t1 (
##  "Disease" VARCHAR NOT NULL, 
##  "Drug name" VARCHAR NOT NULL, 
##  "Chinese name" BOOLEAN, 
##  "ChEBI ID" VARCHAR
## )
## 
## ]
## (Background on this error at: https://sqlalche.me/e/14/e3q8)
## Disease,Drug name,Chinese name,ChEBI ID
## SARS,Ribavirin,,CHEBI_63580 
## SARS,Lopinavir,,CHEBI_31781
## SARS,ritonavir,,CHEBI_45409 
## MERS-CoV,IFNα,,2a: CHEBI_5937
## MERS-CoV,IFNβ1b,,CHEBI_5938 
## MERS-CoV,6-mercaptopurine,,CHEBI_50667
## MERS-CoV,6-thioguanine,,CHEBI_9555 
## SARS-CoV,6-mercaptopurine,,CHEBI_50667
## SARS-CoV,6-thioguanine,,CHEBI_9555 

关机

sudo shutdown -h now