NCBI基因组下载工具
BioNote 2022-08-18
NCBI
NCBI Download
# 参考
# 工具下载
下载datasets,这个是用来从ncbi上下载数据的工具
wget -c https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets
1
使用说明
./datasets download
Usage
datasets download [flags]
datasets download [command]
Examples
datasets download genome accession GCF_000001405.40 --chromosomes X,Y --exclude-gff3 --exclude-rna
datasets download genome taxon "bos taurus"
datasets download gene gene-id 672
datasets download gene symbol brca1 --taxon mouse
datasets download gene accession NP_000483.3
datasets download virus genome taxon sars-cov-2 --host dog
datasets download virus protein S --host dog --filename SARS2-spike-dog.zip
datasets download --input-json request_file.json --filename output.zip
Available Commands
gene download a gene dataset
genome download a genome dataset
virus download a coronavirus dataset
ortholog download an ortholog dataset
Global Flags
-a, --annotated only include genomes with annotation
--api-key string NCBI Datasets API Key
--assembly-level string restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold
--assembly-source string restrict assemblies to refseq or genbank only
--chromosomes strings limit to a specified, comma-delimited list of chromosomes (default [all])
--dehydrated download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
--exclude-genomic-cds exclude cds_from_genomic.fna (genomic cds file)
--exclude-gff3 exclude genomic.gff (gff3 annotation file)
--exclude-protein exclude protein.faa (protein sequence file)
--exclude-rna exclude rna.fna (transcript sequence file)
--exclude-seq exclude genomic.fna (genomic sequence file)
--filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
--include-gbff include genomic.gbff (GenBank flat file sequence and annotation), if available
--include-gtf include genomic.gtf (gtf annotation file), if available
--no-progressbar hide progress bar
--reference limit to reference and representative (GCF_ and GCA_) assemblies
--released-before string only include genomes that have been released before a specified date (MM/DD/YYYY)
--released-since string only include genomes that have been released after a specified date (MM/DD/YYYY)
--search strings only include genomes that have the specified text in the
searchable fields: species and infraspecies, assembly name and submitter
To provide multiple strings '--search' can be included multiple times
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46