How to install and use Ncbi for Sequences Retrieval in command line interface !




Step by Step guide on installing and using NCBI EDirect API for Sequence retrieval i.e., Batch retrieval for all proteins, Mrna, Genomic etc.
First of all, we should know what is NCBI EDirect (Entrez Direct (EDirect) provides access to the NCBI’s suite of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a Unix terminal window. Search terms are entered as command-line arguments. Individual operations are connected with Unix pipes to construct multi-step queries. Selected records can then be retrieved in a variety of formats.)
So, we are going to install NCBI EDirect into our operation for that just copy this command and paste into your terminal.
Installing the NCBI e-utlies command through CLI
sh -c "$(wget -q ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh -O -)"
After installing the e-utlies we are just going to check the files created just type
Ls -al
Now Let’s Move on to Sequence Retrieval using EDirect
Download an mRNA sequence in Fasta format from NCBI using accession number
esearch -db nucleotide -query "NC_001552" | efetch -format fasta > output.fasta
Download a Protein sequence in Fasta format from NCBI using accession number
esearch -db protein -query "NP_001277121.1" | \efetch -format fasta > protein.fasta
Download a Genomic sequence in Fasta format from NCBI using accession number
esearch -db nucleotide -query "NC_035444.2" | efetch -format fasta > genomic.fasta
Batch retrieval for all proteins for taxon ID. This example will download all proteins for viruses in fasta format.
esearch -db "protein" -query " txid10239[Organism]" | efetch -format fasta > output.fasta
Get all CDS from a genome
esearch -db protein -query 302315370| elink -target nuccore|efetch -format ft| grep -A 4 --no-group-separator CDS esearch -db protein -query 302315370 | elink -target nuccore|efetch -format fasta_cds_na| grep YP_003815423.1