How to install and use Ncbi for Sequences Retrieval in command line interface !

How to install and use Ncbi for Sequences Retrieval in command line interface !

August 12, 2022 by Abdul Rehman0
Purple Online Learning Course Facebook Ad (4)
Screenshot 2022-08-12 231613
Screenshot 2022-08-12 231801
Screenshot 2022-08-12 231911
Step by Step guide on installing and using NCBI EDirect API for Sequence retrieval i.e., Batch retrieval for all proteins, Mrna, Genomic etc.

First of all, we should know what is NCBI EDirect (Entrez Direct (EDirect) provides access to the NCBI’s suite of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a Unix terminal window. Search terms are entered as command-line arguments. Individual operations are connected with Unix pipes to construct multi-step queries. Selected records can then be retrieved in a variety of formats.)

So, we are going to install NCBI EDirect into our operation for that just copy this command and paste into your terminal.

Installing the NCBI e-utlies command through CLI

sh -c "$(wget -q ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/install-edirect.sh -O -)"

After installing the e-utlies we are just going to check the files created just type

Ls -al

Now Let’s Move on to Sequence Retrieval using EDirect

Download an mRNA sequence in Fasta format from NCBI using accession number

esearch -db nucleotide -query "NC_001552" | efetch -format fasta > output.fasta

Download a Protein sequence in Fasta format from NCBI using accession number

esearch -db protein -query "NP_001277121.1" | \efetch -format fasta > protein.fasta

Download a Genomic sequence in Fasta format from NCBI using accession number

esearch -db nucleotide -query "NC_035444.2" | efetch -format fasta > genomic.fasta

Batch retrieval for all proteins for taxon ID. This example will download all proteins for viruses in fasta format.

esearch -db "protein" -query " txid10239[Organism]" | efetch -format fasta > output.fasta

Get all CDS from a genome

esearch -db protein -query 302315370| elink -target nuccore|efetch -format ft| grep -A 4 --no-group-separator CDS

esearch -db protein -query 302315370 | elink -target nuccore|efetch -format fasta_cds_na| grep YP_003815423.1


 


Leave a Reply

Your email address will not be published. Required fields are marked *




WE CAN, WE Will




+92-309-017-1710


24/7 Services

Call us now if you are in a need, we will reply swiftly and provide you with our services.


Copyright 2022 Bioinformatics.pk