... | ... | @@ -3,6 +3,7 @@ title: Functional annotations |
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
Functional annotation helps bringing biological meaning to genetic sequences. Functional annotation is usually obtained through protein sequence similarity. Indeed, across two organisms, if two sequences are very similar, one can infer that they can encode for the same biological function.
|
|
|
There are several main parameters that will impact the process of functional annotation:
|
|
|
- how distant is the species which was actually annotated with experimental data (the reference)
|
... | ... | @@ -12,7 +13,7 @@ There are several main parameters that will impact the process of functional ann |
|
|
Several tools to functionally annotate sequences exist. They do not all require the same input, nor will they deliver the same output, and are therefore complementary.
|
|
|
|
|
|
# Requirements
|
|
|
In Ortho_KB, we integrate functional annotation from EggNOG, MapMan, InterPro and TRAPID.
|
|
|
In Ortho_KB, we integrate functional annotation from EggNOG, MapMan, InterPro and TRAPID (giving access to the Gene Ontology and Gene/RNA Family).
|
|
|
|
|
|
## EggNOG
|
|
|
|
... | ... | @@ -24,7 +25,7 @@ First, install emapper, which can be done for example with conda: |
|
|
|
|
|
Then create the eggNOG database which contains ortholog groups and the functional annotation. In our case, we will chose the taxonomic group Viridiplantae, whose code is 33090 (see [http://eggnog5.embl.de/#/app/downloads](http://eggnog5.embl.de/#/app/downloads))
|
|
|
|
|
|
`download_eggnog_data.py --data_dir eggnog_db -F -P -M -H --dbname 33090`
|
|
|
`download_eggnog_data.py --data_dir eggnog_db --dbname 33090`
|
|
|
|
|
|
Finally, the emapper program can be run to annotate each genome:
|
|
|
|
... | ... | @@ -116,7 +117,9 @@ Few genes are annotated by this category. |
|
|
|
|
|
## InterPro
|
|
|
|
|
|
InterPro is a database which includes a range of sources for the annotation of protein function. InterProScan allows to scan InterPro for the submitted protein sequences (FASTA).
|
|
|
InterPro is a database which includes a range of sources for the annotation of protein function. InterProScan allows to scan InterPro for the submitted protein sequences (FASTA) and retrieve associated annotations.
|
|
|
|
|
|
Please check https://interproscan-docs.readthedocs.io/en/latest/HowToDownload.html for the links to download the latest version of InterProScan.
|
|
|
```
|
|
|
mkdir my_interproscan
|
|
|
cd my_interproscan
|
... | ... | |