|
|
---
|
|
|
title: Functional annotations
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Functional annotation helps bringing biological meaning to genetic sequences. Functional annotation is usually obtained through protein sequence similarity. Indeed, across two organisms, if two sequences are very similar, one can infer that they can encode for the same biological function.
|
|
|
There are several main parameters that will impact the process of functional annotation:
|
|
|
- how distant is the species which was actually annotated with experimental data (the reference)
|
... | ... | @@ -54,10 +46,20 @@ The EggNOG-mapper output file includes several categories of annotation, includi |
|
|
The online annotation tool Mercator4 ([https://www.plabipd.de/mercator_main.html](https://www.plabipd.de/mercator_main.html)) annotate sequences with the MapMan ontology. This ontology is characterized by synthetic descriptors of gene functions. A gene is usually associated to one to two MapMan bins (= terms).
|
|
|
MapMan requires DNA or protein input (we use the latter which can provide more annotations).
|
|
|
|
|
|

|
|
|

|
|
|
|
|
|
The annotation should be done after about 10min.
|
|
|
The `MapMan mapping file` can then be downloaded.
|
|
|
|
|
|
Here are the first line of the annotation file for _Medicago truncatula_:
|
|
|
|
|
|
```
|
|
|
BINCODE NAME IDENTIFIER DESCRIPTION TYPE
|
|
|
'1.1.1.1.1' 'Photosynthesis.photophosphorylation.photosystem II.LHC-II complex.component *(LHCb1/2/3)' 'mtruna17chr2g0318521.1' 'mercator4v5.0: component *(LHCb1/2/3) of LHC-II complex & original description: none' T
|
|
|
'1.1.1.1.2' 'Photosynthesis.photophosphorylation.photosystem II.LHC-II complex.component *(LHCb4)' 'mtruna17chr3g0113311.1' 'mercator4v5.0: component *(LHCb4) of LHC-II complex & original description: none' T
|
|
|
'1.1.1.1.4' 'Photosynthesis.photophosphorylation.photosystem II.LHC-II complex.component *(LHCb6)' 'mtruna17chr2g0279001.1' 'mercator4v5.0: component *(LHCb6) of LHC-II complex & original description: none' T
|
|
|
```
|
|
|
Bins are listed with the genes which are annotated with them. The structure of the file can be used to recreate the MapMan ontology.
|
|
|
|
|
|
## TRAPID
|
|
|
|
... | ... | @@ -76,6 +78,23 @@ The **online** annotation tool TRAPID (https://bioinformatics.psb.ugent.be/trapi |
|
|
11. Click on `PROCESS TRANSCRIPTS`
|
|
|

|
|
|
Perform as depicted on the picture above.
|
|
|
14. Wait for the Status to switch from `processing` to `finished`
|
|
|
(it can take a few hours)
|
|
|
15. |
|
|
\ No newline at end of file |
|
|
14. Wait for the Status to switch from `processing` to `finished` (it can take a few hours)
|
|
|
15. Go to Export data
|
|
|
16. Download the `Gene Family data` (`TRANSCRIPT WITH GF`)
|
|
|
|
|
|
|
|
|
## InterPro
|
|
|
|
|
|
InterPro is a database which includes a range of sources for the annotation of protein function. InterProScan allows to scan InterPro for the submitted protein sequences (FASTA).
|
|
|
```
|
|
|
mkdir my_interproscan
|
|
|
cd my_interproscan
|
|
|
wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.65-97.0/interproscan-5.65-97.0-bit.tar.gz
|
|
|
wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.65-97.0/interproscan-5.65-97.0-bit.tar.gz.md5
|
|
|
|
|
|
# Recommended checksum to confirm the download was successful:
|
|
|
md5sum -c interproscan-5.65-97.0-bit.tar.gz.md5
|
|
|
# Must return *interproscan-5.65-97.0-bit.tar.gz: OK*
|
|
|
|
|
|
./interproscan-5.65-97.0/interproscan.sh -cpu 15 -iprlookup -goterms -f TSV -i mtrun.fa -o mtru.tsv
|
|
|
``` |
|
|
\ No newline at end of file |