Skip to content

erreur avec le script d'accés au données de genbank

Il utilise la banque genbank et tourne longtemps avant de générer l'erreur suivante :

voir les scripts

(snakemake-5.13.0-env) mba@front:/work_projet/omnicrobe_data/tm_workflow/text-mining-workflow$ cat log/snakejob.extract_genbank_data.2.sh.e322424
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 64
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       extract_genbank_data
        1

[Wed Apr 13 03:12:47 2022]
rule extract_genbank_data:
    input: ancillaries/extended-microorganisms-taxonomy/taxa+id_microorganisms.txt, /db/genbank/current/flat
    output: corpora/genbank/GenBank_extraction_20210127.tsv
    jobid: 0

python3 softwares/scripts/extractGB.py --taxoref ancillaries/extended-microorganisms-taxonomy/taxa+id_microorganisms.txt --dbpath /db/genbank/current/flat --fout corpora/genbank/GenBank_extraction_20210127.tsv
Activating conda environment: /work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/.snakemake/conda/e6e60c83
Traceback (most recent call last):
  File "/work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/softwares/scripts/extractGB.py", line 126, in <module>
    accession, length, species, strain, taxID, journal, source, host, country = get_values(record)
  File "/work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/softwares/scripts/extractGB.py", line 48, in get_values
    taxID = feature.qualifiers['db_xref'][0].replace('taxon:', '')
KeyError: 'db_xref'
[Wed Apr 13 07:34:02 2022]
Error in rule extract_genbank_data:
    jobid: 0
    output: corpora/genbank/GenBank_extraction_20210127.tsv
    conda-env: /work_projet/omnicrobe_data/tm_workflow/text-mining-workflow/.snakemake/conda/e6e60c83
    shell:
        python3 softwares/scripts/extractGB.py --taxoref ancillaries/extended-microorganisms-taxonomy/taxa+id_microorganisms.txt --dbpath /db/genbank/current/flat --fout corpora/genbank/GenBank_extraction_20210127.tsv
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job extract_genbank_data since they might be corrupted:
corpora/genbank/GenBank_extraction_20210127.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Edited by Mouhamadou Ba