ToxIR: an accurate RNA-seq pipeline for high-precision toxin transcriptome profiling, validated in odontobuthus doriae venom glands

 


ToxIR: an accurate RNA-seq pipeline for high-precision toxin transcriptome profiling, validated in odontobuthus doriae venom glands

Abstract

Transcriptome analysis of complex tissues remains challenging due to assembly errors, isoform diversity, and annotation bias, necessitating optimized computational pipelines. Scorpion venoms are a treasure trove of bioactive peptides with significant biomedical potential, but their complexity complicates transcriptome profiling. We present ToxIR (Toxin Identification and Recognition), an RNA-seq pipeline optimized for accurate toxin transcriptome analysis, validated in Odontobuthus doriae venom glands. ToxIR combines deep sequencing, rnaSPAdes based de novo assembly, and a tailored annotation strategy to detect even low-abundance toxins and resolve isoforms with high accuracy. It incorporates rigorous quality control (FastQC, Trimmomatic), curated UniProt toxin homology searches, and integrated structural analyses (SignalP, TMHMM, Pfam, InterProScan) to prioritize candidates based on signal peptides, cysteine content, and toxin-specific domains. Unlike general-purpose or previous toxin pipelines, ToxIR minimizes misassemblies and annotation bias through its modular design, automated structural queries, and SQLite-backed data integration. The pipeline identified 378 putative toxin candidates, including 192 high-confidence candidates (Group A) and 23 novel, divergent toxins (Group C). These included 180 sodium channels, 111 potassium channels, and 69 chloride channel toxins. By enabling flexible cross-species use and enhancing annotation precision, ToxIR provides a robust framework that accelerates the discovery of therapeutic toxins.

Ebadi, M., & Soorki, M. N. (2025). ToxIR: An accurate RNA-seq pipeline for high-precision toxin transcriptome profiling, validated in odontobuthus doriae venom glands. Scientific Reports. https://doi.org/10.1038/s41598-025-33632-0