Home > Cores > Data Science Core > BTEC > BTEC Staff

BTEC Software

AbIDconvert: Absolute Gene ID Conversion Tools

With the availability of gene and protein centric databases (NCBI, Ensembl, UCSC, and others), as well as the wide variety of available platforms for measuring gene expression (Affymetrix, Agilent, custom arrays, and RNA-Seq), biological researchers need reliable methods for converting various identifiers from one type to another. AbsIDConvert is based on the unique idea that genomic identifiers can be converted to genomic intervals, and therefore conversion between identifiers requires simply finding overlapping intervals. Learn more
Available as web interface and virtual machine


Asteroid is a novel algorithm to simultaneously reconstruct transcripts and estimate their abundance. Learn more
Available as source code


categoryCompare is a methodology for cross-platform and cross-sample comparison of high-throughput data at the annotation level (such as GO ontologies; KEGG pathways; and gene sets (GSEA)). This approach allows for the comparison of datasets from heterogeneous platforms. categoryCompare provides a powerful visualization utilizing Cytoscape that allows for users to quickly view the shared features between annotations. It is available as an R bioconductor package. A web version of categoryCompare is currently under construction which employs cytoscape.js. Learn more
Available as R bioconductor package


DiffSplice is a novel tool for discovering and quantitating alternative splicing variants present in an RNA-seq dataset, without relying on annotated transcriptome or pre-determined splice pattern. For two groups of samples, DiffSplice further utilizes a non-parametric permutation test to identify significant differences in expression at both gene level and transcription level. DiffSplice takes as input the SAM files that supply the alignment of the RNA-seq reads on the reference genome, obtained from an RNA-seq aligner like MapSplice. The results of DiffSplice are summarized as a decomposition of the genome and can be visualized using the UCSC genome browser. Learn more
Available as source code

Flow Difference Metric

FDM , or Flow Difference Metric, identifies regions of differential RNA-transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. This novel non-parametric statistical test is applied between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. Learn more
Available as source code

Genome Scaffolder

Genomeris command line glue for genome projects. It simplifies the small but tedious tasks required when finishing a genome. Genomer makes it easy to reorganise contigs in a genome, map annotations on to the genome and generate the files required to submit a genome. Furthermore genomer aims make genome projects more reproducible and robust. Genomer is designed to work well with build tools such as GNU Make and revision control tools such as git. This makes genome projects easy to share and reproduce. Learn more
Download installer


MapPER is a probabilistic framework to predict the alignment to the genome of all RNA-seq paired-end read (PER) transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, MapPER constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment. Learn more
Available as source code

Multiple Primer Design

MPrime is an interface which allows the efficient high-throughput detection of multiple primers or oligonucleotides for genic regions in either the human, mouse, rat, zebrafish, or fruit fly genomes. In order to choose the regions of interest for primer or oligo design, you must choose the organism you are interested in, as well as the genic regions of interest. Genic regions can be identified by the gene name, GenBank or RefSeq accession, or by a keyword. Additionally, MPrime1.3 will now allow you to enter in fasta formatted sequences. Before primers are designed, you will be sent to a page that will allow you to select the genic regions you wish to use. Learn more
Available as web interface


MultiSplice implements a general linear framework for accurate transcript quantification using a set of new structural features: MultiSplices. Our software has several desirable features: 1. It utilizes all the information implied in the read alignment and alleviate the identifiability issues. 2. By solving the linear system using LASSO, it can achieve the most accurate set of dominantly expressed transcripts. 3. It is very efficient. For example, the analysis of the human transcriptome can be finished in less than one hour. Learn more
Available as source code


P-NONMEM combines the global search strategy by particle swarm optimization (PSO) and the local estimation strategy of NONMEM. In the proposed algorithm, initial values (particles) are generated randomly by PSO, and NONMEM is implemented for each particle to find a local optimum for fixed effects and variance parameters. P-NONMEM guarantees the global optimization for fixed effects and variance parameters. Under certain regularity conditions, it also leads to global optimization for random effects. Because P-NONMEM doesn.t run PSO search for random effect estimation, it avoids tremendous computational burden. In the simulation studies, we have shown that P-NONMEM has much improved convergence performance than NONMEM. Even when the initial values were far away from the global optima, P-NONMEM converged nicely for all fixed effects, random effects, and variance components. Learn more
Available as source code upon request


PYNAC is an algorithm and software system capable of handling high volumes of stable isotope-resolved metabolomics data, while including quality control methods for maintaining data quality. We validate this new algorithm against a previous single isotope correction algorithm in a two-step cross-validation. Next, we demonstrate the algorithm and correct for the effects of natural abundance for both 13C and 15N isotopes on a set of raw isotopologue intensities of UDP-N-acetyl-D-glucosamine derived from a 13C/15N-tracing experiment. Finally, we demonstrate the algorithm on a full omics-level dataset.
Available as source code

RBF-TSS: Transcription Start Site Detector

RBF-TSS is a novel identification method for identifying transcription start sites that improves upon published TSS detection models. RBF-TSS incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function network for identifying transcription start sites is created using non-overlapping chunks (windows) of size 50 and 500 on the human genome. Learn more
Available as source code

rMotifGen: Random Motif Sequence Generator for Genomic Sequences

rMotifGen is a solution with the sole purpose of generating a number of random DNA or amino acid sequences containing short sequence motifs. Each motif consensus can be either user-defined, or randomly generated. Insertions and mutations within these motifs are created according to user-defined parameters. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. Learn more
Available as a web interface and source code
Page updated June 7, 2024