Minerva Gen*NY*Sis Center for Excellence in Cancer Genomics
University at Albany, State University of New York UAlbany Home UAlbany Site Index UAlbany Search
TUTR Home
Details
Browse
Download
Contact
 


TUTR - The UAlbany Training UTR Collection


The untranslated regions (UTRs) of many mRNAs contain sequence and structural motifs that are used to regulate the stability, localization and translatability of the mRNA. Unfortunately, the consensus sequences for these motifs frequently have the potential for significant variability at any given position and are only loosely characterized. Therefore, simple alignment tools are frequently inadequate for the discovery of previously unidentified RNA regulatory motifs.

Many new generation software tools utilize adaptive techniques requiring training. One way to evaluate and train software is to utilize validated sets of sequence data known to contain a defined motif.

The UTResource provides a database of UTRs that contain consensus sequences of experimentally discovered regulatory motifs.

From this database we generated a collection of Training UTR datasets. The sequence files are meant to be used as blind test sets to simulate experimental results one might expect from a process such as immunoprecipitation. Each sequence in a given file contains a previously characterized RNA motif conforming to a defined consensus.

To date twelve basic training sets have been generated with associated indexes and answer sets provided that identify where the previously characterized RNA motif (e.g. the IRE, ARE, SECIS, etc) resides in each sequence. The current incarnation of the collection represents what could be thought of as ideal experiment data. That is, each sequence is positive for at least one occurence of the set's motif. We plan to eventually introduce noise into the sets in the form of negative sequences and also potentially increase the size of the sequences containing the motifs.