SpliceGrapher Quick Start Guide

Downloading and Installation

  • You may download SpliceGrapher from Sourceforge
  • Requires PyML version 0.7.9 or higher for classifying splice sites.
  • Requires matplotlib version 1.1.0 or higher for the extensive graphics tools.
  • The optional IsoLasso pipeline requires IsoLasso version 2.6.1 or higher, along with the gtfToGenePred and genePredToBed programs from UCSC
  • The optional PSGInfer pipeline requires PSGInfer version 1.1.3

Currently Unix/Linux and Mac OS-X are supported. A setup.py script is provided so installation is standard for python:

python setup.py build
python setup.py install

To check that SpliceGrapher is installed correctly, run the python interpreter and type the following:

>>> import SpliceGrapher
>>> SpliceGrapher.__version__
'0.2.3'

To test your installation more thoroughly, use the test script in the SpliceGrapher examples sub-directory:

cd examples
run_tests.sh

Filtering Alignments

SpliceGrapher uses highly accurate splice-site models to filter output from popular spliced alignment tools such as TopHat and MapSplice by removing potentially spurious novel junctions (see our publication in Genome Biology for details). We provide a simple script to do this:

sam_filter.py SAM-file classifiers -f FASTA-reference -m gene-models -o output-SAM
  • SAM-file is the input spliced alignment file that you wish to filter.
  • classifiers are input zip archives that contain organism-specific classifier data. We provide pre-built classifiers for over 100 organisms in the classifiers sub-directory.
  • FASTA-reference is the input reference genome that was used for the alignments.
  • gene-models are the gene models (GTF or GFF3 format) that correspond to the reference genome.

For example, reads aligned to the GRCh37 version of the H. sapiens genome with TopHat could be filtered as follows:

sam_filter.py accepted_hits.sam Homo_sapiens.zip -f Homo_sapiens.GRCh37.68.dna.toplevel.fa -m Homo_sapiens.GRCh37.68.gtf -o filtered.sam

Predicting Splice Graphs

SpliceGrapher’s primary objective is to predict splice graphs, which is accomplished using the program predict_graphs.py. It can combine evidence provided by gene models, EST alignments and NGS alignments to predict splice forms for a gene. The most common approach is to use NGS alignments:

predict_graphs.py SAM-file -d prediction-directory

For example, assuming reads have been filtered as outlined in the H. sapiens example above, one could make predictions for all human genes as follows:

predict_graphs.py filtered.sam -m Homo_sapiens.GRCh37.68.gtf -d predicted

Splice graphs are stored in the predicted directory, organized by chromosome. For example, the output graph for the gene SLC35B2 from chromosome 6 would appear in the file predicted/6/SLC35B2.gff. The splice graph output GFF format is used by all of SpliceGrapher’s plotting and viewing tools.

Viewing Splice Graphs

To view one or more splice graphs, use the view_splicegraphs.py script. For example, to view the graph stored in SLC35B2.gff, simply enter:

view_splicegraphs.py SLC35B2.gff

This produces a figure of the splice graph with minimal annotations, but the script also provides options for adding details to the figures. Also see the plotter.py tool that provides facilities for plotting publication-quality figures.

Additional Information

See the User’s Guide for more details on these and other scripts. See our publication in Genome Biology for details on our approach and our results.