SpliceGrapher Quick Start Guide¶
Downloading and Installation¶
- You may download SpliceGrapher from Sourceforge
- Requires PyML version 0.7.9 or higher for classifying splice sites.
- Requires matplotlib version 1.1.0 or higher for the extensive graphics tools.
- The optional IsoLasso pipeline requires IsoLasso version 2.6.1 or higher, along with the gtfToGenePred and genePredToBed programs from UCSC
- The optional PSGInfer pipeline requires PSGInfer version 1.1.3
Currently Unix/Linux and Mac OS-X are supported. A setup.py
script is provided so installation is standard for python:
python setup.py build
python setup.py install
To check that SpliceGrapher is installed correctly, run the python interpreter and type the following:
>>> import SpliceGrapher
>>> SpliceGrapher.__version__
'0.2.3'
To test your installation more thoroughly, use the test script in the SpliceGrapher examples
sub-directory:
cd examples
run_tests.sh
Filtering Alignments¶
SpliceGrapher uses highly accurate splice-site models to filter output from popular spliced alignment tools such as TopHat
and MapSplice
by removing potentially spurious novel junctions (see our publication in Genome Biology for details).
We provide a simple script to do this:
sam_filter.py SAM-file classifiers -f FASTA-reference -m gene-models -o output-SAM
- SAM-file is the input spliced alignment file that you wish to filter.
- classifiers are input
zip
archives that contain organism-specific classifier data. We provide pre-built classifiers for over 100 organisms in theclassifiers
sub-directory. - FASTA-reference is the input reference genome that was used for the alignments.
- gene-models are the gene models (GTF or GFF3 format) that correspond to the reference genome.
For example, reads aligned to the GRCh37 version of the H. sapiens genome with TopHat
could be filtered as follows:
sam_filter.py accepted_hits.sam Homo_sapiens.zip -f Homo_sapiens.GRCh37.68.dna.toplevel.fa -m Homo_sapiens.GRCh37.68.gtf -o filtered.sam
Predicting Splice Graphs¶
SpliceGrapher’s primary objective is to predict splice graphs, which is accomplished using the program predict_graphs.py. It can combine evidence provided by gene models, EST alignments and NGS alignments to predict splice forms for a gene. The most common approach is to use NGS alignments:
predict_graphs.py SAM-file -d prediction-directory
For example, assuming reads have been filtered as outlined in the H. sapiens example above, one could make predictions for all human genes as follows:
predict_graphs.py filtered.sam -m Homo_sapiens.GRCh37.68.gtf -d predicted
Splice graphs are stored in the predicted directory, organized by chromosome. For example, the output graph for the gene SLC35B2 from chromosome 6 would appear in the file predicted/6/SLC35B2.gff. The splice graph output GFF format is used by all of SpliceGrapher’s plotting and viewing tools.
Viewing Splice Graphs¶
To view one or more splice graphs, use the view_splicegraphs.py
script. For
example, to view the graph stored in SLC35B2.gff
, simply enter:
view_splicegraphs.py SLC35B2.gff
This produces a figure of the splice graph with minimal annotations, but the script also provides options for adding details to the figures. Also see the plotter.py tool that provides facilities for plotting publication-quality figures.
Additional Information¶
See the User’s Guide for more details on these and other scripts. See our publication in Genome Biology for details on our approach and our results.