IntSplice is a tool to predict a splicing consequence of an SNV at intron positions -50 to -3 close the 3' end of an intron of the human genome. Please cite PMID 27009626 for IntSplice.

Single nucleotide variations (SNVs) affecting intronic splicing cis-elements potentially compromise splicing. IntSplice is a support vector machine (SVM)-based model to differentiate pathogenic and normal SNVs at intronic positions -50 to -3 close to the 3' end of an intron [1]. According to the effect size analysis of each intronic nucleotide on normal alternative splicing, we extracted 96 parameters that possibly dictated the strength of splicing signals. The parameters included individual nucleotides close to the 3' end of an intron, predicted branch point sequence, predicted polypyrimidine tract, and RNA-binding motifs of RNA-binding proteins that were registered in the SpliceAid database [2]. We generated support vector machine models to directly differentiate pathogenic SNVs in the human gene mutation database (HGMD) and normal SNVs in dbSNP134 using the 96 parameters. The models were subjected to 5-fold cross validation. The generated models could discriminate HGMD and dbSNP in the 5-fold validation datasets with sensitivities of 0.772 ± 0.028 (mean and SD) and specificities of 0.898 ± 0.023. We compared efficiencies of our models with PSSM [6] and MaxEntScan::score3ss [7]. For PSSM, we made a new position-weighed matrix ranging from intronic positions -50 to -3 by analyzing Ensembl release 67 annotated on GRCh37/hg19. We found that sensitivity, as well as the sum of sensitivity and specificity of our models, were higher than those of PSSM and MaxEnt.
IntSplice ver. 1.1 was generated using the whole dataset of 1,167 intronic mutations in HGMD and 1,167 randomly selected normal SNPs in dbSNP134. In IntSplice ver. 1.0 [1], in addition to the 96 parameters, we included percent-splice-in (PSI) at each 3’ splice site. PSIs were calculated with MISO [3] using 14 RNA-seq data of normal human tissues with accession numbers, GSE13652 [4] and GSE12946 [5]. We, however, excluded the 14 PSIs in ver. 1.1 to expedite calculation of IntSplice. We confirmed that the results are essentially the same between ver. 1.0 and ver. 1.1.

[1] Shibata A, Okuno T, Rahman MA, Azuma Y, Takeda J, Masuda A, Selcen D, Engel AG, Ohno K. IntSplice: Prediction of the splicing consequences of intronic single-nucleotide variations in the human genome. J. Hum. Genet. 2016;61:633-40.
[2] Piva F, Giulietti M, Nocchi L, Principato G. SpliceAid: a database of experimental RNA target motifs bound by splicing proteins in humans. Bioinformatics. 2009;25:1211-3.
[3] Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010;7:1009-15.
[4] Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470-6.
[5] Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413-5.
[6] Shapiro MB, Senapathy P. RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 1987;15:7155-74.
[7] Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004;11:377-94.