Please cite reference [1] for IntSplice.
Note: Upload a VCF (variant call format) file to predict if an SNV (single nucleotide variation) from intronic positions -50 to -3 is pathogenic or not. The genomic coordinate should be according to GRCh37/hg19. Only human SNVs can be analyzed. You may get a genomic coordinate of an SNV using a BLAT server.
Click to download an example of a VCF file. The VCF file should include 7 columns: "Chr" for chromosome;"Pos" for genomic position (GRCh37/hg19); "ID" is required in a VCF file but is not used in IntSplice;"Ref" for a reference nucleotide;"Alt" for a variant nucleotide;"Qual" is required in a VCF file but is not used in IntSplice;"Filter" is required in a VCF file but is not used in IntSplice. IntSplice requires "Chr", "Pos", "Ref", and "Alt", but the other columns cannot be suppressed.

select file:

About; Single nucleotide variations (SNVs) affecting intronic splicing cis-elements potentially compromise splicing. IntSplice is a support vector machine (SVM)-based model to differentiate pathogenic and normal SNVs at intronic positions -50 to -3 close to the 3' end of an intron. According to the effect size analysis of each intronic nucleotide on normal alternative splicing, we extracted 111 parameters that possibly dictated the strength of splicing signals. The parameters included individual nucleotides close to the 3' end of an intron, predicted branch point sequence, predicted polypyrimidine tract, and RNA-binding motifs of RNA-binding proteins that were registered in the SpliceAid database [2]. We calculated percent-splice-in (PSI) at individual 3f splice sites with MISO [3] using 14 RNA-seq data of normal human tissues with accession numbers, GSE13652 [4] and GSE12946 [5]. We first generated support vector regression models with the 111 parameters. Although the correlation efficiencies between the calculated and predicted PSIfs were less than 0.3, we used the predicted PSI's in the following modelings. We then generated support vector machine models to directly differentiate pathogenic SNVs in the human gene mutation database (HGMD) and normal SNVs in dbSNP134 using the parameters used for the SVM regression modeling above as well as the predicted PSI's. The models were generated with four-fifths (training dataset) of the whole dataset and were validated with the remaining one-fifth (validation dataset) of the whole dataset. The generated models could discriminate HGMD and dbSNP in validation datasets with sensitivities of 0.772 } 0.028 (mean and SD) and specificities of 0.898 } 0.023. We compared efficiencies of our models with PSSM [6] and MaxEntScan::score3ss [7]. For PSSM, we made a new position-weighed matrix ranging from intronic positions -50 to -3 by analyzing ENSEMBL release 67 annotated on GRCh37/hg19. We found that sensitivity, as well as the sum of sensitivity and specificity of our models, were better than those of PSSM and MaxEnt. To generate a SVM model for the web service program, IntSplice, we did not split the whole dataset into the training and validation datasets, but included the whole dataset of 1,167 intronic mutations in HGMD and 1,167 randomly selected normal SNPs in dbSNP134.

[1] Shibata A, Okuno T, Rahman MA, Azuma Y, Takeda J, Masuda A, Selcen D, Engel AG, Ohno K. IntSplice: Prediction of the splicing consequences of intronic single-nucleotide variations in the human genome. J. Hum. Genet. 2016;61:633-40.
[2] Piva F, Giulietti M, Nocchi L, Principato G. SpliceAid: a database of experimental RNA target motifs bound by splicing proteins in humans. Bioinformatics. 2009;25:1211-3.
[3] Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010;7:1009-15.
[4] Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470-6.
[5] Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413-5.
[6] Shapiro MB, Senapathy P. RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 1987;15:7155-74.
[7] Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004;11:377-94.