InMeRF about

InMeRF is a tool to predict the pathogenicity of nonsynonymous SNVs (nsSNVs) using 150 discriminant models independently generated for all possible amino acid (AA) substitutions.

Publication

NAR Genom Bioinform. 2020 May 26;2(2):lqaa038.

PMID: 33543123

Materials and Methods

A total of 72,556 pathogenic nsSNVs were extracted from the Human Gene Mutation Database (HGMD) Pro 2015.2 [CLASS = DM (disease-causing mutation)] included in dbNSFP v4.0a.
A total of 166,161 common nsSNV candidates were extracted from dbNSFP v4.0a based on dbSNP build 151 with at least one minor allelic frequency (MAF) of 1000Gp3_AF, UK10K_AF, ExAC_AF, gnomAD_exomes_AF and gnomAD_genomes_AF is > 0.001. We then filtered 162,918 common nsSNVs by removing nsSNVs included in HGMD and in dbNSFP v4.0a with “clinvar_clnsig = Pathogenic or Likely_pathogenic”.
Each nsSNV was classified into one of 150 different nonsynonymous AA substitutions. The pathogenic nsSNVs were sorted in ascending order of MAF, and the common nsSNVs were sorted in descending order of MAF. The same numbers of pathogenic and common nsSNVs were extracted for each AA substitution for random forest (RF) modeling.
Among 37 tools in dbNSFP v4.0a, nsSNV coverages of 3 tools were very low in either pathogenic or common nsSNVs. Therefore, rank scores of the remaining 34 tools in dbNSFP v4.0a were used as feature values (Table 1). To make RF models, nsSNVs that lacked one or more of 34 rank scores in dbNSFP v4.0a were excluded. Then, pathogenic and common nsSNVs were discriminated by using a machine learning library, scikit-learn, on Python version 3.7. Finally, a total of 150 RF models were generated (Figure 1).

Tool	Type	Rate in all nsSNVs (77,195,651)	Rate in pathogenic nsSNVs (72,556)	Rate in common nsSNVs (162,918)	Feature values used for RF models
SIFT	prediction	92.65	97.31	89.62	O
SIFT4G		95.63	98.12	93.46	O
Polyphen2_HDIV		87.13	92.14	80.88	O
Polyphen2_HVAR		87.13	92.14	80.88	O
LRT		82.39	93.93	72.45	O
MutationTaster		96.89	99.94	95.72	O
MutationAssessor		82.45	89.32	76.07	O
FATHMM		88.83	98.27	87.35	O
PROVEAN		93.15	98.29	90.39	O
VEST4		97.31	99.35	95.72	O
MetaSVM		95.82	99.40	94.08	O
MetaLR		95.82	99.40	94.08	O
M-CAP		95.90	97.39	37.24	X
REVEL		95.82	99.40	94.08	O
MutPred		90.22	81.09	6.21	X
MVP		97.80	99.12	73.85	O
MPC		83.00	91.76	75.79	O
PrimateAI		89.88	96.72	85.13	O
DEOGEN2		91.13	94.52	86.73	O
CADD		99.97	100.00	100.00	O
DANN		99.41	100.00	100.00	O
fathmm-MKL		99.41	100.00	100.00	O
fathmm-XF		92.62	86.76	92.20	O
Eigen		92.49	87.62	92.02	O
Eigen-PC		92.49	87.62	92.02	O
GenoCanyon		99.41	100.00	100.00	O
integrated_fitCons		95.45	87.68	97.44	O
LINSIGHT		2.04	0.07	3.52	X
GERP++	conservation	98.95	99.98	98.51	O
phyloP100way_vertebrate		99.99	100.00	99.97	O
phyloP30way_mammalian		99.96	100.00	99.94	O
phyloP17way_primate		99.92	100.00	99.90	O
phastCons100way_vertebrate		99.99	100.00	99.97	O
phastCons30way_mammalian		99.96	100.00	99.94	O
phastCons17way_primate		99.92	100.00	99.90	O
SiPhy		97.98	99.88	97.09	O
bStatistic		98.24	98.93	98.02	O