Skip to main content
Fig. 9 | BioData Mining

Fig. 9

From: Grasping frequent subgraph mining for bioinformatics applications

Fig. 9

Overview of statistical significance testing for subgraphs. The significance of a list of frequent subgraphs, e.g. G, found in the graph database D can be tested by comparing to a background database B. This database is often established by randomly permuting either the labels or the edges of D to represent a relevant background set. This permutation procedure is performed a large number of times (1000s) and thus generates a large set of random graphs. The support of G is then enumerated within B, which establishes the distribution of G at random, which can be presented with a density graph as seen here. The statistical significance can then be reported with a P-value, which corresponds to the chance of seeing a support value that is as high or higher than the observed support in D for the random graphs collected in B. If this P-value falls beneath a predetermined significance cut-off (often 0.05 with a correction for multiple testing), the subgraph is reported as significant

Back to article page