Counting small patterns in networks

Networks are an often employed tool that can help us visualize and analyze binary relationships by representing the entities as a set of nodes and the relations between them as edges in the network. One type of relations in the field of bioinformatics that is often modeled by networks are interactio...

Full description

Bibliographic Details
Main Author: Hočevar , Tomaž
Format: Thesis
Language:unknown
Published: 2017
Subjects:
Online Access:http://eprints.fri.uni-lj.si/4034/
http://eprints.fri.uni-lj.si/4034/1/Ho%C4%8Devar_%2D_doktorska_disertacija.pdf
Description
Summary:Networks are an often employed tool that can help us visualize and analyze binary relationships by representing the entities as a set of nodes and the relations between them as edges in the network. One type of relations in the field of bioinformatics that is often modeled by networks are interactions between pairs of proteins. Recent studies have focused on analyzing the local structure of such networks by observing small connected patterns consisting of 4 or 5 nodes, which are also known as graphlets. The nodes of graphlets are further divided into orbits by their "roles" or symmetries. The number of times a node from the network participates in each orbit forms a signature of the node's local network topology. Working under the assumption that the node's local topology is correlated with its function in the network, researchers have successfully used graphlets to predict new protein functions. The bottleneck of graphlet-based approaches is usually in the time required to count them. This restriction is becoming even more pronounced with a growing amount of available data. This dissertation focuses on improving existing graphlet counting techniques that are based on simple exhaustive enumeration. We present the algorithm Orca that counts graphlets and their orbits instead of enumerating them. It exploits relations between orbit counts to construct a system of equations that can be set up efficiently. Orca achieves this by enumerating (k-1)-node graphlets to count k-node graphlets, effectively obtaining a speed-up by a factor proportional to the maximum degree of a node in the network. In practical terms, it counts graphlets in larger protein-protein interaction networks about 50-100 times faster. Orca was designed for counting graphlets with 4 and 5 nodes. However, we adapt the approach to counting edge-orbits in addition to the original node-orbits with the same gains in run time. We also show that this approach can be generalized to graphlets of arbitrary size by identifying the necessary conditions and ...