Improvements on ORCA for Fast Computation of Graphlet Degree Vectors of any Graphlet Order

It is increasingly common to find real-life structures or behaviors represented as graphs in many areas of the computing sciences. Comparing these graphs is a hard task, especially when we are interested in assigning a non-binary similarity score between two large graphs based on some domain-specifi...

Full description

Bibliographic Details
Main Author: Nunez, Wilberto Z
Format: Text
Language:unknown
Published: RIT Scholar Works 2018
Subjects:
Online Access:https://scholarworks.rit.edu/theses/9892
https://scholarworks.rit.edu/context/theses/article/11050/viewcontent/WNunezPichardoThesis8_22_2018.pdf
Description
Summary:It is increasingly common to find real-life structures or behaviors represented as graphs in many areas of the computing sciences. Comparing these graphs is a hard task, especially when we are interested in assigning a non-binary similarity score between two large graphs based on some domain-specific context. In bioinformatics, social network analysis and other areas is frequently necessary to compute graph similarities based on the local topological information of each vertex of the given graphs. This is why graphlet degree vectors have become more and more popular in these areas. They provide a simple yet detailed representation of a vertex's topology by counting the number of times such vertex touches a list of small predefined sub-structures called graphlets. In this thesis, we study the state-of-the-art algorithm to compute graphlet degree vectors, the Orbit Counting Algorithm (ORCA). ORCA generates a triangular system of linear equations that can be quickly solved to obtain the graphlet degree vector of a vertex. We make theoretical and practical improvements to this algorithm and measure the difference in speed after these improvements. The theoretical improvement consists of finding automorphisms of graphlets given a fixed vertex that is required to map to itself in such automorphisms. We observe that one piece of the algorithm runs much faster than before with this improvement, especially for larger graphlet orders. This helps the algorithm take less time in generating the linear system that we use to find the desired graphlet degree vector. The practical improvement consists of making a flexible implementation of the algorithm, which can take any graphlet size as input, any number of input graphs, and compute the graphlet degree vector for every vertex in each one of those graphs.