Maximum Clique Algorithm

Summary

  • MaxCliqueDyn is a fast exact algorithm for finding a maximum clique in an undirected graph described in Ref. [1] developed by Janez Konc. A clique is a fully connected subgraph of a graph and a maximum clique is the clique with the largest number of vertices in a given graph.
  • Maximum clique algorithms differ from maximal clique algorithms (e.g., Bron-Kerbosch algorithm). The maximal search is for all maximal cliques in a graph (cliques that cannot be enlarged), while the maximum clique algorithms find a maximum clique (a clique with the largest number of vertices). This makes maximum clique algorithms about an order of magnitude faster.

Download

  • New C++ source code of the MaxCliqueDyn algorithm (mcqd.zip) for finding a maximum clique in an undirected graph is released under the GNU General Public License. The old source code is still available. The new implementation is up to 20% faster, has an object interface, which allows easy use and adaptability, and handles graphs of any number of vertices and edges. The MaxCliqueDyn algorithm is described in Ref. [1].
  • GitLab project with the latest version of the source code is available here.
  • A parallel maximum clique algorithm MaxCliquePara is here (runs multi-core/single node).

Improvement

  • The MaxCliqueDyn algorithm described in [1] and implemented in mcqd.h improves on earlier algorithms (see [1]) with two new features:
    1. an improved approximate coloring algorithm (ColorSort), which maintains the vertices in the candidate set R in the favorable decreasing order according to their degrees. This is based on the realization that assignation of vertices into color classes (which disrupts the order) is only needed above a certain threshold, which we calculate as Kmin = |Qmax| - |Q| + 1, where |Qmax| is the size of the current maximum clique and |Q| is the size of the clique found on the current branch of the search tree. Vertices with their colors below Kmin can remain in their original order. This idea consistently reduces the number of steps needed to find a maximum clique as well as the time required to find a maximum clique.
    2. by applying tighter, more computationally expensive upper bounds on a fraction of the search space, it is possible to reduce the time to find the maximum clique. Thus, we re-sort vertices in R by their degrees (DegreeSort) on the lower branches of the search tree enabled by a counter (S) and parameter (Tlimit) that turns on/off the sorting during the algorithm's execution dynamically. This heuristics increases overall performance of the algorithm on a large number of DIMACS and random graphs, and can be tuned to specific graphs by changing the Tlimit parameter.

Installation Instructions


  • Unzip: unzip mcqd.zip
  • Compile with: g++ -O3 mcqd.cpp -o mcqd
  • Run: mcqd test.clq

To Use MaxCliqueDyn in Your Program


  • Do the following:
    1. include the header file: #include "mcqd.h"
    2. provide adjacency matrix of the graph: bool **conn;
    3. provide the number of vertices of the graph (number of rows or columns of the adjacency matrix): int size;
    4. create the object: Maxclique m(conn, size);
    5. provide an empty array (and size) to which the maximum clique will be written: int *qmax; int qsize;
    6. run the maximum clique algorithm variant 1: m.mcq(qmax, qsize);
    7. or the maximum clique algorithm variant 2 (usually faster): m.mcqdyn(qmax, qsize);
    8. that's all. The maximum clique is now in the qmax array, and its size is in qsize.
  • For detailed example see mcqd.cpp file in the downloaded package.

Notes on Protein Comparisons

  • The MaxCliqueDyn algorithm for finding a maximum clique in an undirected graph is used for comparing protein surface structures (see ProBiS Web Server).
  • It was developed with purpose of quickly comparing protein structures. Protein surfaces are first represented as protein graphs and then an associative graph (or product graph) G1 X G2 is constructed from two compared protein graphs. A maximum clique in such an associative graph corresponds to the maximum common subgraph of the graphs G1 and G2, which translates to the biggest similarity in topology or properties of the compared proteins. Maximum clique algorithms are preffered to maximal clique algorithms when comparing protein structures, because knowing all (smaller) cliques is not necessary when comparing rigid 2D or 3D objects. Only the largest one is important. The MaxCliqueDyn algorithm is also considerably faster than the algorithms of Tomita et al. (LNCS, 2003) and Oestergard (Discrete Applied Mathematics, 2002) on DIMACS and random graphs.

Reference