TuGraph Built-in Algorithm Description|V3.3.3|OceanBase TuGraph| docs|Distributed Database

This document mainly introduces the TuGraph built-in algorithm program in detail, the community version of 6 algorithms can refer to the basic algorithm newspaper

Introduction of # #

TuGraph currently contains the following 6 basic algorithms and 28 extended algorithms, a total of 34 graph algorithms:

Basic algorithms:

Algorithm name	The program name
Breadth-First Search	bfs
Pagerank	pagerank
Single-Source Shortest Path	sssp
Weakly Connected Components	wcc
Local Clustering Coefficient	lcc
Label Propagation Algorithm	lpa

Extended algorithms:

Algorithm name	The program name
All-Pair Shortest Path	apsp
Betweenness Centrality	bc
Belief Propagation	bp
Closeness Centrality	cc
Common Neighborhood	cn
Degree Correlation	dc
Dimension Estimation	de
EgoNet	en
Hyperlink-Induced Topic Search	hits
Jaccard Index	ji
K-core	kcore
Louvain	louvain
Multiple-source Shortest Paths	mssp
Personalized PageRank	ppr
Strongly Connected Components	scc
Speaker-listener Label Propagation Algorithm	slpa
Single-Pair Shortest Path	spsp
Triangle Counting	triangle
Trustrank	trustrank
Weighted Label Propagation Algorithm	wlpa
Weighted Pagerank Algorithm	wpagerank
Maximal independent set	mis
Sybil Rank	sybilrank
Subgraph Isomorphism	subgraph_isomorphism
Motif	motif
Kcliques	kcliques
Ktruss	ktruss
Leiden	leiden

Basic algorithms

Breadth-First Search

Breadth first Search implements the Breadth-first Search algorithm, which starts from the root vertex and traverses all accessible vertices along the width of the graph. Returns the number of vertices traversed. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Breadth-first_search。

Pagerank

Web page sorting program to achieve the commonly used Pagerank algorithm. The algorithm calculates the importance ranking of all vertices according to the edge and edge weight in the graph. The higher the PageRank value, the higher the importance of the vertex in the graph. The reciprocal of the number of vertices is used as the initial Rank value of each vertex, and then the Rank value of the vertices is transferred to the adjacent vertices on average according to the outgoing edges, and the transfer process is repeated until the given convergence threshold is met or the given number of iterations is reached. At the end of each pass round, a certain proportion of the Rank values of all vertices will be randomly transferred to any vertex. Please refer to the algorithm content https://en.wikipedia.org/wiki/PageRank。

Single-Source Shortest Path

The Single Source Shortest Path algorithm realizes the Single Source Shortest Path algorithm. It calculates the shortest path length from a given source vertex to any other vertex. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Shortest_path_problem。

Weakly Connected Components

The program of Weakly Connected Components has implemented the algorithm. It can calculate all the weakly connected components in the graph. A weakly connected component is a subgraph of a graph in which reachable paths exist between any two points. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Connectedcomponent(graph_theory)。

Local Clustering Coefficient

The average Clustering Coefficient program implements the Local Clustering Coefficient algorithm to calculate the coefficient of the degree of clustering between vertices in the graph. The returned results include the overall clustering coefficient and the vertex clustering coefficient. The overall agglomeration coefficient reflects the evaluation of the overall agglomeration degree in the graph, and the vertex agglomeration coefficient includes the agglomeration coefficient of any vertex, which reflects the agglomeration degree near the vertex. The higher the agglomeration coefficient, the higher the agglomeration degree. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Clustering_coefficient。

Label Propagation Algorithm

The program implements Label Propagation Algorithm. This algorithm is a community discovery algorithm based on tag propagation and computes unauthorised graphs. In label passing, each vertex adds up all the labels received, and randomly selects one of the labels with the highest sum. The iteration converges or the algorithm terminates after a given number of rounds. The final output is a label for each vertex, and vertices with the same label value are considered to be in the same community. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Label_Propagation_Algorithm。

Extended algorithms

All-Pair Shortest Path

The All-Pair Shortest Path program realizes the all-pair Shortest Path algorithm and calculates the shortest path between any two points in the graph. Returns the shortest path length between any pair of vertices where the path exists. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Floyd-Warshall_algorithm

Betweenness Centrality

Betweenness Centrality algorithm is implemented in Betweenness centrality program to estimate the betweenness centrality value of all vertices in a graph. The value of intermediate centrality reflects the possibility that any shortest path in the graph passes through the vertex, and the higher the value, the more shortest paths pass through the vertex. During calculation, the number of sampling points should be given, and the calculation should be carried out based on these sampling points. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Betweenness_centrality。

Belief Propagation

The Belief Propagation algorithm is implemented in the confidence propagation program. Given the edge distribution of the observed vertices, the algorithm estimates the edge distribution of the unobserved vertices by using the mechanism of passing messages among vertices. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Belief_propagation。

Closeness Centrality

The Distance Centrality program implements an algorithm to estimate the average length of an arbitrary vertex to Closeness the shortest path to another vertex in the graph. The smaller the distance from the center, the smallest the average shortest distance from the vertex to other vertices, which means that the vertex is more centrally located in the graph from the geometric point of view. During calculation, the number of sampling points should be given, and the calculation should be carried out based on these sampling points. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Closeness_centrality。

Common Neighborhood

The Common Neighborhood program implements the common Neighborhood algorithm to count the number of common neighbors between any given pair of neighboring vertices. Given several pairs of vertices to be queried, the result is the number of common neighbors of any pair of vertices to be queried.

Degree Correlation

The Degree Correlation algorithm is implemented in the degree correlation program. It calculates the degree correlation degree of a graph by calculating the Pearson correlation coefficient between any adjacent vertex pairs, which can be used to characterize the correlation degree between high-degree vertices in the graph. A higher degree of correlation indicates a higher degree of correlation between vertices of higher degree in the graph. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Dimension Estimation

The diameter Estimation program implements the Dimension Estimation algorithm. The algorithm calculates the length of the longest shortest path in the graph, which is used to characterize the diameter of the graph. Please refer to the algorithm contenthttp://mathworld.wolfram.com/GraphDiameter.html。

EgoNet

The EgoNet algorithm requires a given root vertex and K value, and takes the root vertex as the source vertex to conduct a width-first search to find the subgraph composed of all neighbors within K degrees. The found subgraph is called the EgoNet of the root vertex.

Hyperlink-Induced Topic Search

The Hyperlink Topic Search algorithm implements the Hyperlink-induced topic search algorithm, which assumes that each vertex has two attributes: Authority and Hub. A good hub vertex should point to many vertices with high authority. And a good authority vertex should be pointed to by many vertices of high pivot type. The algorithm returns the authority value and the hub value for each vertex. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/HITS_algorithm。

Jaccard Index

The Jaccard coefficient program implements the Jaccard Index algorithm. The algorithm calculates the Jaccard coefficient between a given pair of vertices, which can be used to represent the similarity of the two vertices. A higher Jaccard coefficient indicates a higher degree of similarity between pairs of vertices. Given several pairs of vertices with the query, the Jaccard coefficients of these pairs are returned. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Jaccard_index。

K-core

k accounting method implements k-core algorithm. The algorithm computes the number of kernels of all vertices, or finds all K-nucleon graphs in the graph. K-nucleon graph is a special subgraph in which the degree of any vertex is not less than a given K value. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Degeneracy_(graph_theory)。

Louvain

The Louvain community discovery program implements the Fast-unfolding algorithm. The algorithm is a community discovery algorithm based on modularity. It maximizes the modularity of the graph by constantly merging vertex communities, and can discover the hierarchical community structure. Please refer to the algorithm content https://en.wikipedia.org/wiki/Louvain_Modularity。

Multiple-source Shortest Paths

The multisource Shortest Paths program implements the multiple-source Shortest Paths algorithm to calculate the shortest path value to any vertex from the given Multiple source vertices. Where, the shortest path value of multiple source vertices to a vertex is the minimum value of the shortest path from each source vertex to the vertex. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Shortest_path_problem。

Personalized PageRank

Personalized web ranking program has Personalized PageRank algorithm. According to the given source vertex, the algorithm calculates the importance ranking of all vertices to the source vertex based on the source vertex. The higher the Rank value, the more important the vertex is to the source vertex. Unlike PageRank, the source vertex Rank value is 1, the rest of the vertex Rank value is 0; And a certain proportion of Rank value will be immediately transferred back to the source vertex after each round of transmission. Please refer to the algorithm content https://cs.stanford.edu/people/plofgren/Fast-PPR_KDD_Talk.pdf。

Strongly Connected Components

Strongly Connected component program realizes the Strongly Connected Components algorithm. The algorithm computes all strongly connected components of the graph, which is a subgraph of the graph that can start from any vertex to any other vertex. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Strongly_connected_component。

Speaker-listener Label Propagation Algorithm

The program realizes the Speaker-listener Label Propagation Algorithm. This algorithm is a community discovery algorithm based on tag propagation and historical tag record, which is an extension of tag propagation algorithm. Different from the label propagation algorithm, this algorithm records the historical labels of all vertices. When the labels are accumulated in the iteration, the historical labels are also counted. The final output is all the historical label records for each vertex. Please refer to the paper for the algorithm content:“SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process”。

Single-Pair Shortest Path

The program of the shortest path between two points implements the Bidirectional point-first Search algorithm. It searches forward width First along the outgoing edge from the starting point and reverse width first along the incoming edge from the end point on the directed undirected graph. The shortest path length from the starting point to the ending point is determined by traversing the vertices of the starting point and the ending point together. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/Bidirectional_search。

Triangle Counting

The Triangle-counting algorithm is implemented to calculate the number of triangles in an undirected graph, which can be used to characterize the correlation degree of the vertices in the graph. The higher the number of triangles, the higher the degree of correlation of the vertices in the graph. For the algorithm content, please refer to the paper, "Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study".

Trustrank

信任索引排序算法实现了 Trustrank 算法，可以根据给定的白名单计算任意顶点的信任索引。信任指数越高，顶点非法的可能性越小。请参考算法内容 https://en.wikipedia.org/wiki/TrustRank。

Weighted Label Propagation Algorithm

Weighted Label Propagation Algorithm is implemented in the program. = Different from the label propagation algorithm, the label transmission is related to the weight of the edge. During label transmission, the weight of each vertex will be accumulated according to the incoming edge of the label, and the label with the highest sum will be randomly selected. Please refer to the algorithm content https://en.wikipedia.org/wiki/Label_Propagation_Algorithm。

Weighted Pagerank Algorithm

Weighted Pagerank is implemented in the weighted Pagerank algorithm. Different from PageRank algorithm, the transfer of Rank value is related to the weight of the edge. The Rank value of the vertex will be transferred to the adjacent vertices according to the weight of the edge. Please refer to the algorithm contenthttps://en.wikipedia.org/wiki/PageRank。

Maximal independent set

Maximal independent set algorithm implements Maximal Independent Set algorithm. A maximum independent set means that there are no vertices outside the independent set that can join it. A graph may have many MIS that vary greatly in size, and the algorithm finds one of them. Please refer to the algorithm content https://en.wikipedia.org/wiki/Maximal_independent_set#Sequential_algorithm。

Sybil Rank

Sybil detection algorithm implements Sybil Rank algorithm. The SybilRank algorithm starts from non-Sybil nodes and performs a random walk with premature termination. Please refer to the paper for the algorithm content:“Aiding the Detection of Fake Accounts in Large Scale Social Online Services”。

Subgraph Isomorphism

The subgraph matching algorithm implements the subgraph_isomorphism algorithm. The subgraph_isomorphism algorithm matches all nodes in the whole graph and outputs the number of times each node is matched. Algorithm Content Reference https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=18105

Motif

Pattern matching algorithm implements motif algorithm. motif algorithm matches k-order subgraphs for the specified nodes, and finally outputs the number of each kind of K-order subgraphs for each specified node. Each K-order subgraph is represented by a 64-bit integer, and the $i \times k + j$bit is 1, indicating that there is an edge i->j in the subgraph. Algorithm Content Reference https://en.wikipedia.org/wiki/Network_motif#mfinder

Kcliques

The K-order clique counting algorithm implements the k-cliques algorithm. The k-cliques algorithm calculates the number of all K-order complete subgraphs in the graph, and finally outputs the number of k-order complete subgraphs of each node. Algorithm Content Reference https://en.wikipedia.org/wiki/Clique_problem#Cliques_of_fixed_size

Ktruss

The k-order truss counting algorithm implements the k-truss algorithm. A k-truss is a subgraph in which each edge is the edge of at least k-2 triangles. The k-truss algorithm finds out the k-truss subgraph of the graph, and finally outputs the neighbor node list of each node in the subgraph. Algorithm Content Referencehttps://louridas.github.io/rwa/assignments/finding-trusses/

Leiden

Leiden algorithm implements Leiden's algorithm. leiden algorithm is a community discovery algorithm based on modularity. Compared with louvain algorithm, leiden algorithm has the advantage that Leiden algorithm detects the broken links in the community to ensure that each community has good connectivity. Algorithm Content Reference https://www.nature.com/articles/s41598-019-41695-z#Sec4