This document will introduce how to use the OlapBase API
1. The overview
This manual will introduce the configuration of the TuGraph graph computing system. Combined with the code, several important files and interfaces in TuGraph are introduced.
2. Configuration requirements
To develop and compile applications using the TuGraph graph computing system, the configuration required:
linux operating system, currently running successfully on Ubuntu16.04.2 and Centos7 systems.
Compiler that supports C++ 14, requires GCC version 5.4.1 or later.
3. Atomic operations
TuGraph uses multi-threading technology for batch operations, in which case memory access conflicts may occur. In order to ensure the correctness of modification operations in parallel computing, TuGraph implements atomic operations. The code section can be found in the lgraph_atomic.cpp file under the lgraph folder.
TuGraph also customizes 4 commonly used atomic operations. When we need to modify the data of vertices in multithreaded mode, we should use atomic operations to ensure the correctness of the modification operation in parallel environment. In addition to these four atomic operations, users can also use "cas" to build their own atomic operation functions.
bool cas(T \* ptr, T oldv, T newv): If the value pointed to by ptr is equal to oldv, the value pointed to by ptr is assigned to newv and returns true. Otherwise, false is returnedbool write_min(T \*a, T b): If b is smaller than the value pointed to by a, then assign the value pointed to by a to v and return true, otherwise return false.bool write_max(T \*a, T b): If b is larger than the value pointed to by a, then assign the value pointed to by a to v and return true, otherwise return false.void write_add(T \*a, T b): adds the value of b to the value pointed to by a.void write_sub(T \*a, T b): subtract the value of b from the value pointed to by a.
4. Vertex Class ParallelBitset
When using TuGraph for batch operations, you need to use a vertex set to represent the vertices you want to process. ParallelBitset implements a vertex collection class that represents vertices in bits and thus saves a significant amount of memory. The corresponding code can be found in the olap_base.h file in the lgraph folder.
4.1 ParallelBitset Class
size_t Size():Indicates the number of vertices in the Bitmap.ParallelBitset(size_t size):Initialize size and data, the length of data is (size >> 6)+1.void Clear():clears the collection.void Fill():add all vertices to the set.bool Has(size_t i):check if vertex i is in the set.bool Add(size_t i):add vertex i to the set.void Swap(ParallelBitset &other):exchange elements with another set of ParallelBitset.
5. Vertex Array Class ParallelVector
When using TuGraph for batch operations, you need to use an array of vertices to represent the result of processing vertices. ParallelVector implements the vertex array class. The corresponding code can be found in the olap_base.h file in the lgraph folder.
5.1 ParallelVector Class
ParallelVector(size_t capacity)Builds ParallelVector. capacity is the initial size of the vertex arrayT& operator[](size_t i): data with subscript iT \*begin(): ParallelVector the starting pointerT \*end(): ParallelVector the end pointer to Parallelvector. begin and end are similar to the begin and end Pointers of a vector. You can use these Pointers to access an array sequentiallyT&Back (): ParallelVector the last dataT \*Data(): represents the data in the array itself- `void Destroy()' : empty the ParallelVector array and delete the array
size_t Size(): indicates the number of data in ParallelVectorvoid Resize(size_t size): Change ParallelVector to size, which must be greater than or equal to the size before the changevoid Clear(): empty the data in ParallelVectorvoid ReAlloc(size_t capacity): ParallelVector is allocated with new capacity. If data exists in the array, it is migrated to the new memoryvoid Fill(T elem): Assign elem to all data on ParallelVectorvoid Append(const T&elem, bool atomic = true): Add a piece of data at the end of ParallelVectorvoid Swap(ParallelVector<t> &other): to exchange data with other parallelvectorsParallelVector<t> Copy(): copies the current ParallelVector data into the Copy array
6. Customize data structures
6.1 Basic data types
We customize the data structure representation of points and edges to save memory space while covering all vertices:
- 'Empty' : indicates a special data type with empty content.
6.2 Combining data structures
In order to facilitate calculation, we define several data structures of point and edge data according to different calculation scenarios, which are as follows:
EdgeUnit<edgedata>: represents an edge of weight type EdgeData, used to parse the input file, and contains three member variables:size_t src: the starting vertex of an edgesize_t dst: the end of an edgeEdgeData edge_data: edge weightAdjUnit<edgedata>: represents an edge of weight type EdgeData, used during batch computation, and contains two member variables:size_t neighbour: indicates the neighbor vertex of an edgeEdgeData edge_data: edge weightAdjList<edgedata>: Adjacency list of vertices with weights of type EdgeData, often used to represent the set of incoming and outgoing edges of vertices and containing two member variables:AdjUnit<t> \* begin: indicates the start pointer of the listAdjUnit<t> \* end: The end pointer of the list. begin and end are similar to the begin and end Pointers of a vector. You can use these Pointers to loop through the adjacency list.
7. Graph class OlapBase
Graph class OlapBase is the main class for TuGraph to load graphs and calculate graphs. OlapBase is commonly used to represent graphs with weights of type EdgeData. See olap_base.hpp under lgraph folder for the code. This chapter covers the types and API interfaces commonly used in Graph classes. The classes used by the Procedure, Embed, and Standalone functions described above are all subclasses of this class.
7.1 Basic Information
size_t NumVertices(): obtains the number of verticessize_t NumEdges(): Gets the number of edgessize_t OutDegree(size_t vid): indicates the outdegree of the vid of the vertexsize_t InDegree(size_t vid): indicates the input degree of vertex vid
7.2 node sets and edge sets and their related operations
ParallelVector AllocVertexArray(): Allocates an array of type VertexData with size as the number of verticesvoid fill_vertex_array(V \* array, V value): assigns values to all elements in the arrayParallelBitset AllocVertexSubset(): Assigns a subset of ParallelBitsets to denote whether the state of all vertices is activatedAdjList OutEdges(size_t vid)': gets the set of all outgoing edges of vertex vAdjList InEdges(size_t vid): Obtains the set of all incoming edges of vertex vvoid Transpose(): transpose of a directed graphLoadFromArray(char \* edge_array, VertexId input_vertices, EdgeId input_edges, EdgeDirectionPolicy (edge_direction_policy): Loads the graph data from the array, contains four parameters, the meaning of which are respectively:
edge_array : reads the data from the array into the graph. Normally, the array contains multiple edges.
input_vertices: specifies the number of vertices read into the graph by the array.
input_edges : specifies the number of edges that the array reads into the image.
edge_direction_policy : indicates that the graph is directed or undirected. The graph can be divided into three modes: DUAL_DIRECTION, MAKE_SYMMETRIC, and INPUT_SYMMETRIC. For details, see 'enum EdgeDirectionPolicy' in the config.h file in the core folder.
7.3 Locking mechanism
TuGraph implements a pair of locks to control the program's access to vertex data. Respectively is:
void AcquireVertexLock(size_t vid): locks a vertex vid and prohibits other threads from accessing the vertex data corresponding to this lockvoid ReleaseVertexLock(size_t vid): unlocks the vertex vid and all threads can access the vertex data corresponding to the lockVertexLockGuard GuardVertexLock(size_t vid): When the vid operation is performed, the vertex vid is locked, and the lock is automatically released upon exiting the scope
7.4 Batch Processing Operations
TuGraph provides two batch operations to do a point-centered batch process in parallel. Respectively is:
/*
Function Name:ReducedSum ProcessVertexInRange(std::function<ReducedSum(size_t)> work, size_t lower, size_t upper,
ReducedSum zero = 0,std::function<ReducedSum(ReducedSum, ReducedSum)> reduce =reduce_plus<ReducedSum>)
The work function executes the work function on nodes whose numbers are between lower and upper in the Graph. The fourth parameter indicates the accumulated base, which defaults to 0.The fifth parameter indicates that the iteration reduce function operation is performed on the return value of each node processed by work, and the default operation is the accumulation operation.
For details, please refer to include/lgraph/olap_base.h
Example: Count the number of vertices in the parent array that have outgoing edges
*/
auto vertex_num = graph.ProcessVertexInRange<size_t>(
[&](size_t i) {
if (graph.OutDegree(parent[i]) > 0) {
return 1;
}
},
0, parent.Size()
);
printf("the number is %lu\n",vertex_num);
graph is the instantiated object of graph class OlapBase
/*
Function Name:ReducedSum ProcessVertexActive(std::function<ReducedSum(size_t)> work, ParallelBitset &active_vertices,
ReducedSum zero = 0,std::function<ReducedSum(ReducedSum, ReducedSum)> reduce =reduce_plus<ReducedSum>)
Function: Execute the work function for the node that corresponds to 1 in active_vertices. The third parameter represents the cumulative cardinality, which is 0 by default.
The fourth parameter indicates that the iteration reduce function operation is performed on the return value of each node processed by work, and the default operation is the accumulation operation.
For specific implementation, please refer to the specific code in /include/lgraph/olap_base
Example: Output all the out-degree neighbors of nodes 1, 2, and 3 in the Graph, and count the total out-degree of these three nodes
*/
auto active_in = graph.AllocVertexSubset();
active_in.Add(1);
active_in.Add(2);
active_in.Add(3);
auto total_outdegree = graph.ProcessVertexActive<size_t>(
[&](size_t vi) {
size_t local_outdegree = 0;
for (auto & edge : graph.OutEdges(vi)) {
size_t dst = edge.neighbour;
printf("node %lu has neighbour %lu\n",vi,dst);
local_outdegree += 1;
}
return local_outdegree;
},
active_in
);
printf("total outdegree of node1,2,3 is %lu\n",total_outdegree);