OlapBase API|V3.3.3|OceanBase TuGraph| docs|Distributed Database

This document will introduce how to use the OlapBase API

1. The overview

This manual will introduce the configuration of the TuGraph graph computing system. Combined with the code, several important files and interfaces in TuGraph are introduced.

2. Configuration requirements

To develop and compile applications using the TuGraph graph computing system, the configuration required:

linux operating system, currently running successfully on Ubuntu16.04.2 and Centos7 systems.
Compiler that supports C++ 14, requires GCC version 5.4.1 or later.

3. Atomic operations

TuGraph uses multi-threading technology for batch operations, in which case memory access conflicts may occur. In order to ensure the correctness of modification operations in parallel computing, TuGraph implements atomic operations. The code section can be found in the lgraph_atomic.cpp file under the lgraph folder.

TuGraph also customizes 4 commonly used atomic operations. When we need to modify the data of vertices in multithreaded mode, we should use atomic operations to ensure the correctness of the modification operation in parallel environment. In addition to these four atomic operations, users can also use "cas" to build their own atomic operation functions.

bool cas(T \* ptr, T oldv, T newv) : If the value pointed to by ptr is equal to oldv, the value pointed to by ptr is assigned to newv and returns true. Otherwise, false is returned
bool write_min(T \*a, T b) : If b is smaller than the value pointed to by a, then assign the value pointed to by a to v and return true, otherwise return false.
bool write_max(T \*a, T b) : If b is larger than the value pointed to by a, then assign the value pointed to by a to v and return true, otherwise return false.
void write_add(T \*a, T b) : adds the value of b to the value pointed to by a.
void write_sub(T \*a, T b) : subtract the value of b from the value pointed to by a.

4. Vertex Class ParallelBitset

When using TuGraph for batch operations, you need to use a vertex set to represent the vertices you want to process. ParallelBitset implements a vertex collection class that represents vertices in bits and thus saves a significant amount of memory. The corresponding code can be found in the olap_base.h file in the lgraph folder.

4.1 ParallelBitset Class

size_t Size()：Indicates the number of vertices in the Bitmap.
ParallelBitset(size_t size)：Initialize size and data, the length of data is (size >> 6)+1.
void Clear()：clears the collection.
void Fill()：add all vertices to the set.
bool Has(size_t i)：check if vertex i is in the set.
bool Add(size_t i)：add vertex i to the set.
void Swap(ParallelBitset &other)：exchange elements with another set of ParallelBitset.

5. Vertex Array Class ParallelVector

When using TuGraph for batch operations, you need to use an array of vertices to represent the result of processing vertices. ParallelVector implements the vertex array class. The corresponding code can be found in the olap_base.h file in the lgraph folder.

5.1 ParallelVector Class

ParallelVector(size_t capacity) Builds ParallelVector. capacity is the initial size of the vertex array
T& operator[](size_t i) : data with subscript i
T \*begin() : ParallelVector the starting pointer
T \*end() : ParallelVector the end pointer to Parallelvector. begin and end are similar to the begin and end Pointers of a vector. You can use these Pointers to access an array sequentially
T&Back () : ParallelVector the last data
T \*Data() : represents the data in the array itself
`void Destroy()' : empty the ParallelVector array and delete the array
size_t Size() : indicates the number of data in ParallelVector
void Resize(size_t size) : Change ParallelVector to size, which must be greater than or equal to the size before the change
void Clear() : empty the data in ParallelVector
void ReAlloc(size_t capacity) : ParallelVector is allocated with new capacity. If data exists in the array, it is migrated to the new memory
void Fill(T elem) : Assign elem to all data on ParallelVector
void Append(const T&elem, bool atomic = true) : Add a piece of data at the end of ParallelVector
void Swap(ParallelVector<t> &other) : to exchange data with other parallelvectors
ParallelVector<t> Copy() : copies the current ParallelVector data into the Copy array

6. Customize data structures

6.1 Basic data types

We customize the data structure representation of points and edges to save memory space while covering all vertices:

'Empty' : indicates a special data type with empty content.

6.2 Combining data structures

In order to facilitate calculation, we define several data structures of point and edge data according to different calculation scenarios, which are as follows:

EdgeUnit<edgedata> : represents an edge of weight type EdgeData, used to parse the input file, and contains three member variables:
size_t src : the starting vertex of an edge
size_t dst : the end of an edge
EdgeData edge_data : edge weight
AdjUnit<edgedata> : represents an edge of weight type EdgeData, used during batch computation, and contains two member variables:
size_t neighbour : indicates the neighbor vertex of an edge
EdgeData edge_data : edge weight
AdjList<edgedata> : Adjacency list of vertices with weights of type EdgeData, often used to represent the set of incoming and outgoing edges of vertices and containing two member variables:
AdjUnit<t> \* begin : indicates the start pointer of the list
AdjUnit<t> \* end : The end pointer of the list. begin and end are similar to the begin and end Pointers of a vector. You can use these Pointers to loop through the adjacency list.

7. Graph class OlapBase

Graph class OlapBase is the main class for TuGraph to load graphs and calculate graphs. OlapBase is commonly used to represent graphs with weights of type EdgeData. See olap_base.hpp under lgraph folder for the code. This chapter covers the types and API interfaces commonly used in Graph classes. The classes used by the Procedure, Embed, and Standalone functions described above are all subclasses of this class.

7.1 Basic Information

size_t NumVertices() : obtains the number of vertices
size_t NumEdges() : Gets the number of edges
size_t OutDegree(size_t vid) : indicates the outdegree of the vid of the vertex
size_t InDegree(size_t vid) : indicates the input degree of vertex vid

ParallelVector AllocVertexArray() : Allocates an array of type VertexData with size as the number of vertices
void fill_vertex_array(V \* array, V value) : assigns values to all elements in the array
ParallelBitset AllocVertexSubset() : Assigns a subset of ParallelBitsets to denote whether the state of all vertices is activated
AdjList OutEdges(size_t vid)': gets the set of all outgoing edges of vertex v
AdjList InEdges(size_t vid) : Obtains the set of all incoming edges of vertex v
void Transpose() : transpose of a directed graph
LoadFromArray(char \* edge_array, VertexId input_vertices, EdgeId input_edges, EdgeDirectionPolicy (edge_direction_policy): Loads the graph data from the array, contains four parameters, the meaning of which are respectively:

edge_array : reads the data from the array into the graph. Normally, the array contains multiple edges.

input_vertices: specifies the number of vertices read into the graph by the array.

input_edges : specifies the number of edges that the array reads into the image.

edge_direction_policy : indicates that the graph is directed or undirected. The graph can be divided into three modes: DUAL_DIRECTION, MAKE_SYMMETRIC, and INPUT_SYMMETRIC. For details, see 'enum EdgeDirectionPolicy' in the config.h file in the core folder.

7.3 Locking mechanism

TuGraph implements a pair of locks to control the program's access to vertex data. Respectively is:

void AcquireVertexLock(size_t vid) : locks a vertex vid and prohibits other threads from accessing the vertex data corresponding to this lock
void ReleaseVertexLock(size_t vid) : unlocks the vertex vid and all threads can access the vertex data corresponding to the lock
VertexLockGuard GuardVertexLock(size_t vid) : When the vid operation is performed, the vertex vid is locked, and the lock is automatically released upon exiting the scope

7.4 Batch Processing Operations

TuGraph provides two batch operations to do a point-centered batch process in parallel. Respectively is:

/*
    Function Name:ReducedSum ProcessVertexInRange(std::function<ReducedSum(size_t)> work, size_t lower, size_t upper,
                ReducedSum zero = 0,std::function<ReducedSum(ReducedSum, ReducedSum)> reduce =reduce_plus<ReducedSum>)

    The work function executes the work function on nodes whose numbers are between lower and upper in the Graph. The fourth parameter indicates the accumulated base, which defaults to 0.The fifth parameter indicates that the iteration reduce function operation is performed on the return value of each node processed by work, and the default operation is the accumulation operation.
For details, please refer to include/lgraph/olap_base.h

    Example: Count the number of vertices in the parent array that have outgoing edges
*/

auto vertex_num = graph.ProcessVertexInRange<size_t>(
    [&](size_t i) {
        if (graph.OutDegree(parent[i]) > 0) {
            return 1;
        }
    },
    0, parent.Size()
);
printf("the number is %lu\n",vertex_num);

graph is the instantiated object of graph class OlapBase

/*
    Function Name:ReducedSum ProcessVertexActive(std::function<ReducedSum(size_t)> work, ParallelBitset &active_vertices,
                ReducedSum zero = 0,std::function<ReducedSum(ReducedSum, ReducedSum)> reduce =reduce_plus<ReducedSum>)

    Function: Execute the work function for the node that corresponds to 1 in active_vertices. The third parameter represents the cumulative cardinality, which is 0 by default.

    The fourth parameter indicates that the iteration reduce function operation is performed on the return value of each node processed by work, and the default operation is the accumulation operation.

    For specific implementation, please refer to the specific code in /include/lgraph/olap_base



    Example: Output all the out-degree neighbors of nodes 1, 2, and 3 in the Graph, and count the total out-degree of these three nodes
*/

auto active_in = graph.AllocVertexSubset();
active_in.Add(1);
active_in.Add(2);
active_in.Add(3);
auto total_outdegree = graph.ProcessVertexActive<size_t>(
    [&](size_t vi) {
        size_t local_outdegree = 0;
        for (auto & edge : graph.OutEdges(vi)) {
            size_t dst = edge.neighbour;
            printf("node %lu has neighbour %lu\n",vi,dst);
            local_outdegree += 1;
        }
        return local_outdegree;
    },
    active_in
);
printf("total outdegree of node1,2,3 is %lu\n",total_outdegree);

OlapBase API