Bootstrap program|V3.3.3|OceanBase TuGraph| docs|Distributed Database

This document is a bootstrap program designed for TuGraph users. Before reading the detailed documents, users should read this document first to have a general understanding of TuGraph's graph computing operation process, and it will be more convenient to read the detailed documents later. The bootstrap program is a simple instance of a bfs (breadth-first search) program based on Tugraph, and we will focus on how it is used.

1. Introduction to TuGraph-Graph Analysis Engine

TuGraph's graph analysis engine, oriented to the scene is mainly full graph/full data analysis class tasks. With the help of TuGraph's C++ graph analysis engine API, users can quickly derive a complex subgraph from different data sources, and then run iterative graph algorithms such as PageRank, LPA, WCC on the subgraph, and finally make corresponding countermeasures according to the running results.

In TuGraph, the process of both export and computation can be accelerated by parallel processing in memory, so as to achieve near-real-time processing and analysis. Compared with traditional methods, the overhead of data export and disk drop can be avoided, and the ideal performance of computation can be obtained by using compact graph data structure.

TuGraph has 6 built-in algorithms in the community version and 25 built-in algorithms in the commercial version. Users hardly need to implement the specific graph calculation process themselves. For details, please refer to algorithms.md.

According to different data sources and implementations, it can be divided into Procedure, Embed and Standalone, which all inherit from OlapBase API. OlapBase API interface documentation can be found in OlAPBase-APi.md.

The data source of Procedure and Embed is the db data preloaded in the graph database, which can be compiled to generate the.so file used by TuGraph-Web loading and the embed file used by the background terminal respectively. The input graph data are loaded in the form of db. The interface document can refer to olapondb-api.md.

Standalone is used to compile and generate a standalone file. Different from the former, the input graph data of the file is loaded in the form of txt, binary, and ODPS files, and the interface document can refer to olapondisk-api.md.

2. Procedure compile and run

This mode is mainly used for visual loading and running on the TuGraph-web interface. The usage method is as follows:

Run bash make_so.sh bfs in the TuGraph/plugins directory to the BFs.so file in the TuGraph/plugins directory, upload the file as a plug-in to Tugraph-web, input parameters and then execute.

Example: Compile the.so algorithm file at TuGraph/plugins bash make_so.sh bfs

After loading the bfs.so file as a plug-in to TuGraph-web, enter the following json parameters:

{
  "root_id": "0",
  "label": "node",
  "field": "id"
}

The following result

result:"{"core_cost":0.013641119003295898, "found_vertices":3829, "num_edges":88234, "num_vertices":4039, "output_cost":8.821487426757813e-06, "prepare_cost":0.03479194641113281, "total_cost":0.04844188690185547}"

The output ：

num_edges: the number of edges in the graph
num_vertices: the number of nodes in the graph
prepare_cost: Represents the time required for the preprocessing phase. The preprocessing phase works: loading parameters, graph data loading, index initialization.
core_cost: Represents the time required for the algorithm to run.
found_vertices: Number of vertices found.
output_cost: the time it takes for the algorithm result to be written back to db.。
total_cost: The overall running time of the algorithm is executed.

make_so.sh This file is used to compile the graph algorithm files involved in TuGraph-OLAP into a.so file available for TuGraph-web.

3. Embed compile and run

This way is mainly used for TuGraph in the background program on the preloaded db graph data algorithm analysis. Its use method is as follows:

To complete the embed_main.cpp file in TuGraph/plugins directory, add the data name, input parameters, data path and other information, as shown in the following example:

#include <iostream>
#include "lgraph/lgraph.h"
#include "lgraph/olap_base.h"
using namespace std;

extern "C" bool Process(lgraph_api::GraphDB &db, const std::string &request, std::string &response);

int main(int argc, char **argv) {
    // db_path Specifies the path for saving the preloaded graph data
    std::string db_path = "../fb_db/";
    if (argc > 1)
        db_path = argv[1];
    lgraph_api::Galaxy g(db_path);
    g.SetCurrentUser("admin", "73@TuGraph");
    // Specifies the name of the graph
    lgraph_api::GraphDB db = g.OpenGraph("fb_db");
    std::string resp;
    // Enter the algorithm parameters in json format
    bool r = Process(db, "{\"root_id\":\"0\", \"label\":\"node\",\"field\":\"id\"}", resp);
    cout << r << endl;
    cout << resp << endl;
    return 0;
}

Run bash make_so.sh bfs in TuGraph/plugins to bfs_procedure in TuGraph/plugins/cpp. bash make_embed.sh bfs

Do this in the TuGraph/plugins folder ./cpp/bfs_procedure The result is returned.

Input: {"root_id":"0", "label":"node","field":"id"} found_vertices = 3829 {"core_cost":0.025603055953979492, "found_vertices":3829, "num_edges":88234, "num_vertices":4039, "output_cost":9.059906005859375e-06, "prepare_cost":0.056738853454589844, "total_cost":0.0823509693145752}

Input indicates the input parameters.The other parameters are described as above.

4. Standalone compile and run

This file is mainly used to load the graph data directly at the terminal and run the printout results. Here's how to use it:

In TuGraph/build directory make bfs_standalone can get bfs_standalone file, the file is generated with TuGraph/build/output/algo folder.

Run: Go to the TuGraph/build directory and do./output/algo/bfs_standalone -- type [type] -- input_dir [input_dir] -- vertices [vertices] --root (root) - output_dir [output_dir]

Ready to run.

[type] : indicates the type source of the input graph file, including text file, BINARY_FILE binary file, and ODPS source.
[input_dir] : indicates the path of the input graph file folder, which may contain one or more input files. TuGraph will read all the files under [input_dir] when reading the input file. It is required that [input_dir] can only contain the input file and cannot contain other files. Parameters cannot be omitted.
[vertices] : represents the number of vertices in the graph. 0 indicates the number of vertices that the user wants the system to automatically recognize. When is non-zero, it indicates the number of vertices that the user wants to customize. The number of user-defined vertices must be greater than the maximum vertex ID. This parameter can be omitted. The default value is 0.
[root] : indicates the starting vertex id for bfs. The argument cannot be omitted.
[output_dir] : indicates the path of the folder where the output data is saved. The output content is saved to this file, and the parameter cannot be omitted.

Example Compile the standalone algorithm program at TuGraph/build

make bfs_standalone

Run the text source file in TuGraph/build/output

    ./output/algo/bfs_standalone --type text --input_dir ../test/integration/data/algo/fb_unweighted --root 0

Result：

prepare_cost = 0.10(s)
core_cost = 0.02(s)
found_vertices = 3829
output_cost = 0.00(s)
total_cost = 0.11(s)
DONE.

Result Parameter description is the same as above.

If you do not know the required parameters of a new algorithm, run the./output/algo/bfs_standalone --type odps/text/binary (select one) -h to view the parameters.

At this point, the process of bfs operation on the figure above by TuGraph is complete.