TuGraph DataX Instructions|V3.3.3|OceanBase TuGraph| docs|Distributed Database

This document mainly introduces the installation, compilation and usage examples of TuGraph DataX

Introduction

On the basis of Ali's open source DataX, TuGraph implements the support of writing plug-ins and jsonline data format, and other data sources can write data into TuGraph through DataX. DataX introduces reference [https://github.com/alibaba/DataX] (https://github.com/alibaba/DataX) Supported features include:

Import TuGraph from various heterogeneous data sources such as MySQL, SQL Server,Oracle, PostgreSQL, HDFS, Hive, HBase, OTS, ODPS, Kafka and so on.
Import TuGraph to the corresponding target source (to be developed)

Compile and Install

git clone git@code.alipay.com:fma/DataX.git
mvn -U clean package assembly:assembly -Dmaven.test.skip=true

The compiled DataX file is in the target directory

Text data imported into TuGraph with DataX

Using the data from the lgraph_import section of the TuGraph manual as an example, we have three csv data files, as follows: actors.csv


nm015950,Stephen Chow
nm0628806,Man-Tat Ng
nm0156444,Cecilia Cheung
nm2514879,Yuqi Zhang

movies.csv


tt0188766,King of Comedy,1999,7.3
tt0286112,Shaolin Soccer,2001,7.3
tt4701660,The Mermaid,2016,6.3

roles.csv


nm015950,Tianchou Yin,tt0188766
nm015950,Steel Leg,tt0286112
nm0628806,,tt0188766
nm0628806,coach,tt0286112
nm0156444,PiaoPiao Liu,tt0188766
nm2514879,Ruolan Li,tt4701660

Then create three DataX job profiles: job_actors.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["actors.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "actor",
                "type": "VERTEX",
                "properties": [
                  { "name": "aid", "type": "STRING" },
                  { "name": "name", "type": "STRING" }
                ],
                "primary": "aid"
              }
            ],
            "files": [
              {
                "label": "actor",
                "format": "JSON",
                "columns": ["aid", "name"]
              }
            ]
          }
        }
      }
    ]
  }
}

job_movies.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["movies.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              },
              {
                "index": 3,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}

job_roles.json

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "txtfilereader",
          "parameter": {
            "path": ["roles.csv"],
            "encoding": "UTF-8",
            "column": [
              {
                "index": 0,
                "type": "string"
              },
              {
                "index": 1,
                "type": "string"
              },
              {
                "index": 2,
                "type": "string"
              }
            ],
            "fieldDelimiter": ","
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "play_in",
                "type": "EDGE",
                "properties": [{ "name": "role", "type": "STRING" }]
              }
            ],
            "files": [
              {
                "label": "play_in",
                "format": "JSON",
                "SRC_ID": "actor",
                "DST_ID": "movie",
                "columns": ["SRC_ID", "role", "DST_ID"]
              }
            ]
          }
        }
      }
    ]
  }
}

/lgraph_server -c lgraph_standalone.json -d 'run' 'Start TuGraph and run the following commands in sequence:

python3 datax/bin/datax.py  job_actors.json

python3 datax/bin/datax.py  job_movies.json

python3 datax/bin/datax.py  job_roles.json

MySQL's data imported into TuGraph with DataX

We create the following table of movies under 'test' database

CREATE TABLE `movies` (
  `mid`  varchar(200) NOT NULL,
  `name` varchar(100) NOT NULL,
  `year` int(11) NOT NULL,
  `rate` float(5,2) unsigned NOT NULL,
  PRIMARY KEY (`mid`)
);

Insert some data into the table

insert into
test.movies (mid, name, year, rate)
values
('tt0188766', 'King of Comedy', 1999, 7.3),
('tt0286112', 'Shaolin Soccer', 2001, 7.3),
('tt4701660', 'The Mermaid',   2016,  6.3);

Create a DataX job configuration file

job_mysql_to_tugraph.json

Configuring Field

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "column": ["mid", "name", "year", "rate"],
            "splitPk": "mid",
            "connection": [
              {
                "table": ["movies"],
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}

Write simple sql

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "mysqlreader",
          "parameter": {
            "username": "root",
            "password": "root",
            "connection": [
              {
                "querySql": [
                  "select mid, name, year, rate from test.movies where year > 2000;"
                ],
                "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
              }
            ]
          }
        },
        "writer": {
          "name": "tugraphwriter",
          "parameter": {
            "host": "127.0.0.1",
            "port": 7071,
            "username": "admin",
            "password": "73@TuGraph",
            "graphName": "default",
            "schema": [
              {
                "label": "movie",
                "type": "VERTEX",
                "properties": [
                  { "name": "mid", "type": "STRING" },
                  { "name": "name", "type": "STRING" },
                  { "name": "year", "type": "STRING" },
                  { "name": "rate", "type": "FLOAT", "optional": true }
                ],
                "primary": "mid"
              }
            ],
            "files": [
              {
                "label": "movie",
                "format": "JSON",
                "columns": ["mid", "name", "year", "rate"]
              }
            ]
          }
        }
      }
    ]
  }
}

./lgraph_server -c lgraph_standalone.json -d 'run' Start TuGraph and run the following command：

python3 datax/bin/datax.py  job_mysql_to_tugraph.json