This document mainly introduces the installation, compilation and usage examples of TuGraph DataX
Introduction
On the basis of Ali's open source DataX, TuGraph implements the support of writing plug-ins and jsonline data format, and other data sources can write data into TuGraph through DataX. DataX introduces reference [https://github.com/alibaba/DataX] (https://github.com/alibaba/DataX) Supported features include:
Import TuGraph from various heterogeneous data sources such as MySQL, SQL Server,Oracle, PostgreSQL, HDFS, Hive, HBase, OTS, ODPS, Kafka and so on.
Import TuGraph to the corresponding target source (to be developed)
Compile and Install
git clone git@code.alipay.com:fma/DataX.git
mvn -U clean package assembly:assembly -Dmaven.test.skip=true
The compiled DataX file is in the target directory
Text data imported into TuGraph with DataX
Using the data from the lgraph_import section of the TuGraph manual as an example, we have three csv data files, as follows: actors.csv
nm015950,Stephen Chow
nm0628806,Man-Tat Ng
nm0156444,Cecilia Cheung
nm2514879,Yuqi Zhang
movies.csv
tt0188766,King of Comedy,1999,7.3
tt0286112,Shaolin Soccer,2001,7.3
tt4701660,The Mermaid,2016,6.3
roles.csv
nm015950,Tianchou Yin,tt0188766
nm015950,Steel Leg,tt0286112
nm0628806,,tt0188766
nm0628806,coach,tt0286112
nm0156444,PiaoPiao Liu,tt0188766
nm2514879,Ruolan Li,tt4701660
Then create three DataX job profiles: job_actors.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"path": ["actors.csv"],
"encoding": "UTF-8",
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "string"
}
],
"fieldDelimiter": ","
}
},
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "actor",
"type": "VERTEX",
"properties": [
{ "name": "aid", "type": "STRING" },
{ "name": "name", "type": "STRING" }
],
"primary": "aid"
}
],
"files": [
{
"label": "actor",
"format": "JSON",
"columns": ["aid", "name"]
}
]
}
}
}
]
}
}
job_movies.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"path": ["movies.csv"],
"encoding": "UTF-8",
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "string"
},
{
"index": 2,
"type": "string"
},
{
"index": 3,
"type": "string"
}
],
"fieldDelimiter": ","
}
},
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "movie",
"type": "VERTEX",
"properties": [
{ "name": "mid", "type": "STRING" },
{ "name": "name", "type": "STRING" },
{ "name": "year", "type": "STRING" },
{ "name": "rate", "type": "FLOAT", "optional": true }
],
"primary": "mid"
}
],
"files": [
{
"label": "movie",
"format": "JSON",
"columns": ["mid", "name", "year", "rate"]
}
]
}
}
}
]
}
}
job_roles.json
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "txtfilereader",
"parameter": {
"path": ["roles.csv"],
"encoding": "UTF-8",
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "string"
},
{
"index": 2,
"type": "string"
}
],
"fieldDelimiter": ","
}
},
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "play_in",
"type": "EDGE",
"properties": [{ "name": "role", "type": "STRING" }]
}
],
"files": [
{
"label": "play_in",
"format": "JSON",
"SRC_ID": "actor",
"DST_ID": "movie",
"columns": ["SRC_ID", "role", "DST_ID"]
}
]
}
}
}
]
}
}
/lgraph_server -c lgraph_standalone.json -d 'run' 'Start TuGraph and run the following commands in sequence:
python3 datax/bin/datax.py job_actors.json
python3 datax/bin/datax.py job_movies.json
python3 datax/bin/datax.py job_roles.json
MySQL's data imported into TuGraph with DataX
We create the following table of movies under 'test' database
CREATE TABLE `movies` (
`mid` varchar(200) NOT NULL,
`name` varchar(100) NOT NULL,
`year` int(11) NOT NULL,
`rate` float(5,2) unsigned NOT NULL,
PRIMARY KEY (`mid`)
);
Insert some data into the table
insert into
test.movies (mid, name, year, rate)
values
('tt0188766', 'King of Comedy', 1999, 7.3),
('tt0286112', 'Shaolin Soccer', 2001, 7.3),
('tt4701660', 'The Mermaid', 2016, 6.3);
Create a DataX job configuration file
job_mysql_to_tugraph.json
Configuring Field
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "root",
"column": ["mid", "name", "year", "rate"],
"splitPk": "mid",
"connection": [
{
"table": ["movies"],
"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
}
]
}
},
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "movie",
"type": "VERTEX",
"properties": [
{ "name": "mid", "type": "STRING" },
{ "name": "name", "type": "STRING" },
{ "name": "year", "type": "STRING" },
{ "name": "rate", "type": "FLOAT", "optional": true }
],
"primary": "mid"
}
],
"files": [
{
"label": "movie",
"format": "JSON",
"columns": ["mid", "name", "year", "rate"]
}
]
}
}
}
]
}
}
Write simple sql
{
"job": {
"setting": {
"speed": {
"channel": 1
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "root",
"connection": [
{
"querySql": [
"select mid, name, year, rate from test.movies where year > 2000;"
],
"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"]
}
]
}
},
"writer": {
"name": "tugraphwriter",
"parameter": {
"host": "127.0.0.1",
"port": 7071,
"username": "admin",
"password": "73@TuGraph",
"graphName": "default",
"schema": [
{
"label": "movie",
"type": "VERTEX",
"properties": [
{ "name": "mid", "type": "STRING" },
{ "name": "name", "type": "STRING" },
{ "name": "year", "type": "STRING" },
{ "name": "rate", "type": "FLOAT", "optional": true }
],
"primary": "mid"
}
],
"files": [
{
"label": "movie",
"format": "JSON",
"columns": ["mid", "name", "year", "rate"]
}
]
}
}
}
]
}
}
./lgraph_server -c lgraph_standalone.json -d 'run' Start TuGraph and run the following command:
python3 datax/bin/datax.py job_mysql_to_tugraph.json