Skip to content

Latest commit

 

History

History
 
 

benchmark

Interactive Engine Performance

Here is a performance report from the run of the included benchmark program.

Experimental Setup

Hardware Configurations:

  • cluster: an 8 nodes cluster
  • memory: 755GB memory
  • network: 50Gbps network
  • cpu: two 26-core Intel(R) Xeon(R) Platinum 8269CY CPUs at 2.50GHz

Datasets: We generate large LDBC data sets with scale factor 30 using LDBC SNB Data Generator. The generated LDBC data sets which have 89 million vertices and 541 million edges would be used for following experiments.

Queries: For comparison, we consider graph queries from the Social Network Benchmark defined by LDBC to model industrial use cases on a social network akin to Facebook. We choose 10 out of 14 complex read queries (denoted as CR-1...14) from LDBC’s Interactive Workload. (The remaining queries are either too simple (such as simple point-lookup queries) or rely on user-defined logic (such as CR-4,10,13,14), which is not supported by other popular TinkerPop-based systems.)

Single-node performance

JanusGraph cannot process query in parallel, and we run GraphScope in one machine for fair comparison.

compare-JanusGraph

As shown above, GraphScope achieves orders-of-magnitude better performance than JanusGraph in most cases. JanusGraph fails to answer many queries (CR-3, 5, 9) due to out of time.

Scalability

We further study the scalability of GraphScope by adding more computing nodes.

scale-out-largescale-out-small

As shown in the above figure, we analyze the results regarding the two query groups based on their runtime: larger queries (CR-3,5,6,9) and small queries (CR-1, 2, 7, 8, 11).

  • For large queries, these queries traverse a large amount of data and can scale well with up to 5x performance gain from 1 nodes to 8 nodes.

  • For small queries, they only touch a small sub-graph and are not computation-intensive. We expect that their performance may not be improved by adding more machines, while CR-1, 2 and CR-12 still run consistently faster.

As we can see, GraphScope can scale large complex queries almost linearly while keeping stable performance of small queries.

Benchmark Tool Usage

In this directory is a tool that can be used to reproduce this benchmark. It serves as multiple clients to send queries to gremlin server through the gremlin endpoint exposed by the engine, and report the performance numbers (e.g., latency, throughput, query results). The benchmark program sends mixed queries to the server by reading query templates from queries with filling the parameters in the query templates using substitution_parameters. The program uses a round-robin strategy to iterate all the enabled queries with corresponding parameters.

Repository contents

- config                                
    - interactive-benchmark.properties  // configurations for running benchmark
- data
    - substitution_parameters           // query parameter files using to fill the query templates
- queries                               // qurery templates including LDBC queries, K-hop queries and user-defined queries
- shell
    - benchmark.sh                      // script for running benchmark
- src                                   // source code of benchmark program

Note: the queries here with the prefix interactive-complex are implementations of LDBC official interactive complex reads, and the corresponding parameters (factor 30) are generated by LDBC official tools.

Building

Build benchmark program using Maven:

mvn clean package

All the binary and queries would be packed into target/benchmark-0.0.1-SNAPSHOT-dist.tar.gz, and you can use deploy the package to anywhere could connect to the gremlin endpoint.

Running the benchmark

tar -xvf maxgraph-benchmark-0.0.1-SNAPSHOT-dist.tar.gz
cd maxgraph-benchmark-0.0.1-SNAPSHOT
vim conf/interactive-benchmark.properties # specify the gremlin endpoint of your server and modify running configurations
./shell/benchmark.sh                      # run the benchmark program

Benchmark reports numbers as following:

QueryName[LDBC_QUERY_1], Parameter[{firstName=John, personId=17592186223433}], ResultCount[87], ExecuteTimeMS[ 1266 ].
QueryName[LDBC_QUERY_12], Parameter[{tagClassName=Judge, personId=19791209469071}], ResultCount[0], ExecuteTimeMS[ 259 ].
QueryName[LDBC_QUERY_11], Parameter[{workFromYear=2001, personId=32985348901156, countryName=Bolivia}], ResultCount[0], ExecuteTimeMS[ 60 ].
QueryName[LDBC_QUERY_9], Parameter[{personId=10995116420051, maxDate=20121128080000000}], ResultCount[20], ExecuteTimeMS[ 55755 ].
QueryName[LDBC_QUERY_8], Parameter[{personId=67523}], ResultCount[20], ExecuteTimeMS[ 148 ].
QueryName[LDBC_QUERY_7], Parameter[{personId=26388279199350}], ResultCount[0], ExecuteTimeMS[ 10 ].
QueryName[LDBC_QUERY_6], Parameter[{personId=26388279148519, tagName=Vallabhbhai_Patel}], ResultCount[0], ExecuteTimeMS[ 12837 ].
QueryName[LDBC_QUERY_5], Parameter[{minDate=20120814080000000, personId=2199023436754}], ResultCount[0], ExecuteTimeMS[ 11268 ].
QueryName[LDBC_QUERY_3], Parameter[{durationDays=30, endDate=20110701080000000, countryXName=Mongolia, countryYName=Namibia, personId=8796093204429, startDate=20110601080000000}], ResultCount[20]
, ExecuteTimeMS[ 21474 ].
QueryName[LDBC_QUERY_2], Parameter[{personId=28587302394490, maxDate=20121128080000000}], ResultCount[20], ExecuteTimeMS[ 331 ].
query count: 10; execute time(ms): ...; qps: ...

Reproduce this Performance Report

  1. Generate LDBC data using the official tool and set the scale factor (ldbc.snb.datagen.generator.scaleFactor:snb.interactive.1) to 30;

After generation, you may need to adjust the data sets by following two steps:

  • enter social_network directory and copy all the csv files from dynamic and static to one directory;
  • convert date format to yyyymmddhhmmssmmm like this:
 sed -i "s#|\([0-9][0-9][0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\)T\([0-9][0-9]\):\([0-9][0-9]\):\([0-9][0-9]\)\.\([0-9][0-9][0-9]\)+0000#|\1\2\3\4\5\6\7#g" *.csv
  1. Load the LDBC data to GraphScope system (ref to the loading doc and LDBC loading scripts);

  2. Specify the configuration of benchmark (gremlin endpoint, queries enabled ...);

  3. Copy the generated substitution parameters to directory ./data/substitution_parameters;

  4. Run the benchmark and accumulate the statistics. (see section Benchmark Tool Usage)