OLAP Performance Test Method

TPC-H introduction

The TPC Benchmark-H (TPC-H) is a decision support benchmark. It consists of a suite of business-oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance.
In the TPC-H benchmark, 22 complex SQL queries are performed on 8 tables. Most queries contain joins on several tables, subqueries, and GROUP BY clauses. For more information, visit https://www.tpc.org/tpch/.

Test environment introduction

OS: CentOS 7.9
CPU: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 16 cores, and 64 threads
Memory: 125 GB
Deployment mode of StoneDB: standalone

bin/mysqld  Ver 5.6.24-StoneDB for Linux on x86_64 (build-)
build information as follow:
        Repository address: https://github.com/stoneatom/stonedb.git:stonedb-5.6
        Branch name: stonedb-5.6
        Last commit ID: 90583b2
        Last commit time: Date:   Wed Jul 6 23:31:30 2022 +0800
        Build time: Date: Thu Jul  7 05:39:39 UTC 2022

Test scheme

1. Set up the test environment

For information about how to set up the test environment, see Quick Deployment.

2. Compile and deploy TPC-H

Download the TPC-H installation package, upload the package to the test machine.

For example, upload it to the** /data** folder.

unzip tpc-h-tool.zip
mv TPC-H_Tools_v3.0.0/ tpc-h/
cd /data/tpc-h/dbgen/
# Install GCC and MAKE.
yum install gcc make -y

Modify file makefile as shown in the following code.

cp makefile.suite makefile
vim makefile
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC      = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
#                                  SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are:  ATT, DOS, HP, IBM, ICL, MVS,
#                                  SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are:  TPCH
DATABASE= MYSQL
MACHINE = LINUX
WORKLOAD = TPCH

This modification is mandatory, because TPC-H originally does not support MySQL.

Modify file tpcd.h to add the database type MySQL to TPC-H.


vim tpcd.h

#ifdef MYSQL
#define GEN_QUERY_PLAN  ""
#define START_TRAN      "START TRANSACTION"
#define END_TRAN        "COMMIT"
#define SET_OUTPUT      ""
#define SET_ROWCOUNT    "limit %d;\n"
#define SET_DBASE       "use %s;\n"
#endif

# Run the "make" command.
make

3. Use TPC-H to generate 100 GB test data

For example, run the following command to use TPC-H to generate .tbl data files.

./dbgen -s 100

# Copy the data files to the "stonedb" folder.
mkdir /data/tpc-h/stonedb/
mv *.tbl /data/tpc-h/stonedb/

-s n in the command indicates the size of data generated. Unit: GB.
After the test files are generated, you can run the head command to check whether each row in the .tbl data files has some fields that are separated with vertical bars (|).

info

If this is not the first time you use TPC-H to generate data in the environment, we recommend that you run make clean and then make to clear data first and then run the command with -f specified to overwrite the data previously generated.

4. Modify the dss.ddl and dss.ri commands

The dss.ddl command is used to create tables. The dss.ri command is used to create indexes and foreign key indexes.

Because the syntax to create table schemas varies with the storage engine. Therefore, the statements for creating table schemas and indexes must be modified. The following code provides an example.

To modify the statements, copy the file that stores schemas and indexes to the stonedb folder.

[root@htap2 dbgen]# cp dss.ddl /data/tpc-h/stonedb/dss_stonedb.ddl			# Schemas used for StoneDB tables

-- sccsid:     @(#)dss.ddl	2.1.8.1
create table nation  ( n_nationkey  integer not null,
                            n_name       char(25) not null,
                            n_regionkey  integer not null,
                            n_comment    varchar(152),primary key (n_nationkey))engine=StoneDB;

create table region  ( r_regionkey  integer not null,
                            r_name       char(25) not null,
                            r_comment    varchar(152),primary key (r_regionkey))engine=StoneDB;

create table part  ( p_partkey     integer not null,
                          p_name        varchar(55) not null,
                          p_mfgr        char(25) not null,
                          p_brand       char(10) not null,
                          p_type        varchar(25) not null,
                          p_size        integer not null,
                          p_container   char(10) not null,
                          p_retailprice decimal(15,2) not null,
                          p_comment     varchar(23) not null,primary key (p_partkey) )engine=StoneDB;

create table supplier ( s_suppkey     integer not null,
                             s_name        char(25) not null,
                             s_address     varchar(40) not null,
                             s_nationkey   integer not null,
                             s_phone       char(15) not null,
                             s_acctbal     decimal(15,2) not null,
                             s_comment     varchar(101) not null,primary key (s_suppkey))engine=StoneDB;

create table partsupp ( ps_partkey     integer not null,
                             ps_suppkey     integer not null,
                             ps_availqty    integer not null,
                             ps_supplycost  decimal(15,2)  not null,
                             ps_comment     varchar(199) not null,primary key (ps_partkey,ps_suppkey) )engine=StoneDB;

create table customer ( c_custkey     integer not null,
                             c_name        varchar(25) not null,
                             c_address     varchar(40) not null,
                             c_nationkey   integer not null,
                             c_phone       char(15) not null,
                             c_acctbal     decimal(15,2)   not null,
                             c_mktsegment  char(10) not null,
                             c_comment     varchar(117) not null,primary key (c_custkey))engine=StoneDB;

create table orders  ( o_orderkey       integer not null,
                           o_custkey        integer not null,
                           o_orderstatus    char(1) not null,
                           o_totalprice     decimal(15,2) not null,
                           o_orderdate      date not null,
                           o_orderpriority  char(15) not null,  
                           o_clerk          char(15) not null, 
                           o_shippriority   integer not null,
                           o_comment        varchar(79) not null,primary key (o_orderkey))engine=StoneDB;

create table lineitem ( l_orderkey    integer not null,
                             l_partkey     integer not null,
                             l_suppkey     integer not null,
                             l_linenumber  integer not null,
                             l_quantity    decimal(15,2) not null,
                             l_extendedprice  decimal(15,2) not null,
                             l_discount    decimal(15,2) not null,
                             l_tax         decimal(15,2) not null,
                             l_returnflag  char(1) not null,
                             l_linestatus  char(1) not null,
                             l_shipdate    date not null,
                             l_commitdate  date not null,
                             l_receiptdate date not null,
                             l_shipinstruct char(25) not null,
                             l_shipmode     char(10) not null,
                             l_comment      varchar(44) not null,primary key (l_orderkey,l_linenumber))engine=StoneDB;

5. Import table schemas and data

Import table schemas.

mysql> create database tpch;
mysql> source /data/tpc-h/stonedb/dss_stonedb.ddl  # Modify the value of the PATH parameter to be the path to the TPC-H tool.

Import data.

You can directly import tables part, region, nation, customer, and supplier. For tables lineitem, orders, and partsupp, we recommend that you use a script such as split_file2db.sh to split them before the import.

# Import data to StoneDB.
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/part.tbl' into table part fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/region.tbl' into table region fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/nation.tbl' into table nation fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/customer.tbl' into table customer fields terminated by '|';"
mysql -uroot -pxxxx -hxx.xx.xx.xx -P3306 --local-infile -Dtpcd -e "load data local infile '/data/tpc-h/stonedb/supplier.tbl' into table supplier fields terminated by '|';"

#! /bin/bash

shopt -s expand_aliases
source ~/.bash_profile
# Obtain the .tbl files and the corresponding table names.
sql_path=/data/tpc-h/stonedb/
# split_tb=$(ls ${sql_path}/*.ddl)
# Files to split.
split_tb=(lineitem orders partsupp)
# split_tb=(customer nation supplier part region)
# split_tb=(part nation)
# Split settings.
# The interval (number of rows) for splitting.
line=1000000  

# Database configuration.
db_host=192.168.30.102
db_port=3306
db_user=ztpch
db_pwd=******
db=ztpch  




# Split a large SQL file.
function split_file()
{
    for tb_name in ${split_tb[@]}
    do
        echo "$tb_name"
        # Obtain the number of the file before it is split.
        totalline=$(cat $sql_path/$tb_name.tbl | wc -l)
        # echo totalline=$totalline
        a=`expr $totalline / $line`
        b=`expr $totalline % $line` 
        if  [[ $b -eq 0 ]] ;then
            filenum=$a
        else
            filenum=`expr $a + 1`
        fi
        # echo filenum=$filenum
        echo "File $tb_name has $totalline rows of data and needs to be split into $filenum files."


        # Split the file.
        i=1        # Change 38 to 1.
        while(( i<=$filenum ))
        do
                echo "File to split: $tb_name.tbl.$i"
                # The interval for splitting must falls with [min, max] of the original file. 
                p=`expr $i - 1`
                min=`expr $p \* $line + 1`
                max=`expr $i \* $line`
                sed -n "$min,$max"p $sql_path/$tb_name.tbl > $sql_path/$tb_name.tbl.$i
                #echo "This operation does not split the file."
                # Specify the name of the file to split.
                filename=$sql_path/$tb_name.tbl.$i
                echo "$tb_name.tbl.$i is split. File name: $filename"
                # Import data to StoneDB.
                mysql -u$db_user -p$db_pwd -h$db_host -P$db_port --local-infile -D$db -e "load data local infile '$filename' into table $tb_name fields terminated by '|';" $2>1 > /dev/null
            i=`expr $i + 1`
        done
    done
}

split_file

6. Generate test SQL statements

Copy the qgen and dists.dss files to the queries directory.

cp /data/tpc-h/dbgen/qgen /data/tpc-h/dbgen/queries
cp /data/tpc-h/dbgen/dists.dss /data/tpc-h/dbgen/qgen/queries
# Copy the files to path "data/tpc-h/stonedb".
cp -a /data/tpc-h/dbgen/qgen/queries /data/tpc-h/stonedb/queries

#!/usr/bin/bash
# Run "chmod +x tpch_querys.sh".
#./tpch_querys.sh stonedb
db_type=$1
for i in {1..22}
do
./qgen -d $i -s 100 > $db_type"$i".sql
done

# Execute the script to generate statements.
mkdir query
./tpch_querys.sh query
mv query*.sql /data/tpc-h/stonedb/queries

7. Modify the SQL statements and start the test

Test script:

#!/bin/bash

# stone
host=192.168.30.120
port=3306
user=root
password=********
database=ztpch


# The absolate path. The following is for reference only.
banchdir=/data/tpc-h/stonedb/queries
db_type=stonedb #ck、stone、mysql
resfile=$banchdir/"TPCH_${db_type}_`date "+%Y%m%d%H%M%S"`"


echo "start test run at"`date "+%Y-%m-%d %H:%M:%S"`|tee -a ${resfile}.out
echo "Path to log: ${resfile}"
for (( i=1; i<=22;i=i+1 ))
    do

    queryfile=${db_type}${i}".sql"

    echo "run query ${i}"|tee -a ${resfile}.out
    echo " $database  $banchdir/query$i.sql "  #|tee -a ${resfile}.out
    start_time=`date "+%s.%N"`
    #clickhouse
    #clickhouse-client --port $port --user $user --password $password --host $host -d $database < $banchdir/query$i.sql |tee -a ${resfile}.out
    #stonedb and mysql
    mysql -u$user -p$password -h$host -P$port $database -e "source $banchdir/query$i.sql" 2>&1 |tee -a ${resfile}.out

    end_time=`date "+%s.%N"`
    start_s=${start_time%.*}
    start_nanos=${start_time#*.}
    end_s=${end_time%.*}
    end_nanos=${end_time#*.}
    if [ "$end_nanos" -lt "$start_nanos" ];then
            end_s=$(( 10#$end_s -1 ))
            end_nanos=$(( 10#$end_nanos + 10 ** 9))
    fi
    time=$(( 10#$end_s - 10#$start_s )).`printf "%03d\n" $(( (10#$end_nanos - 10#$start_nanos)/10**6 ))`
    echo ${queryfile} "the $i run cost "${time}" second start at "`date -d @$start_time "+%Y-%m-%d %H:%M:%S"`" stop at "`date -d @$end_time "+%Y-%m-%d %H:%M:%S"` >> ${resfile}.time
    # systemctl stop clickhouse-server
    done

Statements before the modification

Statements after the modification

To ensure the repeatability of this test, we recommend that you use the statements after the modification.

q.zip

8. Execute the TPC-H script to obtain the test result

The .out file stores the test results. The .time file records the execution time of each query.

ll /data/tpc-h/stonedb/queries/stonedb
-rw-r--r-- 1 root root   15019 Jun  1 00:55 TPCH_stone_20220531233024.out
-rw-r--r-- 1 root root    2179 Jun  1 00:57 TPCH_stone_20220531233024.time

OLAP Performance Test Method

TPC-H introduction​

Test environment introduction​

Test scheme​

1. Set up the test environment​

2. Compile and deploy TPC-H​

3. Use TPC-H to generate 100 GB test data​

4. Modify the dss.ddl and dss.ri commands​

5. Import table schemas and data​

6. Generate test SQL statements​

7. Modify the SQL statements and start the test​

Statements before the modification​

Statements after the modification​

8. Execute the TPC-H script to obtain the test result​