Skip to content

[CALCITE-4034] Implement a MySQL InnoDB adapter #1996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

neoremind
Copy link
Contributor

@neoremind neoremind commented May 31, 2020

https://issues.apache.org/jira/browse/CALCITE-4034

Calcite’s InnoDB adapter allows you to query the data based on InnoDB data files directy, data files are also known as .ibd files. It leverages innodb-java-reader. This adapter is different from JDBC adapter which maps a schema in a JDBC data source and requires a MySQL server to serve response. With .ibd files and the corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL server", it accepts SQL query and attempts to compile the query based on InnoDB file accessing APIs provided by innodb-java-reader, it will exploit projecting, filtering and sorting directly in InnoDB data file where possible.

What’s more, with DDLs, the adapter is "index aware", it leverages rules to choose the right index to scan, for example, using primary key or secondary keys to look up data, then it tries to push down some conditions into storage engine. Also, the adapter leaves option to provide hint as well, so that user can indicate the optimizer to force use one specific index.

The InnoDB adapter can,

  1. Full scan a table.
  2. Scan by index such as primary key or secondary key, single column or composite keys.
  3. Push down filtering condition to storage engine, support both point query and range query.
  4. Push down projection.
  5. Push down ordering if it matches the natural collation of an index.
  6. Support almost all the commonly used data types.
  7. Work as a simple MySQL server being able to read .ibd data files directly through file system, this can be a tool to query or dump table data by offloading from MySQL process under some conditions.

A basic example of a model file is given below, this schema reads from a MySQL "scott" database:

{
  "version": "1.0",
  "defaultSchema": "scott",
  "schemas": [
    {
      "name": "scott",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.innodb.InnodbSchemaFactory",
      "operand": {
        "sqlFilePath": [ "/path/scott.sql" ],
        "ibdDataFileBasePath": "/usr/local/mysql/data/scott"
      }
    }
  ]
}

sqlFilePath is a list of DDL files, you can generate table definitions by executing mysqldump -d -u<username> -p<password> -h <hostname> <dbname> in command-line.

The file content of /path/scott.sql is given below:

CREATE TABLE `DEPT`(
`DEPTNO` TINYINT NOT NULL,
`DNAME` VARCHAR(50) NOT NULL,
`LOC` VARCHAR(20),
UNIQUE KEY `DEPT_PK` (`DEPTNO`)
)ENGINE=InnoDB DEFAULT CHARSET=latin1;

{% highlight bash %}
CREATE TABLE `EMP`(
`EMPNO` INT(11) NOT NULL,
`ENAME` VARCHAR(100) NOT NULL,
`JOB` VARCHAR(15) NOT NULL,
`AGE` SMALLINT,
`MGR` BIGINT,
`HIREDATE` DATE,
`SAL` DECIMAL(8,2) NOT NULL,
`COMM` DECIMAL(6,2),
`DEPTNO` TINYINT,
`EMAIL` VARCHAR(100) DEFAULT NULL,
`CREATE_DATETIME` DATETIME,
`CREATE_TIME` TIME,
`UPSERT_TIME` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`EMPNO`),
KEY `ENAME_KEY` (`ENAME`),
KEY `HIREDATE_KEY` (`HIREDATE`),
KEY `CREATE_DATETIME_JOB_KEY` (`CREATE_DATETIME`, `JOB`),
KEY `CREATE_TIME_KEY` (`CREATE_TIME`),
KEY `UPSERT_TIME_KEY` (`UPSERT_TIME`),
KEY `DEPTNO_JOB_KEY` (`DEPTNO`, `JOB`),
KEY `DEPTNO_SAL_COMM_KEY` (`DEPTNO`, `SAL`, `COMM`),
KEY `DEPTNO_MGR_KEY` (`DEPTNO`, `MGR`),
KEY `AGE_KEY` (`AGE`)
)ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

ibdDataFileBasePath is the parent file path of .ibd files.

Assuming the model file is stored as model.json, you can connect to InnoDB data file to perform query via sqlline as follows:

sqlline> !connect jdbc:calcite:model=model.json admin admin

We can query all employees by writing standard SQL:

sqlline> select empno,ename,job,age,mgr from "EMP";
+-------+--------+-----------+-----+------+
| EMPNO | ENAME  |    JOB    | AGE | MGR  |
+-------+--------+-----------+-----+------+
| 7369  | SMITH  | CLERK     | 30  | 7902 |
| 7499  | ALLEN  | SALESMAN  | 24  | 7698 |
| 7521  | WARD   | SALESMAN  | 41  | 7698 |
| 7566  | JONES  | MANAGER   | 28  | 7839 |
| 7654  | MARTIN | SALESMAN  | 27  | 7698 |
| 7698  | BLAKE  | MANAGER   | 38  | 7839 |
| 7782  | CLARK  | MANAGER   | 32  | 7839 |
| 7788  | SCOTT  | ANALYST   | 45  | 7566 |
| 7839  | KING   | PRESIDENT | 22  | null |
| 7844  | TURNER | SALESMAN  | 54  | 7698 |
| 7876  | ADAMS  | CLERK     | 35  | 7788 |
| 7900  | JAMES  | CLERK     | 40  | 7698 |
| 7902  | FORD   | ANALYST   | 28  | 7566 |
| 7934  | MILLER | CLERK     | 32  | 7782 |
+-------+--------+-----------+-----+------+

While executing this query, the InnoDB adapter scans the InnoDB data file EMP.ibd using primary key, also known as clustering B+ tree index in MySQL, and is able to
push down projection to underlying storage engine. Projection can reduce the size of data fetched from the storage engine.

We can look up one employee by filtering. The InnoDB adapter retrieves all indexes through DDL file provided in model.json.

sqlline> select empno,ename,job,age,mgr from "EMP" where empno = 7782;
+-------+-------+---------+-----+------+
| EMPNO | ENAME |   JOB   | AGE | MGR  |
+-------+-------+---------+-----+------+
| 7782  | CLARK | MANAGER | 32  | 7839 |
+-------+-------+---------+-----+------+

The InnoDB adapter is able to recognize that empno is the primary key and do a point-lookup by using clustering index instead of a full table scan.

We can do range query on primary key as well.

sqlline> select empno,ename,job,age,mgr from "EMP" where empno > 7782 and empno < 7900;

Note that such query with acceptable range is usually efficient in MySQL with InnoDB storage engine, because for clustering B+ tree index, records close in index are close in data file, which is good for scanning.

We can look up employee by secondary key. For example, the filtering condition will be on a VARCHAR field ename.

sqlline> select empno,ename,job,age,mgr from "EMP" where ename = 'smith';
+-------+-------+-------+-----+------+
| EMPNO | ENAME |  JOB  | AGE | MGR  |
+-------+-------+-------+-----+------+
| 7369  | SMITH | CLERK | 30  | 7902 |
+-------+-------+-------+-----+------+

The InnoDB adapter works well on almost all the commonly used data types in MySQL, for more information on supported data types, please refer to innodb-java-reader.

We can query by composite key. For example, given secondary index of DEPTNO_MGR_KEY.

sqlline> select empno,ename,job,age,mgr from "EMP" where deptno = 20 and mgr = 7566;
+-------+-------+---------+-----+------+
| EMPNO | ENAME |   JOB   | AGE | MGR  |
+-------+-------+---------+-----+------+
| 7788  | SCOTT | ANALYST | 45  | 7566 |
| 7902  | FORD  | ANALYST | 28  | 7566 |
+-------+-------+---------+-----+------+

The InnoDB adapter will leverage the matched key DEPTNO_MGR_KEY to push down filtering condition of deptno = 20 and mgr = 7566.

In some cases, only part of the conditions can be pushed down since there is a limitation in the underlying storage engine API, leaving unpushed remainder conditions in the rest of the plan. Given the below SQL, only deptno = 20 is pushed down.

select empno,ename,job,age,mgr from "EMP" where deptno = 20 and upsert_time > '2018-01-01 00:00:00';

innodb-java-reader only supports range query with lower and upper bound using an index, not fully Index Condition Pushdown (ICP). The storage engine returns a range of rows and Calcite will evaluates the rest of WHERE condition from the rows fetched.

For the below SQL, there are multiple indexes satisfying the left-prefix index rule, the possible indexes are DEPTNO_JOB_KEY, DEPTNO_SAL_COMM_KEY and DEPTNO_MGR_KEY, the Innod adapter will choose one of them according to the ordinal defined in DDL, only deptno = 20 condition is pushed down, leaving the rest of WHERE condition handled by Calcite built-in execution engine.

sqlline> select empno,deptno,sal from "EMP" where deptno = 20 and sal > 2000;
+-------+--------+---------+
| EMPNO | DEPTNO |   SAL   |
+-------+--------+---------+
| 7788  | 20     | 3000.00 |
| 7902  | 20     | 3000.00 |
| 7566  | 20     | 2975.00 |
+-------+--------+---------+

Accessing rows through secondary key requires scanning by secondary index and retrieving records back to clustering index in InnoDB, for a "big" scan, that would introduce many random I/O operations, so performance is usually not good enough. Note that the query above can be more performant by using EPTNO_SAL_COMM_KEY index, because covering index does not need to retrieve back to clustering index. We can force using DEPTNO_SAL_COMM_KEY index by hint as below.

sqlline> select empno,deptno,sal from "EMP"/*+ index(DEPTNO_SAL_COMM_KEY) */ where deptno = 20 and sal > 2000;

Hint can be configured in SqlToRelConverter, to enable hint, you should register index HintStrategy on TableScan in SqlToRelConverter.ConfigBuilder. Index hint takes effect on the base TableScan relational node, if there are conditions matching the index, index condition can be pushed down as well. For the below SQL, although none of the indexes can be used, but by leveraging covering index, the performance is better than full table scan, we can force to use DEPTNO_MGR_KEY to scan in secondary index.

sqlline> select empno,mgr from "EMP"/*+ index(DEPTNO_MGR_KEY) */ where mgr = 7839;

Ordering can be pushed down if it matches the natural collation of the index used.

sqlline> select deptno,ename,hiredate from "EMP" where hiredate < '2020-01-01' order by hiredate desc;
+--------+--------+------------+
| DEPTNO | ENAME  |  HIREDATE  |
+--------+--------+------------+
| 20     | ADAMS  | 1987-05-23 |
| 20     | SCOTT  | 1987-04-19 |
| 10     | MILLER | 1982-01-23 |
| 20     | FORD   | 1981-12-03 |
| 30     | JAMES  | 1981-12-03 |
| 10     | KING   | 1981-11-17 |
| 30     | MARTIN | 1981-09-28 |
| 30     | TURNER | 1981-09-08 |
| 10     | CLARK  | 1981-06-09 |
| 30     | WARD   | 1981-02-22 |
| 30     | ALLEN  | 1981-02-20 |
| 20     | JONES  | 1981-02-04 |
| 30     | BLAKE  | 1981-01-05 |
| 20     | SMITH  | 1980-12-17 |
+--------+--------+------------+

Limitations

innodb-java-reader has some prerequisites for .ibd files, please refer to Prerequisites.

You can think of the adapter as a simple MySQL server, with the ability to query, dump data by offloading from MySQL process under some conditions. If pages are not flushed from InnoDB Buffer Pool to disk, then the result may be inconsistent (the LSN in .ibd file might smaller than in-memory pages). InnoDB leverages write ahead log in terms of performance, so there is no command available to flush all dirty pages. Only internal mechanism manages when and where to persist pages to disk, like Page Cleaner thread, adaptive flushing, etc.

Currently the InnoDB adapter does not aware row count and cardinality of a .ibd data file, so it will only rely on simple rules to perform optimization, once underlying storage engine could provide such metrics and metadata, this can be integrated in Calcite by leveraging cost based optimization in the future.

@XuQianJin-Stars
Copy link
Contributor

hi @neoremind,Thank for you work, I will take the time to review this pr.

@neoremind
Copy link
Contributor Author

@XuQianJin-Stars many thanks 😃

@neoremind
Copy link
Contributor Author

@XuQianJin-Stars I have rebased master, did some refinement and updated innodb_adapter.md, for how the adapter works, please refer to innodb_adapter.md, I think this doc explains most of the design and use cases.

@XuQianJin-Stars
Copy link
Contributor

please

hi @neoremind Thanks for adding the documentation description, the whole PR looks good, I need to take a moment to take a look at it as a whole.

@neoremind
Copy link
Contributor Author

@XuQianJin-Stars No hurry, take your time, thanks very much!

@neoremind neoremind force-pushed the innodb branch 2 times, most recently from 84e64ca to 1fbe785 Compare June 27, 2020 09:35
@zinking
Copy link
Contributor

zinking commented Jun 29, 2020

LGTM, how about adding a test that isn't within mysql's SQL syntax support but gets supported through this adapter ?

@neoremind
Copy link
Contributor Author

@zinking Thanks for reviewing! Could you give me some testing SQL examples and maybe explain the meaning behind this?

@XuQianJin-Stars
Copy link
Contributor

hi @neoremind
Have you considered that row_format has some other values:
default fixed dynamic compressed extra compact

@neoremind
Copy link
Contributor Author

In MySQL 5.6, COMPACT is the default row format. After MySQL 5.7 (include 8.0), DYNAMIC is the default row format. The two are the most popular row formats.

For COMPRESSED, it is not supported yet. Users who cares about storage size rather than CPU load might choose this format. But IMHO, most MySQL users do not specify row format when creating table.

For FIXED row format, it is rarely used. Refer to https://dev.mysql.com/doc/refman/5.7/en/create-table.html

ROW_FORMAT=FIXED is not supported. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is disabled, InnoDB issues a warning and assumes ROW_FORMAT=DYNAMIC. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is enabled, which is the default, InnoDB returns an error.

For REDUNDANT row format, it is an very old format before MySQL 5.1.

For extra, there is no such row format. Valid row formats are {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}

To conclude, the adapter supports COMPACT and DYNAMIC format which are most commonly used nowadays. I can add explanations in Limitation section.

Copy link
Contributor

@XuQianJin-Stars XuQianJin-Stars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MySQL 5.6, COMPACT is the default row format. After MySQL 5.7 (include 8.0), DYNAMIC is the default row format. The two are the most popular row formats.

For COMPRESSED, it is not supported yet. Users who cares about storage size rather than CPU load might choose this format. But IMHO, most MySQL users do not specify row format when creating table.

For FIXED row format, it is rarely used. Refer to https://dev.mysql.com/doc/refman/5.7/en/create-table.html

ROW_FORMAT=FIXED is not supported. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is disabled, InnoDB issues a warning and assumes ROW_FORMAT=DYNAMIC. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is enabled, which is the default, InnoDB returns an error.

For REDUNDANT row format, it is an very old format before MySQL 5.1.

For extra, there is no such row format. Valid row formats are {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}

To conclude, the adapter supports COMPACT and DYNAMIC format which are most commonly used nowadays. I can add explanations in Limitation section.

well, I suggest to add the currently supported format in the document.

/** Scanning table fully with secondary key. */
SK_FULL_SCAN(5);

private int priority;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private int priority -> private final int priority ?

@XuQianJin-Stars
Copy link
Contributor

hi @neoremind What is the production usage scenario of this MySQL InnoDB Java Reader?

@XuQianJin-Stars
Copy link
Contributor

hi @neoremind Sorry I haven't finished the review yet, I will continue to take the time to complete. This PR looks pretty good overall.

@neoremind
Copy link
Contributor Author

@XuQianJin-Stars I have addressed the comments above.

For the question: What is the production usage scenario of this MySQL InnoDB Java Reader?
Calcite's InnoDB adapter allows user to query the InnoDB data based on ibd files directly as illustrated below. innodb-java-reader provides some basic storage APIs, for example, queryByPrimaryKey, queryAll and queryBySk, etc. With this adapter, user is able to query through SQL, and the internal embedded execution engine can work pretty well under most cases.

                      SQL Query 
                       |     |
                      /       \
                ------         ------
               |                     | 
               v                     v
      MySQL Server                    
+-----------------------+   +----------------------+        
| SQL/Parser/Optimizer..|   |Calcite innodb adapter|   
+-----------------------+   +----------------------+
+-----------------------+    +--------------------+         
| InnoDB Storage Engine |    | innodb-java-reader |
+-----------------------+    +--------------------+

---------------------File System--------------------

	+------------+      +-----+
	| .ibd files | ...  |     |    InnoDB Data files
	+------------+      +-----+                         

The by-pass querying capability can benefit the following scenarios:

  1. Query by offloading mysql-server process, even without mysql-server running. Backup MySQL without disturbing mysql-server process, this is a good alternative to dump data besides mysql-dump and mysql -e command.
  2. InnoDB data file can be distributed to other storage. Based on this adapter, users can transform as what ever they wanted, for example, transforming data to columnar format (ORC, Parquet) for OLAP engines; dumping data to heterogeneous NewSQL systems, so that Data Lake analytics can leverage this. Although real-time is hard to achieve because we only get a snapshot of the table, it is still useful in Lambda architecture.

@neoremind neoremind force-pushed the innodb branch 2 times, most recently from fbfa506 to b5e1622 Compare July 13, 2020 08:20
@neoremind neoremind force-pushed the innodb branch 2 times, most recently from 2d24440 to 08f42f7 Compare July 21, 2020 02:54
@neoremind neoremind force-pushed the innodb branch 2 times, most recently from 1ca1bde to ea2e0fc Compare July 27, 2020 08:49
@neoremind
Copy link
Contributor Author

I have refactored some of the code to use the new API (in 1.25.0), to create and parameterized innodb planner rules. Please refer to https://issues.apache.org/jira/browse/CALCITE-3923.

@neoremind
Copy link
Contributor Author

@XuQianJin-Stars I have addressed the comments from Julian (discussion in JIRA), and made the latest code compatible with 1.25.0, the binary files are not a concern anymore. Are there any other works to be done for this PR? I 'd very much like to push forward the work. Many thanks!

julianhyde and others added 12 commits September 28, 2020 08:34
Add Holder.accept
A better implementation of Sarg is possible. The current
implementation can only handle Sargs that result in an AND,
e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that
can result in an OR of ANDs. E.g. the SQL
  x BETWEEN 10 AND 20 OR c > 30
becomes a single RexCall
   SEARCH(x, Sarg([10, 20], (30, +inf)))
and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.
(It is not a good practice to use Optional for fields or parameters.)
@neoremind neoremind force-pushed the innodb branch 2 times, most recently from 1b233f9 to 43b8513 Compare October 10, 2020 08:29
julianhyde pushed a commit to julianhyde/calcite that referenced this pull request Oct 11, 2020
InnoDB is a storage engine for MySQL, but it can also be used
as a standlone file format. This adapter adds a SQL interface
to InnoDB that uses Calcite rather than MySQL.

This adapter handles Sarg by expanding to an OR of ranges. A
better implementation of Sarg is probably possible. The
current implementation can only handle Sargs that result in
an AND, e.g. x >= 10 AND x <= 20). But we ought to handle
Sargs that can result in an OR of ANDs. E.g. the SQL
  x BETWEEN 10 AND 20 OR c > 30
becomes a single RexCall
   SEARCH(x, Sarg([10, 20], (30, +inf)))
and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.

Tweaks (Julian Hyde):
* Add Holder.accept
* Make IndexCondition immutable
* Move computation out of InnodbFilter's constructor

Close apache#1996
MalteBellmann pushed a commit to caespes/calcite that referenced this pull request Feb 21, 2021
InnoDB is a storage engine for MySQL, but it can also be used
as a standlone file format. This adapter adds a SQL interface
to InnoDB that uses Calcite rather than MySQL.

This adapter handles Sarg by expanding to an OR of ranges. A
better implementation of Sarg is probably possible. The
current implementation can only handle Sargs that result in
an AND, e.g. x >= 10 AND x <= 20). But we ought to handle
Sargs that can result in an OR of ANDs. E.g. the SQL
  x BETWEEN 10 AND 20 OR c > 30
becomes a single RexCall
   SEARCH(x, Sarg([10, 20], (30, +inf)))
and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.

Tweaks (Julian Hyde):
* Add Holder.accept
* Make IndexCondition immutable
* Move computation out of InnodbFilter's constructor

Close apache#1996
XuQianJin-Stars pushed a commit to XuQianJin-Stars/calcite that referenced this pull request Jul 14, 2021
InnoDB is a storage engine for MySQL, but it can also be used
as a standlone file format. This adapter adds a SQL interface
to InnoDB that uses Calcite rather than MySQL.

This adapter handles Sarg by expanding to an OR of ranges. A
better implementation of Sarg is probably possible. The
current implementation can only handle Sargs that result in
an AND, e.g. x >= 10 AND x <= 20). But we ought to handle
Sargs that can result in an OR of ANDs. E.g. the SQL
  x BETWEEN 10 AND 20 OR c > 30
becomes a single RexCall
   SEARCH(x, Sarg([10, 20], (30, +inf)))
and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.

Tweaks (Julian Hyde):
* Add Holder.accept
* Make IndexCondition immutable
* Move computation out of InnodbFilter's constructor

Close apache#1996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants