-
Notifications
You must be signed in to change notification settings - Fork 2.4k
[CALCITE-4034] Implement a MySQL InnoDB adapter #1996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hi @neoremind,Thank for you work, I will take the time to review this pr. |
@XuQianJin-Stars many thanks 😃 |
@XuQianJin-Stars I have rebased master, did some refinement and updated |
hi @neoremind Thanks for adding the documentation description, the whole PR looks good, I need to take a moment to take a look at it as a whole. |
@XuQianJin-Stars No hurry, take your time, thanks very much! |
84e64ca
to
1fbe785
Compare
LGTM, how about adding a test that isn't within mysql's SQL syntax support but gets supported through this adapter ? |
@zinking Thanks for reviewing! Could you give me some testing SQL examples and maybe explain the meaning behind this? |
hi @neoremind |
In MySQL 5.6, For For
For For To conclude, the adapter supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In MySQL 5.6,
COMPACT
is the default row format. After MySQL 5.7 (include 8.0),DYNAMIC
is the default row format. The two are the most popular row formats.For
COMPRESSED
, it is not supported yet. Users who cares about storage size rather than CPU load might choose this format. But IMHO, most MySQL users do not specify row format when creating table.For
FIXED
row format, it is rarely used. Refer to https://dev.mysql.com/doc/refman/5.7/en/create-table.htmlROW_FORMAT=FIXED is not supported. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is disabled, InnoDB issues a warning and assumes ROW_FORMAT=DYNAMIC. If ROW_FORMAT=FIXED is specified while innodb_strict_mode is enabled, which is the default, InnoDB returns an error.
For
REDUNDANT
row format, it is an very old format before MySQL 5.1.For
extra
, there is no such row format. Valid row formats are {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}To conclude, the adapter supports
COMPACT
andDYNAMIC
format which are most commonly used nowadays. I can add explanations inLimitation
section.
well, I suggest to add the currently supported format in the document.
/** Scanning table fully with secondary key. */ | ||
SK_FULL_SCAN(5); | ||
|
||
private int priority; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private int priority -> private final int priority ?
hi @neoremind What is the production usage scenario of this MySQL InnoDB Java Reader? |
hi @neoremind Sorry I haven't finished the review yet, I will continue to take the time to complete. This PR looks pretty good overall. |
@XuQianJin-Stars I have addressed the comments above. For the question: What is the production usage scenario of this MySQL InnoDB Java Reader?
The by-pass querying capability can benefit the following scenarios:
|
fbfa506
to
b5e1622
Compare
2d24440
to
08f42f7
Compare
1ca1bde
to
ea2e0fc
Compare
I have refactored some of the code to use the new API (in 1.25.0), to create and parameterized innodb planner rules. Please refer to https://issues.apache.org/jira/browse/CALCITE-3923. |
@XuQianJin-Stars I have addressed the comments from Julian (discussion in JIRA), and made the latest code compatible with 1.25.0, the binary files are not a concern anymore. Are there any other works to be done for this PR? I 'd very much like to push forward the work. Many thanks! |
…r how planner rules are parameterized).
A better implementation of Sarg is possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'.
…able them before merge to master.
(It is not a good practice to use Optional for fields or parameters.)
1b233f9
to
43b8513
Compare
InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996
InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996
InnoDB is a storage engine for MySQL, but it can also be used as a standlone file format. This adapter adds a SQL interface to InnoDB that uses Calcite rather than MySQL. This adapter handles Sarg by expanding to an OR of ranges. A better implementation of Sarg is probably possible. The current implementation can only handle Sargs that result in an AND, e.g. x >= 10 AND x <= 20). But we ought to handle Sargs that can result in an OR of ANDs. E.g. the SQL x BETWEEN 10 AND 20 OR c > 30 becomes a single RexCall SEARCH(x, Sarg([10, 20], (30, +inf))) and results in an OR of ANDs, '(x >= 10 AND x <= 20) OR (x > 30)'. Tweaks (Julian Hyde): * Add Holder.accept * Make IndexCondition immutable * Move computation out of InnodbFilter's constructor Close apache#1996
https://issues.apache.org/jira/browse/CALCITE-4034
Calcite’s InnoDB adapter allows you to query the data based on InnoDB data files directy, data files are also known as .ibd files. It leverages innodb-java-reader. This adapter is different from JDBC adapter which maps a schema in a JDBC data source and requires a MySQL server to serve response. With .ibd files and the corresponding DDLs, InnoDB adapter is able to work like a simple "MySQL server", it accepts SQL query and attempts to compile the query based on InnoDB file accessing APIs provided by innodb-java-reader, it will exploit projecting, filtering and sorting directly in InnoDB data file where possible.
What’s more, with DDLs, the adapter is "index aware", it leverages rules to choose the right index to scan, for example, using primary key or secondary keys to look up data, then it tries to push down some conditions into storage engine. Also, the adapter leaves option to provide hint as well, so that user can indicate the optimizer to force use one specific index.
The InnoDB adapter can,
A basic example of a model file is given below, this schema reads from a MySQL "scott" database:
sqlFilePath
is a list of DDL files, you can generate table definitions by executingmysqldump -d -u<username> -p<password> -h <hostname> <dbname>
in command-line.The file content of
/path/scott.sql
is given below:ibdDataFileBasePath is the parent file path of
.ibd
files.Assuming the model file is stored as
model.json
, you can connect to InnoDB data file to perform query via sqlline as follows:We can query all employees by writing standard SQL:
While executing this query, the InnoDB adapter scans the InnoDB data file
EMP.ibd
using primary key, also known as clustering B+ tree index in MySQL, and is able topush down projection to underlying storage engine. Projection can reduce the size of data fetched from the storage engine.
We can look up one employee by filtering. The InnoDB adapter retrieves all indexes through DDL file provided in
model.json
.The InnoDB adapter is able to recognize that
empno
is the primary key and do a point-lookup by using clustering index instead of a full table scan.We can do range query on primary key as well.
Note that such query with acceptable range is usually efficient in MySQL with InnoDB storage engine, because for clustering B+ tree index, records close in index are close in data file, which is good for scanning.
We can look up employee by secondary key. For example, the filtering condition will be on a
VARCHAR
fieldename
.The InnoDB adapter works well on almost all the commonly used data types in MySQL, for more information on supported data types, please refer to innodb-java-reader.
We can query by composite key. For example, given secondary index of
DEPTNO_MGR_KEY
.The InnoDB adapter will leverage the matched key
DEPTNO_MGR_KEY
to push down filtering condition ofdeptno = 20 and mgr = 7566
.In some cases, only part of the conditions can be pushed down since there is a limitation in the underlying storage engine API, leaving unpushed remainder conditions in the rest of the plan. Given the below SQL, only
deptno = 20
is pushed down.innodb-java-reader
only supports range query with lower and upper bound using an index, not fullyIndex Condition Pushdown (ICP)
. The storage engine returns a range of rows and Calcite will evaluates the rest ofWHERE
condition from the rows fetched.For the below SQL, there are multiple indexes satisfying the left-prefix index rule, the possible indexes are
DEPTNO_JOB_KEY
,DEPTNO_SAL_COMM_KEY
andDEPTNO_MGR_KEY
, the Innod adapter will choose one of them according to the ordinal defined in DDL, onlydeptno = 20
condition is pushed down, leaving the rest ofWHERE
condition handled by Calcite built-in execution engine.Accessing rows through secondary key requires scanning by secondary index and retrieving records back to clustering index in InnoDB, for a "big" scan, that would introduce many random I/O operations, so performance is usually not good enough. Note that the query above can be more performant by using
EPTNO_SAL_COMM_KEY
index, because covering index does not need to retrieve back to clustering index. We can force usingDEPTNO_SAL_COMM_KEY
index by hint as below.Hint can be configured in
SqlToRelConverter
, to enable hint, you should registerindex
HintStrategy onTableScan
inSqlToRelConverter.ConfigBuilder
. Index hint takes effect on the baseTableScan
relational node, if there are conditions matching the index, index condition can be pushed down as well. For the below SQL, although none of the indexes can be used, but by leveraging covering index, the performance is better than full table scan, we can force to useDEPTNO_MGR_KEY
to scan in secondary index.Ordering can be pushed down if it matches the natural collation of the index used.
Limitations
innodb-java-reader
has some prerequisites for.ibd
files, please refer to Prerequisites.You can think of the adapter as a simple MySQL server, with the ability to query, dump data by offloading from MySQL process under some conditions. If pages are not flushed from InnoDB Buffer Pool to disk, then the result may be inconsistent (the LSN in
.ibd
file might smaller than in-memory pages). InnoDB leverages write ahead log in terms of performance, so there is no command available to flush all dirty pages. Only internal mechanism manages when and where to persist pages to disk, like Page Cleaner thread, adaptive flushing, etc.Currently the InnoDB adapter does not aware row count and cardinality of a
.ibd
data file, so it will only rely on simple rules to perform optimization, once underlying storage engine could provide such metrics and metadata, this can be integrated in Calcite by leveraging cost based optimization in the future.