Re-implement compression dictionaries in 8.0

General

Escalation

General

Escalation

Description

High Level Description:

5.7 stores compression dictionaries as dictionary tables. In 8.0, we can no longer have dictionary tables in InnoDB. The aim of this task is to provide support for storing compression dictionaries in 8.0 and also migrate the compression dictionaries data from 5.7 to 8.0 server tables (upgrade).

High Level Specification

The main objective of this task is to have the Dictionary based compressed columns feature ported to 8.0. Key objectives are

To have fully functional implementation of the feature
Port SQL syntax from 5.7
New APIs to interact with InnoDB. Required for upgrade only
Port dictionary tables (ie. InnoDB SYS_ZIP_* to generic SYSTEM tables in 8.0)
Port I_S views from 5.7
Changes required to handle DDLs with compressed columns

Handle upgrade of tables with compressed columns from 5.7 to 8.0

Key points identified from design (w.r.t to new DD only) in 8.0

Persistent storage for compressed dictionaries
Data dictionary changes
TABLE, TABLE_SHARE & handler/DD API changes
Field class changes
DDL Changes
SELECT and DML changes
Concurrency
I_S views
Upgrade from 5.7 to 8.0 DD tables

Persistent storage for compressed dictionaries

Create tables SYS_ZIP_DICT & SYS_ZIP_DICT_COLS equivalent in DD. They shouldn’t be a core DD table but like a SYSTEM TABLE (mysql.func and plugin etc)

During bootstrap create the below tables in mysql database.

For datadirs that are created by Oracle, how will Percona Server create these two system tables ? (TBD).

One idea could be to modify the startup process to check for these tables and create them if they don’t exist

CREATE TABLE compression_dictionary(

id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,

name VARCHAR(64) NOT NULL,

version UNSIGNED NOT NULL,

data BLOB NOT NULL, KEY name_idx(name)) ENGINE=InnoDB CHARACTER SET=utf8mb4 COLLATE=utf8mb4_general_ci STATS_PERSISTENT=0 TABLESPACE=mysql;

CREATE TABLE compression_dictionary_cols(

table_id INT UNSIGNED NOT NULL,

column_id INT UNSIGNED NOT NULL,

dict_id INT UNSIGNED NOT NULL,

FOREIGN KEY fk_dict(dict_id)

REFERENCES compression_dictionary(id), PRIMARY KEY(table_id, colum_id)) ON DELETE

RESTRICT ON UPDATE RESTRICT ENGINE=InnoDB CHARACTER SET=utf8mb4 COLLATE=utf8mb4_general_ci STATS_PERSISTENT=0 TABLESPACE=mysql;

The tables in 8.0 will not belong InnoDB SE(i.e. they are not DDSE_PRIVATE or DDSE_PROTECTED tables). (allows other engines to utilize this). FK relationship between these two tables simplifies the process for deleting dictionary entry which has references to existing tables.

Data dictionary changes:

“COLUMN_FORMAT” is stored as dd::Column::options(column_format=X) by DD. The enum type will be extended for COMPRESSED.
The dict_string=”name” will be stored in dd::Column::options

TABLE, TABLE_SHARE & handler API changes

When a table is loaded, TABLE, TABLE_SHARE* are created for a table. When creating a column object from dd::Column, the extra options have to be parsed to find dict_string.

Find the dict_data for a dict_string

Handle Column creation to parse extra options/se_private_data
Find dict_string
Use handler APIs to get compressed dict str data for given dict_string
Take MDL on compression_dictionary table
Create search KEY and use handler index_read() API to get dict_data

Store dict_string data in TABLE_SHARE/TABLE*

Field class changes

Once TABLE/TABLE_SHARE* has the information, it should be easy to transfer this to a Field class. (Field class should be extended to have new members, just like in 5.7)

DDL Changes

CREATE DICTIONARY name with STRING

open SYSTEM table compression_dictionary
Take appropriate MDL lock (see below in concurrency section)
Prepare the row to be inserted
use table->file->write_row() interface to write to table

CREATE TABLE WITH COLUMN REFERRING DICTIONARY NAME

In-memory dd::Column should have fields zip_dict_name, zip_dict_data(where? as part of options or ?)
On-disk dd::Column will have zip_dict_name, zip_dict_id. (no need for storing zip_dict_data, this can be loaded from compression_dictionary table using dict_name/dict_id)
Field structure should also have fields zip_dict_name, zip_dict_data
From SQL, we get only name, open compression_dictionary table to search for name (when this would be done? When TABLE_SHARE is created?) -> mostly at ha_create_table()
This would use the same DD trx
Will inherit same trx isolation level (RC?)
Open compression_dictionary table
Prepare the search row
Use handler->index_read() APIs
Parse the return value & result

If row exists, we need prevent the row from disappearing (by DROP COMPRESSION_DICTIONARY). Place MDL lock on dictionary table. See below “CONCURRENCY” section
After(or Before?) table creation is successful and before trx commit, insert a row into compression_dictionary_columns
Open compression_dictionary_columns table (with TL_CONCURRENT_INSERT?)
For each column with dictionary entry {
Prepare the row to be inserted with dictionary_id (fetched above), column_id of the current column, table_id of the current table)
Use handler->write_row() API

When DD transaction commits, the dd::Column contents would be persisted to disk.

DROP TABLE:

At ha_delete_table() or earlier (where ever dd::Table is available)
Open compression_dictionary_cols() table, acquire shared MDL(transaction scope)
Do an index_read() with search key (WHERE table_id = dd::Table_id)
For each row, cal handler->delete_row() interface
This will be done as part of DD trx that either commits or rollbacks(releases MDL)

DROP DICTIONARY:

Acquire MDL Exclusive on comressed_dictionary and then on compression_dictionary_cols.

For given name, check if entry exists in compression_dictionary table
If it exists, just try to delete it.(the FK relationship will prevent delete from parent, when an entry in child exists)

ALTER TABLE:

When ALTER should check if COMPRESSION DICTIONARY name is valid?

5.7 verified when creating table_share(get_table_share()>open_table_def()>open_binary_frm()->ha_innobase::update_field_defs_with_zip_dict_info())
At similar stage the dictionary entry validity should be checked

ALTER TABLE t1 DROP COLUMN that has COMPRESSION DICTIONARY

There is a new table creation and old copy of table is dropped
As part of old copy drop, references in compression_dictionary_cols should be removed
Should be done at place where server drops old copy of table.
MDL_shared should be acquired for deleting row in compression_dictionary_cols

ALTER TABLE t1 ADD COLUMN that has COMPRESSION DICTIONARY

ha_create_table() changes required
As part of create table that the additional column, server calls ha_create_table() with new column and COMPRESSION DICTIONARY
MDL acquisition and the search for compression dictionary entry should be done (where?)
Shouldn’t have impact on instant add column

SELECT and DML Changes:

How DMLs will access dictionary data for a column?

Previously(in 5.7), update_field_defs_with_zip_dict_info() is used by server to get data from InnoDB and fill Field class structure. Then from Field class, when innodb builds template, the zip_dict_data is stored as part of InnoDB template (part of prebuilt).

In 8.0, server will directly open compression_dictionary table for a given column. The column will have dict_id & dict_name. Using the dict_id/dict_name, it will search on compression_dictionary table( See above for the steps) and fill the Field class zip_dict members. InnoDB will build template from Field class (this is same as 5.7)

build_template_field().

CONCURRENCY:

MDL:

The scope of MDL used should be of “TRANSACTION” scope, i.e. released on transaction commit or rollback. Reads will use MDL_SHARED_READ on compression_dictionary or compression_dictionary_cols tables. Writes will use MDL_SHARED_NO_READ_WRITE

(There is scope for improvement here, which can be handled later)

compression_dictionary table:

READ:

by CREATE TABLE , DROP TABLE, ALTER t1 ADD COLUMN, SELECT, DMLS
Above SQLs should place an MDL_SHARED_READ on the compression_dictionary table.
(Also take MDL_SHARED_READ on compression_dictionary_cols)
They shouldn’t block each other

INSERT:

CREATE DICTIONARY
Should acquire MDL_SHARED_NO_READ_WRITE

DELETE:

DROP DICTIONARY
Should acquire MDL MDL_SHARED_NO_READ_WRITE

compression_dictionary_cols table:

READ:

DROP DICTIONARY acquires MDL_SHARED_READ

INSERT:

CREATE TABLE, ALTER TABLE t1 ADD COLUMN
Acquire MDL_SHARED_NO_READ_WRITE on compression_dictionary_cols

DELETE:

DROP TABLE, ALTER TABLE t1 DROP COLUMN
Acquire MDL_SHARED_NO_READ_WRITE on comperssed_dictionary_cols

I_S views

Since these tables are of SYSTEM type, a direct SELECT access can be provided. Things to take care are 1) locking or non locking reads 2) MDL type (shared) 3) disallow write access 4) disallow DDLs

If adding above constraints is difficult, then we disallow complete access to COMPRESSION DICTIONARY tables and add I_S views.

Upgrade

Once the DD API access to the new dictionary table is working, during upgrade

create the two dictionary tables
Create new handler API that provides access SYS_ZIP_DICT
Server should get each dictionary row and store in the compression_dictionary table
As part of dd_upgrade_table(), we have both dd::Table and dict_table_t*. Server would mark dd::Table::column as compressed. For such columns, using the dict_table_t, find if an entry exists in SYS_ZIP_DICT_COLS. Get the dict_id, and using dict_id get the dict_name.
Set dd::Column::options (dict_name=”blah”)
Drop the INNODB_SYS_ZIP_* tables at the end of successful upgrade

Environment

None

Linked issues

Smart Checklist

Activity

Show:

Laurynas Biveinis March 6, 2019 at 1:27 PM

See and

Satya Bodapati December 7, 2018 at 6:35 AM

Yes

roel.vandepaar December 6, 2018 at 10:10 PM

ALL code is in trunk - correct?

Satya Bodapati November 20, 2018 at 5:09 AM

Yes, upgrade is complete. Patch is in review

roel.vandepaar November 15, 2018 at 11:58 PM

is upgrade complete?

Done

Details
Assignee
Satya Bodapati
Reporter
Laurynas Biveinis(Deactivated)
Labels
compressed-columnsqablock
Time tracking
6h 28m logged
Fix versions
8.0.13-3
Affects versions
8.0.x
Not 5.5.x
Not 5.6.x
Not 5.7.x
Priority
Critical

Smart Checklist

Created September 3, 2018 at 7:29 AM

Updated March 6, 2024 at 12:52 PM

Resolved November 28, 2018 at 4:14 AM

Configure

Re-implement compression dictionaries in 8.0

Description

High Level Description:

High Level Specification

Persistent storage for compressed dictionaries

Data dictionary changes:

TABLE, TABLE_SHARE & handler API changes

Field class changes

DDL Changes

SELECT and DML Changes:

CONCURRENCY:

I_S views

Environment

Linked issues

blocks

causes

has to be done before

is duplicated by

relates to

Smart Checklist

Activity

Laurynas Biveinis March 6, 2019 at 1:27 PM

Satya Bodapati December 7, 2018 at 6:35 AM

roel.vandepaar December 6, 2018 at 10:10 PM

Satya Bodapati November 20, 2018 at 5:09 AM

roel.vandepaar November 15, 2018 at 11:58 PM

DetailsAssigneeSatya BodapatiSatya BodapatiReporterLaurynas BiveinisLaurynas Biveinis(Deactivated)Labelscompressed-columnsqablockTime tracking6h 28m loggedFix versions8.0.13-3Affects versions8.0.xNot 5.5.xNot 5.6.xNot 5.7.xPriorityCritical

Details

Assignee

Reporter

Labels

Time tracking

Fix versions

Affects versions

Priority

Smart ChecklistOpen Smart Checklist

Smart Checklist

Details
Assignee
Satya Bodapati
Reporter
Laurynas Biveinis(Deactivated)
Labels
compressed-columnsqablock
Time tracking
6h 28m logged
Fix versions
8.0.13-3
Affects versions
8.0.x
Not 5.5.x
Not 5.6.x
Not 5.7.x
Priority
Critical

Smart Checklist