Increase in WHERE clauses causes query planner problems

Description

I am having a weird issue where adding more WHERE clauses to a query switches to a full table/index scan, but doesn't provide details in EXPLAIN. This increases the query execution time 1000x from ~40ms to ~40s. In this specific scenario, it happens when I go from 36 to 37 clauses.

 

The query looks like this

SELECT pinterest_id,domain_id,url_id FROM linkmetrics_pinterest
{{ WHERE (}}
{{  (domain_id=1 AND url_id=5 AND retrieved_at>=1549324800) OR }}
  (domain_id=2 AND url_id=6 AND retrieved_at>=1549324800) OR
  ...
{{ );}}

 

against this table

{{CREATE TABLE `linkmetrics_pinterest` (}}
`pinterest_id` bigint(20) NOT NULL,
`domain_id` bigint(20) NOT NULL,
`url_id` bigint(20) NOT NULL,
`retrieved_at` bigint(20) NOT NULL,
`shared` int(11) DEFAULT NULL,
PRIMARY KEY (`pinterest_id`),
KEY `linkmetrics_pinterest__urls__retrieved_at__idx` (`domain_id`,`url_id`,`retrieved_at`) COMMENT 'rev:default'
{{ ) ENGINE=ROCKSDB DEFAULT CHARSET=utf8}}

  

EXPLAIN is identical, except for the huge jump in rows scanned:

 

With 36 WHERE clauses

id: 1
select_type: SIMPLE
table: linkmetrics_pinterest
partitions: NULL
type: range
{{ possible_keys: linkmetrics_pinterest__urls__retrieved_at__idx}}
key: linkmetrics_pinterest__urls__retrieved_at__idx
key_len: 24
ref: NULL
rows: 36
filtered: 100.00
Extra: Using where; Using index

 

With 37 WHERE clauses

id: 1
select_type: SIMPLE
table: linkmetrics_pinterest
partitions: NULL
type: range
{{ possible_keys: linkmetrics_pinterest__urls__retrieved_at__idx}}
key: linkmetrics_pinterest__urls__retrieved_at__idx
key_len: 24
ref: NULL
rows: 8408659
filtered: 100.00
Extra: Using where; Using index

 

I tried running SHOW WARNINGS after the EXPLAIN to get extended output, but all that did was prefix my original query with: | Note | 1003 | /* select#1 */ 

I also tried running the same query with and without FORCE INDEX, and that didn't make any difference in the query plan. I also executed the same logical query by executing each where clause independently and doing a UNION ALL, which stayed in the ~40ms execution range, so there isn't another bottleneck slowing down the slightly larger query.

I'm happy to provide any further output that might help to make this easier to debug/reproduce.

Environment

Running on Ubuntu 18.04 inside of Docker

Attachments

1

is duplicated by

Smart Checklist

Activity

Show:

Derek Perkins March 12, 2019 at 6:44 PM

I created a more succinct issue here: https://jira.percona.com/browse/PS-5449

Derek Perkins February 26, 2019 at 7:25 AM

For completeness, I also ran this on the Percona distribution server based on CentOS (percona/percona-server:8.0.13), which took over a minute to execute, so it isn't related to the debian/ubuntu distribution.

Derek Perkins February 26, 2019 at 6:28 AM

I just ran the same two queries on Percona 5.7.25 with the same my.cnf settings, and it performed about the same as MariaDB. This is a severe performance regression in 8.0. I ran it using the percona/percona-server:5.7.25 image.

Derek Perkins February 25, 2019 at 11:59 PM

There is definitely a problem with the Percona query planner in this instance. The extra where clause is not matching a large amount of extra data.

 

I have gone ahead and run the same queries in MariaDB 10.3.13 with MyRocks. For the slow query, Percona is scanning 8.7M rows in 43 seconds, and MariaDB is scanning 122 rows in 8 ms. I originally thought that this bug might have been an upstream MyRocks bug, but this test shows that MyRocks is handling the larger query without slowdown. I'm not sure exactly how/where responsibility is bridged between the two, but this should help isolate the problem.

 

 

 

 

Lalit Choudhary February 25, 2019 at 11:31 AM

loaded given data and tested with provided rocksdb configuration in my.cnf

1st Query:
> 36 where clauses query takes
76 rows in set (43.40 sec)

2nd Query:
36 where clauses query takes
25 rows in set (0.01 sec)

3rd query:

19 rows in set (0.01 sec)

Here All 3 queries are different and provide different row count as output.  And about a number of where clause change result changes, because it's row scan count is changing.

  Again it's not just about 1 where extra condition adding/removing that 1 condition can have a large amount of data matching to that condition which results into increase in query execution time.

So it's not a bug.

Test:

 

 

 

Duplicate

Details

Assignee

Reporter

Components

Affects versions

Priority

Smart Checklist

Created February 12, 2019 at 7:08 PM
Updated March 6, 2024 at 12:21 PM
Resolved August 16, 2019 at 4:28 PM