pmm-dump export for QAN data has scalability issues.

Description

Hi,

For some reason, pmm-dump export is generating always the same amount of tsv files, no matter the time filters used. This is most likely linked to how the data within clickhouse itself is gathered, but it's a major scalability issue that can render the tool unusable (running for 16 hrs without producing any data in one extreme customer case).

I haven't checked the code, but this is clear in the following tests. All arguments are the same except for start/end times, and in all cases it's processing the same amount of chunks (and generating the same amount of tsv files under ch/).

First we try with a 30-minute interval, then 10-min and finally 1-minute.

 

In this case, running times are not something to be worried about, but what happens if we increase the chunk size for the first run?

It takes fractions of a second now, instead of around 10 seconds, and only processes one chunk.

So, this improvement is for us to either re-check how we are getting this data from clickhouse, or to improve the defaults for --chunk-rows to something way larger than 1000.

Ideally, we shouldn't be generating a bunch of zero-sized tsv files, either, but since I haven't checked the code in depth, I'm not sure why it's done (or if it could be useful/needed).

For instance, in the following one we generated 584 zero byte tsv files, and none with actual data:

In a customer case it was even more extreme: 1 million zero byte tsv files, and 0 files with actual data. 

Let me know if I can provide any other input. 

 

Thanks!

Environment

None

AFFECTED CS IDs

CS0038700

Activity

Show:
Done

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Smart Checklist

Created August 24, 2023 at 2:12 AM
Updated March 7, 2024 at 9:35 AM
Resolved November 1, 2023 at 1:40 PM