Currently xbstream requires chunks to arrive in order (offset 0 -> last chunk) followed by a EOF chunk to inform we should close the file.
We could get rid of this limitation by:
Reading Payload offset field and when writing the data back to disk, use seek to properly position the file to this position
Adjust EOF package to also contain the last offset of the file. Then when reading this package, we can update the in-memory structure of the file to know what is the expected last offset. When writting payload or sparse map chunk types, we can validate what is the last offset of the file, and if paylog offset + payload size match that offset, we close the file.
In most of the cases, we will close the file when EOF package arrives.
This will require a new version on xbstream protocol, we can do this by adjusting magic header to XBSTCK02
Nowadays random writes are not as bad as they were with spinning disk, so this might allow us to further parallelize xbcloud/xbstream stemming.
Currently xbstream requires chunks to arrive in order (offset 0 -> last chunk) followed by a EOF chunk to inform we should close the file.
We could get rid of this limitation by:
Reading
Payload offset
field and when writing the data back to disk, use seek to properly position the file to this positionAdjust EOF package to also contain the last offset of the file. Then when reading this package, we can update the in-memory structure of the file to know what is the expected last offset. When writting payload or sparse map chunk types, we can validate what is the last offset of the file, and if paylog offset + payload size match that offset, we close the file.
In most of the cases, we will close the file when EOF package arrives.
This will require a new version on xbstream protocol, we can do this by adjusting magic header to
XBSTCK02
Nowadays random writes are not as bad as they were with spinning disk, so this might allow us to further parallelize xbcloud/xbstream stemming.