Switching the default implementation of disk buffers to disk_v2
After much testing, we’re promoting disk_v2
to stable to bring performance and efficiency benefits to everyone using disk buffers.
Back in February, we announced the beta release of our new, reworked disk buffer implementation –
the so-called disk_v2
buffer – as part of Vector 0.20.0. Today, we’re excited to announce that
disk_v2
is now considered stable and is being promoted to the default implementation for disk
buffers.
Wait a minute, what’s disk_v2
? I use a disk
buffer
As part of reworking the implementation of disk buffers, we needed to write the new implementation alongside the existing disk buffers. This was necessary so that we could build confidence in the new implementation before making it the default, as well to provide us time to write the necessary documentation and migration procedures and so on.
We called the new implementation disk_v2
to distinguish it from disk
. You were able to specify
this in your buffer configuration to opt in to using them when in beta. Now that we’re comfortable
marking the new implementation as stable, we’ve changed their name so that disk_v2
is now disk
,
and what used to be disk
is now disk_v1
.
Do I have to do anything to migrate? What happens to my data?
While there were many reasons to write a new implementation of disk buffers – fewer code
dependencies, more consistent performance, better guarantees around data durability – we’ve tried
to keep the user experience foremost in our minds: switching to using disk_v2
should be as
painless as possible.
When running Vector 0.22.0, if we detect a disk buffer that was created with the old disk_v1
implementation, Vector will seamlessly migrate it to the new disk_v2
format and use the new format
going forward.
This does come with a few caveats:
- Vector needs free space to write into the new buffer as it’s migrating the old buffer over
- Vector will delete the old buffer once all records have been migrated
While we do try to maximize buffer compaction during the migration – basically, delete old data as it’s migrated – the process is eventually consistent, and migration won’t always stay at or below the configured maximum buffer size. With this in mind, it is best to plan for having free space beyond the configured maximum buffer size limit – in practice, 10 to 15% extra is sufficient – to allow the migration to complete successfully.
Additionally, as the migration process is destructive – the old buffer is migrated and then removed
– you may wish to make a copy of the buffer data directories (located under the data_dir
path
specified in your configuration) before running Vector 0.22.0. This will allow you to roll back to
Vector 0.21 (or earlier) if necessary.
Let us know what you think!
We’re still just as excited about the performance improvements to disk buffers, and have exciting plans for extending buffering capabilities as a whole. If you have any feedback for us, whether it’s related to the new disk buffers or anything else, let us know on Discord or on Twitter.