The Vector team is pleased to announce version 0.22.0!
Be sure to check out the upgrade guide for breaking changes in this release.
Important: as part of this release, we have promoted the new implementation of disk buffers (buffer.type = "disk_v2"
) to the default implementation (buffer.type = "disk"
). Any existing disk buffers (disk_v1
or disk
) will be automatically migrated. We have rigorously tested this migration, but recommend making
a back up of the disk buffers (in the configured data_dir
, typically in /var/lib/vector
) to roll back if
necessary. Please see the release highlight for additional
updates about this migration.
In addition to the new features, enhancements, and fixes listed below, this release adds:
lua
transform with the much more performant remap
transform.kafka
rather than being limited to the gRPC vector
source and sink.gcp_pubsub
) source to consume events from GCP PubSub.websocket
sink was added to send events to a remote websocket listener.We also made additional performance improvements this release increasing the average throughput by up to 50% for common topologies (see our soak test framework).
experiment | Δ mean | Δ mean % | confidence |
---|---|---|---|
splunk_transforms_splunk3 | 5.98MiB | 58.22 | 100.00% |
datadog_agent_remap_blackhole | 20.1MiB | 43.44 | 100.00% |
splunk_hec_route_s3 | 5.28MiB | 35.34 | 100.00% |
syslog_regex_logs2metric_ddmetrics | 1.84MiB | 15.62 | 100.00% |
syslog_log2metric_splunk_hec_metrics | 2.52MiB | 15.59 | 100.00% |
datadog_agent_remap_datadog_logs | 9.6MiB | 15.02 | 100.00% |
http_to_http_json | 2.78MiB | 13.19 | 100.00% |
syslog_humio_logs | 1.91MiB | 12.23 | 100.00% |
syslog_splunk_hec_logs | 1.85MiB | 12.11 | 100.00% |
syslog_loki | 1.42MiB | 9.53 | 100.00% |
journald
source deadlocks almost immediately (#12966). Fixed in v0.22.1.kubernetes_logs
source does not work with k3s/k3d (#12989). Fixed in v0.22.1.compression
or concurrency
options due to a deserialization failure (#12919). Fixed in v0.22.1.vector validate
no longer creates the socket (#13018). This causes the default SystemD unit file to fail to start Vector since it runs vector validate
before starting Vector. Fixed in v0.22.1.endpoint
including the full path of the request. For the aws_s3
sink this caused cardinality issues since the AWS S3 key is included in the URL. Fixed in v0.22.3.gcp_pubsub
source would log errors due to attempting to fetch too quickly when it has no acknowledgements to pass along. Fixed in v0.22.3.decoding.codec
) receives invalid data. Fixed in v0.23.1.journald
source now processes data more efficiently by continuing to read new data while waiting
for read data to be processed by Vector.Vector’s configuration interpolation of environment variables has been enhanced to both allow setting of default values and returning an error message if an expected environment variable is unset or empty. The syntax matches bash interpolation syntax:
${VARIABLE:-default}
evaluates to default if VARIABLE is unset or empty in the environment.${VARIABLE-default}
evaluates to default only if VARIABLE is unset in the environment.${VARIABLE:?err}
exits with an error message containing err if VARIABLE is unset or empty in the environment.${VARIABLE?err}
exits with an error message containing err if VARIABLE is unset in the environment.kubernetes_logs
source now tags emitted internal metrics with pod_namespace
.datadog_metrics
sink now supports sending aggregated summary metrics (typically scraped from
a Prometheus exporter) to Datadog. Previously these metrics were dropped at the sink.datadog_metrics
sink now supports sending aggregated summary metrics (typically scraped from
a Prometheus exporter) to Datadog. Previously these metrics were dropped at the sink.vector
user to the systemd-journal-remote
group to be able to
consume journald events from a remote system. This matches the Debian package.kubernetes_logs
source now allows configuration of extra_namespace_label_selector
which Vector
will to use select the pods to capture the logs of, if set, based on labels attached to the pod
namespace. This is similar to the extra_label_selector
option which applies to pod labels.
Thanks to
@anapsix
for contributing this change!kubernetes_logs
source now reads events in order whenever a pod log file rotates. Previously
Vector could start reading the new file before it finished processing the previous one, resulting in the
logs being out-of-order.
Thanks to
@sillent
for contributing this change!The parse_json
function now takes an optional max_depth
parameter to control how far it will recurse
when deserializing the event. Once the depth limit is hit, the remainder of the fields is left as raw
JSON in the deserialized event.
For example:
parse_json!("{"1": {"2": {"3": {"4": {"5": {"6": "finish"}}}}}}", max_depth: 5)
Yields:
{ "1": { "2": { "3": { "4": { "5": "{"6": "finish"}" } } } } }
The default remains no max depth limit.
Thanks to @nabokihms for contributing this change!component_received_events_count
histogram metric was added to record the sizes of event batches
passed around in Vector’s internal topology. Note that this is different than sink-level batching. It is
mostly useful for debugging performance issues in Vector due to low internal batching.http
source now allows configuration of the HTTP method to expect requests with via the new
method
option. Previously it only allowed POST requests.
Thanks to
@r3b-fish
for contributing this change!VRL now includes two new functions for encrypting and decrypting field values:
encrypt and decrypt.
A random_bytes function was added to make it easy to
generate initialization vectors for the encrypt
function.
See the highlight for more details about this new functionality.
socket
and syslog
sources now allow configuration of the permissions to use when creating a unix
socket via socket_file_mode
when mode = "unix"
is used.
Thanks to
@Sh4d1
for contributing this change!{{ some_variable }}
syntax. We will be
expanding support for templating over time. This does mean that any strings that had {{ }}
in them
already now need to be escaped. See the upgrade
guide for details.native
and native_json
. This makes it easier to send events between Vector instances
on transports like kafka
. It also makes it possible to send metrics to Vector from an external process
(such as when using the exec
source) without needing to use the lua
transform to convert logs to
metrics. Previously, these generic sources (like exec
or http
) could only receive logs. See the
release highlight for more about this new feature and how
to use it.gcp_pubsub
source was added for consuming events from GCP PubSub.websocket
sink was added for sending events to a remote websocket listener.
Thanks to
@zshell31
for contributing this change!is_json
function was added to VRL. This allows more efficient checking of whether the incoming
value is JSON vs. trying to parse it using parse_json
and checking if there was an error.
Thanks to
@nabokihms
for contributing this change!splunk_hec
source now correctly handles negative acknowledgements from sinks. Previously it would
mark the request including the rejected events as delivered. In Splunk’s acknowledgement protocol, this
means returning true
for the ackID
for the request, but now it correctly returns false
, indicating
the request is not acknowledged.The gcp_stackdriver_metrics
sink now requires configuration of labels at the top-level to match the
gcp_stackdriver_logs
sink. Previously these were nested under .labels
.
See the upgrade guide for more details.
new_relic
sink health check now considers any 200-level response a success. It used to require
a 200 which did not match what New Relic actually returns: 202.--config-dir
), Vector now ignores
subdirectories starting with a .
.aws_s3
sink now only sets the x-amz-tagging
header if tags are being applied. Specifying an
empty value was incompatible with Ceph..
and parse_xml
was corrected to be a map of any field/value rather than
specifically an empty map. This could cause later false positives with type issues during VRL compilation.VRL now correctly updates the type definition of variables defined in one scope, that are mutated in another.
For example:
foo = 1
{ foo = "bar" }
upcase(foo)
Would previously fail to compile because VRL thinks foo
is an integer when, in fact, it has been
reassigned to a string.
internal_metrics
source now correctly tags emitted metrics with host
and pid
when host_key
and pid_key
are configured, respectively, on the internal_metrics
source.socket
source now discards UDP frames greater than the configured max_length
(when mode = "udp"
). Previously these were truncated rather than discarded, which did not match the behavior when
mode = "tcp"
. All socket
source modes are now consistent with dropping messages greater than
max_length
.internal_logs
source occasionally missed some events generated early in Vector’s start-up, before
the component was initialized. This was remedied so that the internal_logs
source more reliably
captures start-up events.parse_ruby_hash
VRL function can now parse hashes that contain a symbol as the value, such as { "key" => :foo }
.log
function in VRL no longer wraps logged string values in quotes. This was causing double
quoting for sink encodings like json
.
Thanks to
@nabokihms
for contributing this change!aws_s3
source now handles S3 object keys that contain spaces. Previously Vector would encounter
a 404 when querying for objects due to not decoding spaces correctly from the SQS object notification.http
config provider now correctly repolls when an error is encountered.
Thanks to
@jorgebay
for contributing this change!logfmt
sink codec and as the encode_logfmt
function now correctly wrap values that contain
quotes ("
) in quotes and escape the inner quote.
Thanks to
@jalaziz
for contributing this change!v2
implementation. This means if you set type = "disk"
you will get the new buffer implementation. In a future release, we will remove the legacy
disk buffers. To continue using the v1
disk buffers, for now, set type = "disk_v1"
.Sign up to receive emails on the latest Vector content and new releases
Thank you for joining our Updates Newsletter