Vector v0.23.0 release notes

The Vector team is pleased to announce version 0.23.0!

Be sure to check out the upgrade guide for breaking changes in this release.

In addition to the new features, enhancements, and fixes listed below, this release adds:

  • Support for loading secrets from an external process. See the release highlight for details.
  • Support for new encoding options to all sinks that support codecs, that mirror the decoding options available on sources. This allows for more codecs (like json and logfmt) and framings (like newline-delimited and length-delimited) to be used on more sinks. See the release highlight for details.
Upgrading Vector
When upgrading, we recommend stepping through minor versions as these can each contain breaking changes while Vector is pre-1.0. These breaking changes are noted in their respective upgrade guides.

Known issues

  • Vector shuts down when a configured source codec (decoding.codec) receives invalid data. Fixed in v0.23.1.
  • The elasticsearch sink doesn’t evaluate templated configuration options like the index configuration before applying the only_fields and except_fields options, causing templates to fail to be evaluated if they used a field that was dropped. Fixed in v0.23.1.
  • The datadog_traces sink APM stats calculation does not correctly aggregate the stats in the way that is expected by the APM backend of Datadog, causing incorrect individual span metrics observed in the Datadog UI. Fixed in v0.25.2.

Changelog

16 enhancements

  • VRL’s is_json function now takes a variant argument making it easier to assert the type of the JSON value; for example that the text is a JSON object. Thanks to nabokihms for contributing this change!
  • The kubernetes_logs source now annotates logs with node labels. This requires updating to the version

    = 0.11.0 of the helm chart or adding the node resource to the allowed actions for the Vector pod. See the upgrade guide for more details.

    Thanks to nabokihms for contributing this change!
  • VRL’s parse_nginx_log function now parses out the upstream value if it exists in the log line. Thanks to nabokihms for contributing this change!
  • The geoip transform now includes the following additional fields:

    • country_name
    • region_code
    • region_name
    • metro_code

    This brings the transform up to parity with the fields enriched by Logstash’s geoip filter (though the field names are not the same).

  • The datadog_metrics sink now achieves a better compression ratio, when compression is enabled, due to sorting the metrics before compression when transmitting them.
  • The azure_blob sink now supports loading credentials from environment variables and via the managed identity service. To use this, set the new storage_account parameter. Thanks to yvespp for contributing this change!
  • The prometheus_scrape source now sets the Accept header to text/plain when requesting metrics. This improves compatibility with Prometheus exporters like keyclock which require this header.
  • The datadog_agent source is now able to accept traces from newer Datadog Agents (version >= 7.33).
  • VRL’s parse_nginx_log function now correctly parses:

    • rate limit errors
    • log entries including the user field
    Thanks to nabokihms for contributing this change!
  • VRL’s diagnostic error messages have had a number of improvements in this release which should help users more quickly identify errors in their VRL code.

    See the originating RFC for full details about the improvements that were made.

  • VRL’s log function now correctly independently rate limits multiple log calls in a single remap transform. Previously they were mistakenly rate limited all together.
  • The splunk_hec_logs sink now has a new option to configure Vector to not set a timestamp on the event when sending it to Splunk: suppress_timestamp. This allows Splunk to set the timestamp during ingestion.
  • The following sinks now allow configuration of end-to-end acknowledgements (via acknowledgements):

    • console
    • nats
    • websocket
  • The pulsar sink now allows authentication via OAuth2 via new auth.oauth2 configuration option. Thanks to fantapsody for contributing this change!
  • The performance of the gcp_pubsub source has been improved by automatically scaling up the number of consumers within Vector to a maximum of the newly added max_concurrency option (defaults to 10).
  • All GCP components now correctly apply the auth.api_key option. This was previously only supported by the gcp_pubsub source and sink and ignored by all other GCP components.

5 new features

  • Support was added for new encoding options to all sinks that support codecs, that mirror the decoding options available on sources. This allows for more codecs (like json and logfmt) and framings (like newline-delimited and octet-framing) to be used on more sinks. See the release highlight for details.
  • Vector now has a mechanism for loading secrets into configuration by executing an external program. See the release highlight for details.
  • The aws_cloudwatch_logs sink now allows configuration of request headers via the headers option. This was primarily added to allow setting the x-amzn-logs-format header when sending Embedded Metric Format logs to AWS CloudWatch Logs. Thanks to hencrice for contributing this change!
  • TCP-based sources like the socket source can now be configured to annotate events with the TLS client certificate of the connection the events came from via setting the tls.peer_key configuration option. Thanks to JustinKnueppel for contributing this change!
  • The splunk_hec_logs sink can now be configured to send to the Splunk HEC raw endpoint via the added endpoint_target option. The default is still the event endpoint.

21 bug fixes

  • The kubernetes_logs source no longer leaks resources (Tokio tasks) during configuration reload. Thanks to nabokihms for contributing this change!
  • The Vector SystemD unit file installed by the Debian package no longer automatically starts Vector. This seems to be more expected by users, as the default configuration is only useful as an example, and matches the behavior of the RPM. Thanks to akx for contributing this change!
  • The vector source now reports the correct number of bytes received in component_received_bytes_total.
  • The aws_ec2_metadata transform now has a lower default request timeout, 1 second rather than 60 seconds, to allow Vector to fail more quickly if the IMDSv2 is unavailable. This can be configured via the new refresh_timeout_secs option.
  • The tag_cardinality_limit now correctly deserializes the action option. Previously it would return an error when trying to configure this option.
  • There were some situations where VRL didn’t calculate the correct type definition of values which were fixed in this release. In some cases this can cause VRL compilation errors when upgrading if the code relied on the previous behavior due to unneeded type assertions. The VRL diagnostic error messages should guide you towards resolving them.

    This affects the following:

    • the “merge” operator (| or |=) on objects that share keys with different types
    • if statements
    • nullability checking for most expressions (usually related to if statements)
    • expressions that contain the abort expression
    • the del function
    • closure arguments

    See the upgrade guide for more details.

  • When end-to-end acknowledgements are enabled, the following sources now correctly handle negative acknowledgements by halting processing :

    • kafka
    • journald
    • file

    Previously these sources would continue processing, potentially resulting in dropped data.

  • The datadog_traces sink now calculates statistics from incoming and forwards them to Datadog for use by the APM product.
  • The disk buffers no longer panic on recoverable errors when reading from the buffer.
  • The datadog_agent source now correctly parses the namespace of incoming metrics from the agent by looking for the first .. For example a metric of system.bytes_read would have a namespace of system and a name of bytes_read. This fixes interoperability issues with the datadog_metrics sink which is capable of adding a default namespace if the incoming metrics do not have one.
  • VRL’s parse_aws_cloudwatch_log_subscription_message type definition was corrected so the .events field is correctly identified as an array of objects rather than an object.
  • VRL’s parse_int function now correctly parses the string 0 as 0 without setting the base. Previously it would return an error. Thanks to shenxn for contributing this change!
  • Vector’s parsing of Syslog messages now preserves empty structured data elements rather than dropping them. This affects the syslog source, the syslog codec, and the parse_syslog VRL function.
  • The gcp_pubsub source now sends heartbeats to the server to avoid inactivity timeouts. The default for this is 75 seconds but can be configured via the added keepalive_secs parameter.
  • The gcp_pubsub source configuration options ending in _seconds were renamed to end in _secs to match other Vector configuration options that take a number of seconds. The original names are aliased, but deprecated so configuration should be updated to use the new names.
  • The datadog_logs sink now correctly retries requests due to aborted connections.
  • The pulsar sink now sends the event timestamp as the message timestamp, if the event has one. Thanks to fantapsody for contributing this change!
  • Vector no longer becomes unresponsive when receiving more than two SIGHUPs, instead it will warn when the signal handler channel has overflowed. Thanks to wjordan for contributing this change!
  • Disk buffers now correctly apply the maximum size configured. Previously it could write an additional 128 MB. As a part of this, the minimum configurable disk buffer size is now 256 MB.
  • Bring supported dnstap proto definitions up-to-date with upstream adding support for DoT, DoH and DNSCrypt SocketProtocol values. Thanks to franklymrshankley for contributing this change!
  • The syslog source and VRL’s parse_syslog structured data fields were made consistent in their handling. See the upgrade guide for more details.

What’s next

OpenTelemetry support
We plan to focus on adding Open Telemetry support to Vector in the form of an opentelemetry source and sink in Q3 (starting from the contribution from caibirdme)!.
Improving Vector's delivery guarantees
Another focus of Q3 for us will be shoring up Vector’s delivery guarantees to eliminate the possibilities of Vector unintentionally dropping data once it has accepted it (when end-to-end acknowledgements are enabled, which we also intend to make the default eventually!).

Download Version 0.23.0

macOS
tar.gz
Windows
zip
Windows (MSI)
msi