0.23 Upgrade Guide
An upgrade guide that addresses breaking changes in 0.23.0
Vector’s 0.23.0 release includes breaking changes:
- The
.deb
package no longer enables and starts the Vector systemd service - VRL type definition updates
- “remove_empty” option dropped from VRL’s
parse_grok
andparse_groks
- VRL conditions are now checked for mutations at compile time
syslog
source and VRL’sparse_syslog
structured data fields made consistent- VRL VM beta runtime removed
gcp_pubsub
sink requires settingencoding
optionhumio_metrics
sink no longer hasencoding
option- New
framing
andencoding
options for sinks - Support for older OSes dropped
kubernetes_logs
source now requires rights to list and watch nodesdatadog_agent
source metrics now contain a namespace parsed from the event name
and deprecations:
- Shorthand values for
encoding
options deprecated - Sink encoding value
ndjson
is nowjson
encoding +newline_delimited
framing
We cover them below to help you upgrade quickly:
Upgrade guide
Breaking changes
The .deb
package no longer enables and starts the Vector systemd service
The official .deb
package
no longer automatically enables and starts the Vector systemd service.
This is in line with how the RPM package behaves.
To enable and start the service (after configuring it to your requirements),
you can use systemctl enable --now
:
systemctl enable --now vector
To just start the service without enabling it to run at system startup,
systemctl start vector
VRL type definition updates
There were many situations where VRL didn’t calculate the correct type definition. These are now fixed. In some cases this can cause compilation errors when upgrading if the code relied on the previous (incorrect) behavior.
This affects the following:
- the “merge” operator (
|
or|=
) on objects that share keys with different types - if statements
- nullability checking for most expressions (usually related to if statements)
- expressions that contain the
abort
expression - the
del
function - closure arguments
The best way to fix these issues is to let the compiler guide you through the problems, it will usually provide suggestions on how to fix the issue. Please give us feedback if you think any error diagnostics could be improved, we are continually trying to improve them.
The most common error you will probably see is the fallibility of a function changed because the type of one of the parameters changed.
For example, if you are trying to “split” a string, but the input could now be null, the error would look like this
error[E110]: invalid argument type
┌─ :1:7
│
1 │ split(msg, " ")
│ ^^^
│ │
│ this expression resolves to one of string or null
│ but the parameter "value" expects the exact type string
│
= try: ensuring an appropriate type at runtime
=
= msg = string!(msg)
= split(msg, " ")
=
= try: coercing to an appropriate type and specifying a default value as a fallback in case coercion fails
=
= msg = to_string(msg) ?? "default"
= split(msg, " ")
=
= see documentation about error handling at https://errors.vrl.dev/#handling
= learn more about error code 110 at https://errors.vrl.dev/110
= see language documentation at https://vrl.dev
= try your code in the VRL REPL, learn more at https://vrl.dev/examples
As suggested, you have a few options to solve errors like this.
- Abort if the arguments aren’t the right type by appending the function name with
!
, such asto_string!(msg)
- Force the type to be a string, using the
string
function. This function will error at runtime if the value isn’t the expected type. You can call it asstring!
to abort if it’s not the right type. - Provide a default value if the function fails using the “error coalescing” operator (
??
), such asto_string(msg) ?? "default"
- Handle the error manually by capturing both the return value and possible error, such as
result, err = to_string(msg)
“remove_empty” option dropped from VRL’s parse_grok
and parse_groks
The “remove_empty” argument has been dropped from both the parse_grok
and the
parse_groks
functions. Previously, these functions would return empty strings
for non-matching pattern names, but now they are not returned. To preserve the
old behavior, you can do something like the following to merge in empty strings
for each unmatched group:
parsed = parse_grok!(.message, "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}")
expected = { "timestamp": "", "level": "", "message": ""}
parsed = merge(expected, parsed)
VRL conditions are now checked for mutations at compile time
VRL conditions, for example those used in the filter
transform, are not supposed to mutate the event. Previously
the mutations would be silently ignored after a condition ran. Now the compiler has support for read-only values, and
will give a compile-time error if you try to mutate the event in a condition.
Example filter transform config
[transforms.filter]
type = "filter"
inputs = [ "input" ]
condition.type = "vrl"
condition.source = """
.foo = "bar"
true
"""
New error
error[E315]: mutation of read-only value
┌─ :1:1
│
1 │ .foo = "bar"
│ ^^^^^^ mutation of read-only value
│
= see language documentation at https://vrl.dev
syslog
source and VRL’s parse_syslog
structured data fields made consistent
Previously, the parse_syslog
VRL function and the syslog
source handled parsing the structured
data section of syslog messages differently:
- The
syslog
source inserted a field with the name of the structured data element, with the fields as keys in that map. It would create further nested maps if the structured data key names had.
s in them. - The
parse_syslog
function would instead prefix the structured data keys with the name of the structured data element they appeared in, but would insert this as a flat key/value structure rather than nesting (so that referencing keys would require quoting to escape the.
s).
With this release the behavior of both is now to parse the structured data section as a flat map of string key / string value, and insert it into the target under a field with the name of the structured data element.
That is:
<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message
Now returns (for both the syslog
source and the parse_syslog
function):
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473": {
"foo.baz": "bar"
},
"facility": "kern",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
Where previously VRL’s parse_syslog
function returned:
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473.foo.baz": "bar",
"facility": "kern",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
And the syslog
source returned:
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473": {
"foo": {
"baz": "bar"
}
},
"facility": "kern",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
The previous parse_syslog
behavior can be achieved by running the result through the flatten
function like:
flatten(parse_syslog!(s'<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message'))
VRL VM beta runtime removed
The experimental VM runtime for VRL-based components has been removed. The
stable AST runtime remains in place, and is now nearly identical in performance
to the VM runtime. If you have runtime = "vm"
configured in your config, you
need to remove it to avoid Vector from erroring on startup.
gcp_pubsub
sink requires setting encoding
option
The gcp_pubsub
sink now supports a variety of codecs. To encode your logs as JSON before
publishing them to Cloud Pub/Sub, add the following encoding option
encoding.codec = "json"
to the config of your gcp_pubsub
sink.
humio_metrics
sink no longer has encoding
option
The humio_metrics
sink configuration no longer expects an encoding
option.
If you previously used the encoding option
encoding.codec = "json"
you need to remove the line from your humio_metrics
config. Metrics are now
always sent to Humio using the JSON format.
New framing
and encoding
options for sinks
We streamlined the encoding configuration for our sinks, enabling all applicable sinks to select
from a variety of codecs: json
, text
, raw
, logfmt
, avro
, native
or native_json
, e.g.
by setting
encoding.codec = "json"
in your sink configuration.
Additionally, some sinks now support configuring how encoded events should be separated within a
stream or batch: bytes
, character_delimited
, length_delimited
or newline_delimited
, e.g. by
setting
framing.method = "newline_delimited"
in your sink configuration.
The following sinks support setting an encoding codec: aws_cloudwatch_logs
,
aws_kinesis_firehose
, aws_kinesis_streams
, aws_s3
, aws_sqs
, azure_blob
, console
, file
,
gcp_cloud_storage
, gcp_pubsub
, http
, humio_logs
, kafka
, loki
, nats
, papertrail
,
pulsar
, redis
, socket
, splunk_hec_logs
and websocket
.
Additionally, the following sinks support setting a framing method: aws_s3
, azure_blob
,
console
, file
, gcp_cloud_storage
, http
and socket
.
Support for older OSes dropped
Due to changes to the tool we use for cross-compiling Vector,
support for operating systems with old versions of libc
and libstdc++
were dropped for the
x86-unknown_linux-gnu
target. Vector now requires that the host system has libc
>= 2.18 and
libstdc++
>= 3.4.21 with support for ABI version 1.3.8.
Known OSes that this affects:
- Amazon Linux 1
- Ubuntu 14.04
- CentOS 7
We will be looking at options to re-add support for these OSes in the future.
kubernetes_logs
source now requires rights to list and watch nodes
Logs from Kubernetes pods are now annotated with a node’s labels on which a pod is running.
- For official helm-chart users, upgrade the chart to the version >= 0.11.0 before upgrading the vector version in your cluster.
- For custom vector installations, modify the cluster role assigned to the vector service account to include nodes. The result should look like the following snippet:
# Permissions to use Kubernetes API.
# Requires that RBAC authorization is enabled.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: vector
rules:
- apiGroups:
- ""
resources:
- namespaces
- nodes
- pods
verbs:
- list
- watch
datadog_agent
source metrics now contain a namespace parsed from the event name
Incoming events from the datadog_agent
source contain a name that is .
(period) delimited where the first element is the namespace eg “system.fs.inodes.total”. Before this release, the metric event outputted by the datadog_agent
source contained an empty namespace, and the name contained the full unparsed name from the Datadog Agent. In this release, the namespace is parsed out of the name. Taking the prior example, the event name becomes “fs.inodes.total” and the namespace is “system”.
This was introduced in order to better handle metrics sent from the datadog_agent
source to the datadog_metrics
sink, where previously they would be lacking a namespace and so would have one added by the sink if default_namespace
was set.
The result is that configurations with VRL expressions that expect the namespace to be in the name will need to be adapted to either remove the namespace from the name, or join the namespace and the name, for example:
full_metric_name = """
join([.namespace, .name], ".")
"""
Deprecations
Shorthand values for encoding
options deprecated
We are deprecating setting encoding options by a shorthand string. E.g. when your sink encoding used
encoding = "json"
it should now be replaced by explicitly setting the codec
encoding.codec = "json"
Sink encoding value ndjson
is now json
encoding + newline_delimited
framing
The ndjson
encoding value will be phased out since the newline_delimited
behavior may be either set by default or
can be set explicitly via a dedicated framing
option.
This affects all sink configurations that previously used
encoding.codec = "ndjson"
The http
, aws_s3
, gcp_cloud_storage
and azure_blob
sinks should be configured to use a combination of json
encoding and newline_delimited
framing instead
framing.method = "newline_delimited"
encoding.codec = "json"
For all other sinks, simply set the codec to json
to maintain the current behavior
encoding.codec = "json"