Structuring, Shaping, and Transforming Data
Use Vector to parse, structure, shape, and transform observability data
Before you begin, this guide assumes the following:
- You understand the basic Vector concepts
- You understand how to set up a basic pipeline
Vector provides multiple transforms that you can use to modify your observability data as it passes through your Vector topology.
The transform that you will likely use most often is the remap
transform, which uses a single-purpose data transformation language called
Vector Remap Language (VRL for short) to define event
transformation logic. VRL has several features that should make it your first
choice for transforming data in Vector:
- It offers a wide range of observability-data-specific functions that map directly to observability use cases.
- It’s built for the very specific use case of working with Vector logs and metrics, which means that it has no extraneous functionality, its data model maps directly to Vector’s internal data model, and its performance is comparable to native Rust performance.
- The VRL compiler built into Vector performs several compile-time checks to ensure that your VRL code is sound, meaning no dead code, no unhandled errors, and no type mismatches.
In cases where VRL doesn’t fit your use case, Vector also offers a Lua runtime transform that offer a bit more flexibility than VRL but also come with downsides (listed below) that should always be borne in mind.
Transforming data using VRL
Let’s jump straight into an example of using VRL to modify some data. We’ll create a simple topology consisting of three components:
- A
demo_logs
source produces random Syslog messages at a rate of 10 per second. - A
remap
transform uses VRL to parse incoming Syslog lines into named fields (severity
,timestamp
, etc.). - A
console
sink pipes the output of the topology to stdout, so that we can see the results on the command line.
This configuration defines that topology:
sources:
logs:
type: demo_logs
format: syslog
interval: 0.1
transforms:
modify:
type: remap
inputs:
- logs
source: |
# Parse Syslog input. The "!" means that the script should abort on error.
. = parse_syslog!(.message)
sinks:
out:
type: console
inputs:
- modify
encoding:
codec: json
To start Vector using this topology:
vector --config /etc/vector/vector.yaml
You should see lines like this emitted via stdout (formatted for readability here):
{
"appname": "authsvc",
"facility": "daemon",
"hostname": "acmecorp.biz",
"message": "#hugops to everyone who has to deal with this",
"msgid": "ID486",
"procid": 5265,
"severity": "notice",
"timestamp": "2021-01-19T18:16:40.027Z"
}
So far, we’ve gotten Vector to parse the Syslog data but we’re not yet
modifying that data. So let’s update the source
script of our remap
transform to make some ad hoc transformations:
transforms:
modify:
type: remap
inputs:
- logs
source: |
. = parse_syslog!(.message)
# Convert the timestamp to a Unix timestamp, aborting on error
.timestamp = to_unix_timestamp!(.timestamp)
# Remove the "facility" and "procid" fields
del(.facility)
del(.procid)
# Replace the "msgid" field with a unique ID
.msgid = uuid_v4()
# If the log message contains the phrase "Great Scott!", set the new field
# "critical" to true, otherwise set it to false. If the "contains" function
# errors, log the error (instead of aborting the script, as above).
if (is_critical, err = contains(.message, "Great Scott!"); err != null) {
log(err, level: "error")
}
.critical = is_critical
A few things to notice about this script:
- Any errors thrown by VRL functions must be handled. Were we to neglect to
handle the potential error thrown by the
parse_syslog
function, for example, the VRL compiler would provide a very specific warning and Vector wouldn’t start up. - VRL has language constructs like variables,
if
statements, comments, and logging. - The
.
acts as a sort of “container” for the event data..
by itself refers to the root event, while you can use paths like.foo
,.foo[0]
,.foo.bar
,.foo.bar[0]
, and so on to reference subfields, array indices, and more.
If you stop and restart Vector, you should see log lines like this (again reformatted for readability):
{
"appname": "authsvc",
"hostname": "acmecorp.biz",
"message": "Great Scott! We're never gonna reach 88 mph with the flux capacitor in its current state!",
"msgid": "4e4437b6-13e8-43b3-b51e-c37bd46de490",
"severity": "notice",
"timestamp": 1611080200,
"critical": true
}
And that’s it! We’ve successfully created a Vector topology that transforms every event that passes through it. If you’d like to know more about VRL, we recommend checking out the following documentation:
- A full list listing of VRL functions
- VRL examples
- VRL expressions, which describes things VRL’s syntax and type system in great detail
Lua runtime transform
If VRL doesn’t cover your use case—and that should happen rarely—Vector also
offers a lua
runtime transform that you can use instead of
VRL. It enables you to run Lua code that you can include directly in
your Vector configuration
The lua
transform provides maximal flexibility because they enable you to use
a full-fledged programming language right inside of Vector. But we recommend
using it only when truly necessary, for several reasons:
- The
lua
transform makes it all too easy to write scripts that are slow, error prone, and hard to read. - It requires you to add a coding/testing/debugging workflow to using Vector, which is worth the effort if there’s no other way to satisfy your use case but best avoided if possible.
- It imposes a performance penalty vis-à-vis VRL.