Vector Remap Language (VRL)
A domain-specific language for modifying your observability data
Vector Remap Language (VRL) is an expression-oriented language designed for transforming observability data (logs and metrics) in a safe and performant manner. It features a simple syntax and a rich set of built-in functions tailored specifically to observability use cases.
You can use VRL in Vector via the remap
transform. For a more
in-depth picture, see the announcement blog post.
Quickstart
VRL programs act on a single observability event and can be used to:
Those programs are specified as part of your Vector configuration. Here’s an
example remap
transform that contains a VRL program in the source
field:
transforms:
modify:
type: remap
inputs:
- logs
source: |
del(.user_info)
.timestamp = now()
This program changes the contents of each event that passes through this
transform, deleting the user_info
field and adding a timestamp
to the event.
Example: parsing JSON
Let’s have a look at a more complex example. Imagine that you’re working with HTTP log events that look like this:
{
"message": "{\"status\":200,\"timestamp\":\"2021-03-01T19:19:24.646170Z\",\"message\":\"SUCCESS\",\"username\":\"ub40fan4life\"}"
}
Let’s assume you want to apply a set of changes to each event that arrives to your Remap transform in order to produce an event with the following fields:
message
(string)status
(int)timestamp
(int)timestamp_str
(timestamp)
The following VRL program demonstrates how to achieve the above:
# Parse the raw string into a JSON object, this way we can manipulate fields.
. = parse_json!(string!(.message))
# At this point `.` is the following:
#{
# "message": "SUCCESS",
# "status": 200,
# "timestamp": "2021-03-01T19:19:24.646170Z",
# "username": "ub40fan4life"
#}
# Attempt to parse the timestamp that was in the original message.
# Note that `.timestamp` can be `null` if it wasn't present.
parsed_timestamp, err = parse_timestamp(.timestamp, format: "%Y-%m-%dT%H:%M:%S.%fZ")
# Check if the conversion was successful. Note here that all errors must be handled, more on that later.
if err == null {
# Note that the `to_unix_timestamp` expects a `timestamp` argument.
# The following will compile because `parse_timestamp` returns a `timestamp`.
.timestamp = to_unix_timestamp(parsed_timestamp)
} else {
# Conversion failed, in this case use the current time.
.timestamp = to_unix_timestamp(now())
}
# Convert back to timestamp for this tutorial.
.timestamp_str = from_unix_timestamp!(.timestamp)
# Remove the `username` field from the final target.
del(.username)
# Convert the `message` to lowercase.
.message = downcase(string!(.message))
Finally, the resulting event:
{
"message": "success",
"status": 200,
"timestamp": 1614644364,
"timestamp_str": "2021-03-02T00:19:24Z"
}
Example: filtering events
The JSON parsing program in the example above modifies the contents of each
event. But you can also use VRL to specify conditions, which convert events into
a single Boolean expression. Here’s an example filter
transform that
filters out all messages for which the severity
field equals "info"
:
transforms:
filter_out_info:
type: filter
inputs:
- logs
condition: '.severity != "info"'
Conditions can also be more multifaceted. This condition would filter out all
events for which the severity
field is "info"
, the status_code
field is
greater than or equal to 400, and the host
field isn’t set:
condition = '.severity != "info" && .status_code < 400 && exists(.host)'
Reference
All language constructs are contained in the following reference pages. Use these references as you write your VRL programs:
Learn
VRL is designed to minimize the learning curve. These resources can help you get acquainted with Vector and VRL:
There is an online VRL playground, where you can experiment with VRL.
Some functions are currently unsupported on the playground. Functions that are currently not supported can be found with this issue filter
The goals of VRL
VRL is built by the Vector team and its development is guided by two core goals, safety and performance, without compromising on flexibility. This makes VRL ideal for critical, performance-sensitive infrastructure, like observability pipelines. To illustrate how we achieve these, below is a VRL feature matrix across these principles:
Feature | Safety | Performance |
---|---|---|
Compilation | ✅ | ✅ |
Ergonomic safety | ✅ | ✅ |
Fail safety | ✅ | |
Memory safety | ✅ | |
Vector and Rust native | ✅ | ✅ |
Statelessness | ✅ | ✅ |
Concepts
VRL has some core concepts that you should be aware of as you dive in.
Assertions
VRL offers two functions that you can use to assert that VRL values conform to your
expectations: assert
and
assert_eq
. assert
aborts the VRL program and logs an
error if the provided Boolean expression evaluates to false
, while
assert_eq
fails logs an error if the provided values aren’t equal. Both functions also
enable you to provide custom log messages to be emitted upon failure.
When running Vector, assertions can be useful in situations where you need to be notified when any observability event fails a condition. When writing unit tests, assertions can provide granular insight into which test conditions have failed and why.
Boolean expressions
true
or false
). Boolean expressions can be of any complexity and have several uses in Vector,
including specifying conditions for routing and
filtering events.Event
VRL programs operate on observability events. This VRL program, for example, adds a field to a log event:
.new_field = "new value"
The event at hand, represented by .
, is the entire context of the VRL program.
The event can be set to a value other than an object, for example . = 5
. If it is set to
an array, each element of that array is emitted as its own event from the remap
transform. For any elements that aren’t an object, or if
the top-level .
is set to a scalar value, that value is set as the message
key on the
emitted object.
This expression, for example…
. = ["hello", 1, true, { "foo": "bar" }]
…results in these four events being emitted:
{ "message": "hello" }
{ "message": 1 }
{ "message": true }
{ "foo": "bar" }
Characteristics
Paths
Path expressions enable you to access values inside the event:
.kubernetes.pod_id
Expressions
Function
Characteristics
Deprecation
VRL functions can be marked as “deprecated”. When a function is deprecated, a warning will be shown at runtime.
Suggestions on how to update the VRL program can usually be found in the actual warning and the function documentation.
Fallibility
Some VRL functions are fallible, meaning that they can error. Any potential errors thrown by fallible functions must be handled, a requirement enforced at compile time.
This feature of VRL programs, which we call fail safety, is a defining characteristic of VRL and a primary source of its safety guarantees.
Literal
Transforming metrics
Program
Features
Compilation
VRL programs are compiled to and run as native Rust code. This has several important implications:
- VRL programs are extremely fast and efficient, with performance characteristics very close to Rust itself
- VRL has no runtime and thus imposes no per-event foreign function interface (FFI) or data conversion costs
- VRL has no garbage collection, which means no GC pauses and no accumulated memory usage across events
Characteristics
Fail safety checks
parse_syslog
function, for example, the VRL compiler aborts and provides a helpful error
message. Fail safety means that you need to make explicit decisions about how to handle potentially
malformed data—a superior alternative to being surprised by such issues when Vector is already handling
your data in production.Type safety checks
parse_syslog
function,
which can only take a string. VRL essentially forces you to write programs around the assumption that
every incoming event could be malformed, which provides a strong bulwark against both human error and
also the many potential consequences of malformed data.Ergonomic safety
Characteristics
Internal logging limitation
I/O limitation
Lack of custom functions
Purpose built for observability
parse_syslog
and
parse_key_value
, for example, make otherwise complex tasks simple
and prevent the need for complex low-level constructs.Rate-limited logging
log
function implements rate limiting by default. This ensures
that VRL programs invoking the log
method don’t accidentally saturate I/O.Fail safety
High-quality error messages
VRL strives to provide high-quality, helpful error messages, streamlining the development and iteration workflow around VRL programs.
This VRL program, for example…
parse_json!(1)
…would result in this error:
error[E110]: invalid argument type
┌─ :2:13
│
2 │ parse_json!(1)
│ ^
│ │
│ this expression resolves to the exact type integer
│ but the parameter "value" expects the exact type string
│
= try: ensuring an appropriate type at runtime
=
= 1 = string!(1)
= parse_json!(1)
=
= try: coercing to an appropriate type and specifying a default value as a fallback in case coercion fails
=
= 1 = to_string(1) ?? "default"
= parse_json!(1)
=
= see documentation about error handling at https://errors.vrl.dev/#handling
= learn more about error code 110 at https://errors.vrl.dev/110
= see language documentation at https://vrl.dev
= try your code in the VRL REPL, learn more at https://vrl.dev/examples
Memory safety
Vector & Rust native
Characteristics
Lack of garbage collection
Stateless
Type safety
Characteristics
Progressive type safety
VRL’s type-safety is progressive, meaning it will implement type-safety for any value for which it knows the type. Because observability data can be quite unpredictable, it’s not always known which type a field might be, hence the progressive nature of VRL’s type-safety. As VRL scripts are evaluated, type information is built up and used at compile-time to enforce type-safety. Let’s look at an example:
.foo # any
.foo = downcase!(.foo) # string
.foo = upcase(.foo) # string
Breaking down the above:
- The
.foo
field starts off as anany
type (AKA unknown). - The call to the
downcase!
function requires error handling (!
) since VRL cannot guarantee that.foo
is a string (the only type supported bydowncase
). - Afterwards, assuming the
downcase
invocation is successful, VRL knows that.foo
is a string, sincedowncase
can only return strings. - Finally, the call to
upcase
does not require error handling (!
) since VRL knows that.foo
is a string, making theupcase
invocation infallible.
To avoid error handling for argument errors, you can specify the types of your fields at the top of your VRL script:
.foo = string!(.foo) # string
.foo = downcase(.foo) # string
This is generally good practice, and it provides the ability to opt-into type safety as you see fit, VRL scripts are written once and evaluated many times, therefore the tradeoff for type safety will ensure reliable production execution.
Principles
Performance
Safety
Other examples
Parse Syslog logs
{
"message": "\u003c102\u003e1 2020-12-22T15:22:31.111Z vector-user.biz su 2666 ID389 - Something went wrong"
}
. |= parse_syslog!(.message)
{
"log": {
"appname": "su",
"facility": "ntp",
"hostname": "vector-user.biz",
"message": "Something went wrong",
"msgid": "ID389",
"procid": 2666,
"severity": "info",
"timestamp": "2020-12-22T15:22:31.111Z",
"version": 1
}
}
Parse key/value (logfmt) logs
{
"message": "@timestamp=\"Sun Jan 10 16:47:39 EST 2021\" level=info msg=\"Stopping all fetchers\" tag#production=stopping_fetchers id=ConsumerFetcherManager-1382721708341 module=kafka.consumer.ConsumerFetcherManager"
}
. = parse_key_value!(.message)
{
"log": {
"@timestamp": "Sun Jan 10 16:47:39 EST 2021",
"id": "ConsumerFetcherManager-1382721708341",
"level": "info",
"module": "kafka.consumer.ConsumerFetcherManager",
"msg": "Stopping all fetchers",
"tag#production": "stopping_fetchers"
}
}
Parse custom logs
{
"message": "2021/01/20 06:39:15 +0000 [error] 17755#17755: *3569904 open() \"/usr/share/nginx/html/test.php\" failed (2: No such file or directory), client: xxx.xxx.xxx.xxx, server: localhost, request: \"GET /test.php HTTP/1.1\", host: \"yyy.yyy.yyy.yyy\""
}
. |= parse_regex!(.message, r'^(?P<timestamp>\d+/\d+/\d+ \d+:\d+:\d+ \+\d+) \[(?P<severity>\w+)\] (?P<pid>\d+)#(?P<tid>\d+):(?: \*(?P<connid>\d+))? (?P<message>.*)$')
# Coerce parsed fields
.timestamp = parse_timestamp(.timestamp, "%Y/%m/%d %H:%M:%S %z") ?? now()
.pid = to_int!(.pid)
.tid = to_int!(.tid)
# Extract structured data
message_parts = split(.message, ", ", limit: 2)
structured = parse_key_value(message_parts[1], key_value_delimiter: ":", field_delimiter: ",") ?? {}
.message = message_parts[0]
. = merge(., structured)
{
"log": {
"client": "xxx.xxx.xxx.xxx",
"connid": "3569904",
"host": "yyy.yyy.yyy.yyy",
"message": "open() \"/usr/share/nginx/html/test.php\" failed (2: No such file or directory)",
"pid": 17755,
"request": "GET /test.php HTTP/1.1",
"server": "localhost",
"severity": "error",
"tid": 17755,
"timestamp": "2021-01-20T06:39:15Z"
}
}
Multiple parsing strategies
{
"message": "\u003c102\u003e1 2020-12-22T15:22:31.111Z vector-user.biz su 2666 ID389 - Something went wrong"
}
structured =
parse_syslog(.message) ??
parse_common_log(.message) ??
parse_regex!(.message, r'^(?P<timestamp>\d+/\d+/\d+ \d+:\d+:\d+) \[(?P<severity>\w+)\] (?P<pid>\d+)#(?P<tid>\d+):(?: \*(?P<connid>\d+))? (?P<message>.*)$')
. = merge(., structured)
{
"log": {
"appname": "su",
"facility": "ntp",
"hostname": "vector-user.biz",
"message": "Something went wrong",
"msgid": "ID389",
"procid": 2666,
"severity": "info",
"timestamp": "2020-12-22T15:22:31.111Z",
"version": 1
}
}
Modify metric tags
{
"counter": {
"value": 102
},
"kind": "incremental",
"name": "user_login_total",
"tags": {
"email": "vic@vector.dev",
"host": "my.host.com",
"instance_id": "abcd1234"
}
}
.tags.environment = get_env_var!("ENV") # add
.tags.hostname = del(.tags.host) # rename
del(.tags.email)
{
"metric": {
"counter": {
"value": 102
},
"kind": "incremental",
"name": "user_login_total",
"tags": {
"environment": "production",
"hostname": "my.host.com",
"instance_id": "abcd1234"
}
}
}
Emitting multiple logs from JSON
{
"message": "[{\"message\": \"first_log\"}, {\"message\": \"second_log\"}]"
}
. = parse_json!(.message) # sets `.` to an array of objects
[
{
"log": {
"message": "first_log"
}
},
{
"log": {
"message": "second_log"
}
}
]
Emitting multiple non-object logs from JSON
{
"message": "[5, true, \"hello\"]"
}
. = parse_json!(.message) # sets `.` to an array
[
{
"log": {
"message": 5
}
},
{
"log": {
"message": true
}
},
{
"log": {
"message": "hello"
}
}
]
Invalid argument type
{
"not_a_string": 1
}
upcase(42)
error[E110]: invalid argument type
┌─ :1:8
│
1 │ upcase(42)
│ ^^
│ │
│ this expression resolves to the exact type integer
│ but the parameter "value" expects the exact type string
│
= try: ensuring an appropriate type at runtime
=
= 42 = string!(42)
= upcase(42)
=
= try: coercing to an appropriate type and specifying a default value as a fallback in case coercion fails
=
= 42 = to_string(42) ?? "default"
= upcase(42)
=
= see documentation about error handling at https://errors.vrl.dev/#handling
= learn more about error code 110 at https://errors.vrl.dev/110
= see language documentation at https://vrl.dev
= try your code in the VRL REPL, learn more at https://vrl.dev/examples
Unhandled fallible assignment
{
"message": "key1=value1 key2=value2"
}
structured = parse_key_value(.message)
error[E103]: unhandled fallible assignment
┌─ :1:14
│
1 │ structured = parse_key_value(.message)
│ ------------ ^^^^^^^^^^^^^^^^^^^^^^^^^
│ │ │
│ │ this expression is fallible because at least one argument's type cannot be verified to be valid
│ │ update the expression to be infallible by adding a `!`: `parse_key_value!(.message)`
│ │ `.message` argument type is `any` and this function expected a parameter `value` of type `string`
│ or change this to an infallible assignment:
│ structured, err = parse_key_value(.message)
│
= see documentation about error handling at https://errors.vrl.dev/#handling
= see functions characteristics documentation at https://vrl.dev/expressions/#function-call-characteristics
= learn more about error code 103 at https://errors.vrl.dev/103
= see language documentation at https://vrl.dev
= try your code in the VRL REPL, learn more at https://vrl.dev/examples