Skip to content
Legacy docs for Tenzir v5.x. For the latest Tenzir v6 series, visit docs.tenzir.com. Migrating from v5? Read the Tenzir v6 migration guide.

This guide shows you how to parse text streams into structured events. You’ll learn to split byte streams on newlines or custom delimiters, and parse line-based formats like JSON lines, CSV, TSV, key-value pairs, Syslog, and CEF.

The examples use from_file with a parsing subpipeline to illustrate each technique.

Use read_lines to split a byte stream on newline characters. Given this input file:

app.log
2024-01-15 10:30:45 INFO Application started
2024-01-15 10:30:46 DEBUG Processing request

This pipeline produces one event per line:

from_file "app.log" {
read_lines
}
{line: "2024-01-15 10:30:45 INFO Application started"}
{line: "2024-01-15 10:30:46 DEBUG Processing request"}

The same pattern works for network streams:

from "tcp://0.0.0.0:9000" {
read_lines
}

Use read_delimited when records use separators other than newlines. Given this input file:

records.dat
first record|||second record|||third record

This pipeline splits on every occurrence of |||:

from_file "records.dat" {
read_delimited "|||"
}
{data: "first record"}
{data: "second record"}
{data: "third record"}

Some formats use blank lines to separate records, such as paragraphs or multi-line entries. Given this input file:

paragraphs.txt
First paragraph with
multiple lines.
Second paragraph here.
Third paragraph.

This pipeline splits on blank lines:

from_file "paragraphs.txt" {
read_delimited "\n\n"
}
{data: "First paragraph with\nmultiple lines."}
{data: "Second paragraph here."}
{data: "Third paragraph."}

Some protocols use null bytes as record terminators:

from "tcp://0.0.0.0:9000" {
read_delimited "\x00", binary=true
}

Add binary=true for non-UTF-8 data to produce blob output instead of string.

XML streams often contain multiple documents without a top-level wrapper. Use include_separator to keep the closing tag as part of each event:

from_file "windows_events.xml" {
read_delimited "</Event>\n", include_separator=true
}
this = data.parse_winlog()

See Windows Event Logs for a complete example.

Several read_* operators parse line-based formats directly into structured events.

Given this input file with dotted keys:

conn.jsonl
{"ts": "2024-01-15T10:30:45Z", "id.orig_h": "192.168.1.100", "id.orig_p": 52311, "id.resp_h": "93.184.216.34", "id.resp_p": 443}
{"ts": "2024-01-15T10:30:46Z", "id.orig_h": "192.168.1.101", "id.orig_p": 52312, "id.resp_h": "93.184.216.34", "id.resp_p": 80}

Use read_ndjson with unflatten_separator to convert dotted keys into nested records:

from_file "conn.jsonl" {
read_ndjson unflatten_separator="."
}
{ts: 2024-01-15T10:30:45Z, id: {orig_h: 192.168.1.100, orig_p: 52311, resp_h: 93.184.216.34, resp_p: 443}}
{ts: 2024-01-15T10:30:46Z, id: {orig_h: 192.168.1.101, orig_p: 52312, resp_h: 93.184.216.34, resp_p: 80}}

For regular JSON arrays or objects, use read_json instead.

Given this input file:

users.csv
id,name,email,role
1,alice,alice@example.com,admin
2,bob,bob@example.com,user
3,carol,carol@example.com,user

Use read_csv to parse the file with automatic header detection:

from_file "users.csv" {
read_csv
}
{id: 1, name: "alice", email: "alice@example.com", role: "admin"}
{id: 2, name: "bob", email: "bob@example.com", role: "user"}
{id: 3, name: "carol", email: "carol@example.com", role: "user"}

For tab-separated or space-separated data, use read_tsv or read_ssv. For custom delimiters, use read_xsv.

Given this input file:

records.txt
name=alice age=30
name=bob age=25
name=carol age=35

Use read_kv to parse each line as key-value pairs:

from_file "records.txt" {
read_kv
}
{name: "alice", age: 30}
{name: "bob", age: 25}
{name: "carol", age: 35}

Given this Common Event Format (CEF) input:

events.cef
CEF:0|Security|IDS|1.0|100|Intrusion detected|7|src=192.168.1.100 dst=10.0.0.1 spt=54321 dpt=443
CEF:0|Security|IDS|1.0|101|Malware found|9|src=192.168.1.101 dst=10.0.0.2 spt=12345 dpt=80

Use read_cef to parse security events:

from_file "events.cef" {
read_cef
}
{cef_version: 0, device_vendor: "Security", device_product: "IDS", device_version: "1.0", signature_id: "100", name: "Intrusion detected", severity: "7", extension: {src: 192.168.1.100, dst: 10.0.0.1, spt: 54321, dpt: 443}}
{cef_version: 0, device_vendor: "Security", device_product: "IDS", device_version: "1.0", signature_id: "101", name: "Malware found", severity: "9", extension: {src: 192.168.1.101, dst: 10.0.0.2, spt: 12345, dpt: 80}}

For IBM QRadar logs, use read_leef.

Given this input file:

syslog.txt
<14>Jan 15 10:30:45 myhost app[1234]: User logged in
<11>Jan 15 10:30:46 myhost app[1234]: Error occurred

Use read_syslog to parse each line:

from_file "syslog.txt" {
read_syslog
}
{facility: 1, severity: 6, timestamp: "Jan 15 10:30:45", hostname: "myhost", app_name: "app", process_id: "1234", content: "User logged in"}
{facility: 1, severity: 3, timestamp: "Jan 15 10:30:46", hostname: "myhost", app_name: "app", process_id: "1234", content: "Error occurred"}

Last updated: