Skip to content

This guide shows you how to reassemble a single logical message that a source split across several transport messages. You’ll learn to frame each segment, gather the segments of one message with an outer window and an inner group, and concatenate them back into one record you can parse.

Many log sources cap the size of a single transport message. When an event exceeds that cap, the source splits it into several smaller messages that share an identifier and carry a segment counter. The receiver sees a burst of partial messages that mean nothing on their own—you have to put them back together before you can parse the event.

A segmented message typically frames each part with a shared id, the total number of segments, and this segment’s number, followed by the payload:

f47ac10b-58cc 3 0 first part of the body
f47ac10b-58cc 3 1 second part
f47ac10b-58cc 3 2 third part

Two properties make this awkward:

  • Segments arrive interleaved. The network does not guarantee order, so segment 2 may reach you before segment 0. You can’t assume the first segment comes first.
  • The identifier is only unique for a while. A source reuses ids over time, so you need a time bound that tells two unrelated bursts apart.

Start by extracting the framing fields from every message. A fnparse_grok pattern pulls the shared id, the segment total, the segment number, and the payload that follows:

let $header = r"^(?<message_id>[\w-]+)\s+(?<segment_total>\d+)\s+(?<segment_number>\d+)\s+(?<payload>.*)$"
from {content: "f47ac10b-58cc 3 0 first part of the body"},
{content: "f47ac10b-58cc 3 1 second part"}
segment = content.parse_grok($header)
{
content: "f47ac10b-58cc 3 0 first part of the body",
segment: {
message_id: "f47ac10b-58cc",
segment_total: 3,
segment_number: 0,
payload: "first part of the body",
},
}
{
content: "f47ac10b-58cc 3 1 second part",
segment: {
message_id: "f47ac10b-58cc",
segment_total: 3,
segment_number: 1,
payload: "second part",
},
}

Adjust the pattern to match your source’s framing. Grok infers the counters as integers—so the segment number sorts numerically in the next step—and leaves the text id and payload as strings.

To reassemble a message, you need every segment that shares an id, but only the ones from the same burst. Combine two operators:

  • window bounds reassembly in event time, so an id reused later starts a fresh message instead of merging with the old one. Keeping it on the outside caps in-flight state at the few windows that are open at once.
  • group, inside the window, routes the segments that share an id through the same subpipeline so each message reassembles on its own.

Inside the group, sort the segments by number and collect their payloads with fncollect, which preserves the sorted order. Carry the message-level fields through with fnfirst:

from {message_id: "f47ac10b", segment_number: 2, ts: 2024-01-01T10:00:00, payload: "c=3"},
{message_id: "f47ac10b", segment_number: 0, ts: 2024-01-01T10:00:00, payload: "a=1"},
{message_id: "f47ac10b", segment_number: 1, ts: 2024-01-01T10:00:00, payload: "b=2"}
window size=30s, on=ts, idle_timeout=30s {
group message_id {
sort segment_number
summarize \
message_id=first(message_id),
ts=first(ts),
payloads=collect(payload)
}
}
message = payloads.join(", ")
drop payloads
{
message_id: "f47ac10b",
ts: 2024-01-01T10:00:00Z,
message: "a=1, b=2, c=3",
}

The three interleaved fragments become one record whose message field holds the complete body, ready to parse like any other event.

The window size bounds two things at once: how far apart a message’s segments may arrive, and how long the pipeline waits before emitting a reassembled message on a live source. Many sources send a message’s segments back-to-back, so a window of 30s leaves a wide margin while keeping latency low. Raise it if segments are delayed in transit; lower it for faster output. Set idle_timeout to the same value so a message flushes shortly after its last segment instead of waiting for the full window.

Reassembly windows on the on field, so each segment needs a usable event-time value before this step. If your source already carries a parsed timestamp, point on at it. If the timestamp still needs parsing—or omits the year, as BSD syslog does—derive a full timestamp first. See Work with time for parsing and reconstructing time values.

To exercise reassembly like a live feed, replay a fixture with delay, which sleeps between events in proportion to a timestamp field. Give each segment a send time so the segments arrive out of order and spread over real wall-clock time. Here two messages stream in over more than five seconds, yet a five-second window reassembles each one:

from {message_id: "f47ac10b", segment_number: 1, ts: 2024-01-01T10:00:00, send: 2024-01-01T10:00:00.0, payload: "world"},
{message_id: "f47ac10b", segment_number: 0, ts: 2024-01-01T10:00:00, send: 2024-01-01T10:00:01.0, payload: "hello"},
{message_id: "9c5b4d2e", segment_number: 0, ts: 2024-01-01T10:00:10, send: 2024-01-01T10:00:07.0, payload: "foo"},
{message_id: "9c5b4d2e", segment_number: 1, ts: 2024-01-01T10:00:10, send: 2024-01-01T10:00:08.0, payload: "bar"}
delay send
window size=5s, on=ts, idle_timeout=5s {
group message_id {
sort segment_number
summarize message_id=first(message_id), payloads=collect(payload)
}
}
message = payloads.join(" ")
drop payloads
{
message_id: "f47ac10b",
message: "hello world",
}
{
message_id: "9c5b4d2e",
message: "foo bar",
}

delay anchors the send times to the current wall clock, so the first message arrives within a second and the second only seven seconds later. Each message still comes out whole and streams out as its window closes: idle_timeout flushes the first message about five seconds after its last segment, before the second even arrives. What matters is the spread of one message’s own segments, not how long the whole stream runs. Remove delay to replay the fixture instantly as a deterministic test.

The Cisco package applies exactly this pattern to Cisco ISE syslog, which splits long messages at the source. It ships the reassembly and parsing steps as the reusable operators cisco::ise::reassemble and cisco::ise::parse, so a pipeline only has to read the data and supply an event-time field before calling them.

Last updated: