This guide shows you how to fetch data from HTTP APIs using the
from_http and
http operators. You’ll learn to make GET
requests, handle authentication, and implement pagination for large result sets.
Choosing the Right Operator
Section titled “Choosing the Right Operator”Tenzir has two HTTP client operators that share nearly identical options:
from_httpis a source operator that starts a pipeline with an HTTP request. Use it for standalone API calls.httpis a transformation operator that enriches events flowing through a pipeline with HTTP responses. Use it when you have existing data and want to make per-event API lookups.
Most examples in this guide use from_http. Unless noted otherwise, the same
options work with http as well.
Basic API Requests
Section titled “Basic API Requests”Start with these fundamental patterns for making HTTP requests to APIs.
Simple GET Requests
Section titled “Simple GET Requests”To fetch data from an API endpoint, pass the URL as the first parameter:
from_http "https://api.example.com/data"The operator makes a GET request by default and forwards the response as an event.
Parsing the HTTP Response Body
Section titled “Parsing the HTTP Response Body”The from_http and http operators automatically determine how to parse the
HTTP response body using multiple methods:
-
URL-based inference: The operators first check the URL’s file extension to infer both the format (JSON, CSV, Parquet, etc.) and compression type (gzip, zstd, etc.). This works just like the generic
fromoperator. -
Header-based inference: If the format cannot be determined from the URL, the operators fall back to using the HTTP
Content-TypeandContent-Encodingresponse headers. -
Manual specification: You can always override automatic inference by providing a parsing pipeline.
Automatic Format and Compression Inference
Section titled “Automatic Format and Compression Inference”When the URL contains a recognizable file extension, the operators automatically handle decompression and parsing:
from_http "https://example.org/data/events.csv.zst"This automatically infers zstd compression and CSV format from the file
extension, decompresses, and parses accordingly.
For URLs without clear extensions, the operators use HTTP headers:
from_http "https://example.org/download"If the server responds with Content-Type: application/json and
Content-Encoding: gzip, the operator will decompress and parse as JSON.
Manual Format Specification
Section titled “Manual Format Specification”You can manually override the parser for the response body by specifying a
parsing pipeline, i.e., a pipeline that transforms bytes to events. For example,
if an API returns CSV data without a proper Content-Type, you can specify the
parsing pipeline as follows:
from_http "https://api.example.com/users" { read_csv}This parses the response from CSV into structured events that you can process further.
Similarly, if you need to handle specific compression and format combinations that aren’t automatically detected:
from_http "https://example.org/archive" { decompress_gzip read_json}This explicitly specifies to decompress gzip and then parse as JSON, regardless of the URL or HTTP headers.
POST Requests with Data
Section titled “POST Requests with Data”Send data to APIs by specifying the method parameter as “post” and providing
the request body in the body parameter:
from_http "https://api.example.com/users", method="post", body={"name": "John", "email": "john@example.com"}Similarly, with the http operator you can also parameterize the entire HTTP
request using event fields by referencing field values for each parameter:
from { url: "https://api.example.com/users", method: "post", data: { name: "John", email: "john@example.com" }}http url, method=method, body=dataThe operators automatically use POST method when you specify a body.
Request Configuration
Section titled “Request Configuration”Configure requests with headers, authentication, and other options for different API requirements.
Adding Headers
Section titled “Adding Headers”Include custom headers by providing the headers parameter as a record
containing key-value pairs:
from_http "https://api.example.com/data", headers={ "Authorization": "Bearer " + secret("YOUR_BEARER_TOKEN") }Headers help you authenticate with APIs and specify request formats. Use the
secret function to retrieve sensitive
API tokens, as in the above example.
TLS and Security
Section titled “TLS and Security”Configure TLS by passing a record to the tls parameter with certificate
paths:
from_http "https://secure-api.example.com/data", tls={ certfile: "/path/to/client.crt", keyfile: "/path/to/client.key", }Use these options when APIs require client certificate authentication.
To skip peer verification (e.g., for self-signed certificates in development):
from_http "https://dev-api.example.com/data", tls={skip_peer_verification: true}Timeout and Retry Configuration
Section titled “Timeout and Retry Configuration”Configure timeouts and retry behavior by setting the connection_timeout,
max_retry_count, and retry_delay parameters:
from_http "https://api.example.com/data", connection_timeout=10s, max_retry_count=3, retry_delay=2sThese settings help handle network issues and API rate limiting gracefully.
Data Enrichment
Section titled “Data Enrichment”Use HTTP requests to enrich existing data with information from external APIs.
Preserving Input Context
Section titled “Preserving Input Context”Keep original event data while adding API responses by specifying the
response_field parameter on the http operator to
control where the response is stored:
from { domain: "example.com", severity: "HIGH",}http f"https://threat-intel.example.com/lookup?domain={domain}", response_field=threat_dataThis approach preserves your original data and adds API responses in a specific field.
Accessing Response Metadata
Section titled “Accessing Response Metadata”With from_http, use the $response variable inside a parsing pipeline to
access HTTP status codes and headers:
from_http "https://api.example.com/status" { read_json status_code = $response.code server = $response.headers.Server}With the http operator, use the metadata_field parameter instead:
from {url: "https://api.example.com/status"}http url, metadata_field=http_metawhere http_meta.code >= 200 and http_meta.code < 300Pagination and Bulk Processing
Section titled “Pagination and Bulk Processing”Handle APIs that return large datasets across multiple pages.
Link Header Pagination
Section titled “Link Header Pagination”Many REST APIs (such as GitHub, GitLab, and Jira) include pagination URLs in the
HTTP Link response header following
RFC 8288. Use paginate="link"
to follow these automatically:
from_http "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10", paginate="link"The operator parses the Link header, finds the rel=next relation, and
continues fetching pages until the response no longer includes a next link.
This works with any API that returns a header like:
Link: <https://api.example.com/items?page=2>; rel="next"Relative URLs in the Link header are resolved against the request URL, so both
absolute and relative pagination links work correctly.
The same approach works with the http operator:
from {url: "https://api.github.com/repos/tenzir/tenzir/issues?per_page=10"}http url, paginate="link"Lambda-Based Pagination
Section titled “Lambda-Based Pagination”The http operator additionally supports
lambda-based pagination for APIs with custom pagination schemes. Provide a
lambda function to the paginate parameter that extracts the next page URL from
the response:
from {query: "tenzir"}http f"https://api.example.com/search?q={query}", paginate=(x => x.next_url if x.has_more)The operator continues making requests as long as the pagination lambda returns a valid URL.
You can also build pagination URLs dynamically:
let $base = "https://api.example.com/items"from {category: "security"}http f"{$base}?category={category}&page=1", paginate=(x => f"{$base}?category={category}&page={x.page + 1}" if x.page < x.total_pages)Rate Limiting
Section titled “Rate Limiting”Control request frequency by configuring the paginate_delay parameter to add
delays between requests and the parallel parameter to limit concurrent
requests:
from {domain: "example.com"}http f"https://api.example.com/scan?q={domain}", paginate=(x => x.next_url if x.has_next), paginate_delay=500ms, parallel=2Use paginate_delay and parallel to manage request rates appropriately.
Practical Examples
Section titled “Practical Examples”These examples demonstrate typical use cases for API integration in real-world scenarios.
API Monitoring
Section titled “API Monitoring”Monitor API health and response times:
from_http "https://api.example.com/health" { read_json date = $response.headers.Date.parse_time("%a, %d %b %Y %H:%M:%S %Z") latency = now() - date}The above example parses the Date header from the HTTP response via
parse_time into a timestamp and then
compares it to the current wallclock time using the
now function.
Error Handling
Section titled “Error Handling”Handle API errors and failures gracefully in your data pipelines.
Retry Configuration
Section titled “Retry Configuration”Configure automatic retries by setting the max_retry_count parameter to
specify the number of retry attempts and retry_delay to control the time
between retries:
from_http "https://unreliable-api.example.com/data", max_retry_count=5, retry_delay=2sStatus Code Handling
Section titled “Status Code Handling”Check HTTP status codes using the $response variable to handle different
response types:
from_http "https://api.example.com/data" { read_json where $response.code >= 200 and $response.code < 300}With the http operator, use metadata_field instead:
from {url: "https://api.example.com/data"}http url, metadata_field=metawhere meta.code >= 200 and meta.code < 300Best Practices
Section titled “Best Practices”Follow these practices for reliable and efficient API integration:
- Use appropriate timeouts. Set a reasonable
connection_timeoutfor your use case. - Implement retry logic. Configure
max_retry_countandretry_delayfor handling transient failures. - Respect rate limits. Use
parallelandpaginate_delayto control request rates. - Handle errors gracefully. Use
$responseinfrom_httpparsing pipelines ormetadata_fieldwithhttpto check status codes and implement fallback logic. - Secure credentials. Access API keys and tokens via secrets, not in code.
- Monitor API usage. Track response times and error rates for performance.
- Leverage automatic format inference. Use descriptive file extensions in URLs when possible to enable automatic format and compression detection.
- Prefer link pagination when available. Use
paginate="link"for APIs that support RFC 8288Linkheaders instead of writing custom lambda expressions.