Improved Delivery Coverage

TL;DR: Folder names stay the same and represent when the file was uploaded (UTC). Signals between that date and the previous folder should be unique. We now ship stragglers that were previously lost to processing delays. Every signal already has a unique signal_id, so customers who deduplicate on it will automatically get more data with no code changes.

What's changing

Better coverage, more signals.

Our pipeline now tracks each entity individually to ensure signals aren't lost between delivery cycles. Previously, processing delays could cause signals to fall through the cracks permanently. Now they ship in the next available delivery.

This was a silent data loss problem.

Signals that hit processing delays (LLM retries, late third-party data, API failures) would simply never appear. With this change, those signals land in the next batch. Since signal_id is already globally unique, customers who deduplicate on it benefit automatically.

We considered changing folder namingbut decided against it. The current format works and signal-level date fields (e.g. data.filing_date on SEC filings, data.posted_date on LinkedIn posts) handle time-series filtering better than any folder name could. See individual signal schema pages for the relevant date field per signal type.

How to use deliveries

Pull the latest folder. Everything inside is new since the previous folder.
Deduplicate on signal_id. Globally unique, never repeated across deliveries.
For time-series filtering, use the signal's own date field, not the folder name.

What stays the same

Delivery schedule, frequency, and buckets
File format (output.jsonl + output.parquet)
Authentication and service accounts
Signal schema and field names
Folder name format (YYYY-MM-DD-HH-MM-SS)