Manifest Files
Every time we deliver signal data to your bucket, we also write a manifest file to a dedicated manifest bucket. Manifests let you build event-driven pipelines: instead of polling for new data on a schedule, watch the manifest bucket and trigger processing when a new file appears.
Manifest Bucket
Manifests are not stored inside your data buckets. They live in a separate, dedicated bucket:
gs://autobound-manifests/
Your service account will be granted objectViewer on this bucket alongside your data buckets.
File Naming
Each manifest file covers a single signal type for a single delivery date:
{signal_type}_{YYYY-MM-DD}.json
Examples:
| File | Signal | Delivery Date |
|---|---|---|
news_2026-04-07.json | News | April 7, 2026 |
10k_2026-03-31.json | SEC 10-K | March 31, 2026 |
glassdoor-company_2026-03-13.json | Glassdoor | March 13, 2026 |
linkedin-post-contact_2026-03-30.json | LinkedIn Posts (Contact) | March 30, 2026 |
Schema
{
"signal_type": "news",
"delivery_date": "2026-04-07",
"status": "complete",
"destination": "internal",
"deliveries": [
{
"delivery_timestamp": "2026-04-07T00:00:00Z",
"data_path": "gs://autobound-news-v3/2026-04-07-00-00-00/",
"files": [
{
"file_name": "output.jsonl",
"file_path": "gs://autobound-news-v3/2026-04-07-00-00-00/output.jsonl",
"format": ".jsonl",
"size_bytes": 79510730,
"record_count": 16766
},
{
"file_name": "output.parquet",
"file_path": "gs://autobound-news-v3/2026-04-07-00-00-00/output.parquet",
"format": ".parquet",
"size_bytes": 39160557,
"record_count": 16766
}
],
"record_count": 33532
}
],
"total_record_count": 33532,
"total_file_count": 2,
"pipeline_run_id": null,
"created_at": "2026-04-07T14:36:33Z"
}Field Reference
| Field | Type | Description |
|---|---|---|
signal_type | string | The signal type identifier (matches the bucket name convention) |
delivery_date | string | Date of the delivery (YYYY-MM-DD) |
status | string | Delivery status. complete indicates a successful delivery. |
destination | string | Target environment identifier (e.g. internal for GCS) |
deliveries | array | One entry per delivery drop in this manifest |
deliveries[].delivery_timestamp | string | ISO 8601 timestamp of the data drop |
deliveries[].data_path | string | Full URI to the delivery folder containing the data files |
deliveries[].files | array | List of files in the delivery |
deliveries[].files[].file_name | string | Filename (e.g. output.jsonl) |
deliveries[].files[].file_path | string | Full URI to the file |
deliveries[].files[].format | string | File extension (.jsonl or .parquet) |
deliveries[].files[].size_bytes | integer | File size in bytes |
deliveries[].files[].record_count | integer | Number of records in this file |
deliveries[].record_count | integer | Total records across all files in this delivery |
total_record_count | integer | Total records across all deliveries in this manifest |
total_file_count | integer | Total number of files across all deliveries |
pipeline_run_id | string or null | Internal pipeline run identifier (may be null) |
created_at | string | ISO 8601 timestamp when the manifest was generated |
File Formats
Every delivery includes two files:
output.jsonl- Newline-delimited JSON, one record per line. Best for streaming ingestion.output.parquet- Apache Parquet columnar format. Best for analytical queries and warehouse loads.
Both files contain identical records. Use whichever format fits your pipeline.
Typical Usage
Event-Driven Ingestion (GCS Notifications)
Set up a GCS Pub/Sub notification on the manifest bucket to trigger your pipeline when new manifests arrive:
gsutil notification create -t YOUR_TOPIC -f json \
-e OBJECT_FINALIZE gs://autobound-manifests/Your subscriber receives an event for each new manifest. Parse the manifest JSON to get the exact data_path and file_path URIs, then pull the data files directly.
Polling
If you prefer polling, list the manifest bucket filtered by signal type prefix:
gsutil ls gs://autobound-manifests/news_*.jsonCompare against your last-processed date to find new deliveries.
Validating Downloads
Use the size_bytes and record_count fields to verify your download is complete:
- Download the file from
file_path - Compare local file size against
size_bytes - Count records and compare against
record_count
Delivery Cadence
Manifests follow the same delivery schedule as the underlying signal data. See Delivery URIs for the full cadence table.
- Weekly signals (News, SEC filings, Hiring, etc.) generate a manifest every week
- Bi-weekly signals (LinkedIn Posts Contact, Financials) generate every two weeks
- Monthly signals (Glassdoor, Reddit, GitHub, etc.) generate once per month
- Quarterly signals (Employee Growth) generate once per quarter
Manifests are generated after the data files are fully written. When you see a manifest, the corresponding data files are guaranteed to be complete and ready to read.
Updated about 17 hours ago
