Manifest Files

Every time we deliver signal data to your bucket, we also write a manifest file to a dedicated manifest bucket. Manifests let you build event-driven pipelines: instead of polling for new data on a schedule, watch the manifest bucket and trigger processing when a new file appears.

Manifest Bucket

Manifests are not stored inside your data buckets. They live in a separate, dedicated bucket:

gs://autobound-manifests/

Your service account will be granted objectViewer on this bucket alongside your data buckets.

File Naming

Each manifest file covers a single signal type for a single delivery date:

{signal_type}_{YYYY-MM-DD}.json

Examples:

FileSignalDelivery Date
news_2026-04-07.jsonNewsApril 7, 2026
10k_2026-03-31.jsonSEC 10-KMarch 31, 2026
glassdoor-company_2026-03-13.jsonGlassdoorMarch 13, 2026
linkedin-post-contact_2026-03-30.jsonLinkedIn Posts (Contact)March 30, 2026

Schema

{
  "signal_type": "news",
  "delivery_date": "2026-04-07",
  "status": "complete",
  "destination": "internal",
  "deliveries": [
    {
      "delivery_timestamp": "2026-04-07T00:00:00Z",
      "data_path": "gs://autobound-news-v3/2026-04-07-00-00-00/",
      "files": [
        {
          "file_name": "output.jsonl",
          "file_path": "gs://autobound-news-v3/2026-04-07-00-00-00/output.jsonl",
          "format": ".jsonl",
          "size_bytes": 79510730,
          "record_count": 16766
        },
        {
          "file_name": "output.parquet",
          "file_path": "gs://autobound-news-v3/2026-04-07-00-00-00/output.parquet",
          "format": ".parquet",
          "size_bytes": 39160557,
          "record_count": 16766
        }
      ],
      "record_count": 33532
    }
  ],
  "total_record_count": 33532,
  "total_file_count": 2,
  "pipeline_run_id": null,
  "created_at": "2026-04-07T14:36:33Z"
}

Field Reference

FieldTypeDescription
signal_typestringThe signal type identifier (matches the bucket name convention)
delivery_datestringDate of the delivery (YYYY-MM-DD)
statusstringDelivery status. complete indicates a successful delivery.
destinationstringTarget environment identifier (e.g. internal for GCS)
deliveriesarrayOne entry per delivery drop in this manifest
deliveries[].delivery_timestampstringISO 8601 timestamp of the data drop
deliveries[].data_pathstringFull URI to the delivery folder containing the data files
deliveries[].filesarrayList of files in the delivery
deliveries[].files[].file_namestringFilename (e.g. output.jsonl)
deliveries[].files[].file_pathstringFull URI to the file
deliveries[].files[].formatstringFile extension (.jsonl or .parquet)
deliveries[].files[].size_bytesintegerFile size in bytes
deliveries[].files[].record_countintegerNumber of records in this file
deliveries[].record_countintegerTotal records across all files in this delivery
total_record_countintegerTotal records across all deliveries in this manifest
total_file_countintegerTotal number of files across all deliveries
pipeline_run_idstring or nullInternal pipeline run identifier (may be null)
created_atstringISO 8601 timestamp when the manifest was generated

File Formats

Every delivery includes two files:

  • output.jsonl - Newline-delimited JSON, one record per line. Best for streaming ingestion.
  • output.parquet - Apache Parquet columnar format. Best for analytical queries and warehouse loads.

Both files contain identical records. Use whichever format fits your pipeline.

Typical Usage

Event-Driven Ingestion (GCS Notifications)

Set up a GCS Pub/Sub notification on the manifest bucket to trigger your pipeline when new manifests arrive:

gsutil notification create -t YOUR_TOPIC -f json \
  -e OBJECT_FINALIZE gs://autobound-manifests/

Your subscriber receives an event for each new manifest. Parse the manifest JSON to get the exact data_path and file_path URIs, then pull the data files directly.

Polling

If you prefer polling, list the manifest bucket filtered by signal type prefix:

gsutil ls gs://autobound-manifests/news_*.json

Compare against your last-processed date to find new deliveries.

Validating Downloads

Use the size_bytes and record_count fields to verify your download is complete:

  1. Download the file from file_path
  2. Compare local file size against size_bytes
  3. Count records and compare against record_count

Delivery Cadence

Manifests follow the same delivery schedule as the underlying signal data. See Delivery URIs for the full cadence table.

  • Weekly signals (News, SEC filings, Hiring, etc.) generate a manifest every week
  • Bi-weekly signals (LinkedIn Posts Contact, Financials) generate every two weeks
  • Monthly signals (Glassdoor, Reddit, GitHub, etc.) generate once per month
  • Quarterly signals (Employee Growth) generate once per quarter
📘

Manifests are generated after the data files are fully written. When you see a manifest, the corresponding data files are guaranteed to be complete and ready to read.