Delivery

File formats and delivery mechanisms for the Autobound Signal Database.

The Signal Database is delivered as structured flat files in your choice of format. This page covers file formats, delivery mechanisms, and refresh schedules.

File Formats

JSON (NDJSON)

Newline-delimited JSON (NDJSON) format—one signal record per line. Ideal for streaming ingestion and simple parsing.

File extension: .ndjson or .json

Example:

[
  {
    "signal_id": "a1b2c3d4-5678-90ab-cdef-1234567890ab",
    "insight_subtype": "workExperienceJobChange",
    "entity_type": "contact",
    "entity_identifiers": {
      "contact_email": "[email protected]",
      "contact_linkedin_url": "https://linkedin.com/in/sarahchen",
      "company_domain": "newco.io"
    },
    "detected_at": "2024-12-10T14:22:00Z",
    "source": "Work Experience",
    "variables": {
      "new_job_title": "VP of Sales",
      "new_job_company_name": "NewCo",
      "previous_company_name": "OldCorp"
    }
  },
  {
    "signal_id": "b2c3d4e5-6789-01bc-def2-3456789012cd",
    "insight_subtype": "newsFunding",
    "entity_type": "company",
    "entity_identifiers": {
      "company_domain": "techstartup.com",
      "company_name": "TechStartup Inc"
    },
    "detected_at": "2024-12-12T09:15:00Z",
    "source": "News",
    "variables": {
      "insightTitle": "TechStartup Raises $50M Series B"
    }
  }
]

Pros:

  • Human-readable
  • Easy to parse in any language
  • Streaming-friendly (process line by line)
  • Simple to debug and inspect

Recommended for: Real-time ingestion pipelines, smaller datasets, debugging

Parquet

Apache Parquet columnar format. Optimized for analytical queries and data warehouse ingestion.

File extension: .parquet

Schema:

signal_id: STRING
insight_subtype: STRING
entity_type: STRING
entity_identifiers: STRUCT<
  contact_email: STRING,
  contact_linkedin_url: STRING,
  company_domain: STRING,
  company_linkedin_url: STRING,
  company_name: STRING
>
detected_at: TIMESTAMP
source: STRING
variables: STRING (JSON-encoded)

Note: The variables field is stored as a JSON string within Parquet. This allows flexibility across 350+ signal subtypes while maintaining a consistent columnar schema.

Pros:

  • Highly compressed (typically 5-10x smaller than JSON)
  • Columnar format enables fast analytical queries
  • Native support in Spark, Snowflake, BigQuery, Databricks, etc.
  • Efficient for large-scale batch processing

Recommended for: Data warehouses, analytical workloads, large datasets

Delivery Mechanisms

Push to Your S3 Bucket

We deliver files directly to your AWS S3 bucket on a scheduled basis.

Setup:

  1. Create a dedicated S3 bucket (or prefix within an existing bucket)
  2. Provide us with a cross-account IAM role ARN with s3:PutObject permissions
  3. We deliver files on your chosen schedule

File structure:

Files are organized by insight type, with separate folders for each signal category you're subscribed to. This structure ensures consistent schemas within each folder and simplifies ingestion pipelines.

s3://your-bucket/autobound-signals/
├── news/
│   ├── 2024-12-15/
│   │   └── news_2024-12-15.parquet
│   └── 2024-12-16/
│       └── news_2024-12-16.parquet
├── job_changes/
│   ├── 2024-12-15/
│   │   └── job_changes_2024-12-15.parquet
│   └── ...
├── hiring_trends/
│   └── ...
├── linkedin_activity/
│   └── ...
├── technographics/
│   └── ...
├── financials/
│   └── ...
├── website_intelligence/
│   └── ...
├── sec_filings/
│   └── ...
└── full/
    ├── news_full_2024-12-01.parquet
    ├── job_changes_full_2024-12-01.parquet
    └── ...

Why organize by insight type? Each signal category has a distinct schema with different variables fields. Separating by type allows you to define strongly-typed tables for each category and avoid parsing heterogeneous data within a single file.

Pull from Autobound S3

Access files from our S3 bucket using temporary credentials.

Setup:

  1. We provision read-only credentials for your account
  2. You pull files on your own schedule
  3. Files are retained for 30 days

Access pattern:

# Example: List available insight type folders
aws s3 ls s3://autobound-signals-{your-org-id}/

# Example: List available dates for job changes
aws s3 ls s3://autobound-signals-{your-org-id}/job_changes/

# Example: Download today's news signals
aws s3 cp s3://autobound-signals-{your-org-id}/news/2024-12-15/ ./signals/news/ --recursive

# Example: Download all signal types for a specific date
for type in news job_changes hiring_trends technographics; do
  aws s3 cp s3://autobound-signals-{your-org-id}/$type/2024-12-15/ ./signals/$type/ --recursive
done

Refresh Cadence

Signal refresh frequency varies by category. Higher-velocity signals (news, job changes) refresh more frequently than stable signals (tech stack, business model).

Signal CategoryRefresh FrequencyNotes
News & EventsDailyFunding, M&A, product launches
Job ChangesWeeklyNew roles, promotions, departures
Hiring TrendsWeeklyOpen positions, hiring velocity
LinkedIn ActivityWeeklyPosts, engagement
FinancialWeeklyEarnings, revenue trends
TechnographicsWeeklyTech stack changes
Website IntelligenceDailyPricing changes, messaging shifts
10-K / SEC FilingsQuarterlyAnnual reports, earnings transcripts

Incremental vs. Full Loads

Incremental (Default):

  • Daily/weekly files contain only new or updated signals since last delivery
  • Use detected_at timestamp to identify new records
  • Smaller file sizes, faster processing

Full Snapshot (On Request):

  • Complete database dump
  • Useful for initial load or periodic reconciliation
  • Delivered monthly or on-demand

File Naming Convention

Files follow a consistent naming pattern based on insight type:

{insight_type}_{date}.{format}

Examples:

  • news_2024-12-15.ndjson — News signals (funding, M&A, launches) from Dec 15
  • job_changes_2024-12-15.parquet — Job change signals from Dec 15
  • technographics_2024-12-15.parquet — Tech stack signals from Dec 15
  • news_full_2024-12-01.parquet — Full news snapshot from Dec 1

Available insight types:

Folder NameContains
newsFunding, M&A, product launches, company news
job_changesNew roles, promotions, departures
hiring_trendsOpen positions, hiring velocity
linkedin_activityPosts, engagement, comments
technographicsTech stack additions and removals
financialsEarnings, revenue trends, financial metrics
website_intelligencePricing changes, messaging shifts, web updates
sec_filings10-K, 10-Q, earnings transcripts

Data Volume Estimates

Typical daily delivery sizes per insight type (varies by coverage):

Insight TypeJSON (uncompressed)JSON (gzipped)Parquet
News (daily)~150 MB~15 MB~10 MB
Job Changes (weekly)~300 MB~30 MB~18 MB
Hiring Trends (weekly)~200 MB~20 MB~12 MB
LinkedIn Activity~250 MB~25 MB~15 MB
Technographics~100 MB~10 MB~6 MB
Website Intelligence~80 MB~8 MB~5 MB
Full snapshot (all)~50 GB~5 GB~3 GB

Delivery SLA

  • Daily signals: Delivered by 6:00 AM UTC
  • Weekly signals: Delivered Sunday by 6:00 AM UTC
  • Availability: 99.9% uptime SLA
  • Support: Issues resolved within 4 business hours

Getting Started

  1. Select insight types: Choose which signal categories you want to subscribe to (news, job changes, technographics, etc.)
  2. Choose your format: JSON for simplicity, Parquet for analytics
  3. Set up delivery: Provide S3 bucket details or request pull access
  4. Configure filtering: Optionally filter by geography or company size
  5. Test with sample: We provide sample datasets for each insight type for integration testing

Contact [email protected] to configure your delivery.