Delivery
File formats and delivery mechanisms for the Autobound Signal Database.
The Signal Database is delivered as structured flat files in your choice of format. This page covers file formats, delivery mechanisms, and refresh schedules.
File Formats
JSON (NDJSON)
Newline-delimited JSON (NDJSON) format—one signal record per line. Ideal for streaming ingestion and simple parsing.
File extension: .ndjson or .json
Example:
[
{
"signal_id": "a1b2c3d4-5678-90ab-cdef-1234567890ab",
"insight_subtype": "workExperienceJobChange",
"entity_type": "contact",
"entity_identifiers": {
"contact_email": "[email protected]",
"contact_linkedin_url": "https://linkedin.com/in/sarahchen",
"company_domain": "newco.io"
},
"detected_at": "2024-12-10T14:22:00Z",
"source": "Work Experience",
"variables": {
"new_job_title": "VP of Sales",
"new_job_company_name": "NewCo",
"previous_company_name": "OldCorp"
}
},
{
"signal_id": "b2c3d4e5-6789-01bc-def2-3456789012cd",
"insight_subtype": "newsFunding",
"entity_type": "company",
"entity_identifiers": {
"company_domain": "techstartup.com",
"company_name": "TechStartup Inc"
},
"detected_at": "2024-12-12T09:15:00Z",
"source": "News",
"variables": {
"insightTitle": "TechStartup Raises $50M Series B"
}
}
]
Pros:
- Human-readable
- Easy to parse in any language
- Streaming-friendly (process line by line)
- Simple to debug and inspect
Recommended for: Real-time ingestion pipelines, smaller datasets, debugging
Parquet
Apache Parquet columnar format. Optimized for analytical queries and data warehouse ingestion.
File extension: .parquet
Schema:
signal_id: STRING
insight_subtype: STRING
entity_type: STRING
entity_identifiers: STRUCT<
contact_email: STRING,
contact_linkedin_url: STRING,
company_domain: STRING,
company_linkedin_url: STRING,
company_name: STRING
>
detected_at: TIMESTAMP
source: STRING
variables: STRING (JSON-encoded)
Note: The
variablesfield is stored as a JSON string within Parquet. This allows flexibility across 350+ signal subtypes while maintaining a consistent columnar schema.
Pros:
- Highly compressed (typically 5-10x smaller than JSON)
- Columnar format enables fast analytical queries
- Native support in Spark, Snowflake, BigQuery, Databricks, etc.
- Efficient for large-scale batch processing
Recommended for: Data warehouses, analytical workloads, large datasets
Delivery Mechanisms
Push to Your S3 Bucket
We deliver files directly to your AWS S3 bucket on a scheduled basis.
Setup:
- Create a dedicated S3 bucket (or prefix within an existing bucket)
- Provide us with a cross-account IAM role ARN with
s3:PutObjectpermissions - We deliver files on your chosen schedule
File structure:
Files are organized by insight type, with separate folders for each signal category you're subscribed to. This structure ensures consistent schemas within each folder and simplifies ingestion pipelines.
s3://your-bucket/autobound-signals/
├── news/
│ ├── 2024-12-15/
│ │ └── news_2024-12-15.parquet
│ └── 2024-12-16/
│ └── news_2024-12-16.parquet
├── job_changes/
│ ├── 2024-12-15/
│ │ └── job_changes_2024-12-15.parquet
│ └── ...
├── hiring_trends/
│ └── ...
├── linkedin_activity/
│ └── ...
├── technographics/
│ └── ...
├── financials/
│ └── ...
├── website_intelligence/
│ └── ...
├── sec_filings/
│ └── ...
└── full/
├── news_full_2024-12-01.parquet
├── job_changes_full_2024-12-01.parquet
└── ...
Why organize by insight type? Each signal category has a distinct schema with different
variablesfields. Separating by type allows you to define strongly-typed tables for each category and avoid parsing heterogeneous data within a single file.
Pull from Autobound S3
Access files from our S3 bucket using temporary credentials.
Setup:
- We provision read-only credentials for your account
- You pull files on your own schedule
- Files are retained for 30 days
Access pattern:
# Example: List available insight type folders
aws s3 ls s3://autobound-signals-{your-org-id}/
# Example: List available dates for job changes
aws s3 ls s3://autobound-signals-{your-org-id}/job_changes/
# Example: Download today's news signals
aws s3 cp s3://autobound-signals-{your-org-id}/news/2024-12-15/ ./signals/news/ --recursive
# Example: Download all signal types for a specific date
for type in news job_changes hiring_trends technographics; do
aws s3 cp s3://autobound-signals-{your-org-id}/$type/2024-12-15/ ./signals/$type/ --recursive
doneRefresh Cadence
Signal refresh frequency varies by category. Higher-velocity signals (news, job changes) refresh more frequently than stable signals (tech stack, business model).
| Signal Category | Refresh Frequency | Notes |
|---|---|---|
| News & Events | Daily | Funding, M&A, product launches |
| Job Changes | Weekly | New roles, promotions, departures |
| Hiring Trends | Weekly | Open positions, hiring velocity |
| LinkedIn Activity | Weekly | Posts, engagement |
| Financial | Weekly | Earnings, revenue trends |
| Technographics | Weekly | Tech stack changes |
| Website Intelligence | Daily | Pricing changes, messaging shifts |
| 10-K / SEC Filings | Quarterly | Annual reports, earnings transcripts |
Incremental vs. Full Loads
Incremental (Default):
- Daily/weekly files contain only new or updated signals since last delivery
- Use
detected_attimestamp to identify new records - Smaller file sizes, faster processing
Full Snapshot (On Request):
- Complete database dump
- Useful for initial load or periodic reconciliation
- Delivered monthly or on-demand
File Naming Convention
Files follow a consistent naming pattern based on insight type:
{insight_type}_{date}.{format}
Examples:
news_2024-12-15.ndjson— News signals (funding, M&A, launches) from Dec 15job_changes_2024-12-15.parquet— Job change signals from Dec 15technographics_2024-12-15.parquet— Tech stack signals from Dec 15news_full_2024-12-01.parquet— Full news snapshot from Dec 1
Available insight types:
| Folder Name | Contains |
|---|---|
news | Funding, M&A, product launches, company news |
job_changes | New roles, promotions, departures |
hiring_trends | Open positions, hiring velocity |
linkedin_activity | Posts, engagement, comments |
technographics | Tech stack additions and removals |
financials | Earnings, revenue trends, financial metrics |
website_intelligence | Pricing changes, messaging shifts, web updates |
sec_filings | 10-K, 10-Q, earnings transcripts |
Data Volume Estimates
Typical daily delivery sizes per insight type (varies by coverage):
| Insight Type | JSON (uncompressed) | JSON (gzipped) | Parquet |
|---|---|---|---|
| News (daily) | ~150 MB | ~15 MB | ~10 MB |
| Job Changes (weekly) | ~300 MB | ~30 MB | ~18 MB |
| Hiring Trends (weekly) | ~200 MB | ~20 MB | ~12 MB |
| LinkedIn Activity | ~250 MB | ~25 MB | ~15 MB |
| Technographics | ~100 MB | ~10 MB | ~6 MB |
| Website Intelligence | ~80 MB | ~8 MB | ~5 MB |
| Full snapshot (all) | ~50 GB | ~5 GB | ~3 GB |
Delivery SLA
- Daily signals: Delivered by 6:00 AM UTC
- Weekly signals: Delivered Sunday by 6:00 AM UTC
- Availability: 99.9% uptime SLA
- Support: Issues resolved within 4 business hours
Getting Started
- Select insight types: Choose which signal categories you want to subscribe to (news, job changes, technographics, etc.)
- Choose your format: JSON for simplicity, Parquet for analytics
- Set up delivery: Provide S3 bucket details or request pull access
- Configure filtering: Optionally filter by geography or company size
- Test with sample: We provide sample datasets for each insight type for integration testing
Contact [email protected] to configure your delivery.
Updated about 3 hours ago
