Summary

Our S3 delivery pipeline experienced a 29-day sync outage from February 25 to March 26, 2026. During this period, signal data continued to be produced and delivered to GCS buckets normally, but was not propagated to S3 mirrors. This affected all S3 delivery customers.

All missing data has been fully backfilled as of April 16, 2026. We have also upgraded the sync architecture to prevent recurrence.


What happened

  • Feb 25: The GCS→S3 sync scheduler was inadvertently removed during infrastructure maintenance
  • Feb 25 – Mar 26: S3 mirrors stopped receiving updates. GCS (primary) was unaffected
  • Mar 26: Sync pipeline was restored. New data began flowing to S3 again
  • Apr 16: Full backfill of all 121 missing deliveries completed across 36 buckets (242 files)

Impacted buckets (36)

BucketMissing deliveries
autobound-10k-v14
autobound-10q-v14
autobound-20f-v11
autobound-20f-v21
autobound-6k-v13
autobound-6k-v21
autobound-8k4
autobound-conference-cfp9
autobound-earnings-transcripts3
autobound-earnings-transcripts-v21
autobound-federal-contract-award8
autobound-financials1
autobound-github-v11
autobound-glassdoor-company-v21
autobound-hackernews12
autobound-hiring-trends4
autobound-hiring-velocity-v14
autobound-linkedin-comments-contact-v11
autobound-linkedin-post-company-v21
autobound-linkedin-post-contact-v32
autobound-news-v23
autobound-news-v31
autobound-patents1
autobound-podcast-appearance17
autobound-product-reviews-v11
autobound-producthunt9
autobound-reddit-company-v21
autobound-sec-form-d-funding12
autobound-seo-traffic1
autobound-twitter-company-posts1
autobound-twitter-contact-posts1
autobound-website-intelligence-v11
autobound-work-milestones3
autobound-work-milestones-v21
autobound-youtube-company1
autobound-youtube-contact1

What we changed

  1. Full mirror architecture: S3 now mirrors every folder in every signal bucket, not just the current day. Any historical gap is automatically caught and filled on the next sync run.
  2. GCP-native scheduling: The sync trigger has been moved from an external scheduler to GCP Cloud Scheduler, eliminating the single point of failure that caused this outage.
  3. Increased capacity: Cloud Run job memory upgraded from 8GB to 16GB to handle large signal files (e.g., website-intelligence at 13.7GB).
  4. Backfill manifest: A backfill manifest (backfill-2026-04-16.json) has been uploaded to s3://autobound-s3-manifests/syncs/ documenting all recovered deliveries.

Action required

None. All missing data is now available in your S3 buckets in the same folder structure and file naming convention as regular deliveries. No changes to your ingestion pipeline are needed.

If you notice any remaining gaps, please reach out in your Slack Connect channel.

Product Reviews Schema Update — v2 Migration

We've updated the Product Reviews signal schema as part of the v2 migration. This update brings structural improvements and a significant increase in data coverage.

Schema Changes

New fields added:

FieldTypeDescription
batch_idstringUnique identifier for the delivery batch
signal_namestringStandardized signal name
associationstringEntity association type
detected_atdatetimeTimestamp when the signal was detected
headlinestringHuman-readable signal headline

Fields removed:

FieldNotes
insightReplaced by headline — provides a cleaner, more actionable summary
relevance_scoreDeprecated in v2 schema

Record Count Increase

The v2 migration includes a +20% increase in record count due to expanded data source coverage and improved entity matching.

Migration Notes

  • All v2 fields follow the standardized signal schema documented in Schema Reference
  • The headline field replaces insight with a more concise, actionable format
  • No breaking changes to existing integration patterns — new fields are additive (except for the two removed fields noted above)

We are introducing structured second company data to news signals that involve two parties. This applies to M&A, partnership, investment, integration, litigation, and talent movement events.

This is an upcoming change, targeted for early June 2026. We will communicate the exact rollout date in advance. No action is needed until then.

What is changing

News signals for certain event categories naturally involve two companies — an acquirer and an acquisition target, an investor and the company invested in, two partners, etc. Today, only the primary company is delivered as structured data. The second party is only available in the article text, meaning customers need to parse unstructured content to identify who was acquired, who received funding, or who the new partner was.

With this update, a new related_company object will be added to the signal record for all applicable event categories. It follows the same schema as the existing company field.

New field

FieldTypeDescription
related_company.namestringName of the second company in the event
related_company.domainstringDomain of the second company

Which signal subtypes will include related_company

Subtypecompany (primary)related_company (new)
AcquisitionAcquirerCompany acquired
MergerCompany AMerge partner
Sells AssetsSellerBuyer
New CustomerVendorThe new client
Files LawsuitPlaintiffDefendant
Invests IntoInvestorCompany invested in
IntegrationCompany AIntegration partner
PartnershipCompany APartner
Competitor IdentifiedCompany AThe competitor
Executive Departure(person-level)Company departed from
Executive Retirement(person-level)Company retired from

For Executive Departure and Executive Retirement events, the source data only provides the company being left (not the person's new employer). Currently these signals have an empty company object. With this update, the company will be correctly populated.

Expected impact

Based on recent delivery data:

  • ~5,000 records per delivery (roughly 24% of all news signals) will include structured second-company data that is currently missing
  • ~1,000 Executive Departure/Retirement signals that currently have no company data at all will be populated
  • Customers will be able to filter, search, and cross-reference both companies in M&A, partnership, and investment signals without parsing article text

Schema example

{
  "signal_type": "news",
  "signal_subtype": "acquires",
  "company": {
    "name": "Acquirer Corp",
    "domain": "acquirer.com"
  },
  "related_company": {
    "name": "Target Inc",
    "domain": "targetinc.com"
  },
  "data": {
    "signal_name": "Acquisition",
    "summary": "Acquirer Corp has acquired Target Inc for $50M.",
    "category": "acquires"
  }
}

What stays the same

  • Signal types without a second party (e.g., Funding, IPO, Launches, Headcount changes) are unchanged. related_company will not be present on those signals.
  • Existing company field is not affected. Same schema, same data.
  • Delivery schedule and file format are unchanged.

What to update

Once this change goes live, if you ingest news signals, add handling for the optional related_company object. It will be present on the 11 subtypes listed above and absent on all others. No changes are needed for signals that do not include a second company.

TL;DR: Folder names stay the same and represent when the file was uploaded (UTC). Signals between that date and the previous folder should be unique. We now ship stragglers that were previously lost to processing delays. Every signal already has a unique signal_id, so customers who deduplicate on it will automatically get more data with no code changes.


What's changing

Better coverage, more signals.

Our pipeline now tracks each entity individually to ensure signals aren't lost between delivery cycles. Previously, processing delays could cause signals to fall through the cracks permanently. Now they ship in the next available delivery.

This was a silent data loss problem.

Signals that hit processing delays (LLM retries, late third-party data, API failures) would simply never appear. With this change, those signals land in the next batch. Since signal_id is already globally unique, customers who deduplicate on it benefit automatically.

We considered changing folder namingbut decided against it. The current format works and signal-level date fields (e.g. data.filing_date on SEC filings, data.posted_date on LinkedIn posts) handle time-series filtering better than any folder name could. See individual signal schema pages for the relevant date field per signal type.

How to use deliveries

  1. Pull the latest folder. Everything inside is new since the previous folder.
  2. Deduplicate on signal_id. Globally unique, never repeated across deliveries.
  3. For time-series filtering, use the signal's own date field, not the folder name.

What stays the same

  • Delivery schedule, frequency, and buckets
  • File format (output.jsonl + output.parquet)
  • Authentication and service accounts
  • Signal schema and field names
  • Folder name format (YYYY-MM-DD-HH-MM-SS)

We've standardized schemas across all signal types in the Signal Database, improving field consistency and adding richer company and contact metadata. These changes align with AN-8873 and AN-8861.

What Changed

Schema Standardization (All Signals)

Company objects now include a consistent set of enrichment fields across all signal types:

  • company.employee_count_low / company.employee_count_high — employee count range
  • company.industries — industry classifications
  • company.linkedin_url — LinkedIn company page URL
  • company.description — company description

These fields were previously available on some signals but not others. They are now present across all signal types (nullable when data is unavailable).

Field Renames

Several field names have been standardized for consistency:

Signal TypeOld FieldNew Field
Earnings Transcriptscompany.financial_symbolcompany.ticker
Twitter/X (Company)company.company_size_lowcompany.employee_count_low
Twitter/X (Company)company.company_size_highcompany.employee_count_high
Work Milestonescontact.titlecontact.job_title
LinkedIn Commentscontact.full_namecontact.name
Product Reviews (G2)relevance_score (top-level)data.relevance
Product Reviews (G2)insight.headline (top-level)data.headline
Employee Growthdata.relevance_scoredata.relevance

Structural Changes (SEC 20-F & 6-K)

For 20-F and 6-K signals, signal_category and metrics have moved from top-level fields into the data object:

  • signal_categorydata.signal_category
  • metrics.*data.metrics.*
  • New field: data.llm_call_category — internal LLM classification category
  • New field: data.fiscal_year_end — fiscal year end period

Reddit Improvements (AN-8861)

Reddit signals now include several new fields:

  • signal_category — high-level category (e.g. buying-intent, product, sentiment, risk, competitive, market, pain-point) derived deterministically from signal_subtype
  • data.post_text — full text of the source Reddit post (required on every record)
  • data.topics_tags — keyword tags extracted from the discussion (now populated; was previously empty)
  • signal_name — human-readable signal label
  • batch_id — processing batch identifier
  • New subtype: brandReputation (category: sentiment)
  • Removed: data.evidence_urls and data.topics_tags with 0% fill rate have been cleaned up — topics_tags is now populated, evidence_urls has been removed

Contact Enrichment (Twitter/X, Work Milestones)

Contact-level signals now include richer contact metadata:

  • contact.first_name, contact.last_name, contact.email — added to Work Milestones and Twitter/X (Contact)
  • contact.linkedin_url — added to Twitter/X (Contact) and YouTube (Contact)

Bucket Discovery

All service accounts can now list bucket names and folder timestamps across the entire project using gsutil ls -p autobound-signal-delivery — including buckets you are not licensed for. This lets you discover what signal types are available before requesting access. Reading file contents still requires objectViewer on the specific bucket.

Delivery Timeline

  • Already live (March 24 delivery): Reddit, SEC 20-F, SEC 6-K, Earnings Transcripts, News, Work Milestones
  • Next delivery cycle (April 2026): Product Reviews, Twitter/X, YouTube, LinkedIn Comments, Employee Growth, Website Intelligence, Patents

Schema Documentation

All signal schema pages have been updated. See the Signal Catalog for links to each signal's schema documentation.

Signals shipping with the new schema in April will include a notice at the top of their doc page until the first delivery lands.

All service accounts can now list signal bucket names in the delivery project.

gsutil ls -p autobound-signal-delivery

This returns the names of all available signal buckets. Object-level access still requires per-bucket permissions (unchanged). Buckets you aren't licensed for will return 403 on read.

This replaces the previous workflow where you had to check our Delivery docs to find bucket URIs.

Scope: Only bucket names are visible. No object contents, no IAM policies, and no other GCP resources.

All SEC filing signals, news, and hiring signals are now on weekly refresh — completing the migration from quarterly/monthly.

SignalPreviousCurrent
10-K, 10-Q, 8-K, 20-F, 6-KMonthlyWeekly
Earnings TranscriptsMonthlyWeekly
NewsMonthlyWeekly
Hiring VelocityMonthlyWeekly
Hiring TrendsMonthlyWeekly

New signal types (Podcast, Form D, HackerNews, ProductHunt, Conference, Federal Contracts) launch on a daily cadence.

See Signal Catalog for current refresh frequencies.

Based on partner feedback, we shipped several improvements to the hiring velocity and trends signals:

  • Time window alignment — Velocity and trends calculations now use consistent lookback periods, resolving discrepancies in company counts between datasets
  • 1:1 department mapping — Each open role maps to a single primary department, eliminating double-counting
  • Full distribution exposed — Removed the top-5 category cap. All departments, locations, seniority levels, and contract types are now returned, sorted by count descending
  • Interpretation guide — Published a Hiring Velocity Interpretation Guide covering how velocity, trends, and breakdown metrics are calculated

These changes are reflected in all deliveries starting March 10, 2026.

We launched 7 new signal types in Q1 2026, bringing the total to 32+ signal categories.

Company Signals

SignalRefreshDescription
SEC Form D Funding2× DailyPre-announcement funding signals from SEC EDGAR Form D filings. Catches fundraising before press releases.
New Business FormationsDailySecretary of State filings from all 50 US states. ~10,000+ new registrations per day.
Federal Contract AwardsDailyUS government contract awards from USASpending.gov. Tech/services NAICS codes, >$100K threshold.
Conference & CFP EventsDailyUpcoming tech conferences with CFP deadlines, sponsor tiers, and audience matching.
HackerNews SignalsDailyShow HN launches, trending discussions, and company mentions with B2B relevance scoring.
ProductHunt LaunchesDailyNew product launches with upvotes, maker profiles, and AI/B2B classification.

Contact Signals

SignalRefreshDescription
Podcast AppearancesDailyExecutive podcast guest appearances with episode topics, key insights, and outreach hooks.

All new signals follow the standard schema pattern and are available via GCS delivery. Contact [email protected] to add these to your subscription.

We've migrated 15 signal categories to new GCS bucket URIs with corrected, standardized schemas. The new buckets use a -v1, -v2, or -v3 suffix.

What's Changing

Signal TypeOld URI (Deprecated)New URI
SEC 10-Kgs://autobound-10k/gs://autobound-10k-v1/
SEC 10-Qgs://autobound-10q/gs://autobound-10q-v1/
SEC 20-Fgs://autobound-20f/gs://autobound-20f-v1/
SEC 6-Kgs://autobound-6k/gs://autobound-6k-v1/
Employee Growthgs://autobound-employee-growth/gs://autobound-employee-growth-v1/
GitHubgs://autobound-github/gs://autobound-github-v1/
Glassdoor (Company)gs://autobound-glassdoor-company/gs://autobound-glassdoor-company-v2/
Hiring Velocitygs://autobound-hiring-velocity/gs://autobound-hiring-velocity-v1/
LinkedIn Comments (Contact)gs://autobound-linkedin-comments-contact/gs://autobound-linkedin-comments-contact-v1/
LinkedIn Post (Company)gs://autobound-linkedin-post-company/gs://autobound-linkedin-post-company-v2/
LinkedIn Post (Contact)gs://autobound-linkedin-post-contact/gs://autobound-linkedin-post-contact-v3/
Newsgs://autobound-news/gs://autobound-news-v2/
Product Reviews (G2)gs://autobound-product-reviews/gs://autobound-product-reviews-v1/
Reddit (Company)gs://autobound-reddit-company/gs://autobound-reddit-company-v1/
Website Intelligencegs://autobound-website-intelligence/gs://autobound-website-intelligence-v1/

Why We Made This Change

As part of a broader delivery infrastructure cleanup, we've migrated these signal categories to new buckets with corrected and standardized schemas that align with our Signal Schema documentation. This ensures consistent field naming, data types, and structure across all signal categories.

📘

One-Time Migration: This cleanup is a one-time effort to standardize our delivery infrastructure. We do not anticipate additional bucket URI changes going forward. Once you've updated to the new URIs, your integration should remain stable.

Deprecation Timeline

DateAction
January 2026New versioned buckets are live and receiving data
January 2026Old buckets stop receiving new data
February 2026Old bucket URIs will be deprecated and access removed
⚠️

Action Required: Update your data pipelines to use the new bucket URIs before February 2026.

Historical Data Note

Due to the new delivery mechanism, historical data in the new buckets may not extend the full 3-6 months initially. If you require historical backfill for specific signal categories, please contact us at [email protected].

Buckets Not Affected

The following buckets remain at their current URIs with no changes:

  • gs://autobound-8k/ — SEC 8-K current reports
  • gs://autobound-company-database/ — Company database
  • gs://autobound-contact-database/ — Contact database
  • gs://autobound-earnings-transcripts/ — Earnings call transcripts
  • gs://autobound-financials/ — Financial data
  • gs://autobound-hiring-trends/ — Hiring trends
  • gs://autobound-intent/ — Intent signals
  • gs://autobound-manifests/ — Data manifests
  • gs://autobound-patents/ — Patent filings
  • gs://autobound-seo-traffic/ — SEO & traffic signals
  • gs://autobound-tech-used/ — Technology stack
  • gs://autobound-twitter-company-posts/ — Twitter/X posts (company-level)
  • gs://autobound-work-milestones/ — Work milestones
  • gs://autobound-x-company/ — Twitter/X posts (company-level)
  • gs://autobound-x-contact/ — Twitter/X posts (contact-level)
  • gs://autobound-youtube-company/ — YouTube activity (company-level)
  • gs://autobound-youtube-contact/ — YouTube activity (contact-level)

Migration Checklist

Use this checklist to ensure a smooth migration:

  • Identify which of the 15 migrated signal types you currently use
  • Update bucket URIs in your data pipeline configuration
  • Test access to new buckets with your service account credentials
  • Verify data schema compatibility with your downstream systems
  • Update any monitoring or alerting that references old bucket names
  • Complete migration before February 2026 deprecation date

Questions?

If you have questions about this migration or need assistance updating your pipelines, contact us at [email protected].