Improved

March 2026: Schema Standardization Across All Signal Types

We've standardized schemas across all signal types in the Signal Database, improving field consistency and adding richer company and contact metadata. These changes align with AN-8873 and AN-8861.

What Changed

Schema Standardization (All Signals)

Company objects now include a consistent set of enrichment fields across all signal types:

  • company.employee_count_low / company.employee_count_high — employee count range
  • company.industries — industry classifications
  • company.linkedin_url — LinkedIn company page URL
  • company.description — company description

These fields were previously available on some signals but not others. They are now present across all signal types (nullable when data is unavailable).

Field Renames

Several field names have been standardized for consistency:

Signal TypeOld FieldNew Field
Earnings Transcriptscompany.financial_symbolcompany.ticker
Twitter/X (Company)company.company_size_lowcompany.employee_count_low
Twitter/X (Company)company.company_size_highcompany.employee_count_high
Work Milestonescontact.titlecontact.job_title
LinkedIn Commentscontact.full_namecontact.name
Product Reviews (G2)relevance_score (top-level)data.relevance
Product Reviews (G2)insight.headline (top-level)data.headline
Employee Growthdata.relevance_scoredata.relevance

Structural Changes (SEC 20-F & 6-K)

For 20-F and 6-K signals, signal_category and metrics have moved from top-level fields into the data object:

  • signal_categorydata.signal_category
  • metrics.*data.metrics.*
  • New field: data.llm_call_category — internal LLM classification category
  • New field: data.fiscal_year_end — fiscal year end period

Reddit Improvements (AN-8861)

Reddit signals now include several new fields:

  • signal_category — high-level category (e.g. buying-intent, product, sentiment, risk, competitive, market, pain-point) derived deterministically from signal_subtype
  • data.post_text — full text of the source Reddit post (required on every record)
  • data.topics_tags — keyword tags extracted from the discussion (now populated; was previously empty)
  • signal_name — human-readable signal label
  • batch_id — processing batch identifier
  • New subtype: brandReputation (category: sentiment)
  • Removed: data.evidence_urls and data.topics_tags with 0% fill rate have been cleaned up — topics_tags is now populated, evidence_urls has been removed

Contact Enrichment (Twitter/X, Work Milestones)

Contact-level signals now include richer contact metadata:

  • contact.first_name, contact.last_name, contact.email — added to Work Milestones and Twitter/X (Contact)
  • contact.linkedin_url — added to Twitter/X (Contact) and YouTube (Contact)

Bucket Discovery

All service accounts can now list bucket names and folder timestamps across the entire project using gsutil ls -p autobound-signal-delivery — including buckets you are not licensed for. This lets you discover what signal types are available before requesting access. Reading file contents still requires objectViewer on the specific bucket.

Delivery Timeline

  • Already live (March 24 delivery): Reddit, SEC 20-F, SEC 6-K, Earnings Transcripts, News, Work Milestones
  • Next delivery cycle (April 2026): Product Reviews, Twitter/X, YouTube, LinkedIn Comments, Employee Growth, Website Intelligence, Patents

Schema Documentation

All signal schema pages have been updated. See the Signal Catalog for links to each signal's schema documentation.

Signals shipping with the new schema in April will include a notice at the top of their doc page until the first delivery lands.