GitHub Usage Tips

How to activate GitHub signals for your end users.

GitHub signals surface engineering investment patterns from public repository activity. Rather than static technographic data ("uses Python"), these signals show active development direction — what companies are building, which technologies they're adopting, and how fast those projects are growing.

The data is delivered as structured JSON in weekly flat files.

Common Use Cases

  • Technographic enrichment — Add real-time tech stack data to a B2B database, showing not just what companies use but where they're actively investing
  • Product intelligence — Understand what products a company is building based on their public repositories and README content
  • Buyer persona enrichment — Infer technical buyer personas from the technologies and frameworks a company is adopting
  • Integration/partnership mapping — Identify which platforms and tools a company integrates with based on their SDK and plugin development

The example below walks through a common activation pattern with screenshots.


Example: Vector Database Prospecting

This example shows how a vector database company (Pinecone, Weaviate, etc.) would use GitHub signals to find qualified prospects.

The Filter Setup

  • Signal Type: AI/ML Investment
  • Confidence: High
  • Technologies: RAG

This returns companies actively building RAG (retrieval-augmented generation) pipelines — Notion, Linear, Intercom, Shopify, Retool, Zapier — with live GitHub repos proving real investment.

Sample Signal

{
  "signal_id": "b8c3d912-5f6e-4c7b-8d9e-2a3b4c5d6e7f",
  "signal_type": "github-initiative",
  "signal_subtype": "githubAIMLInvestment",
  "association": "company",
  "company": {
    "name": "Intercom",
    "domain": "intercom.com",
    "linkedin_url": "https://linkedin.com/company/intercom",
    "industries": ["Software", "Customer Support", "SaaS"]
  },
  "data": {
    "summary": "Building Fin AI agent with massive knowledge base RAG",
    "detail": "Intercom's Fin repositories show heavy investment in retrieval-augmented generation for their AI support agent. LangChain integration with custom embeddings pipeline.",
    "relevance": 0.87,
    "confidence": "high",
    "sentiment": "positive",
    "technologies_mentioned": ["Python", "LangChain", "RAG", "Embeddings", "OpenAI"],
    "referenced_repos": ["fin-ai-agent", "knowledge-embeddings"],
    "portfolio_metrics": {
      "repository_count": 24,
      "growth": {
        "stars_pct": { "30d": 0.71, "60d": 0.89, "180d": 1.45 },
        "forks_pct": { "30d": 0.52, "60d": 0.68, "180d": 1.12 }
      }
    },
    "top_repositories": [
      {
        "name": "fin-ai-agent",
        "full_name": "intercom/fin-ai-agent",
        "url": "https://github.com/intercom/fin-ai-agent",
        "description": "RAG-powered AI agent for customer support",
        "current": { "stars": 890, "forks": 124 },
        "growth_pct": { "stars": { "30d": 0.71 }, "forks": { "30d": 0.52 } }
      }
    ]
  },
  "detected_at": "2026-01-24T10:15:00Z",
  "batch_id": "gh-20260124-def456"
}

What Makes This Useful

The technologies_mentioned field provides stack-level specificity. RAG, LangChain, Embeddings indicates the company needs vector storage infrastructure — not just that they're "doing AI."

The growth_pct_30d field (71% in this case) shows the project has momentum, distinguishing active investment from abandoned experiments.


Key Fields

FieldDescription
data.summarySignal headline, suitable for display
data.technologies_mentionedSpecific technologies: RAG, LangChain, Kubernetes, etc.
data.confidenceSignal quality (high, medium, low)
data.top_repositories[].starsProject traction
data.top_repositories[].growth_pct_30d30-day growth rate

Filter Dimensions

FilterFieldValues
Signal typesignal_subtypeAI/ML Investment, Infrastructure, Platform Ecosystem
Confidencedata.confidencehigh, medium, low
Technologiesdata.technologies_mentionedRAG, LangChain, Kubernetes, Go, TypeScript, etc.

Broader Coverage Patterns

The example above uses narrow filters (AI/ML Investment + RAG technology) which surface highly qualified but lower-volume signals. For broader coverage, consider these patterns:

By Signal Type

PatternFilterCoverageBest For
All AI/ML activitygithubAIMLInvestment (no tech filter)HighAI infrastructure, MLOps
Infrastructure buildoutgithubInfraInvestmentHighDevOps, cloud tooling
Fast-growing projectsgithubRapidGrowthMediumTrend spotting, early adopters
Platform buildersgithubPlatformEcosystemMediumDeveloper tools, integrations
Major OSS presencegithubMajorOSSPlayerLowEnterprise deals, partnerships

By Field Combinations

Technology-based targeting — Use data.technologies_mentioned to filter by stack:

  • Python, LangChain, OpenAI — AI/ML ecosystem
  • Kubernetes, Terraform, Docker — Infrastructure
  • TypeScript, React, Next.js — Frontend/fullstack

Growth-based targeting — Use portfolio_metrics.growth.stars_pct.30d to find momentum:

  • > 0.5 (50%+ growth) — Rapid adoption, likely funded/prioritized
  • > 0.2 (20%+ growth) — Active investment
  • Any positive — At least not abandoned

Repository-based targeting — Use data.referenced_repos or top_repositories to find specific project types:

  • SDK/plugin repos indicate platform plays
  • Infrastructure repos (terraform-, k8s-) indicate ops investment
  • AI repos (llm-, embeddings-, rag-*) indicate AI investment

Coverage vs. Precision Tradeoffs

ApproachVolumePrecisionUse When
Narrow (subtype + specific tech + high growth)LowHighAccount-based targeting
Medium (subtype + confidence = high)MediumMediumDefault for most UIs
Broad (any subtype, growth > 0)HighLowerTechnographic enrichment, market research

Questions?

Contact [email protected] for integration support.