Work Services Products About Talk to Us
All work

KeyTalk

Certificate Intelligence Platform

Data Engineering AI Sales Intelligence Analytics
50+ CT log servers: the complete Certificate Transparency dataset, refreshed on hourly delta cycles
Milliseconds Query performance across tens of millions of certificate records — ClickHouse columnar storage, materialised views
3 tiers Enrichment cost scales with prospect actionability — not raw dataset size

A public dataset with a commercial signal, and a tight window to act on it.

Certificate Transparency logs are public records of every TLS certificate ever issued across 50+ log servers. KeyTalk identified a commercial signal: companies renewing certificates manually, year after year, with no automation signature, are warm CLM prospects. The challenge was building infrastructure to ingest the complete CT dataset continuously at scale, enrich it with company intelligence, and surface the results in a sales interface before competitors found the same insight.

A proprietary sales intelligence platform, continuously updated.

The pipeline runs from raw CT log data to enriched prospect dashboard. It ingests 50+ active CT log servers on hourly delta cycles, extracts and deduplicates apex domains, applies multi-stage business indicator filtering, enriches high-priority domains with WHOIS registrant data, and surfaces them in an expiry-sorted sales portal with CSV export for CRM import.

Five user-facing experiences: full intelligence dashboard, territory-scoped partner views, domain health view for existing customers, admin pipeline health console, and public domain lookup with CLM conversion entry points.

Three decisions that shaped the architecture.

ClickHouse as the analytical core
The core query pattern — show all companies in Germany with certificates expiring in the next 60 days — runs across tens of millions of rows. ClickHouse's columnar storage and MergeTree sort key on (expiry_date, apex_domain) returns these queries in milliseconds. Materialised views pre-aggregate the most common sales query patterns.
Tiered WHOIS enrichment
Enriching every domain in the CT dataset would cost far more than enriching selectively. Three enrichment tiers: domains expiring within 90 days enriched on every delta cycle; 90 days to 12 months run nightly; everything else on demand. Keeps WHOIS API costs proportional to actionable prospect volume, not raw dataset size.
React component library, not a standalone app
Rather than a standalone application that would need rebuilding when KeyTalk's product portal was ready, we built the intelligence views as a portable, themeable React component library. Shipped standalone for Phase 1, importable into KeyTalk's own product when ready. No rebuild required. Phase 2 schema is designed into the v1 API contract as additive fields requiring no breaking changes.

Production-ready platform. Complete ownership from day one.

Production-ready platform with complete IaC, runbooks, and architecture documentation. KeyTalk owns the environment, codebase, and data outright — no dependency on s-team to operate it.

Turning a public dataset into a competitive advantage?

We built a sales intelligence platform on Certificate Transparency data at scale. Tell us your data problem.

Talk to us