SemSearch eliminates manual reverse engineering. Function-level semantic search detects exact code reuse across obfuscated variants. Auto YARA generation, campaign attribution, 14-type IOC extraction — all self-hosted.
The security industry built detection around artifacts that are trivially changed. Threat actors know this.
Same source, different flag — the hash changes completely. Logic, behavior, and threat stay identical.
Packers, encryption, and code mutation break static signatures. Threat actors cycle packing configs to burn hits.
A skilled analyst triages tens of samples per day. Modern threat operations produce thousands. The gap is structural.
Malware authors reuse loaders, C2 code, crypters. Without function-level comparison, this reuse is invisible.
Different analysts and tools disagree on labels. ML classifiers provide reproducible, confidence-scored classifications.
Connecting related samples across campaigns takes hours of manual effort most teams can't afford on every incident.
Knowledge from RE sessions rarely persists in queryable form. SemSearch stores and indexes everything automatically.
Every sample runs through a deterministic multi-stage pipeline. No manual steps. Results in under 7 minutes.
Upload any PE binary via REST API or dashboard. SemSearch validates the format, deduplicates against the corpus, queries 5 threat intel feeds, and assigns analysis priority automatically.
The binary is disassembled and decomposed into individual functions. Each function gets a call graph, dependency tree, and decompiled representation ready for AI analysis.
Two AI models analyze every function in parallel. The structural model captures code shape and patterns. The semantic model understands code meaning. Together they catch obfuscated reuse, variant families, and cross-compiler matches.
Results are aggregated into structured intelligence: composite threat score, ML family classification with confidence, 14-type IOC extraction, auto-generated YARA rules, and function-level similar sample matches.
A complete analyst-ready report is generated with HTML and PDF export. Critical threats trigger real-time WebSocket alerts with configurable rules for threat score thresholds, family matches, and novel malware detection.
Every capability is implemented and live. No roadmap items. No vaporware.
Decomposes every binary into individual functions. Finds code reuse across samples even when files are repackaged or obfuscated. Three search strategies: structural, semantic, hybrid.
Structural similarity model + deep semantic code model. Together they catch obfuscated code, code reuse across compilers, and variant families.
Every critical function gets a natural-language summary, capability tags, and individual risk score. Identifies the exact function responsible for malicious behavior.
ML classifiers trained on hundreds of features identify 12 malware families. Live A/B model comparison improves accuracy continuously.
Auto-pulls IPs, domains, URLs, registry keys, mutex names, file paths, hashes, crypto wallet addresses. Smart exclusions filter benign noise.
Auto-generates YARA rules from samples and function clusters. Full CRUD with enable/disable toggles. Export all rules as a single file.
WebSocket push the moment a critical threat lands. Three rule types: threat score threshold, malware family, and novel malware detection.
Samples sharing significant function-level code overlap are automatically grouped into campaigns. Tracks members, shared IOCs, and timeline.
Every sample produces a structured report. Threat score, family attribution, AI function summaries, IOCs, YARA rules, and similar samples.
| SHA-256 | Family | Match |
|---|---|---|
| 78de45f9a2c03b... | Cobalt Strike | 94.2% |
| 2c91f30bd507e1... | Cobalt Strike | 87.6% |
| 9a1d72fc4e83b0... | Cobalt Strike | 81.3% |
| 3e8a12bc7f04d9... | Beacon Loader | 67.8% |
// Generated by SemSearch · 2026-04-26 rule SemSearch_CobaltStrike_Loader { meta: family = "CobaltStrike" threat_score = "0.94" severity = "CRITICAL" strings: $fn_0 = { 55 48 89 E5 48 83 EC 28 } $api_0 = "VirtualAllocEx" $api_1 = "CreateRemoteThread" condition: uint16(0) == 0x5A4D and $fn_0 and all of ($api_*) }
📊
The interactive report preview is best viewed on desktop.
Resize your browser to explore all six report panels.
Depth tooling for every role in the security team.
Eliminate manual RE. Identify critical functions, spot code reuse, get AI summaries for every dangerous function.
Real-time alerts, threat scoring, and SIEM-ready integration — built for high-volume triage.
13 query operators for ad-hoc hunting across all ingested samples and functions.
Code reuse detection across the entire corpus. Campaign grouping, YARA from clusters, benign baseline filtering.
Your samples never leave your infrastructure. Docker Compose or Kubernetes. No cloud telemetry, no vendor lock-in.
Every binary decomposed into individual functions. Analysis, search, and attribution all happen at function granularity.
Structural model captures code shape. Semantic model captures code meaning. Together they catch what either alone misses.
100+ REST endpoints. JWT + API key dual auth. 4 roles. Per-key rate limits. Built to drop into existing pipelines.
Binary upload to full intelligence report in under 7 minutes.
Self-hosted. Automated. Function-level depth.
Static analysis is live. Dynamic sandbox analysis is next — connecting what the binary contains with what it does.