How It Works Capabilities Report For Analysts Try Demo
Semantic Malware Intelligence

From raw binary to
threat intelligence
in minutes.

SemSearch eliminates manual reverse engineering. Function-level semantic search detects exact code reuse across obfuscated variants. Auto YARA generation, campaign attribution, 14-type IOC extraction — all self-hosted.

Static Analysis — Live Dynamic — Coming Soon
Binary Upload
Disassembly
Static Analysis
Semantic Analysis
Dynamic Analysis Soon
Threat Report
15 Stages
YARA Rules
14 IOC Types
2 AI Models
<7 min
Binary to Full Report
15
Pipeline Stages
2
AI Models
14
IOC Types Extracted
12
Malware Families

Manual reverse engineering
doesn't scale in 2026

The security industry built detection around artifacts that are trivially changed. Threat actors know this.

01
Hash-based detection fails after one recompile

Same source, different flag — the hash changes completely. Logic, behavior, and threat stay identical.

02
Signatures don't survive obfuscation

Packers, encryption, and code mutation break static signatures. Threat actors cycle packing configs to burn hits.

03
Manual RE doesn't scale to modern volume

A skilled analyst triages tens of samples per day. Modern threat operations produce thousands. The gap is structural.

04
No systematic code reuse detection

Malware authors reuse loaders, C2 code, crypters. Without function-level comparison, this reuse is invisible.

05
Family attribution is inconsistent

Different analysts and tools disagree on labels. ML classifiers provide reproducible, confidence-scored classifications.

06
Campaign correlation requires manual work

Connecting related samples across campaigns takes hours of manual effort most teams can't afford on every incident.

07
Intelligence lives in analysts' heads

Knowledge from RE sessions rarely persists in queryable form. SemSearch stores and indexes everything automatically.

Dual AI Function Analysis
Every function runs through two models in parallel — structural analysis captures code shape, semantic analysis understands intent — then results converge for maximum detection accuracy.
Learn More
void sub_401A20() {
  for(i=0; i<16; i++)
    buf[i] ^= key[i % 4];
}
Structural Model
Semantic Model
AES-256 Key Schedule · Encryption
Function-Level Similarity Search
Search isn't file-hash matching. SemSearch compares individual functions across your entire corpus — catching reuse even across compiler changes, obfuscation, and repacking.
Learn More
decrypt_payload
c2_beacon
registry_persist
Emotet_2024.bin
QakBot_v3.dll
94.2%
87.6%
91.1%
3 Matches Found
Automated 15-Stage Pipeline
Upload a binary, get a full intelligence report. 15 automated stages run in under 7 minutes — zero manual steps from submission to actionable output.
Learn More
Binary Upload
Disassembly
AI Analysis
Intelligence
342 Functions
14 IOCs
3 YARA
4:23 Minutes
Complete Threat Intelligence Report
Every sample produces a structured report: threat score, family classification, IOC breakdown, similar samples, YARA rules — ready for your SOC, SIEM, or threat feed.
Learn More
0.92
Threat Score
Emotet · 97.3%
IOC Breakdown
IPs
4
Domains
7
URLs
3
Registry
2
Linked Intelligence
6
Similar
3
YARA
8
Alerts
Report Ready

Five steps from raw binary
to actionable intelligence

Every sample runs through a deterministic multi-stage pipeline. No manual steps. Results in under 7 minutes.

15 Stages
14 IOC Types
Function-Level Similarity
Self-Hosted
Step 01
Binary Upload & Triage

Upload any PE binary via REST API or dashboard. SemSearch validates the format, deduplicates against the corpus, queries 5 threat intel feeds, and assigns analysis priority automatically.

PE / x86 / x64 Deduplication Priority Triage 5 Intel Feeds
01
02
Step 02
Function Extraction & Disassembly

The binary is disassembled and decomposed into individual functions. Each function gets a call graph, dependency tree, and decompiled representation ready for AI analysis.

Function Extraction Call Graphs Decompilation
Step 03
Dual AI Analysis

Two AI models analyze every function in parallel. The structural model captures code shape and patterns. The semantic model understands code meaning. Together they catch obfuscated reuse, variant families, and cross-compiler matches.

Structural Model Semantic Model ML Classification Evasion Detection
03
04
Step 04
Intelligence Generation

Results are aggregated into structured intelligence: composite threat score, ML family classification with confidence, 14-type IOC extraction, auto-generated YARA rules, and function-level similar sample matches.

Threat Score Family Classification IOC Extraction Auto YARA Similarity Search
Step 05
Analyst Report & Real-Time Alerts

A complete analyst-ready report is generated with HTML and PDF export. Critical threats trigger real-time WebSocket alerts with configurable rules for threat score thresholds, family matches, and novel malware detection.

PDF / HTML Export WebSocket Alerts Campaign Attribution API Access
05

Built to automate
the hard parts

Every capability is implemented and live. No roadmap items. No vaporware.

Function-Level Similarity Search

Decomposes every binary into individual functions. Finds code reuse across samples even when files are repackaged or obfuscated. Three search strategies: structural, semantic, hybrid.

multi-model search · structural + semantic re-ranking
Dual AI Analysis

Structural similarity model + deep semantic code model. Together they catch obfuscated code, code reuse across compilers, and variant families.

structural model · semantic code model · quantized vectors
AI Function Enrichment

Every critical function gets a natural-language summary, capability tags, and individual risk score. Identifies the exact function responsible for malicious behavior.

AI analysis · capability tagging · per-function risk
Family Classification

ML classifiers trained on hundreds of features identify 12 malware families. Live A/B model comparison improves accuracy continuously.

ensemble classifiers · automated HPO · experiment tracking
IOC Extraction (14 Types)

Auto-pulls IPs, domains, URLs, registry keys, mutex names, file paths, hashes, crypto wallet addresses. Smart exclusions filter benign noise.

14 IOC types · multi-source extraction · deduplication
YARA Rule Generation

Auto-generates YARA rules from samples and function clusters. Full CRUD with enable/disable toggles. Export all rules as a single file.

sample + cluster sources · bulk generation · export
Real-Time Alerts

WebSocket push the moment a critical threat lands. Three rule types: threat score threshold, malware family, and novel malware detection.

WebSocket push · heartbeat · cooldown protection
Campaign Attribution

Samples sharing significant function-level code overlap are automatically grouped into campaigns. Tracks members, shared IOCs, and timeline.

automated grouping · shared IOCs · timeline tracking

What an analysis
looks like

Every sample produces a structured report. Threat score, family attribution, AI function summaries, IOCs, YARA rules, and similar samples.

sem-search.com/analysis/a3f4b2c1d8e9f047ac62b3...
Threat Score
0.94
CRITICAL
Evasion boost: +0.25
Signal Breakdown
AI Risk Score
0.96
ML Confidence
0.91
Similarity Score
0.88
Family Confidence
0.87
Evasion Level
EXTREME
🛡
Cobalt Strike
ML confidence: 91.4% · 8/12 capabilities
process-injection shellcode-loader anti-debug VMProtect packed reflective-loading C2-beacon
FUN_00401a40 · ReflectiveLoader_Stage2
CRITICAL anti-debug
AI Analysis · confidence 0.96

Reflective PE loader. Resolves kernel32.dll exports via PEB walk, allocates executable memory, writes shellcode, invokes via CreateRemoteThread. IsDebuggerPresent check — consistent with Cobalt Strike 4.x beacon staging.

IPv4 Addresses
185.220.101.42 C2
194.165.16.77 C2
Domains
update.microsoft-cdn.net
telemetry.windowsupdate.org
SHA-256 Family Match
78de45f9a2c03b...Cobalt Strike94.2%
2c91f30bd507e1...Cobalt Strike87.6%
9a1d72fc4e83b0...Cobalt Strike81.3%
3e8a12bc7f04d9...Beacon Loader67.8%
// Generated by SemSearch · 2026-04-26

rule SemSearch_CobaltStrike_Loader {
    meta:
        family       = "CobaltStrike"
        threat_score = "0.94"
        severity     = "CRITICAL"

    strings:
        $fn_0  = { 55 48 89 E5 48 83 EC 28 }
        $api_0 = "VirtualAllocEx"
        $api_1 = "CreateRemoteThread"

    condition:
        uint16(0) == 0x5A4D and
        $fn_0 and all of ($api_*)
}

📊

The interactive report preview is best viewed on desktop.

Resize your browser to explore all six report panels.

Built by analysts.
For analysts.

Depth tooling for every role in the security team.

🔬
Malware Analysts

Eliminate manual RE. Identify critical functions, spot code reuse, get AI summaries for every dangerous function.

Full decompilation per function Function call graph + dependency tree Evasion detection across 10 categories Cross-sample code reuse detection
🛡️
SOC Teams

Real-time alerts, threat scoring, and SIEM-ready integration — built for high-volume triage.

WebSocket push on critical detections SIEM / SOAR integration (in development) Configurable alert rules + cooldowns 0.0–1.0 composite threat score
🎯
Threat Hunters

13 query operators for ad-hoc hunting across all ingested samples and functions.

13 query operators across all fields 6 built-in hunt templates Saved hunts with run-count tracking Campaign attribution + shared IOCs
Security Researchers

Code reuse detection across the entire corpus. Campaign grouping, YARA from clusters, benign baseline filtering.

Function-level similarity, all samples Automated function clustering Auto-YARA from samples and clusters Benign baseline filtering

What makes SemSearch
different

🔒
100% Self-Hosted

Your samples never leave your infrastructure. Docker Compose or Kubernetes. No cloud telemetry, no vendor lock-in.

🧬
Function-Level Depth

Every binary decomposed into individual functions. Analysis, search, and attribution all happen at function granularity.

🤖
Hybrid AI (2 Models)

Structural model captures code shape. Semantic model captures code meaning. Together they catch what either alone misses.

⚙️
API-First Architecture

100+ REST endpoints. JWT + API key dual auth. 4 roles. Per-key rate limits. Built to drop into existing pipelines.

Ready to cut malware
analysis time by 90%?

Binary upload to full intelligence report in under 7 minutes.
Self-hosted. Automated. Function-level depth.

Dynamic Analysis — In Active Development

Static analysis is live. Dynamic sandbox analysis is next — connecting what the binary contains with what it does.