KNOWLEDGE GRAPH · ENGINE ANATOMY

A deterministic representation of the company.

NODES

38,412

EDGES

218,490

ENTITY-CLASSES

QUERY-LATENCY

< 200ms

Not a generic probabilistic model, but a graph of entities and relations — built while the company works, cited on every answer, kept on the company's own servers.

This page explains Lemnia on two levels. The first is in plain language, readable by anyone without a technical background. The second, expandable on demand under each paragraph, holds the architectural detail for CTOs, DPOs, accountants and consultants.

§ 01INTERACTIVE EXPLORER

DEMO DATA · CUSTOMER BIANCHI

Click a node to see its provenance citations. Drag to reposition, mouse-wheel to zoom. The dataset is illustrative — the real graph of a company is built from the data the company chooses to connect.

EXPLORER ARCHITECTURE

Rendered via @xyflow/react 12.10 with a precomputed static layout (mulberry32 deterministic seed for reproducibility across reloads). The example dataset is hand-annotated. In the real product the graph is populated by the ingestion engine and navigation consults the SQLite + sqlite-vec store directly over gRPC mTLS on the LAN.

§ 02PIPELINE · SIX STAGES

FROM CONNECTOR TO CITATION

STAGE 01

Connection

Lemnia reads the systems in your company — ERPs, e-commerce, mail, calendar, support — using only the credentials needed to read. Never to write, without explicit admin consent.

TECHNICAL TRACK

Signed connectors built on the published protocols of the source systems (OAuth2 for Microsoft 365 / Google Workspace, REST APIs for Zucchetti, TeamSystem, SAP B1, WhatsApp Business Cloud API, IMAP IDLE for mail). Each connector inherits the ACLs of the source system: Lemnia never sees data that the source user wouldn't see.

Credentials live in the OS keychain (Keychain Access on macOS, Credential Manager on Windows). Never persisted in clear text on the application disk. Refresh tokens rotated every 24h. CST.375

STAGE 02

Extraction

Scheduled or on-demand reads. Documents, ledger rows, messages, orders. Everything pseudonymised in transit, encrypted at rest.

TECHNICAL TRACK

Streaming ingestion for systems that support webhooks + IMAP IDLE; batch polling for the rest (configurable interval, default 5 min). Sub-5 min p95 from signal to visibility in the dossier. CST.269

At-rest encryption AES-256-GCM with per-tenant DEK, KEK held by the OS Keychain. Optional PII pseudonymisation during ingestion for categories configured by the DPO.

STAGE 03

Understanding

Local Italian-trained models identify entities — customers, suppliers, products, cases, documents — and the relations between them (invoice → order → customer). Disambiguation assisted on edge cases.

TECHNICAL TRACK

NER + RE trained on an Italian corpus (medical + business + tax). Backbone Qwen3.5-4B Q4 for intent comprehension, mDeBERTa-v3-base-italian-NLI for consistency check. CST.333

Disambiguation via Qwen3-Embedding-0.6B over local context, with HITL (human-in-the-loop) modal fallback for cases below confidence 0.7. Disambiguation decisions feed continuous tenant-local training data.

SOTA references: LightRAG (HKU EMNLP 2025) for dual entity+relation extraction on a local KG; HippoRAG 2 for ranking via personalized PageRank on the graph.

STAGE 04

Storage

Signed local graph. Every node, every edge, every property carries a provenance citation — which document, which line, which timestamp this fact came from.

TECHNICAL TRACK

Dual-node schema: Entity (customer, supplier, product, case) + Source (document, email, ledger row). Every SUPPORTS edge ties a fact to a source with char-level range (offset_start, offset_end) and BLAKE3 hash of the source content. CST.30

Storage backend: SQLite with sqlite-vec 0.1.9 for embeddings, rusqlite for relational tables, RocksDB for the blob store of original documents. Append-only log with a BLAKE3 seal per transaction, exportable as proof for GDPR Art. 30 audit.

Community detection: Leiden algorithm for cluster analysis on frequently co-occurring entities. Batched PageRank refresh, configurable by graph size. CST.129

STAGE 05

Retrieval

When queried, Lemnia walks the graph for at most 3 hops and 8 seconds. Retrieval is deterministic — no agent that self-organises, no unpredictable behaviour.

TECHNICAL TRACK

Hybrid BM25 + dense pipeline: query parsing → seed nodes via embedding similarity → multi-hop traversal with PPR weighting → cross-encoder reranking (mDeBERTa). Depth cap: 3 hops biz default, 2 medical. Time cap: 8 s total, fallback to top-3 evidence on timeout. CST.389

Multi-entity seed weighting: each seed weighted by (confidence, PageRank_prior, freshness). Query routes through a deterministic classifier (FetchTimeline | FetchMultiHopChain | FetchAggregate | FetchSummary). No agentic loop on the local model.

SOTA reference: PathRAG (AAAI 2026, arXiv 2502.14902) for relational path-pruning. Lemnia adopts the prune flow but keeps generation deterministic, bound by cite-or-refuse.

STAGE 06

Citation

The answer is generated with every sentence anchored to a source-document passage. If the source doesn't exist, Lemnia says so and refuses to invent.

TECHNICAL TRACK

5-step cite-or-refuse pipeline: (1) decomposition of the answer into atomic claims; (2) substring match against evidence set; (3) mDeBERTa-NLI for entailment verification; (4) KG-consistency check via traversal; (5) strip-and-replace for non-entailed claims. CST.82

Per-pack hedging whitelist: verosimilmente, presumibilmente, si stima, pare — exempt from entailment verification but limited to paragraphs marked as hedged dossier. Strict sections (dunning, quotation, supplier-risk) admit no hedging.

BLAKE3-signed processing register for every answer. Exportable as proof for court or DPO. Regulatory anchor: Trib. Siracusa 338/2026 Art. 96 c.p.c.

§ 03ITALIAN → GRAPH → CITATION

NO LANGUAGE TO LEARN

IT  what happened with customer Bianchi this year?
──────────────────────────────────────────────────────
DSL
    MATCH (c:Customer {name: "Bianchi"})
          -[r:GENERATED|RECEIVED|SENT]-(e)
    WHERE r.timestamp >= "2026-01-01"
    RETURN e ORDERED BY r.timestamp
    LIMIT 50
──────────────────────────────────────────────────────
CITED ANSWER
    «In 2026 customer Bianchi received 3 orders
     (12 Feb, 4 May, 8 Aug), all paid. Last
     invoice is from 12 Nov [F-2026-247].»

Lemnia doesn't require learning a new language. The user writes in Italian; Lemnia translates to a structured graph query, executes it, and returns a cited answer.

The model that performs the translation runs on the company's hardware. Nothing leaves the LAN at execution time.

FROM NATURAL LANGUAGE TO DSL

The IT→DSL parser is a deterministic classifier that maps the query to one of 4 retrieval forms (Timeline, MultiHopChain, Aggregate, Summary). Backbone Qwen3.5-4B Q4 for intent + slot filling, algorithmic validators for deadlines (cf, p.iva, IBAN), HITL fallback on confidence < 0.7.

No execution of LLM-generated code. The intermediate DSL is just a typed AST that the graph engine executes. The LLM has no access to disk or network.

§ 04E-R SCHEMA · FOURTEEN ENTITIES

GENERIC ONTOLOGY · ADAPTABLE PER NICHE

Customer

Supplier

Product

Order

Invoice

DDT

Case

Document

Message

Ticket

Employee

Decision

Site

Fourteen base entity-classes. Each vertical (micro-enterprise, multi-channel, professional studio, SMB) inherits these classes and adds niche-specific ones. Canonical relations: ~28; every edge is annotated with provenance metadata and cardinality.

EXTENDING THE ONTOLOGY PER NICHE

The base schema lives in crates/lemnia-pack-business as Rust types. Every niche (T1-T4) can add classes via Cargo feature flags: e.g. T2 multi-channel adds SkuVariant, ExternalReview, Return; T3 professional studio adds Case, Filing, CalendarHearing.

The 28 canonical relations include GENERATED_BY, RECEIVES, SENDS, MENTIONS, REPLIES_TO, CONTAINS, INVOICED_FOR, LINKS_TO. Each relation carries confidence metadata (0-1), cardinality (1-1, 1-N, N-N) and a provenance citation pointing to the source that generated it.

§ 05THREE DEPLOYMENT TOPOLOGIES

LOCAL-FIRST · ALWAYS

Solo

Lemnia desktop on the owner's laptop. Qwen3.5-4B model local, ~5 GB footprint, zero outbound traffic at query time. Fit for micro-enterprises, shops, artisans.

TECHNICAL TRACK

Stack: Tauri V2 + Rust workspace, llama.cpp for Qwen3.5-4B Q4_K_XL inference (~2.8 GB), sqlite-vec for local embeddings. Single-user, single-tenant. Ingestion via local webhook or polling. CST.333

Studio

Mac mini or NUC running the headless Lemnia service on the LAN. All collaborators see the same dossier from their clients. Queries stay local. Fit for professional studios and multi-channel e-commerce.

TECHNICAL TRACK

T2 architecture: lemnia-server (gRPC mTLS binary) + desktop client tauri-business + mobile-business. Initial pairing via QR code, tenant-scoped certificates, RBAC roles inherited from the source system. Multi-seat (default 3-10 users). CST.403

Intra-LAN sync, never cloud. Pro mode (cloud-burst ingest) requires explicit per-batch consent and keeps a signed audit log.

PMI Sovereign

Dedicated GPU appliance at the company premises. The whole stack runs on company hardware. Unlimited users. Fit for SMBs €5-50M, 50-249 employees, regulated contexts.

TECHNICAL TRACK

T3 single-tenant on-prem: x86_64 server with NVIDIA RTX 6000 Ada or equivalent, vLLM 0.19+ for the local Qwen3.6-35B-A3B-FP8 model + DFlash drafter (2.5-2.9× speedup), Linux SEV-SNP for hardware attestation.CST.335

Compliance: AVR (Authorized Vendor Register) log, per-niche pre-signed DPIA, automatic export of GDPR Art. 30 artifacts. NIS2-ready: access log, patch management, separation of duties, BCP/DR exercise.

§ 06TECHNICAL FOUNDATIONS

THE LITERATURE LEMNIA ABSORBS

Lemnia doesn't invent a new methodology. It integrates the best of published research on KG-RAG, deterministic retrieval and cite-or-refuse, optimising it for the Italian case (Italian-native, local-first, compliant).

BIBLIOGRAPHIC REFERENCES

LIGHTRAG · EMNLP 2025 · HKUDS
Dual entity-level + relation-level retrieval on a local KG. Lemnia adopts the dual split but replaces generation with 5-step cite-or-refuse.
HIPPORAG 2 · NEURIPS 2024 + ICLR 2025 · OSU-NLP
Associative memory with personalized PageRank on the KG. Lemnia takes the PPR weighting as inter-hop ranker; drops the agentic side.
PATHRAG · AAAI 2026 · ARXIV 2502.14902
Flow-based relational pruning, -44 % context reduction at equal accuracy. Lemnia adopts the prune but keeps generation deterministic, bound by cite-or-refuse.
MICROSOFT GRAPHRAG · 2024 · MSR
Hierarchical communities via Leiden + per-cluster summarisation. Lemnia adopts Leiden for cluster analysis but excludes cloud-side summary generation.

WHAT LEMNIA ADDS

Italian-native
NER, RE and parsing trained on Italian corpora (business + tax + medical). Never English-via-translation.
Mandatory citation
5-step pipeline: decomposition → substring match → NLI entailment → graph consistency → strip-and-replace. Per-pack hedging whitelist. Hallucinations never tolerated.
Local at query time
Optional cloud-burst only for heavy ingest and long generation. Query-time retrieval always on-prem.
Compliance built-in
BLAKE3-signed register per query. Per-niche pre-signed DPIA. Annex V 2026 hyper-amortisation eligible.

AI ACT ART. 50 · GDPR · NIS2 · GARANTE PROVV. 474/2025 · EU HOSTED

Request a pilot →← Back to overview

FOUNDER PROGRAM · LIMITED SEATS

Lemnia working on a real company's data.

A 30-minute demo, tailored to your industry. Lemnia composes a real customer's dossier, cites the sources line by line, and shows the signed register ready for the DPO.

Request a pilot→Download the technical dossier