Tutorials & Use Cases·September 11, 2025·8 min read

Everything Is a Node: The Generic Entity Model

Matrix has no per-domain @Node classes. Organizations, agents, sessions, leads, memories — all of it is EntityType / EntityNode rows in one Neo4j graph.

By Matrix Team

Open the source tree of most platforms and you can read the domain off the package names: Organization.java, Agent.java, Conversation.java, Lead.java, each a hand-rolled persistence class with its own table, its own repository, its own migration when a column changes. Add a feature, add a class. Add a customer who needs one extra field, fork the schema.

Matrix doesn't have those classes. There is no @Node class Agent, no @Node class Lead, no @Node class Campaign. Organization, Team, User, Agent, Skill, Knowledge, Tool, McpServer, LlmProvider, Session, Message, Campaign, Lead, Memory, AuditEvent — every one of them is a row of the same two types: EntityType describes a shape, EntityNode is an instance of that shape. The whole domain is data in one Neo4j graph, and the code that reads and writes it doesn't know or care what domain it's modelling.

This post is the how and the why — the model, what it buys you (multi-tenant custom fields with zero forks), and the sharp edges you'll cut yourself on if you don't know they're there.

The two-type core

A PropertyDefinition is one field on a shape:

{
  "name": "requiredCallerFields",
  "type": "STRING",
  "required": false,
  "targetEntityType": null,
  "multi": false
}

type is one of STRING | NUMBER | BOOLEAN | DATE | ENTITY. When type is ENTITY, targetEntityType names the shape it points at and multi says whether it's one reference or many. An Agent's skills field, for example, is type: ENTITY, targetEntityType: "Skill", multi: true.

An EntityType is just a key plus a schemaJson holding a List<PropertyDefinition>. EntityType.ownerOrgId is null for platform globals and set to an org for per-org custom types — hold that thought, it's the whole trick behind custom fields.

An EntityNode is the instance. Conceptually:

EntityNode = id
           + entityType            // "Agent", "Lead", "Session", ...
           + orgId                 // the tenant marker on every row
           + List<EntityRelationship> fieldRefs   // the ENTITY-typed fields

Every node carries its entityType and its orgId. The orgId is not a nicety — it's the floor every query stands on, the same tenancy discipline covered in Multi-Tenancy Is Not a Feature You Bolt On. The EntityRelationship entries (@RelationshipProperties { field, target } on a :HAS_FIELD edge) are how ENTITY-typed fields are stored — an agent's skills becomes one :HAS_FIELD relationship per skill it points at.

Scalar fields, though, are not relationships. And they're not a JSON blob either.

Scalars are native Neo4j properties

This is the part that took a migration to get right. Early Matrix stuffed every scalar field into a single propertiesJson string on the node — easy to write, miserable to query. You can't index inside an opaque blob, so every "find the Lead with this phone number" turned into a full scan and a JSON parse.

V005__entity_properties_native.cypher fixed it: it converted that blob into per-field native Neo4j properties on every :Entity row. A Lead's phoneNumber is now a real node property — n.phoneNumber — not a substring of a blob. Which means V006__entity_property_indexes.cypher could then add composite indexes on the hot lookup keys: (entityType, userId), (entityType, phoneNumber), (entityType, callSid), (entityType, campaign), (entityType, key), and friends. Every property carrying a filter on a hot path gets to ride a real index.

So a single node mixes both storage strategies, on purpose:

Scalars (STRING / NUMBER / BOOLEAN / DATE) → native node properties, indexable.
ENTITY refs → :HAS_FIELD relationships, traversable.

EntityManager.splitProperties does the sorting at write time: scalars go into the SET n += $props map, ENTITY fields become relationships. When multi: true, it accepts a List and persists one :HAS_FIELD per element, then re-assembles them into a List<Long> on the way back out in toDto.

One thing the property pipeline deliberately skips: the 768-d memory embedding. It lives as a native property on the node too, but it's written by raw Cypher in MemoryVectorStore and stripped from the user-facing property bag — it's conceptually separate from the schema. The full story is in Embed-on-Write, Recall-on-Read.

Why this is powerful: custom fields without a fork

Here's the payoff. Because a shape is data, not a class, you can give one tenant extra fields without touching the platform code or anyone else's data.

Matrix ships a global Lead type with the fields every CRM needs. A tenant who wants three extra fields creates a same-named, org-owned EntityType — ownerOrgId set to their org — whose schemaJson is the global shape plus their additions. EntityManager resolves the type org-first: when this tenant reads or writes a Lead, they get their overlay; everyone else gets the global default. No fork, no migration, no second deploy. The agent tools adapt too — update_lead reflects whatever fields the resolved shape declares. That's the foundation under Turn Matrix Into Your CRM.

It's the same lever the whole platform pulls. A new persona is an Agent row. A new corpus is a Knowledge row. A new field on any existing type is a PropertyDefinition edit — not a new class, a new repository, and a Liquibase changeset. This is what we mean when we say the core stays domain-agnostic: your domain lives in data, never in src/main/java. It's layer 2 of the ten-layer stack, and every layer above it inherits the genericity for free.

The sharp edges

A generic store earns its power by moving complexity from the schema into a handful of write-path rules. Get these wrong and you'll lose data quietly.

`updateEntity` is REPLACE, not MERGE

EntityManager.updateEntity is a full replace. It removes every non-reserved native property not in the incoming payload, then does SET n += $props. It also drops the :HAS_FIELD relationships whose field name the update touches. So a "partial update" that sends only the two fields you changed will null out everything else on the node. That's a footgun shaped exactly like a convenience method.

The fix is to reach for mergeEntity(id, partial) instead. It overlays via SET n += $partial and only touches the relationships you explicitly name — a true partial update. Every admin form and AgentsAdminController.PATCH go through mergeEntity for precisely this reason. Rule of thumb: if you have the whole entity, updateEntity is fine; if you have a patch, mergeEntity is the only safe choice.

`createEntity` must be one raw-Cypher transaction

This one is a Neo4j isolation gotcha that cost real debugging time. The obvious way to create an entity is "save the node with Spring Data Neo4j, then run a raw Cypher SET n += $props to write the scalars." It silently loses every scalar.

Why: SDN runs its save in a managed transaction. A raw Driver session opens a different transaction, and under Neo4j's read-committed isolation it can't see the SDN transaction's uncommitted node. The SET matches zero rows and fails silently.

So createEntity bypasses SDN entirely and does the whole insert — node, scalars, and relationships — in one raw-Cypher executeWrite:

CREATE (n:Entity)
SET n = $scalarProps,
    n.entityType = $type,
    n.orgId = $orgId
WITH n
UNWIND $refs AS ref
MATCH (t:Entity) WHERE id(t) = ref.target
CREATE (n)-[:HAS_FIELD {field: ref.field}]->(t)

The lesson generalizes: if you ever add another write path that creates an :Entity, do the whole thing in one raw transaction. Don't split node creation from scalar/relationship writes across two transactions, or you'll re-discover read-committed the hard way.

Schema-driven means schema edits, not ad-hoc fields

Because shapes are validated data, you can't just start writing a new property and expect it to be queryable. Introducing a field means editing the corresponding EntityType schema in SystemSchemaSeeder (the seeder stays global-only — never let it MERGE-by-name over an org's custom overlay, or you'll clobber a tenant's extra fields). And if the new field gets filtered on a hot path, it also needs an index entry in V006 or a new migration. A new field is cheap, but it's a deliberate two-line change, not an accident.

Neo4j Community has no composite NODE KEY

Worth knowing if you're extending the schema: Neo4j Community Edition doesn't support composite NODE KEY constraints. Uniqueness is enforced with single-property UNIQUE constraints instead. Don't reach for a two-column key — model uniqueness on one property (a per-org key, for instance) and let the (entityType, …) composite indexes carry the query performance.

The takeaway

Modelling every domain concept as EntityType + EntityNode rows in one Neo4j graph is a trade: you give up compile-time domain classes and the IDE's autocomplete on lead.phoneNumber, and in return you get a platform where a new persona, a new corpus, or a per-tenant custom field is a data change — no fork, no migration, no redeploy. The complexity doesn't vanish; it concentrates into three write-path rules — mergeEntity for patches, one raw transaction for creates, schema-plus-index for new fields. Learn those three and the model is a superpower. Ignore them and it's a quiet data-loss machine.

If you're building anything multi-tenant where different customers need different shapes of the same thing, this is the pattern that lets you ship the extra field without shipping a fork.

Want to see it run? Spin up a workspace, create an Agent from a form, then POST /api/orgs/{slug}/agents to watch the same shape go in over the API — no class, no migration. Browse the schema in SystemSchemaSeeder and the write path in EntityManager to see exactly how the two types become your entire domain.

#graph entity data model#neo4j#schema#architecture

Build your first agent on Matrix

Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.

Create a workspace Read more articles