Agents That Actually Remember You — Across Chat and Phone
The feature users feel: an agent that knows who you are whether you call or type. One memory pool per contact, joined across voice and chat.
Here is the moment that decides whether someone trusts your agent: they called yesterday and gave it their renewal date, and today they open the chat widget — and the agent asks for their renewal date again.
That one re-ask undoes everything. It tells the user, plainly, that "the agent" is a fiction — there's no it, just a stateless function answering one message at a time, on one channel at a time, with no idea who's on the other end.
Real ai agent memory isn't a vector database you bolt on. It's a much more specific promise: the same person is the same person, no matter how they reach you. If they call and then chat, it's one relationship. The agent should pick up where it left off. This post is about how Matrix keeps that promise — and why the hard part isn't storage, it's identity.
The hard part is identity, not storage
Everyone talks about embeddings and recall. Important, and we cover the mechanics in Embed-on-Write, Recall-on-Read. But recall is useless if you can't answer a prior question: whose memories are these?
A phone call arrives as a number on a SIP stream. A chat arrives as a logged-in session. These look like two completely different events from two completely different systems. The naive design stores call notes under a "phone call" record and chat history under a "conversation" record, and now the same human has two disconnected memories — one per channel — and neither agent ever finds the other.
Matrix refuses that split at the data model. Every interaction resolves to a single User(userType=CONTACT) — a channel-agnostic record for the human, not for the channel. Phone today, chat and WhatsApp next. The channel is an attribute of the interaction, not an identity of its own.
One resolver per channel, one identity at the end
Each channel has exactly one job: turn its raw identifier into the same contact record.
- Voice. The telephony bridge calls
CallerResolver, which finds-or-creates a contact by the caller's phone number — normalized to E.164, with Indian-format variants handled so+91,0-prefixed, and bare 10-digit forms all land on one row instead of three. - Text chat. The JWT user on the request plays the exact same role. No phone number needed; the authenticated identity is the contact.
Different front doors, same person at the end of the hallway.
Session.userId is the join
Matrix has one unified interaction entity: Session, discriminated by channel (TEXT_CHAT, VOICE_REALTIME, future WHATSAPP). A chat and a phone call aren't two concepts — they're one concept (a single interaction between an agent and a contact) recorded in one kind of row. The earlier design had separate Conversation and PhoneCall types; unifying them was deliberate, because a text chat and a phone call really are the same thing.
Every Session carries a userId — the resolved contact. The voice bridge stamps it right after CallerResolver hands back the contact; chat stamps it straight from the JWT user. Same field, same value, regardless of channel.
And that single shared field is the whole trick. Memory recall is scoped to (agentId, userId) — the agent plus the contact. It is not scoped to the channel. So a voice Session and a chat Session for the same person, talking to the same agent, draw from one memory pool:
recall(agentId, userId) # NOT recall(agentId, userId, channel)
↳ everything this agent has ever learned about this contact,
whether they called or typed
Call in the morning, chat in the afternoon, and the afternoon agent has the morning's memories. Not because we copied anything between channels — because there was only ever one pool, and both channels pointed at it.
The profile block that wins
Storing memory is half the job. The other half is getting the model to actually use it instead of re-asking. That fight is harder than it sounds, because the model is also reading a persona prompt that may run thousands of characters and say things like "always confirm the customer's date of birth."
So Matrix doesn't quietly append a few facts and hope. MemoryContextRenderer injects a deliberately heavy-handed "## Who you are talking to" block into the system prompt — the same block for both text chat and Gemini Live voice. It's framed as an override on purpose. It has to win against a long persona prompt that's pulling the model toward interrogating someone it already knows.
The block looks roughly like this:
## Who you are talking to
You are speaking with Priya. You already know these things about her —
do NOT ask for them again:
- Plan: Premium (annual)
- Renewal date: 2026-08-14
- Preferred name: Priya
What you still need to learn (ask naturally if it comes up):
- Reason for considering cancellation
Recent conversations:
- [voice, 2 days ago] Called about a billing question; resolved.
Notice two things. The framing is blunt — "do NOT ask for them again" — because polite framing loses to a wall of persona text. And the same rendered block rides into voice and chat identically, so the agent's recall behaves the same way no matter how the contact reached it. Prompt parity across channels is an enforced invariant in Matrix, and memory rides on top of it.
"What you still need to learn"
That second list isn't decoration. Set an agent's requiredCallerFields — a CSV like name,plan,renewalDate for a retention rep, or dateOfBirth,birthTime,birthPlace for an astrologer — and MemoryContextRenderer emits a checklist stub for whatever is still missing. The agent is told exactly what gap to close, so it gathers the right information conversationally instead of either scripting an interrogation or forgetting to ask. Each Skill can contribute its own requiredContactFields, which union into the agent's list.
Five tools to write what it learns
The agent needs to save what it learns, on either channel, without you wiring anything. Matrix gives every agent five built-in memory tools for free:
update_contact_profile— name, plan, preferences, any profile attributeset_contact_birth_place— resolves and stores a placeset_contact_current_location— where they are nowadd_contact_note— a free-form durable notelookup_contact_details— read back what's already known
These are auto-composed onto the agent's tool surface alongside an ambient get_current_time. You don't attach them, declare them, or write a callback. (A skill can narrow which of the five it exposes via builtinTools, but the default is they're all there.)
The one subtlety worth knowing: the bound contact id is a per-turn ThreadLocal. When tools are composed off the request thread — as the voice path does — that binding is propagated explicitly so a memory write on a phone call lands on the right contact. You don't manage this; the runtime does. It's the kind of plumbing you'd otherwise spend a sprint getting subtly wrong.
After the call: the post-interaction extractor
Tools capture facts the agent decides to write during a conversation. But a lot of signal only becomes clear in hindsight — the gist of why they called, the outcome, the mood. So Matrix runs a fire-and-forget post-call extractor (MemoryExtractorService) after the interaction ends. It distills the session with an LLM and writes the result back.
This is where a clean rule of thumb keeps the memory model from rotting:
Session.summary= a digest of this session. The extractor folds a short summary onto the session row itself. It's the "recent conversations" line you saw in the profile block —MemoryContextRenderersurfaces recent sessions for(agent, userId)that have a non-blank summary, across channels.Memoryrows = what survives across sessions. Durable, cross-session facts — the things a memory tool wrote, plus new facts the extractor pulled out — live as separateMemoryrows, embedded for vector recall.
One is the transcript-level recap of a single interaction; the other is the long-term knowledge that should follow the contact into every future conversation on any channel. Keeping them distinct is what stops your "memory" from degrading into an ever-growing pile of redundant session digests. (The extractor uses the agent's configured provider; an agent with no provider simply skips extraction — by design.)
If you want to go deeper on the kinds of memory and how they're typed, see Working, Episodic, Semantic, Procedural.
What this feels like to the person on the other end
Strip away the implementation and here's the user-facing payoff:
- They call a retention agent and explain they're on the Premium annual plan, renewing in August, frustrated about a billing line. The agent saves it as it goes.
- The call ends. The extractor writes a digest onto the session and any durable facts as
Memoryrows. - Two days later they open the chat widget. Different channel, different front door.
- The chat agent's prompt already contains the "Who you are talking to" block — their plan, their renewal date, the gist of the call — because chat and voice resolved to the same contact and share one memory pool.
- The agent picks up the thread. It does not ask for the renewal date again.
That step 5 — the absence of a re-ask — is the entire feature. It's small, and it's the difference between something that feels like a tool and something that feels like it knows you.
Takeaway
Cross-channel memory isn't a storage problem, it's an identity problem. Matrix solves it by resolving every interaction — phone via CallerResolver, chat via the JWT user — to one channel-agnostic User(userType=CONTACT), joining voice and chat Session rows by userId, and scoping recall to (agentId, userId) so both channels share a single memory pool. A heavy-handed "Who you are talking to" block forces the model to use what it knows; requiredCallerFields tells it what's still missing; five built-in tools write new facts; and a post-call extractor splits the per-session digest (Session.summary) from the durable cross-session facts (Memory rows). The result is an agent that remembers you whether you call or type.
See it remember
Spin up a workspace, create an agent at /orgs/{slug}/admin/agents, set its requiredCallerFields, and talk to it twice — once on voice, once on chat. Watch the second conversation start where the first left off. The memory tools and the contact-learning block are already attached; you don't wire a thing. Then read Embed-on-Write, Recall-on-Read for the vector mechanics underneath it all.
Build your first agent on Matrix
Spin up a workspace, wire up tools and knowledge, give your agent a voice, and talk to it in real time — no agent code required.