The Tax MCP Memory

IntroductionIn our previous article, Bridging the Tax Gap: Building an Open Source MCP for South African Tax we created a constrained system prompt through a text file to regulate the responses that the LLM produces as an output. These led us to figure that we can automate and perform prompt sequencing for multi-prompt tasks by storing prompts in a structured text file. Through the enumeration of specific instructions, we can be able to sequence the execution of prompts such as retrieving a list, filtering data, and executing an action—we can parse these prompts into a sequential workflow. This allows the LLM to execute a chain of prompts autonomously until the high-level goal is achieved. Although we won't be exploring that now, it will be a thrill to explore and experiment.This article explores the implementation of persistent memory for conversational AI. We will demonstrate how to store and retrieve historical user queries and model responses, ensuring the chatbot maintains context and provides a seamless, personalized user experience over time.Memory ContextTo build a truly intelligent system, we must implement a robust Conversation State Management strategy. By capturing historical interactions, the system can store discrete data points—such as a user’s age or annual income—in a Key-Value store for high-precision tasks like personal income tax calculation. For broader contextual recall, we utilize Retrieval-Augmented Generation (RAG) powered by Vector Embeddings. This combination allows the chatbot to cross-reference specific tax year details and business context from past conversations, drastically improving both personalization and mathematical accuracy.FeatureKey-Value Store (Structured)Vector RAG (Unstructured)Example Dataannual_income: 75000"The user mentioned a side-hustle in 2023."Best ForExact calculations and formulas.Nuance, history, and "vibe."AnalogyA labeled filing cabinet.A giant library with a smart librarian.1. Data Storage (The "Brain" on Disk)The code uses a local JSON file to persist data.Standard file I/O using Node. js promises. It reads the array of memories into memory and writes them back when updated.MemoryEntry Type: Each memory stores the original message, a summary, and an embedding (a list of numbers representing the "meaning" of the text).2. Mathematical Comparison (Cosine Similarity). This is the core of how AI "retrieval" works.normalize: Adjusts the embedding vector so its length is 1, making calculations consistent.cosine: This function calculates the Cosine Similarity. It measures the angle between two vectors.A score of 1. 1.0 means the meanings are identical.A score of 0 means they are completely unrelated. $\text {Cosine Similarity} = \frac{A \cdot B}{\|A\| \|B\|}$ 3. Creating Embeddings (getEmbedding). This function converts a string of text (prose) into a mathematical vector (numbers).Dynamic AI Import: It tries to use a professional model, like text-embedding-3-large.Deterministic Fallback: If no AI model is connected, it has a clever "fallback" that turns characters into numbers using a hash. This ensures the code doesn't crash even if the internet is down, though the "meaning" won't be as accurate. 4. The Public API (The Tools). These are the functions your MCP server would actually call: registerMemory:When the user says something important (e. g., "I have 3 dependents"), you call this to save it. It generates the embedding and adds it to the JSON file. retrieveMemories: This is the "Search" function. When the user asks a question, this tool turns the question into an embedding. Compares it against all saved memories using the cosine function.Returns the top 5 most relevant memories. The Code1. The Storage InterfaceThis part handles the "Physical Memory"—saving and loading the data to a file on your computerconst MEM_FILE = path.join(process.cwd(), 'memories.json');async function loadMemories(): Promise<MemoryEntry[]> { try { const raw = await readFile(MEM_FILE, 'utf8'); return JSON.parse(raw) as MemoryEntry[]; } catch (e) { return []; // If file doesn't exist, return empty list }}async function saveMemories(memories: MemoryEntry[]) { await writeFile(MEM_FILE, JSON.stringify(memories, null, 2), 'utf8');}Purpose: It ensures that when you restart your AI server, it doesn't "forget" everything. It converts your TypeScript objects into a JSON file and back again.2. The Semantic "Translator" (getEmbedding)Computers cannot compare the "meaning" of words like "SARS" and "Taxation" directly. They need numbers. This function translates text into a Vector (an array of numbers).async function getEmbedding(text: string): Promise<number[]> { // 1. Try to use a high-end AI model (Gemini/OpenAI) via the 'ai' package try { ... } // 2. The "Fallback": If the AI is offline, use math to create a unique pattern const L = 128; const vec = new Array<number>(L).fill(0); for (let i = 0; i < text.length; i++) { const code = text.charCodeAt(i); const idx = (code * (i + 1)) % L; vec[idx] = vec[idx] + (code % 97) / 97; } return normalize(vec);}Purpose: This turns a sentence into a "coordinate" in a multi-dimensional space. Similar topics will have coordinates that are physically close to each other.3. The "Similarity" Engine (cosine)Once we have coordinates (vectors), we need to know how close they are. This uses Cosine Similarity math.function cosine(a: number[], b: number[]) { // Math that calculates the angle between two arrows (vectors) // If the angle is 0 degrees, the "meaning" is 100% the same. let dot = 0; for (let i = 0; i < n; i++) { dot += a[i] * b[i]; } return dot / (Math.sqrt(na) * Math.sqrt(nb));}Purpose: If a user asks about "VAT," this function scans all old memories and finds the one with the most similar mathematical pattern4. The Memory Manager (register and retrieve)These are the high-level functions that orchestrate the whole process.export async function registerMemory(message: string, summary: string) { const embedding = await getEmbedding(message); // Step 1: Translate to numbers const memories = await loadMemories(); // Step 2: Get existing database const entry: MemoryEntry = { ... }; // Step 3: Create new record memories.push(entry); // Step 4: Add to list await saveMemories(memories); // Step 5: Save to disk}export async function retrieveMemories(query: string, limit = 5) { const qEmb = await getEmbedding(query); // Step 1: Translate search query to numbers const memories = await loadMemories(); // Step 2: Load all past data const scored = memories.map((m) => ({ score: cosine(qEmb, m.embedding), // Step 3: Compare query to every memory memory: m })); scored.sort((a, b) => b.score - a.score); // Step 4: Sort by "Most Relevant" return scored.slice(0, limit); // Step 5: Return top 5 matches}

Introduction

In our previous article, Bridging the Tax Gap: Building an Open Source MCP for South African Tax we created a constrained system prompt through a text file to regulate the responses that the LLM produces as an output. These led us to figure that we can automate and perform prompt sequencing for multi-prompt tasks by storing prompts in a structured text file. Through the enumeration of specific instructions, we can be able to sequence the execution of prompts such as retrieving a list, filtering data, and executing an action—we can parse these prompts into a sequential workflow. This allows the LLM to execute a chain of prompts autonomously until the high-level goal is achieved. Although we won't be exploring that now, it will be a thrill to explore and experiment.

This article explores the implementation of persistent memory for conversational AI. We will demonstrate how to store and retrieve historical user queries and model responses, ensuring the chatbot maintains context and provides a seamless, personalized user experience over time.

Memory Context

To build a truly intelligent system, we must implement a robust Conversation State Management strategy. By capturing historical interactions, the system can store discrete data points—such as a user’s age or annual income—in a Key-Value store for high-precision tasks like personal income tax calculation. For broader contextual recall, we utilize Retrieval-Augmented Generation (RAG) powered by Vector Embeddings. This combination allows the chatbot to cross-reference specific tax year details and business context from past conversations, drastically improving both personalization and mathematical accuracy.

Feature	Key-Value Store (Structured)	Vector RAG (Unstructured)
Example Data	annual_income: 75000	"The user mentioned a side-hustle in 2023."
Best For	Exact calculations and formulas.	Nuance, history, and "vibe."
Analogy	A labeled filing cabinet.	A giant library with a smart librarian.

1. Data Storage (The "Brain" on Disk)

The code uses a local JSON file to persist data.

Standard file I/O using Node. js promises.

It reads the array of memories into memory and writes them back when updated.

MemoryEntry Type: Each memory stores the original message, a summary, and an embedding (a list of numbers representing the "meaning" of the text).

2. Mathematical Comparison (Cosine Similarity).

This is the core of how AI "retrieval" works.

normalize: Adjusts the embedding vector so its length is 1, making calculations consistent.

cosine: This function calculates the Cosine Similarity.

It measures the angle between two vectors.

A score of 1.

1.0 means the meanings are identical.

A score of 0 means they are completely unrelated.

$\text {Cosine Similarity} = \frac{A \cdot B}{\|A\| \|B\|}$

3. Creating Embeddings (getEmbedding).

This function converts a string of text (prose) into a mathematical vector (numbers).

Dynamic AI Import: It tries to use a professional model, like text-embedding-3-large.

Deterministic Fallback: If no AI model is connected, it has a clever "fallback" that turns characters into numbers using a hash.

This ensures the code doesn't crash even if the internet is down, though the "meaning" won't be as accurate.

4. The Public API (The Tools).

These are the functions your MCP server would actually call:

registerMemory:When the user says something important (e. g., "I have 3 dependents"), you call this to save it. It generates the embedding and adds it to the JSON file.

retrieveMemories: This is the "Search" function. When the user asks a question, this tool turns the question into an embedding. Compares it against all saved memories using the cosine function.

Returns the top 5 most relevant memories.

The Code

1. The Storage Interface

This part handles the "Physical Memory"—saving and loading the data to a file on your computer

const MEM_FILE = path.join(process.cwd(), 'memories.json');async function loadMemories(): Promise<MemoryEntry[]> {  try {    const raw = await readFile(MEM_FILE, 'utf8');    return JSON.parse(raw) as MemoryEntry[];  } catch (e) {    return []; // If file doesn't exist, return empty list  }}async function saveMemories(memories: MemoryEntry[]) {  await writeFile(MEM_FILE, JSON.stringify(memories, null, 2), 'utf8');}

Purpose: It ensures that when you restart your AI server, it doesn't "forget" everything. It converts your TypeScript objects into a JSON file and back again.

2. The Semantic "Translator" (getEmbedding)

Computers cannot compare the "meaning" of words like "SARS" and "Taxation" directly. They need numbers. This function translates text into a Vector (an array of numbers).

async function getEmbedding(text: string): Promise<number[]> {  // 1. Try to use a high-end AI model (Gemini/OpenAI) via the 'ai' package  try { ... }   // 2. The "Fallback": If the AI is offline, use math to create a unique pattern  const L = 128;  const vec = new Array<number>(L).fill(0);  for (let i = 0; i < text.length; i++) {    const code = text.charCodeAt(i);    const idx = (code * (i + 1)) % L;    vec[idx] = vec[idx] + (code % 97) / 97;  }  return normalize(vec);}

Purpose: This turns a sentence into a "coordinate" in a multi-dimensional space. Similar topics will have coordinates that are physically close to each other.

3. The "Similarity" Engine (cosine)

Once we have coordinates (vectors), we need to know how close they are. This uses Cosine Similarity math.

function cosine(a: number[], b: number[]) {  // Math that calculates the angle between two arrows (vectors)  // If the angle is 0 degrees, the "meaning" is 100% the same.  let dot = 0;  for (let i = 0; i < n; i++) {    dot += a[i] * b[i];  }  return dot / (Math.sqrt(na) * Math.sqrt(nb));}

Purpose: If a user asks about "VAT," this function scans all old memories and finds the one with the most similar mathematical pattern

4. The Memory Manager (register and retrieve)

These are the high-level functions that orchestrate the whole process.

export async function registerMemory(message: string, summary: string) {  const embedding = await getEmbedding(message); // Step 1: Translate to numbers  const memories = await loadMemories();         // Step 2: Get existing database  const entry: MemoryEntry = { ... };            // Step 3: Create new record  memories.push(entry);                          // Step 4: Add to list  await saveMemories(memories);                  // Step 5: Save to disk}
export async function retrieveMemories(query: string, limit = 5) {  const qEmb = await getEmbedding(query);        // Step 1: Translate search query to numbers  const memories = await loadMemories();         // Step 2: Load all past data  const scored = memories.map((m) => ({     score: cosine(qEmb, m.embedding),            // Step 3: Compare query to every memory    memory: m   }));  scored.sort((a, b) => b.score - a.score);      // Step 4: Sort by "Most Relevant"  return scored.slice(0, limit);                 // Step 5: Return top 5 matches}

The Tax MCP Memory

Introduction

Memory Context

The Code

Comments

Related Articles

Bridging the Tax Gap: Building an Open-Source MCP for South African Tax

More Articles
Await You

We use cookies

Cookie Preferences

The Tax MCP Memory

Introduction

Memory Context

The Code

Comments

Related Articles

Bridging the Tax Gap: Building an Open-Source MCP for South African Tax

More ArticlesAwait You

More Articles
Await You