← back to posts

Memory and State Management in CrewAI

In the first two posts, we built agents with tools and task dependencies. But there’s a critical gap: agents have no memory between runs.

Run your crew today, and it forgets everything tomorrow. Ask it “what did you research yesterday?” and it stares back blankly. For simple one-shot tasks, that’s fine. But for real-world applications—iterative analysis, customer support, ongoing monitoring—agents need to remember.

The challenge isn’t just persistence. It’s what to remember, how long to remember it, and how much to feed back into the LLM without exploding your token costs.

Why Memory Matters

Consider a customer support agent. First interaction:

  1. Customer reports a bug in their integration
  2. Agent gathers logs, researches the issue, proposes a solution
  3. Interaction ends

Second interaction, three days later:

  1. Same customer returns: “The fix worked partially, but now X is happening”
  2. Without memory: Agent re-gathers logs, re-researches, has no context that Y was already tried
  3. With memory: Agent recalls previous findings, understands the current issue is a follow-up, and focuses on different solutions

The second case is infinitely more effective. And cheaper—less re-research, faster resolution.

Multi-turn workflows benefit similarly. An analysis crew that remembers previous findings, failed approaches, and discovered patterns becomes better over time.

Memory Architecture in CrewAI

CrewAI provides two layers:

1. Short-Term Memory (Within a Crew Run)

Context flows between tasks automatically via the context= parameter we covered before. Task B sees Task A’s output. Task C sees both. This is ephemeral—it dies when the crew finishes.

1
2
3
4
5
6
7
8
# Context flows within the run
task_a = Task(description="...", agent=researcher)
task_b = Task(description="...", agent=analyst, context=[task_a])
task_c = Task(description="...", agent=writer, context=[task_a, task_b])

crew = Crew(agents=[...], tasks=[task_a, task_b, task_c])
result = crew.kickoff()
# After crew.kickoff(), all context is lost

2. Long-Term Memory (Across Crew Runs)

This is what we need to implement. CrewAI doesn’t provide a built-in memory system—it gives you hooks, and you build what fits your needs.

The key hook is Agent memory, which CrewAI reads from and writes to.

Implementing Agent Memory

Agents store memories in a simple structure. CrewAI’s default uses in-memory lists, but you can override it.

Pattern 1: Using CrewAI’s Built-in Memory (Basic)

By default, agents have a memory attribute, but it’s ephemeral. To make it persist, we need to wire it manually:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
from crewai import Agent, Task, Crew
from crewai_tools import tool
import json
import os
from datetime import datetime

# Simple file-based memory
class FileMemory:
    def __init__(self, agent_name: str, storage_dir: str = "./agent_memory"):
        self.agent_name = agent_name
        self.storage_dir = storage_dir
        self.file_path = os.path.join(storage_dir, f"{agent_name}_memory.json")
        os.makedirs(storage_dir, exist_ok=True)
        self.memories = self._load()
    
    def _load(self) -> list:
        """Load memories from disk."""
        if os.path.exists(self.file_path):
            try:
                with open(self.file_path, "r") as f:
                    return json.load(f)
            except json.JSONDecodeError:
                return []
        return []
    
    def save(self):
        """Persist memories to disk."""
        with open(self.file_path, "w") as f:
            json.dump(self.memories, f, indent=2)
    
    def add(self, entry: dict):
        """Record a memory."""
        entry["timestamp"] = datetime.now().isoformat()
        self.memories.append(entry)
        self.save()
    
    def get_recent(self, limit: int = 10) -> str:
        """Fetch recent memories as formatted text for the agent."""
        recent = self.memories[-limit:]
        if not recent:
            return "No prior memories."
        return "\n".join([
            f"[{m['timestamp']}] {m['type']}: {m['content']}"
            for m in recent
        ])
    
    def clear(self):
        """Wipe all memories."""
        self.memories = []
        self.save()

# Instantiate per agent
researcher_memory = FileMemory("researcher")

# Create agent and manually inject memory retrieval into its backstory
researcher = Agent(
    role="Senior Research Analyst",
    goal="Uncover detailed information about companies",
    backstory=f"""You have 10 years of experience researching markets. You excel at finding patterns.

Prior findings and research:
{researcher_memory.get_recent()}

Use these prior findings to inform your current analysis. If the current task is a follow-up 
to previous research, build on what you already know instead of starting from scratch.""",
    verbose=True,
    llm="openai/gpt-4o"
)

# After crew runs, save what was discovered
def save_crew_results(crew_result: str, agent_memory: FileMemory, result_type: str = "finding"):
    """Parse and save crew results to agent memory."""
    agent_memory.add({
        "type": result_type,
        "content": crew_result[:500]  # Summarize to avoid bloat
    })

This is hacky but functional. We’re embedding memory summaries into the backstory. It works for small memory volumes but breaks at scale (token costs explode).

Pattern 2: Structured Memory with Semantic Filtering

Better approach: store memories, but only surface relevant ones to the agent.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import json
import os
from datetime import datetime
from typing import List, Optional

class SemanticMemory:
    """Structured memory with keyword tagging and semantic search."""
    
    def __init__(self, agent_name: str, storage_dir: str = "./agent_memory"):
        self.agent_name = agent_name
        self.storage_dir = storage_dir
        self.file_path = os.path.join(storage_dir, f"{agent_name}_semantic.json")
        os.makedirs(storage_dir, exist_ok=True)
        self.memories: List[dict] = self._load()
    
    def _load(self) -> list:
        if os.path.exists(self.file_path):
            try:
                with open(self.file_path, "r") as f:
                    return json.load(f)
            except json.JSONDecodeError:
                return []
        return []
    
    def save(self):
        with open(self.file_path, "w") as f:
            json.dump(self.memories, f, indent=2)
    
    def add(self, content: str, tags: List[str], category: str = "general"):
        """Add memory with tags for later retrieval."""
        self.memories.append({
            "id": len(self.memories),
            "timestamp": datetime.now().isoformat(),
            "content": content,
            "tags": tags,
            "category": category
        })
        self.save()
    
    def search(self, query_tags: List[str], limit: int = 5) -> str:
        """Retrieve memories matching tags."""
        matches = []
        for mem in self.memories:
            if any(tag in mem["tags"] for tag in query_tags):
                matches.append(mem)
        
        matches.sort(key=lambda m: m["timestamp"], reverse=True)
        results = matches[:limit]
        
        if not results:
            return "No relevant prior findings."
        
        formatted = "\n".join([
            f"[{m['timestamp'][:10]}] {m['content']}"
            for m in results
        ])
        return f"Relevant prior findings:\n{formatted}"
    
    def get_context_for_task(self, task_description: str) -> str:
        """Smart retrieval: extract keywords from task and find related memories."""
        # In production, use semantic search (embedding similarity) instead of keywords
        # For now, simple keyword matching
        keywords = task_description.lower().split()
        
        matching_tags = []
        for mem in self.memories:
            for tag in mem["tags"]:
                if any(kw in tag.lower() for kw in keywords):
                    matching_tags.append(tag)
        
        if not matching_tags:
            return "No prior context available."
        
        return self.search(matching_tags, limit=5)

# Usage
analyst_memory = SemanticMemory("analyst")

# Save findings
analyst_memory.add(
    content="Apple's current P/E ratio is 28.5, revenue growth YoY is 12%",
    tags=["Apple", "valuation", "financial"],
    category="company_analysis"
)

analyst_memory.add(
    content="Microsoft dominates enterprise cloud, Azure revenue up 28% YoY",
    tags=["Microsoft", "cloud", "enterprise"],
    category="company_analysis"
)

# Retrieve contextually
task_description = "Analyze Apple's financial health and growth prospects"
context = analyst_memory.get_context_for_task(task_description)
print(context)
# Output: Relevant prior findings with Apple-tagged memories

Pattern 3: Crew-Level Memory for Shared Context

Sometimes the entire crew needs shared memory—findings that benefit multiple agents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import json
from typing import Dict, Any

class CrewMemory:
    """Shared memory accessible to all agents in a crew."""
    
    def __init__(self, crew_name: str, storage_dir: str = "./crew_memory"):
        self.crew_name = crew_name
        self.storage_dir = storage_dir
        self.file_path = os.path.join(storage_dir, f"{crew_name}_crew.json")
        os.makedirs(storage_dir, exist_ok=True)
        self.state: Dict[str, Any] = self._load()
    
    def _load(self) -> dict:
        if os.path.exists(self.file_path):
            try:
                with open(self.file_path, "r") as f:
                    return json.load(f)
            except json.JSONDecodeError:
                return {}
        return {}
    
    def save(self):
        with open(self.file_path, "w") as f:
            json.dump(self.state, f, indent=2)
    
    def set(self, key: str, value: Any):
        """Store a value in crew memory."""
        self.state[key] = value
        self.save()
    
    def get(self, key: str, default: Any = None) -> Any:
        """Retrieve a value."""
        return self.state.get(key, default)
    
    def summary(self) -> str:
        """Format memory as text for agent context."""
        if not self.state:
            return "No shared findings yet."
        return "\n".join([
            f"• {k}: {v}"
            for k, v in self.state.items()
        ])

# Usage in a crew
crew_memory = CrewMemory("market_analysis")

# After crew runs, agents write findings
crew_memory.set("market_trend", "Tech sector is consolidating")
crew_memory.set("key_players", ["AAPL", "MSFT", "GOOG"])
crew_memory.set("next_focus", "AI-driven startups")

# Inject into next crew's agent backstories
next_researcher_backstory = f"""You are a market analyst.

Previous crew findings:
{crew_memory.summary()}

Build on these findings in your current analysis."""

State Isolation and Multi-User Safety

If you’re running crews in a service (API, web app), isolate state per user or session. Shared memory becomes a nightmare.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class UserMemory:
    """Isolate memories per user."""
    
    def __init__(self, user_id: str, storage_dir: str = "./user_memories"):
        self.user_id = user_id
        self.storage_dir = storage_dir
        self.user_dir = os.path.join(storage_dir, user_id)
        os.makedirs(self.user_dir, exist_ok=True)
        self.memory = SemanticMemory(f"{user_id}_agent", self.user_dir)
    
    @staticmethod
    def cleanup_old_users(storage_dir: str, days_inactive: int = 30):
        """Garbage collect old user data."""
        import shutil
        from datetime import timedelta
        
        if not os.path.exists(storage_dir):
            return
        
        cutoff = datetime.now() - timedelta(days=days_inactive)
        for user_dir in os.listdir(storage_dir):
            user_path = os.path.join(storage_dir, user_dir)
            mtime = os.path.getmtime(user_path)
            if datetime.fromtimestamp(mtime) < cutoff:
                shutil.rmtree(user_path)

# Service-level usage
def run_analysis_for_user(user_id: str, company: str) -> str:
    user_mem = UserMemory(user_id)
    
    # Crew sees user's prior research
    analyst = Agent(
        role="Analyst",
        goal="Analyze companies",
        backstory=f"Prior research:\n{user_mem.memory.get_context_for_task(company)}",
        llm="openai/gpt-4o"
    )
    
    # ... build crew, run, save results
    crew.kickoff()
    user_mem.memory.add(f"Analyzed {company}", tags=[company])
    
    return result

# Cleanup old user data weekly
UserMemory.cleanup_old_users("./user_memories", days_inactive=90)

Common Pitfalls

1. Unbounded memory growth. Memories accumulate forever. Memory file becomes 50MB, loading takes seconds, token costs explode. Set a retention policy: keep last 100 entries, purge entries older than 6 months, or use a fixed-size ring buffer.

1
2
3
4
5
6
7
8
def prune_old_memories(self, max_entries: int = 100, days_old: int = 180):
    """Keep only recent memories."""
    from datetime import timedelta
    cutoff = datetime.now() - timedelta(days=days_old)
    # Keep entries if: recent OR list is still under limit
    recent = [m for m in self.memories if datetime.fromisoformat(m["timestamp"]) > cutoff]
    self.memories = recent if len(recent) >= max_entries else self.memories[-max_entries:]
    self.save()

2. Embedding memories directly in backstories. Works for small volumes, fails at scale. Use retrieval instead: only inject relevant memories, not all of them.

3. No deduplication. Agents save the same finding multiple times. Query memory before saving.

1
2
3
4
5
6
def add_if_new(self, content: str, tags: List[str]):
    """Only save if not a duplicate."""
    for mem in self.memories:
        if mem["content"][:50] == content[:50]:  # Simple duplicate check
            return  # Already have this
    self.add(content, tags)

4. Mixing agent and crew memory. Keep them separate. Agent memory is personal (what this agent learned). Crew memory is shared (what the team discovered).

5. Storing sensitive data. If agents can access tools that fetch customer data, sanitize memories before storing. Don’t persist PII or secrets.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def sanitize(content: str) -> str:
    """Remove PII before storing."""
    import re
    # Remove emails
    content = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[email]', content)
    # Remove phone numbers
    content = re.sub(r'\b\d{3}-\d{3}-\d{4}\b', '[phone]', content)
    return content

agent_memory.add(sanitize(findings), tags=["customer_analysis"])

What’s Next

Memory makes agents persistent and contextual. But persistence creates new problems: how do you debug what an agent remembered? How do you know if it’s using stale information?

The next post covers debugging multi-agent workflows—tracing decisions, inspecting memory reads/writes, and understanding why agents did what they did.

For now: choose the memory pattern that fits your scale. File-based works for single-user scripts. Semantic memory with pruning works for services. Add retrieval to only surface relevant memories, not all of them.


This is part 3 of the CrewAI series. Previous: Part 1: Getting Started, Part 2: Building Custom Tools