Skip to main content
SessionSummarizer (sync) and AsyncSessionSummarizer (async) solve the context-window problem for long user sessions. Instead of accumulating hundreds of raw actions that grow unboundedly, the summarizer condenses them into a compact prose paragraph every N actions — keeping your RAG context small and retrieval meaningful.

How it works

actions 1-10  →  your LLM  →  "User explored the dashboard, filtered reports by date..."  →  on_summary
actions 11-20 →  your LLM  →  "User exported a CSV and opened billing settings..."        →  on_summary
actions 21-30 →  your LLM  →  "User invited a teammate and updated their profile..."      →  on_summary
Each group of N actions is summarised independently. The on_summary callback fires after each summarisation — wire it to your vector store, database, or RagPipeline.

Sync usage

import openai
from autoplay_sdk import ConnectorClient
from autoplay_sdk.summarizer import SessionSummarizer

openai_client = openai.OpenAI()

def my_llm(prompt: str) -> str:
    return openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    ).choices[0].message.content

def store_summary(session_id: str, summary: str) -> None:
    print(f"[{session_id}] {summary}")
    # save to DB, vector store, etc.

summarizer = SessionSummarizer(
    llm=my_llm,
    threshold=10,            # fire after 10 accumulated actions
    on_summary=store_summary,
)

ConnectorClient(url=URL, token=TOKEN) \
    .on_actions(summarizer.add) \
    .run()

Async usage

from autoplay_sdk import AsyncConnectorClient
from autoplay_sdk.summarizer import AsyncSessionSummarizer

async def my_llm(prompt: str) -> str:
    r = await async_openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )
    return r.choices[0].message.content

async def store_summary(session_id: str, summary: str) -> None:
    await db.upsert(session_id, summary)

summarizer = AsyncSessionSummarizer(
    llm=my_llm,
    threshold=10,
    on_summary=store_summary,
)

async with AsyncConnectorClient(url=URL, token=TOKEN) as client:
    client.on_actions(summarizer.add)
    await client.run()

Custom prompt

By default the summarizer uses a built-in prompt optimised for RAG pipelines. You can override it with your own — just include {actions} as the placeholder:
MY_PROMPT = """
You are a support assistant. Summarise what this user was trying to do
so a support agent can quickly understand their context.

Actions:
{actions}

Summary:
"""

summarizer = SessionSummarizer(
    llm=my_llm,
    threshold=10,
    prompt=MY_PROMPT,
    on_summary=store_summary,
)
Default prompt:
You are summarising a user's in-app session for a RAG pipeline.
Your summary will be embedded and stored in a vector database to provide
context for a chatbot or AI assistant.

Write a concise 2-3 sentence summary of what the user did, focusing on:
- Which features or pages they visited
- What actions they performed
- Any clear intent or goal you can infer

Actions:
{actions}

Summary:

Get partial context

Read what’s been accumulated for a session before the threshold is reached:
# Returns a formatted action list string (not yet summarised)
context = summarizer.get_context("ps_abc123")
print(context)
# "1. Viewed Dashboard — https://app.example.com/dashboard
#  2. Clicked Export CSV button — ..."
Useful for including real-time context in a chatbot response before the full summary fires.

Composing with RagPipeline

The most common pattern — attach the summarizer to a RagPipeline so summaries are automatically embedded and upserted:
from autoplay_sdk.rag import RagPipeline
from autoplay_sdk.summarizer import SessionSummarizer

summarizer = SessionSummarizer(llm=my_llm, threshold=10)

pipeline = RagPipeline(
    embed=embed_fn,
    upsert=upsert_fn,
    summarizer=summarizer,
)

ConnectorClient(url=URL, token=TOKEN) \
    .on_actions(pipeline.on_actions) \
    .on_summary(pipeline.on_summary) \
    .run()

Constructor

SessionSummarizer(llm, threshold=10, prompt=None, on_summary=None)

llm
Callable[[str], str]
required
Any synchronous LLM callable. Receives the formatted prompt and must return the summary as a plain string.
threshold
int
default:"10"
Number of individual actions (not batches) to accumulate before triggering summarisation. A batch of 3 actions counts as 3 toward the threshold.
prompt
str | None
default:"None"
Custom prompt template. Must contain {actions} as a placeholder. Uses the built-in default if not provided.
on_summary
Callable[[str, str], None] | None
default:"None"
Called after each summarisation with (session_id, summary_text). Can also be set after construction by assigning to summarizer.on_summary.

AsyncSessionSummarizer(llm, threshold=10, prompt=None, on_summary=None)

Same parameters but llm is async (prompt: str) -> str and on_summary can be sync or async.

API reference

Method / PropertyDescription
.add(payload)Receive an actions batch — wire to on_actions
.get_context(session_id)Return accumulated (not-yet-summarised) actions as text
.reset(session_id)Clear a session’s history without summarising
.active_sessionsList of session IDs with pending actions
.on_summaryAssignable callback (session_id, summary) -> None