Building a Knowledge Mining Engine for customer support

One of the most interesting systems I've built at Kim.cc is what we call the Knowledge Mining Engine — a pipeline that turns a company's messy support history into structured, reusable knowledge.

The problem

Every e-commerce brand has thousands of resolved tickets. Buried in them is the "tribal knowledge" of how the team actually handles refunds, exchanges, lost parcels, and edge cases. New AI agents start from zero. We wanted them to start from everything the team already knows.

The shape of the pipeline

At a high level:

Ingest tickets from helpdesks (Gorgias, Zoho Desk, …)
Cluster them by intent
Summarise each cluster into a draft SOP
Review and publish, then feed SOPs to the agent runtime

mine.py

from dataclasses import dataclass
 
 
@dataclass
class Ticket:
    id: str
    subject: str
    messages: list[str]
    resolution: str
 
 
def mine_sops(tickets: list[Ticket]) -> list["SOP"]:
    clusters = cluster_by_intent(tickets)       # embeddings + HDBSCAN
    sops = []
    for intent, group in clusters.items():
        draft = summarise_resolution(intent, group)  
        sops.append(SOP(intent=intent, steps=draft.steps))
    return sops

Keeping the LLM honest

RAG is only as good as its retrieval. We attach citations to every generated step so a human reviewer can trace a claim back to the tickets it came from:

type SopStep = {
  text: string;
  citations: Array<{ ticketId: string; quote: string }>;
};

Everything is observable through Langfuse, so we can see exactly which prompt, context, and model produced a given SOP — and catch regressions early.

What I'd carry forward

Citations build trust. Reviewers approve far faster when they can verify.
Clustering before summarising keeps prompts focused and cheap.
Observability isn't optional once LLMs are in the critical path.

Building a Knowledge Mining Engine for customer support

The problem

The shape of the pipeline

Keeping the LLM honest

What I'd carry forward

related posts

Shipping a Python CLI developers actually enjoy

Cutting support costs 80% with a CX platform at Airtel