Guides

Task-oriented walkthroughs for the things people actually do with tsumugi: building a collection, training a ranker, serving search, and keeping it fresh.

Each guide is built around a job rather than a flag: turning a crawl into shards, fitting a model over them, standing up a search endpoint, and bringing later crawls in without rebuilding. They assume you have worked through the quick start.

Building a collection Turn a Parquet or JSONL crawl export into a directory of .tsumugi shards, and choose a shard size that fits your corpus. Training a model Fit a LambdaMART ranking model over a collection, and understand the bootstrap label that stands in until real relevance judgments exist. Serving search Stand up a search endpoint over a collection, query it, and understand the routing, the latency budget, and why the merged top-k is exact. Keeping a collection fresh Bring later crawls into a collection with add, and merge accumulated shards back down with compact, without rewriting what is already there.