How To Set Your Chunking Strategy

TLDR Table;

Documentation Type	Chunk Size (tokens)	Overlap (tokens)	Typical # Chunks per Doc	Notes
API Reference	300–600	~50	1–10	One chunk per endpoint or function; high precision retrieval required.
How-To Guide / Tutorial	800–1,200	~100–120	2–10	Narrative flow; each chunk should capture one or more sequential steps.
Architecture / Concept Overview	1,200–1,500	~100–150	1–6	Explanatory content; preserve conceptual integrity of entire sections.
Onboarding / Getting Started Guide	500–800	~50–80	1–3	Short, concise docs; balance granularity with simplicity.
FAQ / Troubleshooting	300–500	0–50	1–20	One chunk per Q&A or issue; minimal to no overlap; aim for independent retrieval.

Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by feeding them relevant external documents (chunks) to ground their responses. Chunking – splitting long documents into smaller pieces – is crucial because LLMs have limited context windows. Exceeding this limit causes truncation or failure to process the input. This post focuses on token-based fixed-size chunking using OpenAI’s text-embedding-3-large model for embedding (256 dimensions), applied to English-language SaaS product documentation.

We have assumed the following constraints following OpenAI: each chunk ≤ 4096 tokens, overlap between chunks ≤ 400 tokens, and a maximum of 20 chunks per document.

Within these bounds, we explore optimal chunk sizes and overlaps for different types of SaaS documentation – from API references to onboarding guides – and how documentation structure impacts RAG effectiveness.

Why Chunking Matters: In RAG, how you chunk affects retrieval accuracy. If chunks are too large, they may contain mixed topics and fewer can fit into the prompt. If too small, context may be fragmented across chunks. The goal is to find a sweet spot where each chunk is semantically coherent and self-contained, yet preserves enough context. We will first classify common SaaS documentation structures, then recommend chunking parameters (chunk size, overlap, expected number of chunks) for each, using OpenAI’s token-based approach.

OpenAI’s Token-Based Chunking Approach

OpenAI’s system for file search and retrieval provides built-in chunking with configurable parameters. By default, documents added to an OpenAI vector store are automatically tokenized and split into overlapping chunks: a max_chunk_size_tokens (default 800 tokens) with chunk_overlap_tokens (default 400 tokens).

While one could use 4096 tokens or more for chunk size, in practice it’s usually beneficial to use smaller chunks. Smaller chunks provide finer granularity – the vector database can retrieve a more pinpointed piece of information relevant to a query. That said, very large chunks may dilute the relevance of embeddings, whereas chunks that are too tiny might lack sufficient meaning on their own. A commonly recommended strategy is to start with chunk sizes such that 3–5 chunks can collectively cover an answer within the LLM’s context window. In general, overlaps around 10–20% of the chunk size are a good starting point, balancing continuity against duplication. We will apply these principles in recommending chunk sizes for each documentation type.

Typical SaaS Documentation Structures

SaaS product documentation often falls into several well-defined categories, each with a distinct purpose and writing style. Before diving into chunking strategy, it’s important to recognize these documentation types and their structures:

Onboarding Guides (User Manuals/Getting Started): Introductory guides aimed at helping new users or customers get started quickly with the product. These often walk through initial setup, basic concepts, and first-use scenarios, with an emphasis on reducing friction during onboarding. Such guides are typically written in a step-by-step or tutorial style, covering core features at a high level to ensure the user can achieve a “quick win” early. For instance, the Predictable Dialogs documentation includes an Introduction and Getting Started section for new users. These guides might include screenshots or code snippets (if for developers), but generally focus on basic usage rather than deep dives.
How-To Guides (Tutorials and Use-Case Guides): Detailed, task-oriented articles that teach users how to accomplish specific goals or use particular features of the product. The content is sequential and often interdependent (each step builds on the previous), meaning context continuity is important. (Example: Notion’s documentation provides both video and written tutorials for specific tasks, ensuring users can follow along with a practical example.) Predictable Dialogs’ docs also have a “How-to Guides” section, indicating common tasks or feature implementations are covered there.
API Reference Documentation: If the SaaS product exposes an API or developer SDK, the docs will include a comprehensive API reference. API documentation is usually highly structured and factual, often with repetitive structure for each endpoint (endpoint name, description, parameters, response, examples). The structure often parallels the API’s endpoints – for instance, grouping by resource (Users, Orders, etc.) with sub-pages.
Conceptual or Architecture Overviews: Many SaaS companies provide documentation pages that explain the architecture, design principles, or conceptual underpinnings of the product. Unlike tutorials, they are not step-by-step, but rather explanatory.
FAQs and Troubleshooting Guides: Many SaaS documentation portals include a FAQ section or troubleshooting manual for common issues and their resolutions. FAQs are often short-form and segmented by question, whereas troubleshooting guides might be longer articles grouped by category of issue. These documents might not be narrative, but rather a list of independent entries.
Other Documentation Types: Depending on the SaaS, there may be additional doc types (e.g. Release Notes/Changelog, Compliance Guides, Integration Guides for third-party platforms, SDK documentation, etc.), but the ones above cover the most typical structure for user-facing product docs. It’s common for documentation sites to be organized into main categories like “Get Started”, “How-To Guides”, “Reference (API)”, “Concepts/Architecture”, and “FAQ/Troubleshooting”. For instance, the Predictable Dialogs docs menu includes Introduction, Getting Started, How-to Guides, Reference, FAQs, etc., reflecting this kind of structure.

Understanding these documentation structures is important because each type has different content length, style, and user intent. Thus, the optimal chunking strategy can vary by doc type. Next, we detail recommended chunk sizes and overlaps for each category, and why those choices support effective retrieval in a RAG context.

Recommended Chunk Sizes and Overlaps by Documentation Type

Selecting chunk size and overlap is about balancing context coherence with retrieval granularity. (Note: 1000 tokens roughly equates to ~750 words of English text.)

1. API Reference Documentation

Users typically search these docs for specific keywords (e.g., an error code or parameter name).

Recommended Chunk Size: 300–600 tokens per chunk, corresponding roughly to a single endpoint’s documentation or a sub-section of an endpoint. Since each endpoint’s info is usually self-contained, we want chunks fine-grained enough that a query about a specific endpoint maps to that endpoint’s chunk and not unrelated content. Smaller chunks (a few hundred tokens) are justified because precise information retrieval is the priority for API docs. In fact, research suggests that for exact fact lookup (like retrieving an API field description), chunk sizes in the range of 256–512 tokens can improve search accuracy. If an API page lists multiple endpoints, the chunking algorithm should ideally split at logical boundaries, but with fixed-size token chunking, it may cut mid-page. Keeping chunks ~500 tokens makes it more likely each chunk contains one endpoint’s info.

Overlap: 50 tokens overlap between consecutive chunks. This is about 10%–15% of chunk size in this case, enough to include maybe a function signature or last sentence of the previous chunk. For API docs, large overlaps aren’t usually necessary because each section is quite independent. Around 10% overlap has been cited as a reasonable starting point, which in this case is ~50 tokens.

Expected # of Chunks: If each API reference page covers one endpoint or a small set of endpoints, it might be 1–3 chunks per page. For example, a single endpoint documentation (~300 words plus examples) might fit in one chunk. A longer reference page covering multiple methods could end up with, say, 5–10 chunks (if the page is ~2500 tokens total). It’s uncommon for a single API reference page to exceed the 20-chunk limit unless it’s an entire API reference in one page (which most providers avoid). If an API spec were extremely large, one should consider splitting it into multiple documents by section to stay under ~20 chunks each.

Rationale: The goal is that each chunk corresponds to a specific callable unit (endpoint or function) or a coherent piece of the reference. That way, when a developer asks something like “What’s the parameter for updating a customer’s email?”, the retrieval is likely to return the chunk containing the “Update Customer” API endpoint details specifically, rather than a giant chunk with multiple endpoints. Fine-grained chunks also reduce the chance of irrelevant text (from other endpoints) being included in the answer. Our recommendation aligns with the notion that dense technical content benefits from smaller, focused chunks. We also ensure that even if multiple chunks are returned (e.g., if a query is broad), their total tokens can comfortably fit in the LLM’s context. For instance, GPT-3.5 could handle 3–5 chunks of ~500 tokens each (~1500–2500 tokens total) within its 4096-token window, and GPT-4 could handle more.

2. How-To Guides and Tutorials

Structure & Content: How-to guides are narrative, describing procedures or workflows. They typically include several steps or sections (with headings like Step 1, Step 2, or descriptive subheaders) and may contain explanatory text, screenshots, or code snippets. The writing is sequential, often assuming the reader will follow from start to finish. Context from earlier steps is often needed to understand later steps (e.g., you set up something in step 1 that is used in step 3). These guides can range from a few hundred words to several thousand, depending on complexity.

Recommended Chunk Size: 800–1,200 tokens per chunk. This larger chunk size (compared to API refs) is recommended because we want to preserve more of the narrative context within a single chunk. Overly small chunks could break the flow mid-step or mid-explanation, harming coherence. A chunk on the order of ~1000 tokens (approximately 750 words) might encompass, for example, 1–2 major steps of a tutorial or one section of a how-to guide. This is still within the model context limits, but large enough to hold a meaningful portion of the tutorial. Broader context is beneficial for generative tasks like summarizing or explaining a multi-step process, and while retrieval here is used for Q&A, users’ questions about a tutorial may require understanding the interplay of several steps. So, we lean towards moderately large chunks. Moreover, advanced models with bigger windows can handle these easily – for instance, GPT-4 (8k) can take a few chunks of ~1000 tokens each without issue, and GPT-4 32k could take even more. In fact, using larger chunks (1,500+ tokens) can minimize the number of retrievals needed, as long as the chunk remains on a single subtopic. We’ve capped our suggestion at ~1200 tokens so that even GPT-3.5 (4k) could accept top-3 chunks (~3600 tokens total) if needed.

Overlap: 100–120 tokens overlap between chunks (around 10% of chunk size). With narrative guides, it’s important to have some overlap so that if a user’s query is about something straddling two sections (say, the end of one step and the beginning of the next), the relevant details are not isolated in separate chunks. An overlap of ~100 tokens (approximately a paragraph of text) can ensure continuity – for example, the end of “Step 2” will also appear at the start of the chunk that contains “Step 3”. This helps the retriever and LLM see the transition and maintain context. We stay in the 10% range to avoid too much redundancy. Given OpenAI’s system permits up to 400 token overlap, our suggestion is safely below that and in line with general guidance (10–20% overlap).

Expected # of Chunks: For a typical how-to article of, say, 1500–3000 tokens (which might be 2–4 pages of text with images in a web doc), we expect 2–5 chunks. Short tutorials (one pager quick guides) might only need 1 or 2 chunks. Longer, in-depth tutorials (multi-part guides on one page) could creep up to ~8–10 chunks if, for example, the guide is ~8000 tokens (which is quite long). If a tutorial page would exceed 20 chunks at ~1000 tokens each (i.e. > ~20k tokens of text), that’s a sign the documentation should perhaps be split into multi-part guides or separate pages per major section, both for human readability and RAG utility. In practice, most SaaS how-to guides wouldn’t hit that extreme on a single page; they’d break it into subsections or pages.

Rationale: The chosen chunk size aims to capture a complete thought or step grouping in one chunk. If a user asks a question about a procedure (“How do I configure X in step 3 of the guide?”), the system should retrieve the chunk covering step 3 in full, not pieces of it split across chunks. By using ~1000-token chunks, we increase the chance that each chunk corresponds to at least one entire step or a coherent section. This size is still well below the 4096-token limit, leaving room for multiple chunks to be combined if needed. Overlap ensures no step is cut off awkwardly, preserving context continuity – important because tutorials have a narrative logic. This approach echoes the principle: for tasks needing broader context (like understanding a whole procedure), larger chunks (up to a couple thousand tokens) are beneficial, whereas for pinpoint facts we use smaller chunks. We strike a balance to support both good retrieval and completeness of information in each chunk.

3. Architecture and Concept Overview Documents

Structure & Content: Architecture or concept docs provide high-level explanations of systems or features. They often have several thematic sections (e.g., “System Components”, “Data Flow”, “Security Architecture”, etc.). The text is expository and not necessarily linear like a tutorial; readers might jump to the section they’re interested in. These documents can vary in length: some are short blog-post-like overviews (500–1000 words), while others can be extensive technical papers (multiple thousands of words). They may include diagrams or tables but largely rely on descriptive text. Because these docs aim to convey a big-picture understanding, the contextual coherence across sections is important. Reading one section without another might give partial understanding; however, each section might also be somewhat self-contained on a subtopic.

Recommended Chunk Size: 1,200–1,500 tokens per chunk (and potentially up to ~2,000 for very large overviews). We recommend relatively larger chunks here so that each chunk can encompass an entire section or subtopic of the document, capturing the breadth of explanation. For example, if an architecture doc has a section "Component X and its Role", it might be best to keep that whole section in one chunk if possible, rather than splitting it, so that all context about Component X is together for retrieval. Large language models (especially GPT-4) can handle these larger chunks easily, and indeed it’s noted that for models with big windows (like GPT-4 32k), using chunks on the order of 1500+ tokens is advisable to reduce the number of retrieval calls. Even for GPT-3.5 (4k context), two such chunks (~1500 tokens each, ~3000 total) can be processed simultaneously without overflow, which is often enough to cover answers for conceptual questions. The embedding model (text-embedding-3-large) can embed 1500 tokens without issue (embedding models can generally embed up to the same or more tokens than the context limit of the target model, e.g., ada-002 can embed ~8191 tokens).

Overlap: 100–150 tokens overlap. This overlap (~10% of chunk) is to ensure continuity between sections, but we might not need very large overlaps if sections are fairly distinct topics. A moderate overlap will include the end of one section in the chunk for the next section, which is useful if a concept is explained at the boundary of two sections. For instance, if Section A references something that Section B continues discussing, an overlap helps the chunk covering Section B still have that reference context. We avoid going too high (no 300-400 overlap here) to minimize redundancy – concept documents often already revisit key points, and too much overlap could just duplicate paragraphs across chunks. About a paragraph’s worth of overlap (100 tokens or so) should suffice to connect the narrative between chunks.

Expected # of Chunks: Many architecture overview pages might be 1–3 chunks in total, since some are short enough to fit entirely in one chunk. If an overview is say ~2000 tokens, that could be just 2 chunks of ~1000 each (or 1 chunk of 2000 if one chose to not split further). If it’s a long form doc, e.g. 6000 tokens (around 4500 words), splitting into ~1500-token chunks yields ~4 chunks. It would be unusual for a single conceptual article to exceed 20 chunks (that would imply an extremely large doc ~ >20k tokens of text, basically a small book or extensive whitepaper). If faced with such a large doc, it might be better treated as multiple docs or chapters for RAG purposes. Generally, expect only a few chunks per concept/architecture doc.

Rationale: We treat these documents almost like chapters in a textbook – each chunk should ideally correspond to one chapter/section. The user’s query might be, for example, “How does the system ensure security in its architecture?” – ideally, the chunk covering the “Security” section of the architecture doc would be retrieved. If chunks were too small, say 500 tokens, the information might be scattered and require retrieving many pieces (risking missing some or overflowing the context with too many chunks). Larger chunks ensure the answer likely resides in one or two chunks at most. We recommend pushing toward the higher end of chunk size for these (though still within OpenAI’s 4096 token limit). This approach leverages the LLM’s ability to handle larger contexts: for broad explanatory content, larger chunks (e.g. 1500+ tokens) help preserve semantic integrity so the model sees the full explanation in one go. It’s also more cost-effective to embed a few large chunks than many small ones for a big document, although quality is the primary concern here. Overlap is kept moderate just to ensure no key sentence at a section boundary is lost; since these sections are often conceptually distinct, heavy overlap isn’t needed.

4. Onboarding Guides (Getting Started Manuals)

Structure & Content: Onboarding guides or “Getting Started” manuals are short guides intended for new users to quickly learn the basics of using the product. They usually cover installation/setup and a simple walkthrough of core features. The tone is introductory and concise. These guides are often relatively short compared to tutorials – because the idea is not to overwhelm new users but give them a quick ramp-up. They might combine text and a few images or simple examples. In some cases, this could even be a single-page checklist or a few pages covering different aspects of first use.

Recommended Chunk Size: 500–800 tokens per chunk. We choose a chunk size somewhat in between the API docs and the full how-to guides. Onboarding docs are often not very long; many can be under 1500 tokens total (a couple of chunks). We still want each chunk to be coherent and possibly cover one section of onboarding (e.g., “Account Setup” could be one chunk, “First Project” another chunk). ~600 tokens (around 450 words) might capture a small section or a couple of paragraphs of instructions. Because these guides typically emphasize brevity and clarity, keeping chunks on the smaller side ensures we aren’t mixing too many topics per chunk. It also aligns with precise Q&A – a new user might ask a pointed question like “Where do I find the API key in setup?” and we’d want the chunk that covers the setup steps to be retrieved without including unrelated fluff. That said, onboarding docs can have a slight narrative (step 1, step 2), so we don’t go as low as an API chunk; we allow a few hundred tokens so the chunk carries context of a couple of steps if needed.

Overlap: 50–80 tokens overlap. Similar to the reasoning in tutorials, a bit of overlap helps if the getting-started instructions are sequential. However, since the content is usually concise and broken into clearly delineated steps or sections, a smaller overlap (~50 tokens, maybe up to one short paragraph) is generally sufficient. For example, the end of the “Installation” section might overlap into the chunk that begins “Configuration” to remind the model what was just set up. Given the short nature of these docs, overlap mainly guards against any loss of context at boundaries. We keep it ~10% or less of chunk size here.

Expected # of Chunks: Likely 1–3 chunks for a typical onboarding guide. Many SaaS quickstart pages would fit in one chunk (if under ~500 tokens of actual text). If there are multiple sections (installation, configuration, first use), maybe two or three chunks. It would be unusual for a “getting started” tutorial to be very long – if it is, it might be broken into a small series. With our chunk size range, it’s easy to stay under 20 chunks; you’d need an extremely verbose onboarding doc for that to be an issue (which would defeat the purpose of it being a quickstart).

Rationale: New user guides emphasize clarity and brevity, so our chunking strategy ensures that each chunk is tightly focused on a small part of the process. By keeping chunks around 500–800 tokens, we increase the likelihood that a direct question (“How do I sign up?” or “What’s the next step after creating an account?”) will retrieve a very relevant small chunk (perhaps the one covering account creation or whatever pertains). In retrieval, smaller chunks mean less chance of including irrelevant text that could confuse the answer. Also, the types of questions new users ask are often fairly targeted, so high granularity is beneficial. On the other hand, we don’t go too small (like one sentence per chunk) because some minimal context (the surrounding few sentences) helps the LLM formulate a useful answer. Our overlap choice again is to avoid breaking any mini-narratives. This strategy aligns with general advice: for straightforward informational tasks (finding a specific piece of info), smaller chunks ~512 tokens work well; onboarding guides typically fall into that pattern, being more about facts and direct instructions than lengthy discourse.

5. FAQs and Troubleshooting Articles

(While not explicitly listed in the question’s examples, we include this type as it’s common in SaaS docs and illustrates a distinct strategy.)

Structure & Content: FAQ pages are usually formatted as question-answer pairs, each relatively short (a few sentences to a paragraph answer). Troubleshooting guides might list issues and their solutions, or a step-by-step diagnostic procedure for common problems. The structure is often a list or a set of independent sections for each question or issue. Users typically search these by keywords related to their problem or question. They are less likely to read an entire FAQ sequentially; instead they jump to the relevant Q/A.

Recommended Chunk Size: 300–500 tokens per chunk, aligned roughly with one Q&A pair or one troubleshooting issue+solution. Ideally, we want to keep each Q&A as a single chunk if possible, so that the question and its answer are embedded together. If the FAQ answers are extremely brief (one or two sentences), one might even group a couple of Q&As in a chunk, but generally it’s safer to isolate them to maximize precision. For longer troubleshooting articles that have sections, chunk similarly to a how-to (maybe slightly larger chunks if each issue’s solution is a few paragraphs). But typically, each item in an FAQ is short. Using ~400 token chunks ensures we can capture the entire context of one question’s answer fully.

Overlap: Minimal overlap (e.g., 0 to 50 tokens). In an FAQ list, entries are independent. We don’t really need overlap between chunks since question 1 has nothing to do with question 2 in most cases. In fact, to avoid blending content, it might be best to cut cleanly at the end of each Q&A. We might include a tiny overlap just to include the question text at the top of the answer’s chunk if the splitting algorithm didn’t align perfectly, but in principle, one can set overlap = 0 for clearly separated sections. (OpenAI’s guideline of non-negative overlap allows 0 overlap.) For troubleshooting guides that are narrative (like a procedure to diagnose issues), treat overlap as in a tutorial (~10%), but for distinct Q&As, no overlap is fine.

Expected # of Chunks: If each question is one chunk and there are, say, 10 FAQs, that’s 10 chunks. Many FAQ pages will easily stay under 20 chunks. If a knowledge-base article had dozens of questions, it might be better split into multiple pages by topic. Troubleshooting guides might be longer, but typically each issue remains a manageable section.

Rationale: The main aim here is high precision retrieval. When a user asks a question that matches a known FAQ, we want the exact Q&A to be retrieved. That is best achieved by keeping that Q&A in one chunk and not polluted by other content. Small chunks (similar reasoning as API ref) provide that granularity. We avoid overlap to not confuse the boundaries between answers. The result is that the embedding for that chunk is very specifically about the question and answer, leading to a strong match when the user’s query is similar. This strategy mirrors the idea of indexing sentence or paragraph-level chunks for Q&A tasks, which improves relevance at the cost of possibly needing to fetch multiple chunks if the answer is long – but in FAQ, answers are short anyway. It’s worth noting that OpenAI’s retrieval system itself might rank chunks by relevance, so separate chunks per question ensures irrelevant ones aren’t pulled in just because they were contiguous in a larger chunk.

Summary of Chunking Strategies by Documentation Type

The table below summarizes recommended fixed-size chunking parameters for different documentation types, along with real-world examples of each type from SaaS companies. These strategies assume OpenAI’s tokenization (using tools like tiktoken to count tokens) and adhere to the model limits (4096 token chunks, ≤400 token overlaps, ≤20 chunks per doc). Actual chunk counts will vary by document length, but typical ranges are given.

Documentation Type (Example)	Chunk Size (tokens)	Overlap (tokens)	Typical # Chunks (per document)
API Reference – e.g. Stripe or Twilio API docs (endpoint reference page)	300–600 tokens (~250–450 words)	~50 tokens overlap (~10%)	1–3 chunks for a single endpoint page; up to ~10 if page contains multiple endpoints.
How-To Guide / Tutorial – e.g. “How to implement SSO” guide, Notion or AWS how-to tutorials	800–1,200 tokens (~600–900 words)	~100 tokens overlap (~10%)	2–5 chunks for a typical guide; possibly up to ~8–10 for very long tutorials.
Architecture/Concept Overview – e.g. system architecture explainer, conceptual whitepaper	1,200–1,500 tokens (~900–1100 words)	~150 tokens overlap (~10%)	1–3 chunks for most overviews; ~4–6 chunks for extensive technical papers.
Onboarding / Getting Started – e.g. Quickstart page, initial setup guide (Predictable Dialogs “Getting Started”)	500–800 tokens (~375–600 words)	~50–80 tokens overlap (~10%)	1–3 chunks (usually short docs); rarely more than 5 chunks.
FAQ / Troubleshooting – e.g. FAQ page with Q&A pairs, common issues (Palo Alto Networks troubleshooting guide)	300–500 tokens per Q&A (~1 short question + answer)	0–50 tokens overlap (minimal; often none)	1 chunk per Q&A item; e.g. ~10 chunks for 10 FAQs (independent).

Table: Recommended fixed-size chunking parameters for various SaaS documentation types and structures. The examples illustrate typical use cases from real SaaS docs (Stripe, Notion, AWS, Predictable Dialogs, Palo Alto, etc.). Chunk sizes are given in tokens (approximate word counts in italics), and overlaps in tokens. The typical number of chunks assumes a single document page; entire documentation sections will of course consist of many such chunks across pages.

Improving Documentation Structure for RAG (Optional Recommendations)

Beyond choosing chunk sizes, SaaS providers can optimize how they author and structure documentation to make it more RAG-friendly. Here are some recommendations to improve documentation for better chunking and retrieval outcomes:

Write Self-Contained Sections: Structure documentation pages so that each section (under a heading) encapsulates a single concept or task. This way, even when chunked, each chunk maps to a coherent topic. As Pinecone’s guidance suggests, if a piece of text makes sense on its own to a human, it will be meaningful to the model too. For example, avoid having one paragraph cover two unrelated topics; instead, use separate paragraphs or sections.
Limit Document Length: Avoid extremely long single pages that cover too many topics. It’s often better to break documentation into multiple pages or sub-pages by topic. This not only helps readers but also ensures that when chunking, you won’t end up with dozens of chunks for one page. Remember that by default only ~20 chunks from a document can be retrieved – if you have a 30-chunk document, parts of it might never be considered. So, if a guide is very large, consider splitting it (e.g., a multi-part tutorial series rather than one monster article).
Use Descriptive Headings and Metadata: Clear section headings (e.g., “Step 3: Configure API Keys” or “Troubleshooting: Database Connection”) can be leveraged in or alongside chunks. Even if the chunk splitter doesn’t explicitly use headings, those headings will appear in the text, improving keyword hits. You can also attach metadata to chunks (like the section title or category) in some advanced retrieval systems to filter or boost relevant chunks. Well-structured docs with obvious cues (titles, keywords) help the retriever match queries to the right chunk.
Avoid Cross-Dependencies: Try not to require the reader (or the AI) to have information from an entirely different page to understand the current one. While some cross-referencing is inevitable, each page should have enough context to stand alone for a given topic. In RAG, the user’s query triggers retrieval from possibly multiple documents, but there’s no guarantee chunks from two separate pages will both be pulled in unless the query strongly suggests both. By making pages more self-contained, you increase the likelihood that one of its chunks can fully answer a relevant question by itself. If context from elsewhere is needed, consider adding a brief recap or link (which the RAG system could follow if allowed).
Use Consistent Formatting for Repeated Structures: If your documentation has many similarly structured items (like many API endpoints or many FAQs), consistency helps the chunking algorithm and embeddings. For example, start each API endpoint description with a standard template or each FAQ answer by restating the question. This uniformity means each chunk of that type has a predictable format, which can improve how the embedding model represents them and how the search finds them. (However, ensure each chunk still contains unique descriptive keywords to differentiate them.)
Consider Summaries for Long Sections: If a documentation section is long or complex, consider providing a summary at the end or beginning. In a chunking scenario, a summary might either end up in the same chunk or its own chunk, but either way it provides a concise statement of key points that might match user queries. Summaries can be especially useful if a query is broad, as the summary chunk might be retrieved to give a high-level answer.
Test and Iterate with Real Queries: Improving docs for RAG is an ongoing process. Use a sample of user queries (from your support tickets or forum questions) to see how well the retrieval picks up answers from your documentation chunks. If certain queries fail, examine the chunks – maybe the docs need to explicitly mention some terms or the chunking needs adjustment (e.g., maybe an overlap was too small to capture context). Adjust documentation wording or structure where needed to fill those gaps. Essentially, treat RAG retrieval relevance as another quality metric for your docs.

By following these practices, SaaS companies can make their documentation more “AI-ready”. Well-structured, well-chunked documentation not only benefits an AI chatbot’s ability to find correct answers but also improves human readability – a double win. The underlying theme is clarity and modularity: write docs in clear modules that can be taken in isolation or in sequence as needed. This aligns perfectly with the chunking strategies we’ve outlined, ensuring that each chunk delivers a focused piece of knowledge to both end-users and AI systems.

Conclusion

Chunking is a pivotal component of building effective RAG systems on top of documentation. Using OpenAI’s token-based chunking with appropriate chunk sizes and overlaps ensures that the text-embedding-3-large model can accurately represent each piece of documentation in the vector space, and that GPT-based assistants can retrieve and use these pieces to answer user questions. We examined various types of SaaS documentation – onboarding guides, how-to tutorials, API references, architecture overviews, and FAQs – and found that each benefits from a tailored chunking approach.

In summary, short, discrete content (like API entries or FAQs) favors smaller chunks for precise retrieval, whereas longer, narrative or conceptual content (tutorials, architecture explainers) can use larger chunks to preserve context. Overlaps of around 10% are generally effective to maintain continuity without introducing too much redundancy. Always keep in mind the limits: chunks must remain within the model’s token capacity (4096 in our focus scenario) and it’s wise not to approach that hard limit except for truly self-contained large sections, because you may want to retrieve multiple chunks concurrently.

By aligning chunking strategy with documentation structure, we ensure that each chunk is a meaningful, standalone unit of information – whether it's a single Q&A or a full explanation of a concept. This alignment is evidenced by how real SaaS docs are organized: for example, Stripe's API docs are naturally segmented in a way that fits chunking well, and Notion's task-based tutorials group content in user-friendly modules. Our table of recommended strategies provides a quick reference for chunking new documents based on their type. Implementers should treat these as starting guidelines and then iterate – measure retrieval performance, adjust chunk size or overlap if needed, perhaps using the model to test different configurations (some practitioners even use automated tools to find the "sweet spot" for chunk size per corpus).

Ultimately, combining good chunking practices with well-structured documentation yields a robust knowledge base for RAG-driven search and chatbots. It means users (or AI) can ask a question and get an accurate, context-rich answer sourced from the docs, enhancing the support experience. As OpenAI's own documentation pipeline illustrates, the interplay of chunk size, overlap, and document organization directly influences relevance. With the strategies outlined here, one can confidently prepare SaaS documentation to work harmoniously with state-of-the-art LLMs and embedding models, unlocking more reliable and efficient knowledge retrieval.

Frequently Asked Questions

What is the optimal chunk size for API reference documentation? Use 300-600 tokens per chunk for API reference docs. This size allows each chunk to contain information for a single endpoint or function, enabling precise retrieval when users query specific API details. Smaller chunks work better for exact fact lookup scenarios.

How much overlap should I use between chunks? Generally use 10-20% overlap relative to chunk size. For example, 50 tokens overlap for 300-500 token chunks, or 100-150 tokens overlap for 1000+ token chunks. This balances continuity with avoiding excessive redundancy.

Why do tutorial documents need larger chunks than API docs? Tutorials have narrative flow where steps build upon each other. Larger chunks (800-1200 tokens) preserve the contextual relationships between sequential steps, ensuring users get complete procedural information rather than fragmented instructions.

What's the difference between chunking FAQs vs tutorials? FAQs should use minimal overlap (0-50 tokens) since each Q&A is independent, while tutorials need more overlap (~100 tokens) to maintain narrative continuity. FAQ chunks should isolate individual questions for precise retrieval.

How do I determine if my document should be split across multiple pages? If a single document would require more than 20 chunks (roughly 20,000+ tokens), consider splitting it into separate pages. This prevents exceeding OpenAI's retrieval limits and improves both human readability and AI comprehension.

Should I use the same chunking strategy for all document types? No. Different document types serve different purposes and user intents. API docs need precision (small chunks), while architecture overviews need context preservation (large chunks). Match your chunking strategy to the document type and expected queries.

How does chunk size affect embedding quality? Very small chunks may lack sufficient context for meaningful embeddings, while very large chunks may contain mixed topics that dilute relevance. The goal is chunks that are semantically coherent and self-contained within the 4096 token limit.

What token counting should I use for OpenAI systems? Use OpenAI's tiktoken library to count tokens accurately according to their tokenization. This ensures your chunks align with OpenAI's actual token limits and billing calculations, avoiding truncation or unexpected costs.

How can I make my documentation more RAG-friendly? Write self-contained sections under clear headings, limit document length, use consistent formatting for similar structures, avoid cross-dependencies between pages, and include descriptive metadata. These practices improve both chunking and retrieval quality.

What's the impact of document structure on chunking effectiveness? Well-structured documents with clear headings and logical sections chunk more effectively. When fixed-size chunking cuts across sections, good structure ensures each chunk still contains coherent information rather than mixing unrelated topics.

How do I handle code examples in chunked documentation? Try to keep complete code examples within single chunks when possible. For API docs, include both the function signature and example in the same chunk. For tutorials, ensure code snippets stay with their explanatory text to maintain context.

Can I adjust chunking parameters after initial implementation? Yes, you should iterate and test different chunk sizes based on retrieval performance with real user queries. Monitor which chunks are being retrieved and whether they contain relevant, complete information to answer user questions effectively.