Answers Generation¶
Answers are direct, concise responses generated by SearchAI using advanced AI models. They are designed to provide users with immediate and most relevant information based on their queries.
Components of an Answer¶
- Answer Text: The generated response addressing the user's question.
- Snippet Reference: A link to the source as a citation for further reading.
Answer Configuration¶
This section is used to configure the type of answers to be presented to the users. There are two types of Answer generation techniques supported:
- Extractive Answers: The topmost chunk retrieved in response to the user query is directly presented to the user as answers. The extractive answers are exact content retrieved from the chunks without any change in text. Provide the following configurations for extractive answers.
- Response Length- This is the expected length of the answer, in tokens.
- Generative Answers: The top chunks retrieved in response to the user query are sent to the configured LLM, which generates a paraphrased answer from the content in the chunks. Integrate LLM and Enable Answer Generation in the Generative AI Tools configuration.
Chunk Configurations
- Token budget for Chunks: This parameter specifies the total number of tokens that can be included in the chunks sent to the LLM for processing. This allows users to fully utilize the LLM’s context-handling capabilities. This has a default value of 20,000 and can take a maximum value of 10,00,000.
The context size of an LLM refers to the maximum number of tokens the model can process in a single interaction. This includes: * Tokens for the prompt, instructions and context information sent to the LLM. * Tokens for the Output, i.e. the response generated by the LLM.
To determine the maximum value of this parameter, subtract the tokens used for both the prompt and the output from the maximum context size of the language model (LLM). For instance, for a 4,096-token context, if the prompt uses 500 tokens and the response uses 500 tokens, the remaining 3096 tokens can be used for sending the chunks. This is the maximum value that this parameter can take. If each chunk is of 500 tokens, top 6 chunks are sent to LLM as context. If, however, a limited number of chunks are to be sent, let's say, top 3, set this field to 1500.
- Enable Document level Processing: Use this configuration to enable sending full documents to the LLM instead of just chunks. This is particularly useful when relevant information is distributed across multiple chunks, and sending only a few may result in incomplete or suboptimal answers. When this option is enabled, Search AI identifies and sends complete documents associated with the most relevant chunks, up to a defined token budget, to the LLM. This ensures richer context and improved response accuracy for complex queries.
- Token budget for Documents: Specifies the maximum number of tokens that can be used when sending document content to the LLM. If the content of a document exceeds the defined limit, only the portion that fits within the token budget will be sent. This setting ensures that LLM context limits are respected while maximizing the amount of useful information provided.
- Valid Range: 1,000 to 100,000 tokens
- Default Value: 50,000 tokens
-
Chunk Order: This configuration sets the order of qualified chunks sent to the LLM in the Prompt. The order of data chunks can affect the context and thereby, the results of a user query. The decision to use a specific chunk order should align with the goals of the task and the nature of the data being processed.
Most to Least Relevant: In this case, the chunks are added in descending order of relevance, i.e., highest relevance to the lowest, followed by the query. For instance, if the top five chunks are to be sent to the LLM, the most relevant chunk is added first, and the least relevant chunk is added at the end.
Least to Most Relevant: In this case, the chunks are added in ascending order of relevance. The least relevant chunk is added first, and the most relevant chunk is at the end, followed by the query.
Feedback Configurations This parameter is used to enable the feedback mechanism in answers. When this is enabled, the web SDK automatically includes the feedback options for the users. The feedback shared by the users is presented as part of Answer Insights analytics.
Select Generative Model Select the model for generating answers. If multiple models are configured, all configured models will be listed.
Answer Prompt Select the prompt to be sent to the model to generate answers.
Temperature This parameter controls the randomness of the output. It affects how deterministic or creative the responses are, and tweaking the temperature can significantly change the generated answers. The lower the temperature, the lesser the randomness.
Response Length This is the expected length of the answer in tokens.