Skip to content

Ingest Data API

This API allows you to ingest and index data into the SearchAI application. You can directly ingest structured data as chunk fields using the API or ingest an uploaded document.

Ingesting Documents

  • To ingest content from a file, use the Upload File API to upload your file to the application.
  • After uploading, include the fileId from the Upload File API response in the Ingest API to process the file content.
  • Currently, only uploading of PDF, docx, ppt or txt is supported. If any other type of file is sent for ingestion, the API throws error.

Ingesting Strucutured Data

  • To ingest structured data, add the content to the body of the request object in the API. Ensure that the data corresponds to the Chunk Fields listed in the table below.
  • File Structure: The JSON file must adhere to a specific structure for SearchAI to interpret the data correctly:
    • The file name is used as the recordTitle.
    • The JSON file should consist of an array of objects, where each object represents a chunk of data.
    • The fields in each chunk must correspond to the chunk fields listed in the table below.

Supported Chunk Fields

Field name Description Mandatory
chunkText This is the content that will be used to render the final answer to the user for extractive answers and will be sent to the LLM for generative answers. Yes
recordUrl This is the URL used to generate user references. References explain where the content was originally sourced from. Yes
sourceACL This field stores the list of user identities that have access to the information stored in this chunk No
sourceUrl This is the URL of the primary source. For example, for content from the Kore website, if recordUrl: www.kore.ai/products, set sourceUrl: www.kore.ai. If this is empty, it is set to the same value as the recordUrl. No
chunkMeta This field can be used to store any metadata associated with the chunk. This can be used to further process the content in the application, generate embeddings etc. No
chunkTitle This is the title that will be used to render the final answer to the user for extractive answers and it will be sent to the LLM for generative answers. No
cfa1 This custom field of type array is available for users to use according to their requirements. No
cfa2 This custom field of type array is available for users to use according to their requirements. No
cfa3 This custom field of type array is available for users to use according to their requirements. No
cfa4 This custom field of type array is available for users to use according to their requirements. No
cfa5 This custom field of type array is available for users to use according to their requirements. No
cfs1 This custom field of type string is available for users to use according to their requirements. No
cfs2 This custom field of type string is available for users to use according to their requirements. No
cfs3 This custom field of type string is available for users to use according to their requirements. No
cfs4 This custom field of type string is available for users to use according to their requirements. No
cfs5 This custom field of type string is available for users to use according to their requirements. No

API Specifications

Method POST
Endpoint https://{{host}}/api/public/bot/:botId/ingest-data
Content Type application/json
Authorization auth: {{JWT}}

See How to generate the JWT Token.

API Scope
  • Ingest data

Query Parameters

PARAMETER REQUIRED DESCRIPTION
host Required The environment URL. For example, https://platform.kore.ai
Bot ID Required Unique Identifier of your application. Bot ID corresponds to the appID for your application. To view your App ID, go to the Dev Tools under App Settings. You can view the AppID under the API scopes.

Request Parameters

PARAMETER REQUIRED DESCRIPTION
sourceName Yes SourceName is mandatory. If the given name does not exist then a new source is created automatically.
sourceType Yes This can take the following values:
  • “json” - to upload structured data in the form of chunk fields , sent via the request object. When sourceType is json, even if file ID is present it will not be considered
  • “file” - to upload documents based on file ID. When sourceType is file only file ID is considered. If chunk payload is present it will be ignored.
documents Yes Depending upon the value of the sourceType, this field can either be used to pass the chunks fields in JSON format or it can be used to pass the reference of the file containing the chunk fields in JSON format. For ingesting chunks directly, use the following format.

“sourceName”: “Abc”, \ “sourceType” : “json”,

"documents": [

{

"title": "Cybersecurity",

"chunks": [

{

"chunkText": "Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks. With the rise of cyber threats like ransomware and data breaches, cybersecurity has become a critical concern for businesses and governments worldwide.",

"recordUrl": "https://www.cybersafe.com/",

"chunkTitle": "The Importance of Cybersecurity",

"chunkMeta": {

"Role": "Dev"

}

}

]

Note that the fields inside the chunks object should correspond to the chunk fields. To view the chunk fields, refer to the Chunk Browser.

For ingesting content from a file, pass the following information in this field.

“sourceName”: “Abc”, “sourceType” : “file”, “documents”: [

{

"fileId": "f12455"

}

]

where, fileId is the unique identifier of the uploaded file.

Use Upload File API to upload the file to the application. This API will return the fileId in response which should be used in the Ingest API to ingest and index content of the file.

Sample Request

curl --location 'https://{{your-instance}}/api/public/bot/st-44xxxxxxxxxxxxxxxd8f39e4/ingest-data' \
--header 'auth: eyJhbGciOxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxYzkwLTMxYTAtNWJlZS1iZWI5LTJmMGZhZTg4NWMzZCJ9.lLIkckd3mQuP-glk9YVj-wXYE-8wGRlTaTHmZshaGdE' \
--header 'Content-Type: application/json' \
--data '{
    "sourceName": "JsonSource",
    "sourceType": "json",
    "documents": [
        {
            "title": "Cybersecurity",
            "chunks": [
                {
                    "chunkText": "Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks. With the rise of cyber threats like ransomware and data breaches, cybersecurity has become a critical concern for businesses and governments worldwide.",
    "recordUrl": "https://www.cybersafe.com/",
    "chunkTitle": "The Importance of Cybersecurity",
                    "chunkMeta": {
                        "Role": "dEV"
                    }
                }
            ]
        }
    ]
}'