Multilingual Support in SearchAI¶

SearchAI offers multilingual capabilities to enhance accessibility and deliver a seamless experience for users interacting in different languages. This feature enables users to engage with the platform in their preferred language, resulting in more intuitive and personalized interactions.

Core Capabilities

SearchAI's multilingual support enables you to:

Add and manage content in multiple languages.
Submit queries and receive responses in supported languages.
Get search results and answers in the same language as the query

Key Highlights:

100+ Languages Supported for indexing, querying, and answer generation.
Works with any language supported by your chosen LLM and vector generation model, using the Text Extraction strategy and Vector Retrieval method.
No additional configuration required.

Commonly Supported Languages¶

SearchAI supports the languages commonly handled by advanced LLMs and embedding models like BGE-M3. The following are among the most widely supported and globally significant languages in terms of usage and application support:

English	Spanish	Arabic	French	Chinese
Hindi	Portuguese	Russian	Bengali	Urdu
Japanese	German	Turkish	Korean	Italian
Vietnamese	Persian	Swahili	Thai	Malay
Afrikaans	Albanian	Amharic	Armenian	Assamese
Asturian	Avaric	Azerbaijani	Bashkir	Basque
Bavarian	Belarusian	Bihari	Bishnupriya	Bosnian
Breton	Bulgarian	Burmese	Cantonese	Catalan
Central Bikol	Central Kurdish	Chavacano	Chechen	Cebuano
Chuvash	Cornish	Corsican	Croatian	Danish
Dhivehi	Doteli	Dutch	Egyptian Arabic	Emilian-Romagnol
Erzya	Esperanto	Estonian	Fiji Hindi	Finnish
Galician	Georgian	Goan Konkani	Greek	Gujarati
Haitian Creole	Hebrew	Hill Mari	Hungarian	Ido
Icelandic	Ilocano	Indonesian	Interlingua	Interlingue
Irish	Javanese	Kannada	Karachay-Balkar	Kazakh
Khmer	Komi	Kurdish	Kyrgyz	Lao
Latin	Latvian	Lezghian	Limburgish	Lithuanian
Lojban	Lombard	Low German	Lower Sorbian	Luxembourgish
Macedonian	Maithili	Malagasy	Malayalam	Maltese
Manx	Marathi	Mazanderani	Meadow Mari	Mingrelian

Refer to the official documentation of your LLM or vector generation model for a comprehensive list of supported languages.

Language-Specific Configuration¶

While core multilingual support is comprehensive, certain modules within SearchAI are language-sensitive and require different strategies or models depending on the language used. The sections below provide support details for some of the most widely used languages by key components:

Extraction Strategies
Vector Configuration Models
Retrieval Strategies
Answer Generation Models

Use this guidance to ensure your multilingual setup is aligned with the most effective techniques for each language or model.

Language-Specific Extraction Capabilities¶

The following table outlines the supported content extraction methods for widely used languages, enabling you to select the most effective approach for processing multilingual content.

	Language Support
Text Extraction	All languages listed above
Layout Aware Extraction	English, Ukrainian
Image based Document Extraction	English, Spanish, Italian, German, French
Advanced HTML Extraction	English, Ukrainian, German
Markdown Extraction	English, Ukrainian, Spanish, Russian, German, Hungarian, Chinese

Language-Specific Vector Generation Support¶

Vector generation model support varies by language. Use the following models for optimal performance:

English: MPNet, E5, BGE-M3, LaBSE
Non-English Languages: BGE-M3 and LaBSE

Note: BGE-M3 supports a wide range of languages. Their training data includes many commonly spoken languages; however, performance may be lower for low-resource or underrepresented languages.

Language-Specific Retrieval Strategy Support¶

English: Vector Retrieval and Hybrid Retrieval
Non-English: Vector Retrieval

Supported Answer Generation Models¶

Answer generation quality depends on the language capabilities of the underlying LLM. Please refer to the official list of supported languages from the LLM provider.

Recommendations¶

To optimize multilingual performance:

Choose the right LLM - Select models with strong support for your target languages. Refer to the official list of languages supported by the LLM.
Customize prompts - Create language-specific prompts to improve answer quality and relevance.
Test performance - Evaluate different LLMs for your specific use case in your target language.
Monitor quality - Regularly assess answer quality across languages and adjust configurations as needed.

Send Feedback