Skip to content

Multilingual Support in SearchAI

SearchAI offers multilingual capabilities to enhance accessibility and deliver a seamless experience for users interacting in different languages. This feature ensures that users can engage with the platform in their preferred language, resulting in more intuitive and personalized interactions. With this feature, users can:

  • Add and manage content in multiple languages.
  • Submit queries and receive responses in supported languages.
  • Get search results and answers in the same language as the query

Note: SearchAI supports 100+ languages, enabling global accessibility. It can work with any language supported by the underlying LLM and vector generation model, provided you use the Text Extraction Strategy and Vector Retrieval method.

Configuration Guidance by Language

Certain modules within SearchAI are language-sensitive and may require different strategies or models depending on the language used. The tables below provide current support details for some of the most widely used languages across key components:

  • Extraction Strategies
  • Vector Configuration Models
  • Retrieval Strategies
  • Answer Generation Models

Use this guidance to ensure your multilingual setup is aligned with the most effective techniques for each language or model.

The table below outlines the supported content extraction methods for several widely used languages, helping you choose the most effective approach for processing multilingual content.

Language-Specific Extraction Capabilities

Language Text Extraction Layout Aware Extraction Image Extraction Advanced HTML Extraction Markdown Extraction
English
Ukrainian
Japanese
Spanish
Russian
Afrikaans
Albanian
Amharic
Arabic
Armenian
Assamese
Azerbaijani
Basque
Belarusian
Bengali
Bosnian
Bulgarian
Burmese
Catalan
Cebuano
Chinese
Corsican
Croatian
Czech
Danish
Dutch
English
Esperanto
Estonian
Finnish
French
Frisian
Galician
Georgian
German
Greek
Gujarati
Haitian Creole
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Irish
Italian
Javanese
Kannada
Kazakh
Khmer
Korean
Kurdish
Kyrgyz
Latin
Latvian
Lithuanian
Luxembourgish
Macedonian
Malagasy
Malay
Malayalam
Maltese
Marathi
Mongolian
Nepali
Norwegian
Odia (Oriya)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Scots Gaelic
Serbian
Sinhala
Slovak
Slovenian
Somali
Sundanese
Swahili
Swedish
Tagalog
Tajik
Tamil
Telugu
Thai
Tibetan
Turkish
Turkmen
Uyghur
Urdu
Uzbek
Vietnamese
Welsh
Yiddish
Yoruba

Language-Specific Vector Generation Support

Vector generation model support varies by language. Use the following models for optimal performance:

  • English: MPNet, E5, BGE-M3, LaBSE
  • Non-English Languages: BGE-M3 and LaBSE (recommended)

Note: BGE-M3 supports a wide range of languages. Their training data includes many commonly spoken languages; however, performance may be lower for low-resource or underrepresented languages.

Language-Specific Retrieval Strategy Support

  • English: Vector Retrieval and Hybrid Retrieval
  • Non-English: Vector Retrieval

Supported Answer Generation Models

Answer generation quality depends on the language capabilities of the underlying LLM. While many languages are technically supported, performance may vary.

Recommendation

Always refer to the LLM-supported languages list. If the LLM supports your language but response quality is inconsistent, try defining custom prompts to improve accuracy and relevance. You may also consider evaluating other LLMs that better support your target language.