Mastering Earned Media for Large Language Models (LLMs) ESG & Sustainability Awards PR Confidence Survey 2025 PRmoment Leaders

Is Gemini Becoming the Go-To AI for Indian Languages?

Credit: Wikipedia Commons

As an 'Army Brat', I got to live across various parts of India, including the north, south, and west. This resulted in a fairly neutral accent while speaking English. However, lately, I have noticed my Punjabi origins increasingly influencing the way I say certain phrases and words in English.

Just one reminder that India is likely the most complex market in the world for multi-language content.

Read this story in Hindi, translated by Gemini.

India today represents the largest untapped linguistic market in the world. Internet penetration has reached an estimated 886 million active users in 2024, with rural users (~488 million) driving growth. This makes India a natural testing ground for global tech players. Companies recognise that the next wave of digital growth is not coming from English-speaking users but from regional language speakers who are now coming online in millions every month.

With LLMs fast emerging as a source for search boom, PRmoment spoke to a range of PR professionals to find out how they are using AI to generate multilingual content, why regional matters, which LLMs are rising to the challenge to offer quality translations, and where they fall short.

Why does regional content in Indian languages matter?

Speaking at our recently concluded Webinar on Mastering Earned Media for Large Language Models (LLMs), Debanjan Chakraborty, VP and India digital advisory, Edelman, said, " LLMs are currently 'disproportionately trained on English and Western-centric data." Hindi, Tamil, and other regional languages are 'underrepresented,' leading to 'uneven brand visibility." 

This means brands might dominate English queries but "suddenly may disappear in answers where there are vernacular or local LLM outputs." OpenAI and Indian initiatives (like Bhashini under the National Language Translation Mission, Bharat GPT, and Sarvam AI) are working to address this."

Binesh Kutty, senior director, Live! Newsroom, Burson India says, "To their credit, many leading models now perform reasonably well in widely spoken Indian languages like Hindi, Marathi and Tamil. However, they continue to underperform in less-represented languages such as Odia, Punjabi and others."

Read this story in Hindi, translated by Gemini.

Nakul Kundra, co-founder, Devnagri AI, says, "While global players are making an effort, their models are still largely trained on Western datasets.

The need is not just to translate text but to embed cultural context into AI models. That level of granularity is critical in India’s fragmented media landscape. In short, global players are paying attention because they see the size of the opportunity, but it will take Indian innovators to truly solve the problem in depth."

Yasin Hamidani, director, Media Care Brand Solutions, says, "In India, a single campaign often needs adaptation across 8–10 languages, each with unique tonality, cultural codes, and platform consumption habits. On YouTube or ShareChat, vernacular video thrives; on Instagram, Hinglish dominates; while Twitter/X leans towards English. Managing consistency while staying locally relevant is the biggest challenge. Additionally, ensuring accurate translations, regional influencer collaborations, and optimising content placements by language all add to the operational complexity."

Binesh explains that the implications are significant for communication professionals, " They must constantly fine-tune content placement strategies while staying alert to any false narratives and misinformation gaining visibility. This requires communications agencies to have access to solutions that can identify both supply gaps and demand patterns in content across multiple languages and formats. Today, there are emerging solutions that can analyse why a piece of content might perform well in English but fail in Hindi or Tamil and, more importantly, provide actionable recommendations for localisation and adaptation.

Some platforms have started to address these challenges. Features like auto-dubbing are now being offered to help users consume content in their preferred language. But these tools often remain underutilised partly because the onus remains on users to activate and use them."

Prerna Dalakoti, account director, PR Professionals Group, explains, "Global tech companies have understood that if they want to succeed in India, they cannot ignore regional languages. India’s internet user base is increasingly driven by non-English speakers, and engagement is far higher when content is available in users’ native languages. This has made multilingual content creation more mainstream, expanding reach to previously untapped audiences and opening up new opportunities for deeper engagement."

Which LLM is the best for English to Hindi, Indian languages translations?

Last month, Google’s AI app Gemini crossed 450 million monthly active users globally. The company stated in July this year that the decision to offer free access to students contributed to the increase in the number of monthly active users by 50% over the last quarter.

Gemini also actively supports responses in nine Indian languages: Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Telugu, and Urdu. This is now expanding to 12 languages with support from developers in India.

 What about ChatGPT? ChatGPT is the most popular generative AI app, with 40.52 million monthly downloads worldwide. India has the second-largest number of ChatGPT users in the world, second only to the US.

As per some user reviews, ChatGPT can understand and generate text in Hindi, Tamil, Telugu, Bengali, and select others, but fluency, comprehension, and cultural nuance can be inconsistent compared to Gemini.

Can Global LLMs correctly translate Indian Languages?

Nakul Kundra, co-founder, Devnagri AI, points out, "Global LLMs represent a huge technical advance, but they were never designed for India’s linguistic complexity. They perform well on broad patterns but struggle with code-mixing (Hinglish/Tanglish), romanised (using Roman script for writing Indian languages) inputs, dialectal forms, and locally specific named entities - all of which change meaning in subtle ways.

For India, localisation isn’t about converting English to Hindi or Tamil but about capturing context, intent, and cultural nuance. A financial assistant trained on Western datasets won’t understand Indian banking terms like ‘chit funds’ or ‘gold loans’. A health chatbot can’t afford to misinterpret a colloquial term for a symptom. These aren’t small misses; they can have reputational, financial, or even human consequences. Academic evaluations and field experience show clear accuracy gaps for low-resource Indic languages and for code-mixed text; the result is not just awkward translations but mistakes that can be reputational or operationally harmful.

Second, discovery: recommendation and rankings behave differently for vernacular content; SEO and metadata strategies must be language-aware. 

Third, safety and measurement: moderation models trained on English often misclassify sentiment and intent in regional languages, increasing false positives and negatives. Practically, this means content teams must localise not just words but storytelling, A/B test by language cohort, and invest in language-native moderation and measurement. The companies that crack this will not just reach more Indians, they will win deeper trust and cultural relevance."

Read this story in Hindi, translated by Gemini.

Are there enough regional language datasets for India?

Arati Mukerji, founder, Commarati, shares, "Global LLMs are great for translating news or summarising simple content, but when it comes to the specificities of each culture, they face issues. For instance, ‘festival’ may be translated as ‘tyohar’; however, it may miss the context of whether it is Diwali, Pongal, Eid, as well as the unique metaphors and rituals. The same is the case for regional greetings, humour, which may not reflect the lived experiences.

Some challenges faced by global LLMs emanate from the fact that datasets or digital representations of the regional languages are not very large, and existing data often reflects the urban language style but may not necessarily be everyday speech."

Additionally, Arati explains, "Hashtags, algorithms, etc, are still mostly English-language dominated on most platforms, posing a real challenge for discovery of content."

Read this story in Hindi, translated by Gemini.

India's race to build local language LLMs

India currently has around 28 Large Language Models (LLMs) with some capability in Indian languages.

Several indigenous and international LLMs trained on Indian language data have emerged from both research labs and startups, including:

  • Sarvam-1 and Sarvam-2b: Trained specifically for 10 Indian languages—Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
  • MuRIL: Developed by Google Research India, pre-trained on 17 Indian languages and their transliterated counterparts, including Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Assamese, Malayalam, Punjabi, Odia, Urdu, and others.
  • Navarasa 2.0: Supports 15 Indian languages plus English, including Hindi, Telugu, Tamil, Marathi, Urdu, Konkani, Assamese, Nepali, Sindhi, Malayalam, Kannada, Punjabi, Oriya, Gujarati, and Bengali.
  • Krutrim: Can process all 22 Indian constitutionally recognised languages and generate outputs in 10, including Marathi, Hindi, and Kannada.
  • BharatGPT: Supports over 12 languages, designed to develop multilingual assistants.
  • OpenHathi: Specialised in Hindi and English.
  • AryaBhatta-GemmaGenZ-Vikas-Merged: Supports nine Indian languages.

Read this story in Hindi, translated by Gemini. Do let us know your views on the quality of the translation here

If you enjoyed this article, you can subscribe for free to our weekly event and subscriber alerts.

We have four email alerts in total - covering ESG, PR news, events and awards. Enter your email address below to find out more: