{"id":116152,"date":"2024-08-17T15:03:12","date_gmt":"2024-08-17T15:03:12","guid":{"rendered":"https:\/\/news.talkwithrattan.com\/index.php\/2024\/08\/17\/heres-a-big-leap-into-use-of-genai-in-indian-languages-times-of-india\/"},"modified":"2024-08-17T15:03:12","modified_gmt":"2024-08-17T15:03:12","slug":"heres-a-big-leap-into-use-of-genai-in-indian-languages-times-of-india","status":"publish","type":"post","link":"https:\/\/news.talkwithrattan.com\/index.php\/2024\/08\/17\/heres-a-big-leap-into-use-of-genai-in-indian-languages-times-of-india\/","title":{"rendered":"Here\u2019s a big leap into use of GenAI in Indian languages &#8211; Times of India"},"content":{"rendered":"<div style=\"text-align:center\"><img loading=\"lazy\" decoding=\"async\" width=\"600\" height=\"446\" src=\"https:\/\/i0.wp.com\/static.toiimg.com\/thumb\/imgsize-23456,msid-112593198,width-600,resizemode-4\/112593198.jpg?resize=600,446&amp;ssl=1\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"Here\u2019s a big leap into use of GenAI in Indian languages &#8211; Times of India\" title=\"Here\u2019s a big leap into use of GenAI in Indian languages &#8211; Times of India\" \/><\/div><p> <br \/>\n<\/p>\n<div>In a modest office in Bengaluru, a soft-spoken entrepreneur is charting an ambitious course to make India a powerhouse in <!-- -->generative artificial intelligence<!-- --> tailored for the nation\u2019s linguistic diversity.<br \/>Vivek Raghavan, the co-founder of <!-- -->Sarvam AI<!-- -->, believes the key to unlocking AI\u2019s potential in India lies in developing models that can understand and communicate in the country\u2019s many regional languages through <!-- -->voice interfaces<!-- -->.<br \/>\u201cIndians will interact with generative AI through voice in their own language,\u201d Raghavan tells us.<br \/>At the heart of Sarvam\u2019s approach is the idea that while massive <!-- -->language models<!-- --> like GPT-4o and Gemini 1.5 offer impressive capabilities, much of what people need can be achieved with far smaller, more efficient models finetuned for specific tasks and linguistic contexts.<br \/>\u201cIf I want to do something that\u2019s relevant millions of times a day, I can\u2019t use those large models. It\u2019s too expensive and not accurate enough,\u201d Raghavan explains. \u201cFor a use case like customer support for a telecom company, I want a smaller, purpose-built model that outperforms bigger models on that task.\u201d<\/p>\n<div data-pos=\"0\" class=\"id-r-component QbQNS undefined  &#10;        \">\n<div><\/div>\n<\/div>\n<p>To this end, Sarvam just announced Sarvam 2B, an open-source 2 billion parameter model trained from scratch on trillions of tokens of Indian language data, including synthetically generated text. At just a fraction of the size of models like GPT-4, and at a fraction of its cost, Sarvam 2B<!-- --> promises to deliver superior performance on Indian language tasks like translation, transliteration, and summarisation. And it\u2019s does it for 10 <!-- -->Indian languages<!-- -->.<br \/>The company also unveiled \u201cSarvam Agents\u201d \u2014 multilingual, voiceenabled AI assistants that can perform actions like booking tickets or scheduling meetings through telephony, WhatsApp or in-app interfaces. The cost? As low as 1 rupee per minute.<br \/>In a demo we saw, a voice AI agent deployed on the phone line of a healthcare customer starts by saying: \u201cNamaste, Sarvam Saathi tak pahunchne ke liye, dhanyavad. Aap ki kya madad kar sakti hoon? (Thank you for reaching out to Sarvam. How can I help you?). Then starts a seamless conversation in Hinglish with a user who has a dental issue. The bot was able to understand even uniquely Indian utterances. There was no latency. If the user interrupted the bot, the bot handled that beautifully. It understood all queries, and it even tually even booked an appointment for the user with a doctor for the preferred date.<br \/><span class=\"strong\" data-ua-type=\"1\" onclick=\"stpPgtnAndPrvntDefault(event)\">Unconventional beginnings<\/span><br \/>Raghavan\u2019s path to founding Sarvam is unconventional. For 15 years, he worked as a volunteer on India\u2019s massive Aadhaar digital identity project. This experience, he says, gave him the drive to leverage technology for societal impact. \u201cI see a future where every child can get a quality education (via AI), which was not possible before this,\u201d he says, echoing a point made by Indian-American entrepreneur &amp; venture capitalist to TOI earlier this week.<br \/>He bumped into the Indian lan guage AI problem over a decade ago when the Supreme Court sought a way to translate judgments into regional languages. This led him to advise the government\u2019s Bhashini initiative \u2013 India\u2019s AI-led language translation platform, launched as part of the Digital India vision.<br \/>The decision to finally form a for-profit startup, rather than continue in the public or non-profit sector, was driven by the need for speed and scale. \u201cWe need to move faster,\u201d Raghavan explains. \u201cThis is a space where globally, things are moving very fast.\u201d<br \/>Sarvam\u2019s approach reflects Raghavan\u2019s belief in \u201csovereign AI\u201d \u2014 models tailored for Indian contexts that can be deployed on-premises by enterprises concerned about data privacy. It\u2019s also about giving Indian researchers the tools to push the boundaries of language AI.<br \/>The company is open-sourcing the audio language model that\u2019s built on top of Meta\u2019s open-source Llama model. \u201cWe want the Indian AI ecosystem to make progress,\u201d Raghavan says.<br \/><span class=\"strong\" data-ua-type=\"1\" onclick=\"stpPgtnAndPrvntDefault(event)\">Fundamental innovations<\/span><br \/>Under the hood, Sarvam has pioneered techniques to reduce the \u201ctokenizer tax\u201d that makes representing Indian language text inefficient in standard models. In AI and ML parlance, a token can represent an entire word or just a single character, Indian languages routinely fall prey to the negative effects of the second category, because the number of tokens it usually takes to represent an Indian language is far higher than say for a language like English. Which is why methods to reduce the tokenizer tax of using an Indian language was important, says Raghavan. Fewer tokens mean a smaller, more efficient model.<br \/>The company also embraced synthetic data generation as a way to augment limited real-world datasets for Indian languages. \u201cWe\u2019ve built models to generate data and we\u2019re using that data to train models,\u201d Raghavan says. Sarvam\u2019s 2B model was trained on a cluster provided by Indian company Yotta.<br \/>Looking ahead, Raghavan sees opportunities to apply generative AI to domains rich in Indian knowledge like Ayurveda, where models could synthesise information from ancient texts into a coherent, referenceable corpus.<\/div>\n<p><script>\nvar _mfq = window._mfq || [];\n_mfq.push([\"setVariable\", \"toi_titan\", window.location.href]);\n!(function(f, b, e, v, n, t, s) {\n    function loadFBEvents(isFBCampaignActive) {\n      if (!isFBCampaignActive) {\n        return;\n      }\n      (function(f, b, e, v, n, t, s) {\n        if (f.fbq) return;\n        n = f.fbq = function() {\n          n.callMethod ? n.callMethod(...arguments) : n.queue.push(arguments);\n        };\n        if (!f._fbq) f._fbq = n;\n        n.push = n;\n        n.loaded = !0;\n        n.version = '2.0';\n        n.queue = [];\n        t = b.createElement(e);\n        t.async = !0;\n        t.defer = !0;\n        t.src = v;\n        s = b.getElementsByTagName(e)[0];\n        s.parentNode.insertBefore(t, s);\n      })(f, b, e, 'https:\/\/connect.facebook.net\/en_US\/fbevents.js', n, t, s);\n      fbq('init', '593671331875494');\n      fbq('track', 'PageView');\n    };\n    function loadGtagEvents(isGoogleCampaignActive) {\n      if (!isGoogleCampaignActive) {\n        return;\n      }\n      var id = document.getElementById('toi-plus-google-campaign');\n      if (id) {\n        return;\n      }\n      (function(f, b, e, v, n, t, s) {\n        t = b.createElement(e);\n        t.async = !0;\n        t.defer = !0;\n        t.src = v;\n        t.id = 'toi-plus-google-campaign';\n        s = b.getElementsByTagName(e)[0];\n        s.parentNode.insertBefore(t, s);\n      })(f, b, e, 'https:\/\/www.googletagmanager.com\/gtag\/js?id=AW-877820074', n, t, s);\n    };\n    function loadSurvicateJs(allowedSurvicateSections = []){\n      const section =  window.location.pathname.split('\/')[1]\n      const isHomePageAllowed = window.location.pathname === '\/' && allowedSurvicateSections.includes('homepage')\n      if(allowedSurvicateSections.includes(section) || isHomePageAllowed){\n        (function(w) {\n         function setAttributes() {\n                    var prime_user_status = window.isPrime ? 'paid' : 'free' ;\n                    var viwedVariant = window.isAbPrimeHP_B ? 'B' : 'A';\n                    w._sva.setVisitorTraits({\n                    toi_user_subscription_status : prime_user_status,\n                    toi_homepage_variant_status: viwedVariant\n                        });\n                }\n         if (w._sva && w._sva.setVisitorTraits) {\n                    setAttributes();\n                } else {\n                    w.addEventListener(\"SurvicateReady\", setAttributes);\n          }\n          var s = document.createElement('script');\n          s.src=\"https:\/\/survey.survicate.com\/workspaces\/0be6ae9845d14a7c8ff08a7a00bd9b21\/web_surveys.js\";\n          s.async = true;\n          var e = document.getElementsByTagName('script')[0];\n          e.parentNode.insertBefore(s, e);\n        })(window);\n      }\n    }\n    window.TimesApps = window.TimesApps || {};\n    var TimesApps = window.TimesApps;\n    TimesApps.toiPlusEvents = function(config) {\n      var isConfigAvailable = \"toiplus_site_settings\" in f && \"isFBCampaignActive\" in f.toiplus_site_settings && \"isGoogleCampaignActive\" in f.toiplus_site_settings;\n      var isPrimeUser = window.isPrime;\n      var isPrimeUserLayout = window.isPrimeUserLayout;\n      if (isConfigAvailable && !isPrimeUser) {\n        loadGtagEvents(f.toiplus_site_settings.isGoogleCampaignActive);\n        loadFBEvents(f.toiplus_site_settings.isFBCampaignActive);\n        loadSurvicateJs(f.toiplus_site_settings.allowedSurvicateSections);\n      } else {\n          var JarvisUrl=\"https:\/\/jarvis.indiatimes.com\/v1\/feeds\/toi_plus\/site_settings\/643526e21443833f0c454615?db_env=published\";\n          window.getFromClient(JarvisUrl, function(config){\n            if (config) {\n              const allowedSectionSuricate = (isPrimeUserLayout) ? config?.allowedSurvicatePrimeSections : config?.allowedSurvicateSections\n              loadGtagEvents(config?.isGoogleCampaignActive);\n              loadFBEvents(config?.isFBCampaignActive);\n              loadSurvicateJs(allowedSectionSuricate);\n            }\n          })\n      }\n    };\n  })(\n    window,\n    document,\n    'script',\n  );<\/script><br \/>\n<br \/><br \/>\n<br \/><a href=\"https:\/\/timesofindia.indiatimes.com\/technology\/times-techies\/heres-a-big-leap-into-use-of-genai-in-indian-languages\/articleshow\/112593092.cms\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a modest office in Bengaluru, a soft-spoken entrepreneur is charting an ambitious course to make India a powerhouse in generative artificial intelligence tailored for the nation\u2019s linguistic diversity.Vivek Raghavan, the co-founder of Sarvam AI, believes the key to unlocking AI\u2019s potential in India lies in developing models that can understand and communicate in the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":116153,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"tdm_status":"","tdm_grid_status":"","fifu_image_url":"https:\/\/static.toiimg.com\/thumb\/imgsize-23456,msid-112593198,width-600,resizemode-4\/112593198.jpg","fifu_image_alt":"","footnotes":""},"categories":[604],"tags":[995,9809,15468,671,273,1025,81276,60295,29439,20276,95103,46211,272,95102],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts\/116152"}],"collection":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/comments?post=116152"}],"version-history":[{"count":1,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts\/116152\/revisions"}],"predecessor-version":[{"id":116154,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/posts\/116152\/revisions\/116154"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/media\/116153"}],"wp:attachment":[{"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/media?parent=116152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/categories?post=116152"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/news.talkwithrattan.com\/index.php\/wp-json\/wp\/v2\/tags?post=116152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}