Singapore regional AI model for Southeast Asia trained in 11 languages

By Crypto Gloom On Feb 17, 2024

While ChatGPT and Bard continue to gain traction, a group of researchers in Singapore are working on developing large-scale language models (LLMs) trained primarily on Southeast Asian data.

The artificial intelligence (AI) model, called SEA-LION, is designed as an alternative to the mainstream LLM, but tailored to Southeast Asia. The generative AI model was trained on data from 11 local languages, including Indonesian, Vietnamese, and Thai, with special attention paid to local culture and traditions.

The project, primarily funded by Singapore authorities, seeks to improve AI adoption indicators among businesses and individual users in the region. Previous attempts to use OpenAI’s ChatGPT had unclear results due to differences in training language and local dialect.

“We are not trying to compete with the big LLMs,” said Leslie Teo, senior director of AI products at AI Singapore. “We are working to complement this so it represents us better.”

Mainstream LLMs are typically taught in English, but despite the language’s reach, nearly 50% of the world’s population does not utilize the full potential of generative AI chatbots. To solve this problem, governments are designing custom chatbots that can complement existing services by shuffling data sets in local languages.

“Regional LLMs are also necessary because they support technological independence,” said Nuurrianti Jalli, assistant professor at Oklahoma State University. “Less reliance on Western LLM can provide better privacy protections for local residents and better serve national or local interests.”

SEA-LION is expected to have an immediate impact in Southeast Asia, especially among local companies transitioning to AI. Paul Condylis, vice president of data science at Indonesian startup Tokopedia, notes that the LLM model will be an essential addition to connecting, improving and personalizing customer experiences.

Southeast Asia has built an impressive reputation for embracing new technologies on par with North America and Europe. Along with AI, the region is opening its borders to blockchain technology with applications in finance, logistics, tourism, gaming and entertainment.

Disadvantages of a Regional LLM

Although regional LLMs have been praised for their localization, experts have discovered a series of biases and censorship in their use. There are also clear concerns that local AI systems may fail to include sufficient information about the global worldview and thus may portray a ‘revisionist view of history’.

“Models may fail to surface important social and political issues such as human rights violations, corruption, and valid criticism of political power,” Jalli said.

Others have pointed out that authoritarian governments use local LLMs to suppress dissent and oppress minorities. To ensure that LLMs reflect people’s cultural differences and remain neutral in their results, experts are pushing for the use of high-quality educational data that is free from bias and anti-democratic tendencies.

For artificial intelligence (AI) to function properly within the law and succeed in the face of growing challenges, it must integrate enterprise blockchain systems that ensure data input quality and ownership. This helps keep your data safe while ensuring immutability. data. Check out CoinGeek’s coverage To learn more about this new technology Why enterprise blockchain will become the backbone of AI.

See: Artificial Intelligence Needs Blockchain

Are you new to blockchain? To learn more about blockchain technology, check out CoinGeek’s Blockchain for Beginners section, our ultimate resource guide.