Author Topic: Serving a greater purpose with AI  (Read 277 times)

Md. Abdur Rahim

  • Full Member
  • ***
  • Posts: 167
    • View Profile
Serving a greater purpose with AI
« on: August 16, 2023, 11:57:07 AM »


Assistant Professor Soujanya Poria has always held a fascination for programming, algorithms and systems in computer science since young. What drew his interest was how computer programming can lead to innovative creations such as games, websites, graphics—how a few lines of codes have shaped, and will continue to shape, the world today.

“In 2009, artificial intelligence (AI) was nothing as spectacular as it is now. But the things computer programmes could already create then, even small things, were amazing,” said Asst Prof Poria, a researcher of the Information Systems Technology and Design (ISTD) Pillar at Singapore University of Technology and Design (SUTD).

In 2022, he was recognised by IEEE Intelligent Systems as a rising AI scientist in the AI’s 10 to Watch list—a nod to his ongoing exemplary research on large language models (LLMs), affective computing, multimodal machine learning (ML), and context-dependent sentiment analysis. As the rising researcher continues to push boundaries, he credits much of his success to his older brother, who is his mentor and also a computer scientist.

“It was my brother who advised me to look into AI and natural language processing. He has a vision of how the future will be in 20 years, the kind of outlook that not many will have. I owe most of my success to my brother. He’s played a significant role in my life,” Asst Prof Poria shared.

 

Striving for a positive impact

Through his work, Asst Prof Poria is committed to one goal: to develop technologically advanced tools for the benefit of society.

“We should think about the bigger aspect of the research, like how AI should not take up the jobs of humans but assist humans in improving productivity. How can I employ AI to improve the lives of the poor? Thinking of the societal benefits will keep us motivated, otherwise we will just be making devices without any particular ambition or meaning,” he expressed.

For instance, in his research on multimodal ML, he demonstrates how improving such AI systems can benefit the healthcare sector.

“Imagine having an AI agent present in a physical conversation between a healthcare professional and a patient. The AI agent could assist the healthcare professional by constantly monitoring the patient’s facial expression and provide cues on whether the patient is happy or unhappy,” he explained. The idea is that the healthcare professional can use the information as a prompt to act accordingly, such as to be more empathetic or attentive.

Multimodal ML refers to the ability of AI models to process and interpret data from various modalities, such as textual, audio and visual. It is an emerging field that is gaining traction because training AI models to recognise and better understand different inputs can improve their ability to make predictions and decisions.

In Asst Prof Poria’s example, the AI agent will be able to monitor, record, analyse and produce a report that can provide the healthcare professional with insights into the patient’s mental state. The healthcare professional can then act on this information or refer to it for future conversations. Asst Prof Poria is partnering with companies to develop similar models and tools for them.

Additionally, in a recent research paper, he unveiled a novel model for automatic text-to-audio (TTA) generation. Built based on a latent diffusion model approach and named after the Latin American form of dance, TANGO uses an instruction-tuned LLM—unlike other TTA generation models—to encode textual description and convert it into sound.

Due to this change, TANGO outperformed AudioLDM, an advanced TTA generation model in the field, in generating audio across various sound types despite using a smaller dataset. Within a few weeks of its release, TANGO has been downloaded thousands of times and there are YouTube videos featuring it.

TTA generation is exciting for Asst Prof Poria because of its potential use cases. For example, early childhood educators could adopt TTA generation models to help children understand complex sounds better, akin to how visual aids are used to improve the learning process. In a workplace, TTA generation models could create meditative sounds to reduce employee stress.

“Our system can create different, complex and interesting sounds and I’ve been talking to companies about the possible use cases for it,” remarked Asst Prof Poria.

 

The future lies in open source AI

The advent of generative AI has created fundamental shifts in AI research and new possibilities. Asst Prof Poria personally looks forward to the potential of AI in creating long-form videos. However, he emphasised the need for open science.

“Models like ChatGPT and GPT-4 are closed source—you don’t know what’s going on inside, how much data these models are trained on, or how private the information is. I am part of the open source society, and I am proud of that. We need smaller, open source models that can achieve a similar performance but with more transparency and hopefully more interpretability. In the near future, I believe more contributions will come from smaller models,” Asst Prof Poria said.

“Smaller, open models can also help us move towards more sustainable AI. For large models, you need to train them for many days, probably up to a year to achieve the kind of performance we’ve seen. The carbon footprint they create and the power they use are definitely not sustainable and not good for the environment,” he added.

Asst Prof Poria believes that the open-source approach can help more institutions and researchers join the AI discourse, particularly those from countries that lack the means to build or invest in the necessary infrastructure. Improving their technological accessibility can ensure more AI research and developments—a win-win scenario in the AI landscape.

 Source: the American Association for the Advancement of Science (AAAS)
Original Content: https://www.eurekalert.org/news-releases/998602