Main content

News Release

Self-Supervised Learning and Data Augmentation Technologies for AI Speech Recognition Paper Accepted by the INTERSPEECH 2024

Development of Efficient and Effective Training Methods and Verification of Enhancements to AI Speech Recognition Performance

August 29, 2024

TOKYO, August 29, 2024 – Ricoh today announced its paper on Self-Supervised Learning and Data Augmentation Technologies for Artificial Intelligence (AI) Speech Recognition will be presented at INTERSPEECH 2024, the international spoken language processing conference. This is the first time a Ricoh paper has been accepted by the INTERSPEECH.

The paper presented at this conference introduces the development of an efficient and effective training method for AI speech recognition models using only speech data without transcripts.

Traditionally, supervised learning methods for AI speech recognition require speech data paired with corresponding transcripts to teach the AI the relationship between that speech and the text. However, this method demands a large volume of transcribed speech data, making it costly to acquire. Moreover, the audio quality recorded in real-world environments can vary depending on factors such as application and location, necessitating enhanced tolerance to acoustic noise for broader usability across different settings. Ricoh's newly developed self-supervised learning method, combined with data augmentation techniques that strengthen resistance to acoustic noise, achieves more accurate speech recognition performance at a lower cost compared to traditional methods.

AI speech recognition is a technology that recognizes and analyzes spoken words, voices, and conversations, converting them into text data for output. Today, this technology is widely used in business, such as displaying subtitles during meetings, creating meeting minutes, and generating reports. AI speech recognition allows for faster transcription and data entry into systems compared to manual typing, making it an extremely effective tool for improving operational efficiency. Ricoh's advanced AI speech recognition technology offers highly accurate recognition of casual conversations, as well as voices recorded at a distance from the microphone, even where there is noise or reverberations, making it ideal for business environments where accuracy is critical.

This capability is also integrated into Ricoh's AI Agent under development, which quickly recognizes and analyzes speech, dynamically generates follow-up questions, and engages in ongoing dialogues to anticipate customer needs and provide precise recommendations. Additionally, by combining AI speech recognition with other AI technologies, Ricoh delivers digital services that help customers effectively utilize speech and conversational data in their workplaces, transforming the data into new value and addressing their management challenges.

INTERSPEECH is the world's largest and most comprehensive conference on the science and technology of spoken language processing and organized by International Speech Communication Association (ISCA). The papers accepted will be presented at the 25th INTERSPEECH Conference to be held on Kos Island, Greece, from September 1-5, 2024.

As its mid-term vision, Ricoh aims to provide consistent services globally as a workplace services provider in the changing workplace. Ricoh empowers its customers' creativity by improving efficiency in workplace, any location or space where people work beyond traditional offices, through AI and data, while creating collaboration and innovation. Ricoh promotes AI research and development in spoken language and imaging fields to bolster the digital services it provides to those sectors. Ricoh will support the digital transformations of its customers through AI service offerings tailored for use in each business and industry.

Related Links


| About Ricoh |

Ricoh is a leading provider of integrated digital services and print and imaging solutions designed to support digital transformation of workplaces, workspaces and optimize business performance.

Headquartered in Tokyo, Ricoh's global operation reaches customers in approximately 200 countries and regions, supported by cultivated knowledge, technologies, and organizational capabilities nurtured over its 85-year history. In the financial year ended March 2024, Ricoh Group had worldwide sales of 2,348 billion yen (approx. 15.5 billion USD).

It is Ricoh's mission and vision to empower individuals to find Fulfillment through Work by understanding and transforming how people work so we can unleash their potential and creativity to realize a sustainable future.

For further information, please visit www.ricoh.com

###

© 2024 RICOH COMPANY, LTD. All rights reserved. All referenced product names are the trademarks of their respective companies.