Xiaomi has unveiled its new open-source intelligent audio model, MiDashengLM-7B, marking a major leap forward in its efforts to strengthen the technical backbone of its platforms, including smart home devices and electric vehicles. The model builds upon Xiaomi’s foundational audio system, Xiaomi Dasheng.
اضافة اعلان
Advanced Architecture and Unified Sound Understanding
According to Xiaomi's post on Chinese social media platform Weibo, MiDashengLM-7B represents a significant advancement in audio comprehension technologies. It utilizes a cutting-edge architecture that integrates the Xiaomi Dasheng platform as an audio encoder and the Qwen2.5-Omni-7B model as a decoder, creating a seamless system capable of understanding speech, environmental sounds, and music in a unified manner.
Innovative Training for Deeper Acoustic Insight
The model employs innovative training strategies that redefine audio scene interpretation, enabling it to capture deep auditory meanings, including speaker emotions, spatial echo, and other nuanced features often missed by traditional audio transformation models.
High Benchmark Performance
MiDashengLM-7B has demonstrated superior performance across 22 public evaluation datasets covering a wide range of tasks, including audio captioning, comprehension, audio-based Q&A, and speech recognition.
Its first-token response time in single-pass inference is just a quarter of what leading models require. Moreover, it processes 20 times more audio samples simultaneously under the same GPU memory constraints, giving Xiaomi a clear edge in performance.
Precision Audio Processing
The model has outperformed notable systems like Whisper and Kimi-Audio on X-ARES benchmarks, especially in non-speech tasks. Dasheng is also used for audio generation tasks such as noise reduction and auditory enhancement.
Notably, Xiaomi’s Dasheng-Denoiser has already been integrated into major international conferences like Interspeech 2025, showcasing its ability to turn noisy speech into clean audio using targeted encoding and advanced audio restoration networks.
Efficient Resource Utilization
In terms of computational efficiency, MiDashengLM shows impressive inference speed. For instance, it can process 512 audio samples (30 seconds each) within an 80GB memory environment, while competing models struggle beyond 16 samples.
This efficiency also enabled a reduction in audio encoder output frame rates from 25 Hz to 5 Hz, resulting in up to 80% less computational power required.
Fully Open Dataset
The model was built entirely using 100% publicly available data, amounting to 1.1 million hours spanning a wide range of fields—speech recognition, environmental sound understanding, music analysis, non-verbal behavior, and audio-based interactive tasks.
Redefining Audio Data Processing
One of MiDashengLM’s key breakthroughs is its radical departure from traditional ASR (Automatic Speech Recognition) systems. Instead, it uses comprehensive descriptive alignment mechanisms that integrate all types of sound content—including speech, ambient sounds, and music.
This shift reduced the loss of valuable data, which conventional ASR methods often discarded—sometimes up to 90% of the audio content.
Real-World Applications and Offline Capabilities
MiDashengLM has broad applications, such as providing custom feedback during voice training or language learning, offering real-time insights while driving, or serving as an intelligent assistant that can answer questions about environmental sounds.
Xiaomi also plans to expand the model to support offline operation on edge devices, along with enhanced voice editing features based on natural language commands.
Transparency and Open Collaboration
In a move toward full transparency, Xiaomi revealed all dataset details, including distribution ratios from 77 sources, and the entire training process—from the encoder’s initial pretraining to the final fine-tuning.
The model is released under the Apache 2.0 license, allowing full freedom for commercial or academic use. Xiaomi has invited the developer community to contribute via GitHub, reinforcing its philosophy of openness, transparency, and collaborative innovation.