How Voice Responses and Audio Messages Work on aichat.md – Extended Details

One of the aspects that differentiates aichat.md from other platforms is the native ability to work with voice messages, both when receiving (when customers send you an audio message) and when responding (when the AI assistant responds to them vocally). Everything is based on integration with two internationally recognized technologies:

Receiving the Audio Message

The customer sends an audio message through connected channels.

  • Compatible Platforms

  • Automatic Detection

Interpretation with Whisper (OpenAI)

Instant Transcription

The audio message is sent to Whisper, which 'listens' and transforms everything into text. For example, if someone speaks in Romanian, the transcription will be in Romanian. If they speak in Russian, the transcription will be in Russian, and so on.

Multi-Language

Whisper automatically detects the language used.

Text Understanding

The AI assistant records the text as any other written message, so it understands exactly what the customer is asking.

No Configuration

There is no need for manual configuration of languages or transcription settings.

Processing

Processing and the Assistant's Decision

After transcription, the AI assistant analyzes the text and decides how to respond.

Generating the Voice Response with ElevenLabs

The process by which text is transformed into a voice message.

Text-to-Speech

The AI assistant formulates the response in text, then sends it to ElevenLabs.

Create Audio File

ElevenLabs converts the text into a voice message, using the selected voice.

Return to Customer

The final voice message is sent to the customer through Facebook/Instagram/Telegram.

Custom Voices

Possibility to choose male, female voices, warm or professional tone.

Custom Voice for Each Contact or Scenario

A special advantage on aichat.md is that you can customize the voice used depending on the type of customer, language, or situation.

New Customer

Friendly, slightly enthusiastic voice.

Business Partner

More serious, neutral voice.

Different Language

Adaptation to the customer's language (e.g., French with a specific accent, warmer Spanish).

Specific Situation

Dynamic tone for promotion, calm tone for responding to problems.

Brief Recap Scenario

Customer

Sends a 15-second audio message on Facebook in Romanian, saying: 'Hi, I'm interested in whether you do deliveries on Saturdays. Thank you!'

AI Assistant (Whisper)

Sends the file to Whisper, gets the text.

AI Assistant (Response)

Checks the instructions. Formulates the response: 'Hello! Sure, we also deliver on Saturdays, at no extra cost.' Sends the text to ElevenLabs, receives the audio file.

AI Assistant (Sending)

Returns the voice message in the Facebook conversation, also in Romanian. The customer hears a clear, warm, and personalized response.
placeholder hero

Why Voice Responses Matter for Your Business

By integrating with Whisper (for voice recognition) and ElevenLabs (for voice generation), aichat.md transforms any received audio message into a perfect two-way conversation: it understands what the customer says, regardless of the language, and responds with a personalized voice message, as natural as a real person.
You can even set different voices for each language or type of customer, creating a unique and super-personalized experience, without hassle.

  • Increased Conversions: People react positively when they feel a more human contact, such as voice messages.

  • Accessibility and Speed: Many customers prefer to speak, not type.

  • Professional Note: The assistant responds impeccably in multiple languages (not only in writing, but also in audio).

  • Unique Experience: Creates a unique and super-personalized experience.