GH GambleHub

Voice interfaces and assistants

1) What VUI is and when it's needed

Voice interface (VUI) - a way of interacting through speech: assistants in the application/browser, smart speakers, IVR/telephony, voice in auto and TV.
Suitable for: hand-occupied scenarios (driving, kitchen), quick commands ("turn on...," "call..."), accessibility, navigation through complex menus.
Not suitable for: accurate visual selection (catalogs, tables), long entry of structured data without a screen.

2) Dialogue model: intents, entities and context

Intent: what the user wants: 'Create _ payment', 'Check _ balance'.
Slots/entities: target parameters: amount, currency, addressee, date.
Context/dialogue-state: what is already known, what we clarify, where we branch.
Confirmation rules: that we confirm explicitly (money, personal data).

Example of intent scheme (pseudo-JSON):
json
{
"intent": "MakeDeposit",
"slots": {
"amount": {"type": "number", "required": true, "confirm": "sensitive"},
"currency": {"type": "currency", "required": true, "default": "UAH"},
"method": {"type": "payment_method", "required": false}
}
}

3) Patterns of dialogue

1. The team with one phrase: "Top up the account for 500 hryvnia Apple Pay." → confirmation → action.
2. Clarifying dialogue: "To whom to translate?" → "How much?" → confirmation.
3. Step-by-step wizard: complex scenarios with data validation and reverse step.
4. Intent recognition + NLU paraphrase: support for variable formulations.

5. Quick help: "What are the withdrawal limits?" - short answer + "Show on screen."

4) Wording: voice and tone

Brand voice: confident, calm, friendly; without diminutive and "jokes" in critical steps (payments, security).

Max. Assistant replica length: 1-2 sentences; long answers - break and suggest "Continue?"

Questions - specific: "How much to replenish?" instead of "What do we do next?"

5) Confirmations, safety and ethics

Tough confirmation of sensitive actions: pronounce key parameters ("Replenish by 500 hryvnia with a card... 4581? »)

Double confirmation for irreversible operations.
Without voicing full personal data.
Undo/Undo option: Undo, Stop, Undo Last Step.

6) Mistakes and misunderstanding

Failure types and responses:
  • ASR error (did not hear): "I did not hear the amount. Please repeat it"
  • NLU-incomprehensible: "I did not understand the request. I can top up my account or show my balance. What will you choose?"
  • Missing data/limitation: "This method is not available in your region. What are the other options?"
  • Network/service: "Now there is no connection with the payment service. Do you want to try again in a minute?

Rule: a maximum of 2 attempts to query → offer an alternative (screen/person).

7) Speed and barge-in (interrupting)

TTFB latency: target <300-500 ms; if longer - a short "em-mm" signal/earcon.
Barge-in: the user can interrupt the assistant at any time; handle the interrupt correctly.
Streaming the answer: we start talking earlier than the entire text is ready, but without breaking the line.

8) TTS/ASR and SSML: How to Say "Human"

Pronunciation of numbers/currencies/dates: local formats ("p'yatsot hryvnia," "15 leaf falls").
Pauses and stresses: SSML '<break time = "300ms "/>', '<emphasis level =" moderate ">'.
Reading abbreviations/codes: '<say-as interpret-as = "characters"> IBAN </say-as>'.
Speed ​ ​ and timbre: no faster than 0. 9 × basic to be legible.

SSML example:
xml
<speak>
Top up on <say-as interpret-as = "cardinal"> 500 </say-as>
<sub alias = "hryvnia"> UAH </sub>?
<break time="300ms"/>
Please confirm.
</speak>

9) Multimodality: voice + screen

Visual cues: confirmation card, list of methods, progress.

Hand-off to the screen: "I sent options to the screen. Please select a method"

State synchronization: voice initiates, screen terminates (and vice versa).

10) Multilingualism and localization

Auto-detect language by session/tuning, not by single phrase.
Glossary of terms: common terminology for RU/UA/TR/EN.
Regional formats of numbers/currencies/dates, pronunciation of names/toponyms.
Switching in the dialog: "Go to ukraїnsku" is an explicit command.

11) Availability (A11y) in voice

Confirmation of action is clear and short.
Repeat on Demand: "Repeat" voices the final line.

Volume/speed: "Speak slower/quieter/louder."

For the hearing impaired: subtitles/transcript on the screen, vibration signals.
For speech disorders: alternative input methods (button, presets).

12) Confidentiality, logging and compliance

Wake-word and recording indicator: explicit "listening" state.
Local processing, if possible; otherwise, data minimization.
Masking sensitive fragments in logs (PAN, IBAN, address) and auto-editing audio.
Retention periods and right to remove on request; Do not save history settings.
Age restrictions/parental controls (children's voices/teams).

Transparency: "I am recording this command to improve recognition. Can be disabled in settings"

13) Assistant persona

Name/person: a short biography, area of ​ ​ competence that can/cannot.
Tone for situations: normal (friendly), critical (neutral), educational (supportive).

Boundaries: "I don't give financial advice, but I can show help."

14) VUI Quality Metrics

Intent recognition rate.
Slot fill rate и avg. turns to fill.
ASR WER/CER (Word/Character Recognition Error).
Task Success / Completion rate и Time-to-Complete.
Escalation rate (per operator/screen).
Barge-in usage и Latency p95.
User Satisfaction/CSAT after script.
Abandonment on step.

15) Voice testing and QA

Test phrase sets: synonyms, colloquial forms, accents, errors.
Environment noises: street/car/kitchen, different microphones.
Replay dialog: playable scripts, golden-set for regression.
Wizard-of-Oz in the early stages.
Legal scenarios: How an assistant responds to potentially dangerous requests.

16) Product integration (iGaming cases)

Balance/deposit/withdrawal: "What is the balance? , ""Replenish at 200 UAH...," "Output status."

Bonuses/Missions: "What bonuses are available? , ""Activate weekly cashback."

Responsible play: "Set a deposit limit of 1000 UAH per week."

Status of systems: "Are there any technical works now?"

17) Anti-patterns

Long monologues of the assistant without the opportunity to interrupt.
Implicit confirmations of monetary transactions.
Uncontested "did not understand" without prompting options.
Oversupplied sounds/jingles interfering with perception.
An attempt to "voice" solve problems where a detailed visual choice is needed.

18) Promts and answers templates

Slot refinement (sum):
  • Assistant: "How much to replenish the account?"
  • User: "Five hundred."
  • Assistant: "Replenish by 500 hryvnia? Please confirm"
Confirmation of sensitive action:
  • "Confirm replenishment by 500 hryvnia card... 4581. Say "confirm" or "cancel.""
Misunderstanding + guide tip:
  • 'I didn't hear the payment method. I can offer: Apple Pay, card, crypto wallet. What will you choose?"
Escalation to screen:
  • "Sent available methods to the screen. Select and say "done" to continue"

19) Examples of SSML patterns

Numbers/Currency and Pause:
xml
<speak>
Your current balance is
<say-as interpret-as="cardinal">1250</say-as>
<sub alias = "hryvnia"> UAH </sub>.
<break time="250ms"/>
Shall we continue?
</speak>
Emphasis on the important word:
xml
<speak>
<emphasis level = "moderate "> Caution </emphasis>: Verification is required for output.
</speak>
Pronunciation of the abbreviation:
xml
<speak>
Recharge with <say-as interpret-as = "characters"> IBAN </say-as>?
</speak>

20) Checklists

Pre-Release Dialogue/Content

  • For each intent - a list of synonyms/phrase variants.
  • One clear question per required slot.
  • Sensitive actions - with explicit confirmation.
  • There is a short on-screen/operator alternative.
  • Replicas ≤ 2 suggestions; long - with "Continue? ».

Technique and quality

  • barge-in is supported and return to dialogue after interrupting.
  • p95 latency is normal; there are earcons on delay.
  • SSML configured: pauses, numbers, stresses.
  • Logs impersonal/masked; history management is.
  • Multilingualism and local formats tested.

A11y and safety

  • "Repeat/Speak Slower/Louder" works.
  • Complete personal/payment data is not announced.
  • There is a cancellation/rollback of the action by voice.
  • Age and regional limits tested.

21) Dialog specification framework (template)

Purpose of the scenario: (for example, "Deposit ≤ 90 seconds")

Intents and synonyms: a list of example phrases.
Слоты: `amount` (req, confirm), `currency` (default=UAH), `method` (enum).
Confirmation rules for which values/thresholds to repeat.
Error options: ASR, NLU, no service - texts + branches.
Multimodal outputs: which cards/screens we show.
Logs and privacy: what and how we mask, TTL storage.

Final cheat sheet

First intents/slots/confirmation rules, then texts.
Speak briefly, let them interrupt and cancel.
Configure SSML, local formats, and tone by context.
Keep privacy and logging under control.
Measure Intent/Slot/ASR metrics, Task Success, and latency.
Always have an alternative to the screen and a path to the person.

Contact

Get in Touch

Reach out with any questions or support needs.We are always ready to help!

Telegram
@Gamble_GC
Start Integration

Email is required. Telegram or WhatsApp — optional.

Your Name optional
Email optional
Subject optional
Message optional
Telegram optional
@
If you include Telegram — we will reply there as well, in addition to Email.
WhatsApp optional
Format: +country code and number (e.g., +380XXXXXXXXX).

By clicking this button, you agree to data processing.