Tõlk.fm documentation

Help Center

In-depth guides for event organisers — from setup and audio to billing, privacy, and troubleshooting. For quick answers, see the FAQ.

General supporthello@tolk.fm

Billing questionsbilling@tolk.fm

Privacy & legalprivacy@tolk.fm

First steps

Getting started

Everything you need to run your first event — account creation, event setup, and distributing your QR code.

Create an account

Go to tolk.fm and click Sign up.
Enter your email address. You will receive a magic link — no password required.
Click the link in the email to confirm your address and land in the dashboard.
Optional: add your organisation name in Settings → Profile so it appears in event pages and billing documents.

You must be at least 18 years old to create an account or purchase services.

Create your first event

In the dashboard click New event.
Enter an event title and choose an event mode: Live translation, Scheduled playback, or Hybrid. Live translation is the right choice for most first events.
Select your source language — the language the speaker will use.
Choose up to 8 target languages. Each active language channel is billed independently.
Click Create event. You are taken to the event control room.
From the control room, copy the join link or download the QR code to share with your audience.

You can edit most settings — title, languages, engine, speaker mode — after creation and even while the event is running.

Run a test event

Always rehearse before your real event. Test runs consume credits at the same rate as production events, but are essential for catching audio and connectivity issues.

Create a test event with the same language channels you plan to use.
Set up the same microphone, mixer, or audio interface you will use at the venue.
Start the event and speak naturally for a few minutes while a colleague listens on the join page.
Check latency (a few seconds is normal), translation accuracy, and audio quality across all target languages.
Switch between OpenAI and Gemini to compare which engine suits your content.
End the event when done. The test cost will be deducted from your wallet.

Microphone placement and room acoustics strongly affect translation quality. Test in the actual venue whenever possible.

Create a custom join URL

A custom join slug replaces the default code-based URL with a human-readable path:

tolk.fm/my-conference-2026

Open the event from the dashboard and go to Event settings.
Find the Custom join URL field and enter your preferred slug. Slugs must be unique — you will see an error if the slug is already taken.
Save. The new URL becomes active immediately and the old code-based URL continues to work.

Slugs can contain letters, numbers, and hyphens. Spaces and special characters are not allowed.

Event setup

Event modes

Tõlk.fm supports three event modes and three audience capture modes. Learn which combination fits your event.

Live translation mode

In live translation mode, source audio is captured from a microphone, mixer, or audio interface and streamed to the AI translation engine in real time. Translated audio is broadcast to listeners within a few seconds.

Best for: conferences, keynotes, panels, lectures, and any event where content is not known in advance.
Latency: a few seconds is normal. Let your audience know in advance.
Languages: up to 8 target languages simultaneously. Each runs as a separate AI translation session.
Engine: choose OpenAI (13 languages, synthetic voice) or Gemini (70+ languages, preserves speaker voice). You can switch live.

Scheduled playback mode

In scheduled playback mode, you upload a pre-recorded audio or video file before the event. Tõlk.fm prepares translated audio tracks for all target languages. At the scheduled start time, each listener hears the translated version of the recording in perfect sync with everyone else on the same channel.

Best for: pre-recorded keynotes, video screenings, or presentations where content is finalised in advance.
No latency: prepared translations are synced exactly to the recording timeline.
Upload: upload your source file from the event control room. Translation preparation begins automatically.
Start time: set the broadcast start time. Listeners who join early will wait on a countdown screen.

Hybrid mode

Hybrid mode combines prepared translated tracks for pre-recorded segments with live AI translation for live segments such as Q&A, panel discussions, or speaker remarks.

Example: a pre-recorded keynote video plays with prepared translations, then a live Q&A session switches to real-time AI translation with open-mic audience participation.

Hybrid mode requires planning the handoff points between prepared and live segments. Test the transition with your team before the event.

Open floor and audience mic

Open floor is an audience capture mode that lets listeners request the microphone from their phone. One person speaks at a time; the organiser or speaker approves the request.

Best for: guided tours, workshops, Q&A sessions, and any event where the audience needs to speak.
How it works: a listener taps Request mic on the join page. The organiser sees the request in the control room and taps Approve. The listener's voice is then translated and broadcast like any other speaker.
One at a time: only one person holds the floor. The previous speaker is automatically released when a new request is approved.
Translation cost: open-mic speech adds to the per-channel-minute cost in the same way as main-speaker audio.

Speaker monitor mode

Speaker monitor mode gives the event speaker or moderator a dedicated interface showing live translations as they are generated. It also provides controls for the open-mic queue.

In event settings, enable Speaker monitor.
A separate join code is generated for the speaker monitor page. Share this with your speaker or moderator.
The speaker opens the monitor page on their device. They see captions and translations in real time.
Optionally, exclude one language from the speaker monitor to avoid hearing a translated version of their own voice (echo avoidance).

Add or remove languages during an event

You can change the active language channels without stopping the event.

Add a language: in the runtime control room, click Add channel and select the target language. The new channel starts within a few seconds.
Remove a language: click Stop next to an active channel. Billing for that channel stops immediately.
Switch engine: you can switch between OpenAI and Gemini for the entire event from the runtime controls. Channels restart automatically with the new engine.

Audio

Audio & sound setup

Good audio in means good translation out. How to set up your microphone, mixer, and venue connection.

Microphone and audio input setup

Tõlk.fm captures audio from whichever input device your browser has access to. For best results:

Use a directional microphone pointed at the speaker's mouth rather than an omnidirectional room microphone. This reduces background noise and reverberation.
Set gain correctly — aim for a signal that peaks around –12 dBFS. Clipping or very low signal both hurt translation accuracy.
USB audio interfaces (e.g. Focusrite Scarlett, RODE AI-1) give you more control over gain and connect reliably to laptops and tablets.
Bluetooth microphones introduce additional latency and are not recommended. Use wired USB or 3.5 mm TRS connections where possible.

Using a mixer or PA direct feed

Connecting a direct feed from the venue mixing desk produces the cleanest possible audio and is strongly recommended for large venues.

Ask the venue sound engineer for an auxiliary send or direct output from the mixing desk.
Connect this output to your laptop or tablet via a USB audio interface or a 3.5 mm TRS adapter.
In the Tõlk.fm control room, select the correct audio input device before starting the event.
Ask the sound engineer to set the send level at a consistent volume — avoid riding gain during the event, which causes translation quality to vary.

A PA feed from a live desk typically sounds better than a lavalier microphone placed near the event speaker, especially in large or reverberant spaces.

Understanding latency

Live AI translation introduces a delay of a few seconds between the speaker and the listener. This is inherent to the technology: the model must receive a phrase before it can translate it.

Latency	What it means
1–4 seconds	Normal. Expected for real-time AI translation.
5–8 seconds	Acceptable. Check network conditions if this is consistent.
10–15 seconds	High. Investigate network, provider load, or audio issues.
15+ seconds	Problematic. Restart the channel and check connectivity.

Tell your audience at the start of the event that translation trails the speaker slightly. This sets expectations and prevents confusion.

Listener headphone guidance

Headphones are strongly recommended for every listener. Without them:

The translated audio from a listener's device can be picked up by the event microphone, creating a feedback loop that degrades translation quality for everyone in the room.
Ambient translated audio from multiple devices creates distracting noise for other attendees.

Practical options: ask attendees to bring earphones; provide disposable earbuds at the registration desk; or use an induction loop system for accessibility-compliant deployments.

Network and bandwidth requirements

Tõlk.fm requires a stable, low-latency internet connection. Bandwidth requirements are modest, but connection stability matters more than raw speed.

Connection	Requirement
Organiser device (audio capture)	Stable uplink ≥1 Mbps, latency <100 ms preferred. Wired Ethernet strongly recommended for the organiser side.
Listener devices (audio playback)	Any Wi-Fi or 4G/5G mobile connection. Minimum ~200 kbps per channel.
Venue Wi-Fi	Separate SSID for the event from the public guest network. Ensure sufficient access points for the expected audience density.

Avoid relying on shared public Wi-Fi for the organiser's audio capture device. If the venue connection drops, translation stops for all listeners.

Languages

Languages & translation engines

OpenAI and Gemini power Tõlk.fm. Choose the right engine for your language coverage and voice preferences.

Supported languages

The available target languages depend on the translation engine chosen for the event. You can run up to 8 target languages simultaneously per event.

OpenAI realtime translate — 13 target languages

EnglishEnglish
SpanishEspañol
PortuguesePortuguês
FrenchFrançais
Japanese日本語
RussianРусский
Chinese中文
GermanDeutsch
Korean한국어
Hindiहिन्दी
IndonesianBahasa Indonesia
VietnameseTiếng Việt
ItalianItaliano

Google Gemini live translate — 81 target languages (includes all OpenAI languages)

AfrikaansAfrikaans
AkanAkan
AlbanianShqip
Amharicአማርኛ
Arabicالعربية
ArmenianՀայերեն
AzerbaijaniAzərbaycan dili
BasqueEuskara
BelarusianБеларуская
Bengaliবাংলা
BulgarianБългарски
Burmese (Myanmar)မြန်မာစာ
CatalanCatalà
Chinese中文
Chinese (Simplified)简体中文
Chinese (Traditional)繁體中文
CroatianHrvatski
CzechČeština
DanishDansk
DutchNederlands
EnglishEnglish
EstonianEesti
FilipinoFilipino
FinnishSuomi
FrenchFrançais
GalicianGalego
Georgianქართული
GermanDeutsch
GreekΕλληνικά
Gujaratiગુજરાતી
HausaHausa
Hebrewעברית
Hindiहिन्दी
HungarianMagyar
IcelandicÍslenska
IndonesianBahasa Indonesia
ItalianItaliano
Japanese日本語
JavaneseBasa Jawa
Kannadaಕನ್ನಡ
KazakhҚазақ тілі
Khmerខ្មែរ
KinyarwandaIkinyarwanda
Korean한국어
Laoລາວ
LatvianLatviešu
LithuanianLietuvių
MacedonianМакедонски
MalayBahasa Melayu
Malayalamമലയാളം
Marathiमराठी
MongolianМонгол
Nepaliनेपाली
NorwegianNorsk
Norwegian (Bokmål)Norsk bokmål
Persianفارسی
PolishPolski
PortuguesePortuguês
Portuguese (Brazil)Português (Brasil)
Portuguese (Portugal)Português (Portugal)
Punjabiਪੰਜਾਬੀ
RomanianRomână
RussianРусский
SerbianСрпски
Sindhiسنڌي
Sinhalaසිංහල
SlovakSlovenčina
SlovenianSlovenščina
SpanishEspañol
SundaneseBasa Sunda
SwahiliKiswahili
SwedishSvenska
Tamilதமிழ்
Teluguతెలుగు
Thaiไทย
TurkishTürkçe
UkrainianУкраїнська
Urduاردو
UzbekOʻzbekcha
VietnameseTiếng Việt
ZuluisiZulu

If a target language you need is not on the OpenAI list, switch to Gemini in event settings to access the full catalogue of 81 languages.

OpenAI vs Gemini — which should I choose?

Feature	OpenAI	Gemini
Languages	13	81
Voice style	Synthetic voice (selectable), adjustable speed	Preserves speaker's original voice and intonation
Transcription	Source-language captions	Target-language captions
Latency	Typically low	Slightly higher but comparable
Best for	Common languages, clean synthetic output, voice customisation	Rare or regional languages, natural voice fidelity

You can switch engines live during an event. We recommend testing both before your event to see which produces better results for your speaker and content type.

Switching translation engines during an event

You can change the translation engine without ending the event. Active channels will restart with the new engine — listeners will experience a brief pause of a few seconds.

In the event control room, open Runtime settings.
Select the new engine (OpenAI or Gemini) from the provider dropdown.
Confirm. All active channels will stop and restart using the new engine.
Listeners are automatically reconnected to the restarted channels.

If one engine has a provider outage or unusually high latency, switching to the other is a fast recovery option.

Voice customisation (OpenAI engine)

When using the OpenAI engine, you can select from several synthetic voices and adjust the output speed.

Voice selection: choose from available voice presets in event settings. Preview each voice before the event using the voice preview feature.
Speed control: set playback speed from 0.25× (very slow) to 1.5× (fast). A slower speed gives listeners more time to follow along; a faster speed reduces the perceived lag between speaker and translation.
Applies globally: the voice and speed settings apply to all language channels running on the OpenAI engine.

The Gemini engine does not have a synthetic voice selector — it preserves the speaker's original voice characteristics instead.

Translation accuracy and limitations

AI translation quality depends on audio clarity, speaking pace, accents, technical vocabulary, and background noise. For most conference and presentation contexts, the output is clear and understandable. Factors that reduce accuracy:

Multiple speakers talking simultaneously
Heavy accent or dialect not well-represented in the model's training data
Fast speech with little pausing
Technical, medical, or legal jargon
Background music, applause, or crowd noise in the audio feed

Do not use Tõlk.fm as the sole source of translation for medical, legal, financial, safety, emergency, or other high-stakes decisions. Always provide human interpreter alternatives where accuracy is critical or legally required.

Billing

Billing & credits

Tõlk.fm uses a prepaid USD wallet. Top up before events, monitor usage live, and review charges in the Billing page.

How billing works

Tõlk.fm charges $0.09 per minute per active translation channel. A channel is one target language running during a live event. Charges are calculated at the end of each event.

Example	Calculation	Cost
30-min talk, 1 language	30 × 1 × $0.09	$15.00 (minimum applies)
60-min talk, 2 languages	60 × 2 × $0.09	$15.00 (minimum applies)
90-min talk, 3 languages	90 × 3 × $0.09	$24.30
Half-day (4 hrs), 4 languages	240 × 4 × $0.09	$86.40
Full day (8 hrs), 5 languages	480 × 5 × $0.09	$216.00

There is a $15.00 minimum charge per event, regardless of actual usage. If an event is very short or uses few channels, the minimum applies.

Top up your wallet

Go to the Billing page in the dashboard and choose a plan or enter a custom amount. Payments are processed by Stripe. Funds appear in your wallet immediately after payment.

Plan	You pay	Wallet receives	Bonus
Starter	$12	$12	—
Pro	$108	$120	+$12
Scale	$480	$600	+$120
Custom	Any amount ≥$5	Same amount	—

The Pro and Scale plans include a volume bonus. For frequent events, these plans reduce your effective cost per channel-minute.

Estimate your event cost

Use this formula to estimate spend before an event:

Cost = max( duration_minutes × languages × $0.09, $15.00 )

Top your wallet up by at least the estimated cost before starting. The event will stop translating if your wallet balance reaches zero mid-event.

For a 2-hour event in 3 languages: 120 × 3 × $0.09 = $32.40
For a full-day summit (7 hours) in 6 languages: 420 × 6 × $0.09 = $226.80
For a 20-minute demo in 1 language: minimum applies = $15.00

View transaction history

The Billing page shows:

Your current wallet balance
All top-up transactions with date, amount paid, and amount credited
Per-event charges with event name, date, duration, languages, and cost
Bonus credits received from Pro and Scale plan top-ups

To download a receipt for any top-up, click the transaction row and use the Stripe receipt link.

Refunds and billing disputes

Refunds are provided where required by law or approved by us on a case-by-case basis. Because a portion of costs are passed directly to AI providers (OpenAI, Google), fees already consumed by those services may be non-refundable.

EU consumers retain a 14-day withdrawal right for distance contracts, though this may not apply once digital services have been fully performed.
If you believe you were charged incorrectly, email billing@tolk.fm with your event ID and a description of the issue.
Disputes related to Stripe card charges should be raised with Stripe first, as they handle payment processing.

Privacy

Privacy & security

How Tõlk.fm handles event audio, organiser data, listener data, and GDPR obligations.

How long is event audio retained?

Data	Default retention
Live audio streams	Processed transiently for real-time translation. Not stored by default.
Uploaded media (scheduled playback)	Retained until you delete it or close your workspace.
Generated translated audio (prepared)	Retained until you delete it or close your workspace.
Captions and transcripts	Retained if the feature is enabled; otherwise not stored.
Event metadata (title, languages, timestamps)	Retained for billing, troubleshooting, and event history.
Usage logs and billing records	Retained for the statutory accounting period required under Estonian and applicable law.

If you need data deleted sooner, contact privacy@tolk.fm.

Data shared with OpenAI and Google

Audio and metadata are sent to the AI provider you select for translation. We use business-tier APIs with data-processing agreements — your data is not used to train their models.

Provider	What is shared	Data-processing posture
OpenAI	Audio stream, language settings, session metadata, generated translations.	Processed under OpenAI API data processing terms. Prompts and outputs are not used to train OpenAI models when using the API.
Google Gemini	Audio stream, language settings, session metadata, generated translations.	Paid Gemini API terms state prompts and responses are not used to improve Google products. Processed under Google's data processing addendum.

For full details, see the Privacy Policy — AI providers section.

What data do we collect from listeners?

Listeners join without creating an account or logging in. The data collected is minimal and limited to what is needed to provide the stream.

Join code and event identifier
Selected language channel
Playback session status (connected, disconnected)
Device and browser type (user agent)
IP address and approximate location from network data
Connection logs for troubleshooting

Listener data is not sold or shared with third parties beyond the infrastructure providers (Supabase, Vercel) that host the service.

Organiser GDPR obligations

When you use Tõlk.fm at your event, you are responsible for the lawful processing of your attendees' personal data. Key obligations:

Inform attendees that audio is streamed to AI providers for real-time translation — verbally at the start of the event and in writing in the event programme or venue signage.
Obtain required consents or provide a legitimate-interest assessment for recording or streaming audience audio, particularly for open-mic sessions.
Do not capture children's voices without documented parental consent and local safeguarding compliance.
If your event is in a jurisdiction that requires advance disclosure of AI-generated content, ensure notices are in place before the event starts.
Provide human interpreter alternatives where required by local accessibility, employment, or venue regulations.

Local laws vary significantly. Consult legal counsel for events in regulated industries or jurisdictions with strict recording or AI-disclosure laws.

Support

Troubleshooting

Diagnose and fix the most common issues with live translation, audio, listener access, and billing.

Translation stopped or froze mid-event

Check your internet connection and confirm the organiser device is still online.
Check your wallet balance on the Billing page. Translation stops if the balance reaches zero.
In the control room, look at the event error log at the bottom of the runtime view for any error messages.
Try stopping and restarting the affected language channel.
If the issue persists, switch to the other translation engine (OpenAI ↔ Gemini) — provider-side outages are rare but do occur.
If nothing resolves it, email hello@tolk.fm with the event ID from the dashboard URL.

Choppy or cutting-out audio for listeners

Common causes and fixes:

Cause	Fix
Weak listener Wi-Fi or mobile data	Ask listeners to move closer to a Wi-Fi access point or switch to mobile data.
Organiser device on weak Wi-Fi	Switch the organiser device to wired Ethernet or a dedicated mobile hotspot.
Congested venue Wi-Fi	Use a separate SSID for the event, or ask the venue IT team to allocate a dedicated channel.
Too many devices per access point	Coordinate with the venue to ensure adequate Wi-Fi coverage density for the expected audience size.

Translation latency is very high

A few seconds of latency is normal. Latency above 10–15 seconds warrants investigation.

Network latency: check the organiser device's connection speed and ping. A round-trip time above 150 ms to the media worker increases translation latency.
AI provider load: rare, but if a provider is under heavy load globally, translation latency increases. Try switching engines.
Audio buffer issues: if the audio feed has long silences or is very slow, the model may buffer more aggressively. Ensure a consistent, well-paced audio input.

If latency is consistently above 10 seconds and switching engines does not help, contact support with the event ID.

Microphone not detected in browser

The browser requires explicit permission to access the microphone. If you accidentally denied it, reset it as follows:

Chrome: click the lock icon in the address bar → Site settings → Microphone → Allow → reload the page.
Firefox: click the lock icon → clear permissions → reload and re-grant when prompted.
Safari: open Settings → Websites → Microphone → find tolk.fm → set to Allow.
iOS Safari: open Settings → Safari → Camera & Microphone Access → enable for tolk.fm.

If using an external USB audio interface, make sure the correct device is selected in the audio input dropdown in the event control room, not the built-in microphone.

Event won't start — low wallet balance

Tõlk.fm requires a minimum wallet balance of $15.00 before a live event can be started. This covers the per-event minimum charge.

Go to the Billing page.
Top up your wallet with at least $15, or more if the expected event cost is higher.
Return to the event control room and try starting again.

Use the cost estimator to add enough funds to cover the full expected duration. The event will stop translating mid-session if the balance runs out.

Find your event ID for support requests

Every event has a unique ID visible in the URL of the event control room:

tolk.fm/events/[event-id]

Include this ID in any support email so the team can locate your event quickly in server logs and billing records.

Help Center

Getting started

Event modes

Audio & sound setup

Languages & translation engines

Billing & credits

Privacy & security

Troubleshooting

Bring Tõlk.fm into your applications

Programmatic control

Public listener links

Automated workflows

Talk to us