For event agencies, marketing/HR and the public sector

How to successfully subtitle videos and livestreams

Michael Westphal

Michael Westphal

Reading progress

The subtitling of videos can be done manually and thus laboriously, or via automation. Artificial intelligence (AI) is used in the automated translation of speech to words. Depending on the system, AI is up to 70 times faster than a human. While humans already perform error correction line by line when creating texts, it may be necessary to perform manual correction once again when using AI.
 
How automatic subtitling works
 
The automatic subtitling works via a process called speech-to-text. In simplified terms, it works like this: what is spoken is analyzed and a kind of "audio footprint" is created. Only one word is assigned to this pattern. Many stored patterns make up the system's vocabulary. During recognition, the system now checks which recognized audio patterns match those in the vocabulary. It also takes into account the logic that certain words are often accompanied by certain other words. An article is often followed by a noun.It would be rather wrong to append an article to another article. In order for these systems to achieve optimal results, they must be trained. This is done by inputting corrected material.
 
The working steps
 
1. upload video, choose language, create subtitles
2. edit the text of the subtitles with an online editor, if necessary
3. embed video with subtitles so that they are displayed immediately
 
Subtitling target group
 
Disabled people
As part of accessibility, viewers with disabilities are entitled to subtitles. Public bodies are even obliged to offer their content in an accessible manner so that people with disabilities are not deprived of information. In public service media, much of the content is now subtitled.
 
Mobile users
Mobile users are often on the move with the sound turned off. Here, subtitles help to convey the content for hearing people as well.
 
Limitations of automatic subtitling
 
Music
Background noise is disturbing when recognizing speech. Recognition becomes almost impossible with music as background noise. The recognition rate of the translation from speech to text can drop to zero here, and the quality of the subtitles suffers significantly.
 
Dialects
Dialects are to be regarded as a language in their own right. And even within the dialects, there are numerous variants that make recognition difficult. Since the creation of language models is very time-consuming, the industry has concentrated on the most important languages of the world. There are, after all, 192 languages in total. This is also a reason why English with approx. 900 million English understanders offers a better recognition than German with approx. 130 million German understanders.
 
First video to be subtitled

en_USEnglish