There is no easier and faster solution to create more accurate automated subtitles than contextual machine-translated subtitles. In this article, we'll explore what contextual machine-translated subtitles are and how they can benefit media companies and content creators. We'll also discuss the process for creating contextually accurate automated subtitles and review the accuracy of AI translation provided by SyncWords.
Accuracy problems of machine-translated subtitles
No doubt that machine translations are improving in accuracy aggressively thanks to deep learning, but it’s still not perfect and has room for improvement. Usually, the accuracy rate of machine-translated subs variates from 50 to 80% depending on the accuracy of the input transcript. Another issue that affects the accuracy is the alignment of segments in the timeline and that they are translated segment by segment using the initial source language timed caption file. With CMT, we’re addressing these problems with an all-in-one solution.
What are contextual machine-translated subtitles?
Contextual machine-translated subtitles are subtitles created using machine learning and natural language processing (NLP) technologies and alignment algorithms. This allows them to be translated from one language to another with a higher degree of accuracy than typical automated machine translation solutions. SyncWords considers the context of the whole media instead of sentence-by-sentence translation. It helps to ensure that the subtitles are accurate and of a higher quality than those created using traditional sentence-by-sentence translation methods.
How does it work?
NOTE: Input quality determines output quality. To get the most accurate translation, you must ensure the media is audible and the base language transcript or caption file is correct.
Step 1: Create a project
Press “Create Project” on your dashboard and then press “Order Translations”
You need to have an accurate transcript or a caption file in the source language of the media. If you have no transcript, we can create one with the help of our professional transcriptionists, or you can use ASR, (this may decrease the quality of the transcript if speakers have heavy accents, the subject matter is highly technical or the quality of the audio is not great).
Step 2: Select translation when creating the project or running a translation in an existing project.
Once the media file and transcript are in place, our system can translate it using the best available neural engine for this specific language pair. When the translation is ready, our proprietary alignment automation will segment the text and apply timing according to the source speech and properties you apply for the subtitles file.
You can now edit or export your subs in any format.
The SyncWords API is only available by request. To get access to our API reach out to request access.
How do you improve the accuracy of foreign subtitles using contextual machine learning?
Classic machine translators can often make mistakes in translating words or phrases, or they may not be able to accurately capture the context of the whole document because they translate sentence by sentence or segment by segment.
Capturing the context is much easier when you translate the whole document, but this creates another problem: segmenting the translation correctly and applying to these segments the corresponding start/end anchors in the media timeline. This is where our outstanding alignment algorithm does its magic.
While contextual machine-translated subtitles are generally more accurate than traditional methods, there are still some accuracy issues that may arise. To reach 99.99% accuracy, it’s important to have the subtitles proofread by a native speaker of the language before they are published. This will help to ensure that any errors are caught and corrected before they reach a wider audience.
Can I use translation glossaries along with CMT?
Yes, but for some language pairs the glossaries may not be currently available. We’re working on providing more support for various language pairs to be used with glossaries.