Transcription format: Our guide

Using a suitable transcription format for your videos and audio will save you time and money.

This choice will allow your content to be completely readable, and you can easily export and share it with audiences and work or study teams.

That is why we invite you to continue reading this post and learn what it consists of, its importance, the suggested document transcription format and the most commonly used files.

Why is it important to use the proper transcription format?

transcript format

The transcription format you choose to use is essential because it ensures more readable and understandable documents:

  • Reflects professionalism

Everything you do and how you do it reflects you as a person, professional or student. The transcription format you use for podcasts, interviews, memos, and more shows your appreciation for the content.

  • It adds to your brand.

Having a consistent look and feel is part of your identity; it's about being consistent and doing everything in the best possible way and with style.

  • Facilitates collaborative work

Interdependence is a reality; we always share information no matter where we are. This will be much easier if you use the right transcription format.

How to format a transcript?

Transcription converts a video or audio recording into text and has powerful benefits. Depending on the type of transcription, it can include all content and sound effects (verbatim transcription) or just the dialogue.

In general, transcripts are edited, meaning omitting irrelevant information or modifying specific parts to achieve more readable content.

types of transcription fotmats

Once the audio or video has been converted into text and the transcript has been edited, it is time to format the content. This consists of obtaining a final draft that is legible and that responds to the destination and objectives you want to achieve.

The following are the most frequently used document and file formats.

Document formats

In the transcription document, it is recommended that the following elements be taken into consideration:

  • Font type and size: if transcribed in Word, use Times New Roman or Calibri, size 11 or 12 points.
  • Paragraph length and headings: always add a heading and subheadings to the transcript; if the content is long, it is divided into paragraphs of 400 to 500 characters each and indentation is used.
  • Voice tags: the identification of the name or function of each speaker; in case of not having the information (or preferring anonymity), generic tags such as "Speaker 1" or "S1" are used. They are written in bold; if you are part of an organization, use its style.
  • Inaudible tags: insert a tag to refer to the impossibility of hearing the speaker's words and a crosstalk tag when two speakers speak simultaneously. Example: [inaudible 03:19] and [crosstalk 03:30].
  • Timestamps identify parts of the audio or video in which dialogue or a specific event occurs. These are the accepted ways to insert a timestamp (include examples).
  • Spelling: if writing in American English, use the American spelling, and if writing in British English, use the UK spelling.
  • Sounds: identify background sounds and non-speech elements in square brackets, for example, [laughter].

File formats

We will review the most commonly used transcription file formats and their main features.

transcription format

Plain text transcription formats

Among these formats are a series of simple versions of your audio or video, with no or very little structure, the best known of which are:

  • Plain text files (.txt)

TXT stands for text and is the most basic transcription file type; it is trendy because of its compatibility with all word processing and editing applications. TXT files can store note information, source codes, configuration data, or plain text data.

  • Microsoft Word Document (MS Word .docx)

Transcripts made in this file format are easy to format as you like and edit, and most people can open them using Word or other compatible software.

  • PDF

PDF files work well because they are widely compatible and can be opened on all devices. However, be aware that they are difficult files to manipulate or edit.

  • RTF

It stands for Rich Text Format, another file format compatible with word processing applications.

Timestamped transcription formats

This is a variety of formats that allow transcriptions to be made with time markers. Thus, it is possible to specify what was said at each moment of the audio or video; time stamps can have different formats, including seconds, minutes or hours.

The following are the most commonly used file formats.

  • Microsoft Word Document (.docx)

You can also use Microsoft Word to produce your timestamped content transcripts. As mentioned, it is extremely easy to use and edit file types.


In addition to the timestamps, they include frame rate and frame-accurate labelling. The timestamps in this file format are displayed in hour:minute:second: frame.

The frame rate refers to how many frames appear per second, and the frame label refers to the number of frames within that second.

The advantage of using SMTPE is how easy it will be to synchronize your transcript with the video and audio content.

  • TER

It is another transcript file format that includes timestamps for the beginning and end of each "title." it refers to each fragment of written text that should appear on the screen at a specific interval.

A widespread format for video transcripts is SRT files (stands for Secure Reliable Transport). Note that when using them, you will not be able to customize the color or format of the subtitles.

  • VTT

Also known as WebVTT, it is widely used for video subtitles because it includes metadata about your video. Additionally, you can choose how to format the subtitles by choosing font and text color.

  • HTML

This format is handy for uploading the transcript to a website because it is essentially an HTML version of your content. This means it is optimized for easy opening and accessing on the web. You can also find HTML transcripts perfectly adapted to the screen reader format.

  • TTML (o .xml)

It is a standard timestamped format for subtitles. It is supported in many web applications, such as YouTube, and some streaming services, including Netflix.

  • JS y JSON

These formats have timestamps in an even smaller size, are widely used encoding languages and are entirely compatible. JS and JSON are excellent choices if you want versatility and the ability to use your transcript in all applications.

Each Word is assigned a timestamp, facilitating a perfect video, audio and text alignment. However, this type of file format is not as suitable for downloading.

  • CSV

With CSV, you format your transcript in a table, divide the information into separate columns, and present the transcript line by line.

Finding a transcription tool that generates CSV format is difficult, so you need to convert it from a plain text format.


We have presented you with our transcription format guide. Choosing the most suitable depends on how and where you want to use the information. Ideally, you should have transcription software that offers a wide range of formats.

At ScriptMe, we offer a quality transcription service that is easy to use, fast and affordable. Regarding file formats, we suggest exporting and sharing your transcripts in Avid, Adobe, Resolve, Office Word, SRT, VTT and EBU-STL for subtitles.

Emil Nikkhah
Hi, I'm Emil Nikkhah.
For over 25 years, I have successfully built several television post-production companies.
Frustrated by the high cost and slowness of manual transcription and subtitling in this industry, we decided to create ScriptMe, a powerful software for automatic transcription and subtitling.
Share in your social networks:

Leave a Reply

Your email address will not be published. Required fields are marked *

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram