webVTT files for Video Captions or Descriptions Last Updated October 2025 A WebVTT or Web Video Text Track is a file format for displaying timed text, such as subtitles, captions and audio description for video content on the internet. The file type can be added to a video file in the HTML code or added to a Vimeo or YouTube video. This can be very confusing so please contact the College of Fine Arts web team at the cofawebmaster@austin.utexas.edu email address, if you have any questions. On this page: File Kind and Audience Create Your File Components of the File Understanding Time Codes Validate Your File Examples File for an Ambient Video Description File for Video with Music File for Video with Text built in as overlay File with Multiple Cues File with Optional Header File with Advanced Time Code Settings Styles & Questions Learn More File Kind & Audience The file type is WebVTT but there are different kinds of WebVTT files for different scenarios. A single video can have more than one kind of WebVTT file. “Captions” kind are the most common. Captions are presented to visitors as text they can read. They benefit audience members who cannot hear the video, for example when it is muted or the visitor is deaf. They also benefit audience members who want it translated into another language and the Captions kind will also be read out loud for audience members unable to see the video. Captions are usually timed to the dialog and sounds in the video. In addition to sounds they can include other cues about what is happening, such as describing who is speaking, on screen actions, background noises, including telling you when a speaker or a sound is coming from off-screen, or if text is overlaying the video. If there is any text in the video, it will need to be in the Caption so that it can be translated or heard by someone who cannot see the video. The convention is to use brackets around the additional information that is not being said by the person on screen. Videos with sound must also have a way for the sound to be turned on and off. If the video has any sound, even if it is just music, then a Captions or Subtitles track must be included to tell the visitor about the sound. A visitor will know there is an off on button so they need to know what it does. At the very least you must tell them there is music. It’s best to tell what you know about the music and why it’s included, if possible. “Descriptions” kind of WebVTT file summarizes what’s happening visually in the video. They are most common in videos that have no sound. Descriptions are heard by audience members unable to see the video for reasons, such as, they do not have a screen or they are blind. Descriptions file type are not needed, if the Captions file type is good enough. “Subtitles” kind is very similar to Captions, but usually just includes dialog and is primarily for translation. We prefer Captions. “Chapters” can be used in some in some interfaces to navigate the video. “Metadata” is used by programs and is not visible to the user. The track kind is identified in the track tag of the video source code, for example: <video controls src="my.mp4"> <track default kind="captions" src="my.vtt" /> </video> Create Your File Writing the captions and descriptions is the hard part. Follow these simple steps to create the file. The file you create must be plain text with .vtt as the extension. Create the WebVTT file in your favorite text editor (for example TextEdit on Mac). To get started, you can copy one of the examples below into your file and update the timecode and text to be appropriate for your video. When you are finished, save your file with the extension .txt. Then change the file’s extension from .txt to .vtt. Finally, change the file to plain text in Textedit: Go to Format -> Make Plain Text. Components of the File Required first line of a webVTT file A basic WebVTT file needs to start with the string WEBVTT at the top of your document. Cues Each webVTT contains block(s) called cues. A basic Cue contains four things: A cue identifier to help you organize your captions, such as sequential numbers or letters. Time code for where in your video’s timeline your text is to be displayed. The actual text that is displayed on the screen. See Tips for Writing Captions & Audio Description There needs to be a blank line in between each subtitle block. NOTE that you can have multiple text cues in a single time range. Optional header To help you visually organize multiple WebVTT files, an optional header can be added to the left of the initial WEBVTT string. The only character restriction is that your header cannot contain the following string of characters, which is reserved for cues only: ---> So, if you are going to use a header it is best to just give it a dash as demonstrated in the example with the optional header below. Understanding Time Codes The time codes are displayed as hours, minutes, seconds and millli-seconds rounded to three digits, 00:00:00.000 (hh:mm:ss.fff). Hours can be optional. Since hours can be optional the times are frequently displayed as 00:00.000 (mm:ss.fff). The first time code, presented before -->, represents when the text should appear on the screen. The second time code, presented after -->, the end time. Be sure to provide ample time for visitors to read each cue. Time codes for cues can overlap. Validate Your File You can validate your file on this site at no cost to check it: https://tools.igem.org/wiki/vtt-validator Examples File for an Ambient Video Description Many of our websites have a home page video that is ambient and meant to convey a concept or feeling rather than be informative. Don’t share unimportant details. In this case you can use a Descriptions kind of track to describe the concepts that the video is meant to share. It’s great when you can share more. You can include more than one time cue for various concepts shared in the video. Hopefully, you will have more emotive things to share about the video than the below example. But, consider the surrounding content and do not be redundant with other text on the page. It’s okay to keep it simple and have only one time cue for the duration of your video. WEBVTT 1 00:00:00.000 --> 01:15.000 Several clips from recent productions, events and courses. A celebration of the vibrant community of our Department and the work that we do. File for Video with Music If your video has music then you must include a Captions kind of track, even if there is no other sound in the video. You can include additional cues for other types of information, see the example for a File with Multiple Cues below. You can also include a Descriptions kind of track. If appropriate, have one time cue describing the music for the duration of your video. WEBVTT 1 00:00:00.000 --> 00:30.000 [Digital music created by students in the AET 339 Video Game Audio course plays for the duration of the video.] File for Video with Text Built-in as Overlay If your video has text built in and overlaying the video, you will include the text in a Captions kind of track. You can include additional cues for other types of information, see the example for a File with Multiple Cues below, and you can also include a Descriptions kind of track. WEBVTT 1 00:00:30.000 --> 01:00.000 [Text over the video says, Welcome to the College of Fine Arts!] File with Multiple Cues WEBVTT 1 00:00:00.000 --> 00:15.000 [Opens with music from Symphony No. 2 for Wind Ensemble] 2 00:00.000 --> 00:15.000 [Dancers on stage to music] 3 00:30.000 --> 00:45.000 [Text over the video says, Welcome to the College of Fine Arts!] 4 00:45.000 --> 01:00.000 Dean Doty [walks on screen]: Welcome Students! File with Optional Header WEBVTT -En Vogue, My Lovin' (You're Never Gonna Get It), Funky Divas, 1992 1 00:01.000 --> 00:04.000 Now you promise me the moon and stars 2 00:05.000 --> 00:09.000 Save your breath, you won't get very far (Oooh, bop..) File with Advanced Time Code Settings Using the following method you can get very specific with where in your video’s timeline you’d like specific text to appear. WEBVTT 00:00.000 --> 00:07.000 This <00:01.000>text <00:02.000>will <00:03.000>appear <00:04.000>over <00:05.000>6 <00:06.000>seconds. Styles & Questions The style of the text on the screen can be set in the cue block but most styles will be determined by the website style guides and the CSS for your website. If you would like to change the styles or have questions about getting started with WebVTT files, please contact your College of Fine Arts Web Team at cofawebmaster@austin.utexas.edu. Learn More Find our tips for how to write subtitles, captions and audio description . Learn more about the Web VTT API from MDN, an open-source, collaborative project owned by Mozilla Corporation and developed by Mozilla, in partnership with a global community of volunteers and partners. Visit the The World Wide Web Consortium (W3C) WebVTT or Web Video Text Track Draft Community Group Report, from March 2025.