• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
UT Shield
College of Fine Arts Web instruction
  • Welcome
  • About Us
    • WordPress & Drupal
    • Websites & Content Owners
  • Getting Started
    • Learn Best Practices for Web
    • Vocabulary Terms
    • Do you need a website?
    • Ways to Get a Website
      • Website Options Comparison
      • Initiate a Project with the Web Team
      • Outsource or DIY
    • Know Your Audience
    • Navigation & Menu Items
  • Requirements
    • Americans with Disabilities Legal Obligations
      • Alternate Text
      • Use of Color & Contrast
      • Link Text & Appearance
        • Non-Visible Link Context
      • PDF Files
      • Software Plug-in Link
      • Captions, Audio Description & Transcripts
        • webVTT Caption File
        • Writing Captions & Audio Description
        • Video & Remote Communications
      • Additional ADA Issues
        • Accessible Social Media
        • Video & Remote Communications
    • UT Web Accessibility Policy
    • UT Web Privacy Policy
    • Records Retention – Backups, Archives and Revisions
    • College of Fine Arts Brand
    • UT Brand Guidelines
    • Information Security Office Policies
  • Best Practices
    • College of Fine Arts Communications Written Style Guide
    • Using a Rich Text Editor
      • Using an External HTML Editor
    • Links & URLs
      • Link Text & Appearance
        • Non-Visible Link Context
      • Absolute vs. Relative Paths
      • HTTP vs HTTPS
      • Anchor Links
      • Changing URLs
      • Shorten a URL for Marketing
    • Images
      • JPG vs PNG
      • Image Editing and Troubleshooting
    • Web Forms & Surveys
      • Create a Link to Pre-Filter a Web Form Page
    • Standards & Structure
      • Hierarchical Headers
      • Computer File Names
      • URL address names
        • UT Short Links Service
      • HTML
        • Common HTML Errors
        • Multiple Rows with Floated Images
        • Responsive Word Wrapping
    • Artificial Intelligence
    • Test & Evaluate
    • Wikipedia
    • Social Media & Networking
      • Accessible Social Media
    • RSS & Feeds
    • UT Austin Web Publishing Guidelines
  • Site Performance
    • Page Load Time
      • Fonts
    • Respond to a Variety of Devices
      • Telephone Numbers on Smart Phones
    • Analytics & Visitor Data
      • Outreach Campaign Link Tracking
        • Campaign URL Builder
        • Campaign Link Tracking Worksheet
    • Page Not Found Traffic
    • Tips to Increase Traffic
    • Optimize for AI Chat and Search
  • Site Specific Instructions
    • Instructions for COFA sites on University Blog Services WordPress
    • Instructions for COFA Drupal Kit sites
    • Instructions for COFA sites on UT Drupal Kit Managed

webVTT files for Video Captions or Descriptions

Last Updated October 2025

A WebVTT or Web Video Text Track is a file format for displaying timed text, such as subtitles, captions and audio description for video content on the internet. The file type can be added to a video file in the HTML code or added to a Vimeo or YouTube video.

This can be very confusing so please contact the College of Fine Arts web team at the cofawebmaster@austin.utexas.edu email address, if you have any questions.

On this page:

  • File Kind and Audience
  • Create Your File
    • Components of the File
    • Understanding Time Codes
  • Validate Your File
  • Examples
    • File for an Ambient Video Description
    • File for Video with Music
    • File for Video with Text built in as overlay
    • File with Multiple Cues
    • File with Optional Header
    • File with Advanced Time Code Settings
  • Styles & Questions
  • Learn More

File Kind & Audience

The file type is WebVTT but there are different kinds of WebVTT files for different scenarios. A single video can have more than one kind of WebVTT file.

“Captions” kind are the most common. Captions are presented to visitors as text they can read. They benefit audience members who cannot hear the video, for example when it is muted or the visitor is deaf. They also benefit audience members who want it translated into another language and the Captions kind will also be read out loud for audience members unable to see the video.

Captions are usually timed to the dialog and sounds in the video. In addition to sounds they can include other cues about what is happening, such as describing who is speaking, on screen actions, background noises, including telling you when a speaker or a sound is coming from off-screen, or if text is overlaying the video. If there is any text in the video, it will need to be in the Caption so that it can be translated or heard by someone who cannot see the video.

The convention is to use brackets around the additional information that is not being said by the person on screen.

Videos with sound must also have a way for the sound to be turned on and off. If the video has any sound, even if it is just music, then a Captions or Subtitles track must be included to tell the visitor about the sound. A visitor will know there is an off on button so they need to know what it does. At the very least you must tell them there is music. It’s best to tell what you know about the music and why it’s included, if possible.

“Descriptions” kind of WebVTT file summarizes what’s happening visually in the video. They are most common in videos that have no sound. Descriptions are heard by audience members unable to see the video for reasons, such as, they do not have a screen or they are blind.

Descriptions file type are not needed, if the Captions file type is good enough.

“Subtitles” kind is very similar to Captions, but usually just includes dialog and is primarily for translation. We prefer Captions.

“Chapters” can be used in some in some interfaces to navigate the video.

“Metadata” is used by programs and is not visible to the user.

The track kind is identified in the track tag of the video source code, for example:
<video controls src="my.mp4">
<track default kind="captions"  src="my.vtt" />
</video>

Create Your File

Writing the captions and descriptions is the hard part. Follow these simple steps to create the file.

The file you create must be plain text with .vtt as the extension.

  1. Create the WebVTT file in your favorite text editor (for example TextEdit on Mac).
  2. To get started, you can copy one of the examples below into your file and update the timecode and text to be appropriate for your video.
  3. When you are finished, save your file with the extension .txt.
  4. Then change the file’s extension from .txt to .vtt.
  5. Finally, change the file to plain text in Textedit: Go to Format -> Make Plain Text.

Components of the File

Required first line of a webVTT file

A basic WebVTT file needs to start with the string WEBVTT at the top of your document.

Cues

Each webVTT contains block(s) called cues. A basic Cue contains four things:

  1. A cue identifier to help you organize your captions, such as sequential numbers or letters.
  2. Time code for where in your video’s timeline your text is to be displayed.
  3. The actual text that is displayed on the screen. See Tips for Writing Captions & Audio Description
  4. There needs to be a blank line in between each subtitle block.

NOTE that you can have multiple text cues in a single time range.

Optional header

To help you visually organize multiple WebVTT files, an optional header can be added to the left of the initial WEBVTT string. The only character restriction is that your header cannot contain the following string of characters, which is reserved for cues only:

--->

So, if you are going to use a header it is best to just give it a dash as demonstrated in the example with the optional header below.

Understanding Time Codes

The time codes are displayed as hours, minutes, seconds and millli-seconds rounded to three digits, 00:00:00.000 (hh:mm:ss.fff). Hours can be optional. Since hours can be optional the times are frequently displayed as 00:00.000 (mm:ss.fff).

The first time code, presented before -->, represents when the text should appear on the screen.  The second time code, presented after -->, the end time.  Be sure to provide ample time for visitors to read each cue. Time codes for cues can overlap.

Validate Your File

You can validate your file on this site at no cost to check it: https://tools.igem.org/wiki/vtt-validator

Examples

File for an Ambient Video Description

Many of our websites have a home page video that is ambient and meant to convey a concept or feeling rather than be informative. Don’t share unimportant details.  In this case you can use a Descriptions kind of track to describe the concepts that the video is meant to share. It’s great when you can share more. You can include more than one time cue for various concepts shared in the video.  Hopefully, you will have more emotive things to share about the video than the below example. But, consider the surrounding content and do not be redundant with other text on the page. It’s okay to keep it simple and have only one time cue for the duration of your video.

WEBVTT
 
1 
00:00:00.000 --> 01:15.000
Several clips from recent productions, events and courses. A celebration of the vibrant community of our Department and the work that we do.

File for Video with Music

If your video has music then you must include a Captions kind of track, even if there is no other sound in the video. You can include additional cues for other types of information, see the example for a File with Multiple Cues below. You can also include a Descriptions kind of track.  If appropriate, have one time cue describing the music for the duration of your video.

WEBVTT
 
1 
00:00:00.000 --> 00:30.000
[Digital music created by students in the AET 339 Video Game Audio course plays for the duration of the video.]

File for Video with Text Built-in as Overlay

If your video has text built in and overlaying the video, you will include the text in a Captions kind of track. You can include additional cues for other types of information, see the example for a File with Multiple Cues below, and you can also include a Descriptions kind of track.

WEBVTT
 
1 
00:00:30.000 --> 01:00.000
[Text over the video says, Welcome to the College of Fine Arts!]

File with Multiple Cues

WEBVTT
 
1 
00:00:00.000 --> 00:15.000
[Opens with music from Symphony No. 2 for Wind Ensemble]
 
2 
00:00.000 --> 00:15.000
[Dancers on stage to music]

3 
00:30.000 --> 00:45.000
[Text over the video says, Welcome to the College of Fine Arts!]
 
4
00:45.000 --> 01:00.000
Dean Doty [walks on screen]: Welcome Students!

File with Optional Header

WEBVTT -En Vogue, My Lovin' (You're Never Gonna Get It), Funky Divas, 1992
 
1 
00:01.000 --> 00:04.000
Now you promise me the moon and stars 
 
2 
00:05.000 --> 00:09.000
Save your breath, you won't get very far (Oooh, bop..)

File with Advanced Time Code Settings

Using the following method you can get very specific with where in your video’s timeline you’d like specific text to appear.

WEBVTT
 
00:00.000 --> 00:07.000
This <00:01.000>text <00:02.000>will <00:03.000>appear <00:04.000>over <00:05.000>6 <00:06.000>seconds.

Styles & Questions

The style of the text on the screen can be set in the cue block but most styles will be determined by the website style guides and the CSS for your website. If you would like to change the styles or have questions about getting started with WebVTT files, please contact your College of Fine Arts Web Team at cofawebmaster@austin.utexas.edu.

Learn More

Find our tips for how to write subtitles, captions and audio description . Learn more about the Web VTT API from MDN, an open-source, collaborative project owned by Mozilla Corporation and developed by Mozilla, in partnership with a global community of volunteers and partners. Visit the The World Wide Web Consortium (W3C) WebVTT or Web Video Text Track Draft Community Group Report, from March 2025.

Primary Sidebar

Open Office / Co-Working Sessions

Second Monday of the month:
11 a.m. – 12:30 p.m.

Fourth Thursday of the month:
2 p.m. – 3:30 p.m.

Email us for the Zoom link.

College of Fine Arts WordPress Instruction
College of Fine Arts Drupal Instruction
Subscribe to our email list to receive periodic tips and resources for generating great digital content

Contact Us

The COFA Web team can be reached by emailing us at cofawebmaster@austin.utexas.edu


To view some links on this page you may need to download Acrobat Reader.

UT Home | Emergency Information | Site Policies | Web Accessibility | Web Privacy | Adobe Reader

© The University of Texas at Austin 2025