Accessible Video and Audio
Multimedia—audio and video content—represent some of the most compelling materials in the online-education revolution. Multimedia content allow students to read, listen, watch, and study course materials at a pace and time that accommodates their schedules and preferred modes of learning.
Our goal in this section is not to evaluate the merits of multimedia in online pedagogy, however, but rather to make content designers and instructors aware of some of the potential barriers and emphasize the potential benefits of multimedia in online education for students with disabilities.
While multimedia may benefit the learning experience for all students, including those with disabilities, a lot of multimedia that we have seen in online education is simply inaccessible. Accessibility considerations need to be well integrated into the design and implementation of multimedia content in order to be effective for students with disabilities. Moreover, we believe you will find that the extra effort not only satisfies the necessary educational mission of universal usability but that the educational experience of students who have no identified disabilities is enriched and improved.
There are four significant accessibility issues commonly associated with multimedia delivery:
- Inadequate delivery formats
- Lack of text transcription for audio
- Lack of synchronized captioning for video
- Lack of audio description for video
We address all of these issues in the advice sections, below.
General Multimedia Requirements
Ensure that all major audio and video controls are accessible to the keyboard-alone and can be accessed and understood by a person using a screen reader.
Discussion
A basic goal of web-delivered multimedia is universal play-ability. You want your video and audio to be easily viewed and heard, and you want students to not have to concern themselves with installing special video or audio players. This fact narrows your choices for video and audio formats somewhat. For example, there is no native Windows Media Player for Linux-based machines. Linux users can typically find alternatives—and most Linux users are used to such work-arounds—but the burden is exacerbated for users with disabilities, no matter what browser or operating system they happen to be using. This indicates that we should probably take a least common denominator approach in attempting to provide accessible audio and video. We should target media formats that are most likely to play equally well and be equally accessible, regardless of browser or operating system.
So, what to choose? Ultimately, the choice will likely be a function of the skills of the IT personnel in your institution, the capabilities of your learning management system and/or guidelines for content, and your own skills and level of comfort as content designers and instructors. Given these constraints, we still believe it is possible to make recommendations: For audio, MP3 format is supported most widely. For video, we recommend Flash. This is not to say that audio formats such as WAV or OGG are inferior, or that MOV or WMV video formats are not as capable as Flash. However, World Wide Web usage statistics and player capabilities argue in favor of Flash and MP3.
If you need video that can be downloaded and viewed, Flash is probably not a very good choice. But for web delivered video, it is by far the most widely supported video format and, more important to our concerns, it can be made entirely accessible to keyboard, screen reader, and for the deaf and hard of hearing.
For audio, if you are going to allow for download, there is little argument that MP3 is the best choice. It is supported by virtually all playback software and hardware. If you are going to be embedding the player within the web page, then a Flash control that plays MP3 is a good choice.
For captioning, Flash also provides the best experience for web page-embedded players. The text of the captions can be kept editable and separate from the video to facilitate video search. And there are ready-made players that provide good keyboard and screen reader accessibility while delivering highly legible synchronized captions. For particular recommendations, see the Resources sections under Audio and Video below.
One problem with Flash-based content in web pages, however, is keyboard and screen reader accessibility. We discuss this also in the section on non-HTML dynamic content. Here, suffice it to say that not all web browsers are able to activate Flash in a web page using the keyboard alone. And screen reader users report that Flash controls are often difficult to use. You should always test your Flash-based media players in multiple browser/operating-system combinations. And below we try to recommend some good options.
One final note before giving particular recommendations. We have had to be highly selective in our advice in this section. We believe we offer intelligent and appropriate choices. But multimedia content is far from a discrete area of concern. It is vast and opinions are rife on which are the best technologies and means of implementation.
We hope the sections on this page will provide guidance and a sense of what accommodations need to be in place for accessibility. However, in no way do we consider this survey of technologies and methodologies comprehensive.
Resources
- DO-IT Video Search — An example of video search capability from DO-IT Washington
- Adobe Flash Player Statistics — 99.0% of Internet-enabled desktops support Flash
Text Transcriptions of Audio
Provide full and accurate text transcriptions of all audio-only content. Either the transcription can be provided on the same page as the media player or a link can be provided. Links should be contextual—that is, "link to transcript" is not as good as "transcript of July 4 2009 Chemistry 101."
Discussion
Obviously, audio content lacking easily accessible transcription has no value for deaf students. But by providing transcripts for your audio clips, you not only make your course accessible to the deaf student, you also make it easier for non-native speakers and students with cognitive disabilities who benefit from multiple modes of presentation. In addition, you make the content of your audio searchable. Raw audio is difficult to search accurately, and searching raw audio is expensive and compute-resource intensive—text search, on the other hand, is a piece of cake.
If you export your non-streaming audio as MP3, then you can simply link to the files and let the student use her own device (iPod or similar or software player). And the Flash-based players we recommend accept MP3 format audio, if you want to embed your podcasts for web-only playback.
Finally, be aware that acquiring an accurate transcript is either time-consuming and painstaking or expensive—often both. You must either pay someone a minimum of $1 a minute (typically significantly more than this for high quality and quick turn-around) or dedicate yourself to providing transcripts. Transcriptions can stretch the labor and monetary resources of programs or even institutions. They are a necessity of access, however. Because of this, the move to multimedia needs to be carefully considered, preferably in open and wide-spread conversation among administration, instructors, designers, and developers.
Resources
- Audacity — a free audio editor and recorder for Windows, Mac, and Linux
- SoundForge — a high-quality, commercial product, with good screen reader accessibility for the end-user
- NCAM's CCforFlash Player — a Flash component that can be used to playback audio in web pages
- JW FLV audio player — in-page Flash-based playback
- CastingWords — a high-quality, online transcription service with relatively low costs
Video Captioning
All video must have time-synchronized captioning that is either enabled by default or easily turned on and that has good background-foreground contrast and is properly "chunked" for easy reading. Web-page embedded and downloaded video which plays in a stand-alone player must be keyboard and screen reader navigable.
Discussion
Let's discuss the players first. Most of the video on the web is delivered via Adobe Flash-based in-page video players. YouTube and Hulu are prominent examples of Flash-based players. Flash has excellent compression and can deliver high-fidelity audio and high-resolution video without taxing bandwidth. In addition, Flash is installed in most browsers, and, as of this writing, release of Flash for mobile devices is imminent. Thus it makes sense to recommend Flash for playback of video in web pages.
The main concern with Flash from our perspective, however, is accessibility. Flash is fully accessible to the keyboard only in Internet Explorer on Windows and screen readers can struggle with Flash—it can require switching screen reader "modes," and the labeling of interface elements in Flash applications, including Flash video players, is hit or miss. The way to get around keyboard and screen reader problems is through a JavaScript API (Application Programming Interface) that "talks" to the video player, while providing regular HTML controls for keyboard and screen reader users. YouTube, for example, has an excellent JavaScript API, as does the very widely used JW FLV/Longtail Video Player. In our resources for this section, we link to these APIs and off-the-shelf implementations of these players that leverage the APIs to achieve accessibility.
Other media formats for captioned video have merits. RealPlayer, Windows Media Player, QuickTime, and iTunes all have caption capability. iTunes and M4V format is a very compelling solution given the ubiquity of the iPod, iPod Touch, and iPhone. iTunes and the videos that get synced with iPod-family devices are capable of displaying "subtitles." Originally devised for subtitling in a language other than the primary content language, the subtitle track gives you the ability to have elegant closed-captions (captions that can be displayed or not, depending on user preference) in iTunes on your Mac or PC or on your iPod-family device. The process for adding subtitles to iTunes-based video/m4v is covered in some depth on the Apple and NCAM web sites (see the Resources in this section).
So, as an instructor or course designer, you can be confident that there are readily available means for delivering Flash-based, caption-capable video and other formats in your online courses. Of course, first, you must create the captions!
As discussed in the section on audio, acquiring an accurate transcription can be costly and time intensive. Nonetheless, it is the first step in captioning web video. Without an accurate transcript, captioned video can be frustrating and misleading. We do not want clearly spoken audio to conflict with the caption track.
Beyond the transcript, the basic steps in captioning web video are:
- "Chunking" the transcript
- Adding in audio cues/sound effects and speaker change indications
- Synchronizing the captions with the video to create a timed-text file
- Associating the timed-text with the video for web playback
"Chunking" involves breaking the transcript into one-to-three-line segments of between 32 and 42 characters per line. A chunk appears in the video alone for the time of the spoken instance it transcribes. This method of display, in which a caption chunk appears and then is replaced with the next, is called "pop-on" captioning and is the display method seen all over the web. There are conventions for chunking transcripts but no set standards. The conventions dictate that chunks be as semantically complete as possible. That is, we want each chunk that appears on screen in a single instance to make sense, to be relatively understandable out of context. And you want each line in the chunks to be easy to read. Try not to split infinitives and dangle participle phrases. Overall, follow you intuition on where to break the lines and how to break chunks. For easiest readability, constrain lines to no more than 42 characters.
Once the transcript has been chunked, you will want to add in indications of speaker changes and audio cues or sound effects. The conventions for indicating speaker changes vary somewhat and you can use your judgment—or you may be using a tool, such as MAGpie, which formats the speaker changes for you. Typically we see either the name or title of the speaker surrounded by parentheses or separated from the speech by a colon: "(Mr Smith)" or "Mr Smith:" Either strategy is acceptable.
An audio cue or sound effect is a textual description of an audio event. For example, in video of a track race, the starting gun firing might get the audio cue/sound effect "bang," or in an interlude one might encounter "music plays." Audio cues/sound effects can be rendered surrounded either by parentheses or square brackets: "(wind rustling dry leaves)" or "[wind rustling dry leaves]." Typically audio cues/sound effects appear alone on a line of captioning and, where appropriate, comprise the entire caption chunk.
The Resources contain links to two prominent guidelines that give advice on presentation of sounds effects/audio cues and speaker indications. We recommend studying them. We also advise standardizing on formatting guidelines at your own institution.
Once chunked and formatted with cues and speech indications, it is necessary to synchronize the captions with the audio track in your video. Depending on your video source and target format, you will want to pick the appropriate software to facilitate synchronization. The process typically involves launching the software and importing the movie and text captions. You then play back the movie and tap a hot key to mark each chunk of captions with a time in the movie. To complete the synchronization process will typically take you as long or slightly longer than the run time of your movie. For many applications, a tool like the free MAGpie for Windows or the excellent MovCaptioner for Macintosh will allow you to fine tune text-time alignments. We will not cover any single tool in detail. There is adequate information available on the web. See the Resources for some helpful links.
Once synchronization is complete you will use the software to export your alignments. Tools such as MAGpie will export timed-text in a variety of formats. Choose the format appropriate to your targeted player. For example, for the JW FLV player, choose DFXP. Again, detailed coverage of techniques is beyond the scope of this document, but you will find helpful links in the Resources.
Resources
- Creating Captions for Rich Media — a guide from NCAM Rich Media
- NCAM caption style guide — conventions of captioning
- Accessible Digital Media — design guidelines for electronic publications, multimedia and the web
- DCMP Captioning Key for Educational Media a guide for vendors performing captioning
- JW FLV/Longtail Video Player and its JavaScript API — fundamentals of player scripting and examples and source code
- NCAM's CCforFlash player — a Flash component and caption-capable Flash video player
- YouTube JavaScript API — reference information for the YouTube JavaScript player API
- NCAM MAGpie — a free application for creating captions and audio descriptions for rich media
- MAGpie instructions from NCAM — help in using the MAGpie caption authoring tool
- WebAIM captioning tutorial — an overview on captioning and instructions on how to create captions for different media types
- MovCaptioner — Mac-only software to add captions to Quicktime movies. Also exports DFXP and other common formats.
- OSU WAC's JW FLV Player Controls — accessible controls for the JW FLV Player
- OSU WAC's YouTube captioning guide and JavaScript player controls — a relatively pain-free way to have accessible, captioned video in your web pages
- Accessify's Easy YouTube Caption Creator — a simple way to create a .sub caption text file for YouTube videos
- YouTubeCC — a simple way to add caption to Youtube videos
- CaptionTube — a utility for adding closed captions to YouTube videos
Audio Description
Provide audio description of important visual content. Audio description should be able to be easily turned on and off.
Discussion
An audio description is an audio-only track that runs synchronously with the main video's audio and describes key visual content. Audio descriptions provide necessary context for understanding what is going on in the video for students who cannot see the video, and they provide extra context to aid comprehension for students with cognitive disabilities.
There is no question that there is very little audio-described video on the web, whether in or outside of educational institutions. Instructors and content designers see creation of it as an undue burden. However, if your goal is universal design that provides access to everyone, regardless of ability, then audio description is a necessity.
You should audio describe:
- Opening titles and on-screen text (when it is not spoken)
- Crucial scene changes, on-screen events, actions, and gestures
- Character appearance and observable emotional states, when essential to comprehension of the video
It is probably easiest to scrub through the video and record the audio description as separate clips. The video players we recommend in the video section require you to have a single audio description track that runs the length of the video. Any good audio recording software will allow you to insert silence of arbitrary length into a clip and merge clips together. So it does not take much effort to produce a video-length audio description track.
Resources
- “Meet the Mentor” — a video profiling a blind IT professional. The video, produced at Ohio State University, is both captioned and audio described.
- Equal Access: Universal Design of Computer Labs — an example of captioning and audio description from DO-IT at the University of Washington
- DCMP Description Key for Educational Media — a guide for audio description
- Standard techniques in audio description — a guide from Joe Clark
- SoundForge — Windows audio editing software
- Garage Band — Mac audio editing softare
- Audacity — audio editing software for all platforms including Linux