Streaming Media for Web Based Training

Chad Childers, Ford Motor Company, USA,
Frank Rizzo, Ford Motor Company, USA,
Linda Bangert, Internet Education Group, USA,

Abstract: Streaming audio became available on the Web in 1995, but with the development of the Synchronized Multimedia Integration Language (SMIL)[1], the technology has reached a new level of maturity. SMIL is based on the XML standard, and allows audio, video, images, and text to be integrated. The implications for Web based pedagogy are tremendous. We now have the opportunity to do training on the desktop in a feature-rich open environment. We will give a basic background in streaming media technology, discuss the standards, the current state of the art, our experience with RealNetworks [2], how to take advantage of it in an intranet environment, and touch on future developments, including the integration of testing engines.

Taking Web Based Training into the 21st Century

In the online world of 1999, what you can do is both empowered and constrained by the technology. A good understanding of the limits of the viewing software, end user hardware, and intervening network allows the instructional designer to make the best possible use of what is available - to push the limits while balancing speed and usability. We are now entering a new world where those limits are being pushed back rapidly, and anything is possible. It is still very important to understand the limits, for now, but it is equally important to break down our preconceived notions of what is possible. For a moment, imagine anything is possible!

The new SMIL (Synchronized Multimedia Integration Language) standard allows multimedia content, including text, pictures, sound, and video to be synchronized for a coherent learning experience. Control of all these media is contained in a simple text file (although the format is quite complex). Tools to simplify creation and editing are rapidly being developed. SMIL can greatly reduce the bandwidth required while delivering an experience similar to watching a fully interactive television channel.

Definitions, History, and Current Status

Streaming media is defined as network based data, which can be presented to the user before the whole data file has finished transferring. If you see a picture begin to appear on your screen before the transfer completes (e.g., a PNG or some JPG files), or hear an audio file start playing as soon as you click it (e.g., a RealAudio file), that is an example of streaming media.

The primary advantage of streaming is that large audio and video files can be played as they arrive on the computer rather than having to wait for the file transfer to complete. For training in particular, this means that the user interface is much more responsive.

Data can be streamed in a variety of ways:
Pre recorded / on-demand Real-time / on-demand - pulledReal-time / live - pushed
Web server
RealMedia™ server

If served via the web server, when a user clicks on a web link to a RealAudio sound file, the data is delivered over the web's HTTP protocol. One kind of data can be presented. If served via the RealMedia server, the data is delivered back to the web browser via a streaming protocol like RTSP (Real Time Streaming Protocol) or UDP. Pre-recorded content could include online training materials, and real time content could include classes and meetings.

SMIL is important for several reasons. First, it integrates various kinds of media. Second, it is an open standard that can be leveraged across all platforms. Finally, it is derived from the W3C Extensible Markup Language (XML)[3] standard; this is important because anything can be defined in XML, and it can be extended on the fly simply by defining new tags.

A SMIL player can act like any other Web browser plug-in, and display SMIL content over an HTTP connection. However, it can also subscribe to a host group and view an IP multicast, or negotiate the control connection and open a unicast RTSP connection to stream the data.

The TCP/IP protocol upon which the Internet is based is reliable over a wide variety of physical networks because the packets can be retransmitted by TCP, the Transmission Control Protocol. [4] When delivering streaming data over a high bandwidth corporate or university campus network, however, the high reliability of TCP is not required, and retransmissions can slow down performance and take up too much network bandwidth. RTSP is designed to degrade gracefully even if a few packets get lost, and therefore delivers the data faster with lower overhead. Early papers on Real-Time Video concluded that the Web was not suitable for high bandwidth media, because of the inherent delays. [5] The new protocols help deal with these problems.

With IP multicast, a streaming server can send a broadcast message across the network, allowing multiple computers to receive it. The new protocols can allow some packets to expire without retransmission, and new routers can allow the data to go across the network to multiple destinations with only one destination address, the multicast host group address, in the header.

If you want to take advantage of these capabilities, first decide on your needs, then take the time to understand the options. You can then talk with your networking group for help in configuring routers, or to help you decide what technology best suits your needs.

SMIL, the best direction for Web Based Training

Just as web based training (WBT) has some important gains over traditional classroom training, the use of SMIL allows the Instructional Designer (ISD) to take the training experience one step further. SMIL builds on the existing base of XML standards, tools and experience. It allows for very easy indexing and editing because the control files are all plain text with tags, similar to HTML. It can be used inline within an HTML page, and allows simple extensibility for other applications (such as testing engines) within a well-defined framework. Open standards tend to be simple and durable.

Poorly designed WBT can be a waste of time and money as well as an ineffectual tool for training. Many of the streaming videos foisted upon unsuspecting viewers as WBT show a video of a slide presentation with an audio narration, or even worse a subject matter expert as a "talking head" giving a lecture. To take advantage of SMIL, build a presentation including the audio track of the lecture combined with streaming text of the speaker's notes, streaming JPEGs of any presentation slides, and perhaps short video segments of any animated processes required to illustrate the topic. Besides being more informative and imminently more useful to the end-user, the performance gains of the SMIL presentation over the pure video are tremendous. The network bandwidth required for audio playback audio with text and JPEGs can be less than half that of a single video stream. Due to the fact that the JPEGs will take up less bandwidth than the video, the quality of the JPEGs can be much higher and small details on the presentation slides, like text, will actually be readable by the end-user. The audio and text tracks could be localized and presented to the user in the language of their choice without having to re-author the entire presentation in a single monolithic chunk. SMIL gives you the ability to split these data types into separately maintained files and maintain full control over how and when they are displayed to the user. One of SMIL's strongest abilities is controlling when something happens on-screen. The content author can precisely control all events, effects and transitions.

Applications that support open standards tend to be available at low or no cost for academic uses. Many of the instructional design tools available today use proprietary code and custom designed Java plug-ins to display the coursework. Some software companies charge far too much for their WBT solutions and may have simply repackaged their custom computer-based training software engines into a complicated browser plug-in. Using this type of approach to WBT can lead to a variety of problems. Trying to deploy the specialized plug-ins to your client base and dealing with unforeseen incompatibilities caused by these plug-ins can quickly become a maintenance nightmare. Better to rely upon a solution based on open standards where the browser plug-ins are freely available and tested by the Internet at large.

There are still some issues to be resolved with SMIL. Drag-and-drop tools to automatically generate SMIL code are still in their first-generation or in beta testing. Complex SMIL presentations still require hand-coding or at the very least, some hand tweaking to perfect and debug. At the time this paper was authored, writing SMIL presentations complex enough to be called WBT requires knowledge and experience beyond the capabilities of the typical instructional-designer. For now, the use of an experienced web site programmer or a staff member who can be dedicated to learning the technical aspects of SMIL is recommended.

From an authoring standpoint, the emerging collaboration and streaming technologies in Microsoft Office 2000 appear to be simple to use and well integrated into the traditional Office suite of tools. It remains to be seen if the Microsoft products continue to have the server scalability and quality issues which plagued earlier streaming technology releases.

When choosing a streaming technology for WBT many factors must be weighed and evaluated. There are a variety of technical factors such as existing network infrastructure between you and your audience, server platform and availability, and client software maintenance. These will be discussed in "The Real Nitty-Gritty" section below. Other factors, which can sometimes be more important to overcome than the technical issues, include personal experience and comfort level with the technology, institutional politics, and any existing corporate relationships. First and foremost, you need to be comfortable and familiar with the technology you implement. If your instructional designers are all familiar with a specific tool set and very comfortable with the processes and procedures surrounding your existing traditional or CBT training methods, there will probably be a substantial resistance to change. Overcoming any internal training paradigm "inertia" will definitely be an obstacle. Often, training the trainers is the hardest job of all. Convincing the management that a new method of training is needed and it may cost them some money to overcome the technical issues can sometimes be an insurmountable hurdle. Presenting the idea to management requires careful analysis of the actual costs involved. Another potential issue when choosing a streaming technology within an organization can be factoring in any pre-existing corporate relationships. If your organization has, for example, a strong relationship with Apple Computers, then trying to justify a WBT solution utilizing SMIL instead of QuickTime for streaming media may require good analysis and strong justifications.

The Real Nitty-Gritty

Building a WBT module using SMIL follows the same process as building traditional WBT with a few specialized requirements. The first phase of any project is the conceptual brainstorming and storyboarding. This is best done on paper for speed and easy reference. The first step is to define the objective of the WBT module. Decide what information is to be conveyed to the viewer and be specific. Define the project scope, setting definite boundaries encompassing just enough detail to properly cover your objectives. Keep the focus tight and stick to your stated objectives. Next, set requirements for the user experience, thinking about not just what the user will be learning, but how you want them to learn. The idea is to lead them through your learning materials in an organized and straightforward manner. This will help you design the navigational methods used within the WBT module. Try to design the framework first. Don't worry about the graphics yet, work on the layout first. Designing a common look-and-feel that can be reused across modules will help reduce development time on subsequent projects and lend a consistency to your training. Consistent look-and-feel across WBT modules gives the viewer a higher comfort level knowing that even though the subject matter may be new, the process of learning throughout your modules is familiar.

From your storyboard, build a timeline and organize the presentation of your learning materials. Plan the layout of the learning materials and the navigational items. Decide when a particular item needs to be displayed. This will help you determine load orders of your media assets and help identify any constraints imposed on your load order by your target bandwidth. With SMIL different items within your presentation can be specified to load serially or in parallel. It's usually a good idea to make sure that your navigation buttons and other graphics and text load before the user gets to view the video animation on the first page of your presentation.

After the layout and storyboard have been finalized, the next two steps for building your presentation are the design of the interface and the content design of the learning materials. These steps frequently happen in parallel. While one group of graphics artists work on the backgrounds, buttons, graphics and other window dressing, the instructional designers work with the media production staff to plan and create the learning materials.

Interface Design

When designing the interface for SMIL presentations, one must consider how the presentation will be displayed. SMIL can be embedded into a web page or displayed stand-alone in the RealPlayer®. The choice can be simple depending on the level of integration desired, use of any courseware testing engines, and finally personal preference. Either way, standard web design rules definitely apply. To be effective, the interface must be clean and uncluttered. The design should encourage the user to explore while intuitively leading them safely through the learning materials in the appropriate order to effectively teach them what they need to know. The design must support the learning materials. Layout of the presentation should lead the user to focus on the learning materials. With time-based control over all display elements, SMIL provides the ultimate in design flexibility.

As with any web based design project, the module must be designed with the lowest common denominator client system in mind. If your audience is within your organization's intranet and there are hardware and software standards in place to assure that all of your users have at least a certain minimum configuration then it is relatively simple to plan your WBT module to fit within those requirements. Typically, a SMIL presentation should be designed to fit within a 640x480 VGA screen. Remember that the actual usable space within a browser is smaller than the full screen resolution. For a 640x480 VGA display with the web browser window maximized, with default menu settings, the usable screen real estate is approximately 600x300 pixels. When adding a 320x240-pixel video, not much room is left vertically for titles and text.

Learning Materials Content Design

While the graphics artists are busy with the interface, the source materials for the video, audio, and other media clips must be recorded and encoded. Plan and conduct source material recording sessions. Once the project storyboard and layout are finalized, it's time to build the actual "meat" of the presentation. Successfully planning and producing the actual presentation material is simple if you, the designer, have control over the material being presented. More often than not, the audio and video have to be recorded onto cassette or videotape and then digitized and encoded for use on the web.

When dealing with video as a streamed medium, many factors influence the final stream quality and playback rates. The well-known rule of "Garbage in, garbage out." applies to streaming video. The higher the quality of the recording used as source material, the smaller and faster the streaming video file will be. The differences in signal-to-noise ratios and overall resolution between VHS, S-VHS, 8mm, High-8, Mini-DV, BetaCam-SC, and DV-PRO video formats (these are listed in increasing degree of quality) directly influence the playback frame rate and encoded file size of the streaming video file. The better the format you can afford to record in, the cleaner and better your video will stream to your clients.

For important high-bandwidth content, the use of a professional video production staff equipped with proper lighting and recording equipment will always yield a higher quality recording than a consumer-quality video camera. This by no means should be interpreted to mean that low cost, consumer-quality equipment is incapable of producing satisfactory results. However, to provide Internet-based video streams larger than a postage stamp at acceptable quality when network bandwidth is at a minimum, starting with premium quality video recordings is essential. The objective here is to plan the multimedia source materials appropriately taking into consideration the time, resources, and funding required to realistically achieve your design goals.

Building the RealText™, RealPix™, and SMIL files

After the actual content files have been created, the SMIL presentation files need to be created. This is quite similar to creating HTML pages, except SMIL is time sensitive and requires specific timing for each event and transition, and the files need to live on the streaming server, not the web server. For instructions on how to code SMIL files, the SMIL technical documentation can be found at the World Wide Web Consortium Architecture for Synchronized Multimedia [1]. Technological Issues

The topics covered within this section will address issues surrounding manufacture of streaming media for intranet use where high-bandwidth network connections are available. Although SMIL presentations can be adapted to incorporate different sized videos for either high or low bandwidth use, that is outside the scope of this paper.

Once the instructional designer has obtained the source materials for the videos, they need to be digitized. The format into which the video is digitized will affect the final encoded output. Always digitize video uncompressed at 30 frames per second and in Stereo at 16-bit 44-KHz sampling rates. Let the streaming format encoder software have the best quality input so it has the all of the data it needs to provide the highest-quality output. The more data the encoder has to work with, the fewer assumptions the compression routines need to make. This will result in smoother, cleaner, and smaller encoded video output.

Digital editing of video and audio sources before encoding is usually required. Certain optimizations such as video cropping and audio normalization can be made to provide optimal output upon playback. Applications for video and audio editing include Adobe Premiere® and Sonic Foundry's SoundForge®.

Encoding content to RealNetworks formats

Audio and video can be encoded into the RealNetworks® RealMedia® format using a variety of third party applications. The easiest encoder to use is the RealProducer® Plus G2 from RealNetworks. It has many different stream options. The RealMedia G2 SureStream™ format option allows multiple streams at different bandwidths to be encoded into the same file. This allows the RealPlayer® and RealServer® to better negotiate how much data to send the player based on network performance. For example, a video may be encoded for 28.8K modem, 56K modem, 64K Single ISDN, 128K Dual ISDN, 220K xDSL and Cable Modem, and 150K Corporate LAN data rates all within a single file. Depending on the available network bandwidth, the player will switch between these different encoded formats dynamically as the user watches the video. This feature provides much better playback than older streaming technologies, which only adapt to changing network conditions by dropping frames or "fuzzing-out" the video into large indistinguishable blocks. Depending on the resolution and frame rate of your video source files it may not make sense to encode at the higher bandwidth settings, such as 220K and 150K, and at the low bandwidth settings, 28.8K, 56K, and single ISDN in the same file. The lower settings may not have enough available bandwidth to stream the file.

Streamed animations can be produced using Macromedia's Flash® technology. Flash is in widespread use for non-streamed web based animations. The same animations can be included into your SMIL presentation after a simple encoding procedure into the RealFlash™ format. Now the use for Flash animations is no longer limited to the realm of the static web page and can be unleashed into the dynamic environment of a SMIL presentation. The RealNetworks site has some good examples of RealFlash™ SMIL presentations.

Bandwidth between you and your target audience is the limiting factor on SMIL design. Designing SMIL presentations includes tradeoffs for each data stream sent to the player. The designer must balance data stream buffering times versus compression and the number of streams being loaded simultaneously. These calculations are also affected by the resolution of the data to be streamed. Resizing a video originally intended to stream at 320x240 pixels down to 160x120 will reduce your bandwidth requirements by a factor of four (assuming constant compression rates). The RealNetworks SMIL kit has exhaustive information on this topic.

Uploading files to stream server

As the streaming media files are created, they need to be stored on a separate server running the RealNetworks RealServer® G2 server software. The content creator will need to place the files in a subdirectory off the mount point for the server, and will need the address and port number of the server, as well as whether the Ramgen file system, for sending temporary small files, is in use. The files can then be linked to from any web page. Links can be of the format http://server/ramgen/MountPoint/virtual_directory/filename, and once within SMIL, individual components can be specified in a very similar format, rtsp://server/MountPoint/virtual_directory/filename. [6]

Hardware requirements

Hardware requirements vary widely depending on your application. Four sets of hardware requirements are involved: network infrastructure, web server, stream server, and client browser/player. The web server requirements and configuration are outside the scope of this document. Many of the issues discussed here are particularly important to corporate implementers who have controlled environments into which they wish to introduce streaming technologies. The only successful way to implement streaming media in a corporate environment is to work with the network and computer infrastructure organizations within your company to understand and proactively adapt to the additional requirements imposed by the technology.

Network infrastructure: Both Internet and intranet bandwidth demands should not be underestimated. Careful analysis of current network loads and capacities can help determine how much streaming traffic can be handled before network upgrades are required to provide adequate quality of service to all users. Network upgrades are expensive and time consuming. Always consult with your network operations staff before implementing any streaming technologies on a widespread basis across your network.

Client browser/player: The RealNetworks RealPlayer G2 will currently run on any PC-compatibles running Windows 95, 98, NT4.0, and Power Macintosh. Performance will vary depending on CPU speed and available memory. For fast, responsive control and playback, we recommend a minimum of a 166-Mhz Pentium with 32Mb of RAM Slower machines will provide sub-optimal playback. PCs also need to be MPC-2 compliant and have appropriate sound-cards, drivers, and headphones or speakers installed. For corporate intranet sites, overcoming the current installed base of non multi-media equipped PCs can be a significant challenge.

Stream server: The RealNetworks RealServer G2 products run on a variety of UNIX platforms as well as Microsoft NT. Hardware requirements vary depending on expected number of users. Consult the RealNetworks website for details.


SMIL Authoring· RealNetworks RealProducer® Pro G2

· Digital Renaissance TAG Author® 2.0

· Veon Interactive V-Active® for RealSystem G2

Audio/Video Editing· Sonic Foundry's Sound Forge® 4.5

· Adobe Premiere®

Streaming Media Encoding· RealNetworks RealProducer® Plus G2

Future Directions

Most currently available web based testing software requires custom format files, special software, and is not standards-based. New products such as TopClass [7] use plain text and HTML, and are much better suited to integration within a SMIL framework. Testing is the next step.

The evolution of the tools currently available will no doubt give rise to a suite of powerful and easy-to-use tools for creating SMIL presentations. Ongoing development of SMIL with ratification via W3 will assure interoperability.

Better integration and tools will allow the potential we see to be realized. Right now, streaming media is at the level of maturity the web was in 1994. The standards are there, and the tools are coming. You can go out and use the technology now. Let us know what you do with it!


[1] W3C Recommendation: Synchronized Multimedia Integration Language (SMIL) at
[2] RealNetworks HTML+TIME at
[3] W3C Recommendation: Extensible Markup Language (XML) at
[4] Comer, Douglas E. The Internet Book: Everything You Need to Know About Computer Networking and How the Internet Works. Prentice Hall, August 1997
[5] "Real-Time Video and Audio in the World Wide Web" by Chen, Tan, Campbell, and Li. Proceedings of the Fourth International World Wide Web Conference, December 1995.
[6] RealServer Administration Guide at
[7] WBT Systems TopClass Overview at