Abstract: Streaming audio became available on the Web in 1995, but with the development of the Synchronized Multimedia Integration Language (SMIL)[1], the technology has reached a new level of maturity. SMIL is based on the XML standard, and allows audio, video, images, and text to be integrated. The implications for Web based pedagogy are tremendous. We now have the opportunity to do training on the desktop in a feature-rich open environment. We will give a basic background in streaming media technology, discuss the standards, the current state of the art, our experience with RealNetworks [2], how to take advantage of it in an intranet environment, and touch on future developments, including the integration of testing engines.
In the online world of 1999, what you can do is both
empowered and constrained by the technology. A good understanding
of the limits of the viewing software, end user hardware, and
intervening network allows the instructional designer to make
the best possible use of what is available - to push the limits
while balancing speed and usability. We are now entering a new
world where those limits are being pushed back rapidly, and anything
is possible. It is still very important to understand the limits,
for now, but it is equally important to break down our preconceived
notions of what is possible. For a moment, imagine anything is
possible!
The new SMIL (Synchronized Multimedia Integration Language) standard allows multimedia content, including text, pictures, sound, and video to be synchronized for a coherent learning experience. Control of all these media is contained in a simple text file (although the format is quite complex). Tools to simplify creation and editing are rapidly being developed. SMIL can greatly reduce the bandwidth required while delivering an experience similar to watching a fully interactive television channel.
Streaming media is defined as network based data,
which can be presented to the user before the whole data file
has finished transferring. If you see a picture begin to appear
on your screen before the transfer completes (e.g., a PNG or some
JPG files), or hear an audio file start playing as soon as you
click it (e.g., a RealAudio file), that is an example of streaming
media.
The primary advantage of streaming is that large
audio and video files can be played as they arrive on the computer
rather than having to wait for the file transfer to complete.
For training in particular, this means that the user interface
is much more responsive.
Data can be streamed in a variety of ways:
Pre recorded / on-demand | Real-time / on-demand - pulled | Real-time / live - pushed | |
Web server | |||
RealMedia server |
If served via the web server, when a user clicks on a web link
to a RealAudio sound file, the data is delivered over the web's
HTTP protocol. One kind of data can be presented. If served
via the RealMedia server, the data is delivered back to the web
browser via a streaming protocol like RTSP (Real Time Streaming
Protocol) or UDP. Pre-recorded content could include online training
materials, and real time content could include classes and meetings.
SMIL is important for several reasons. First, it integrates various
kinds of media. Second, it is an open standard that can be leveraged
across all platforms. Finally, it is derived from the W3C Extensible
Markup Language (XML)[3] standard; this is important because anything
can be defined in XML, and it can be extended on the fly simply
by defining new tags.
A SMIL player can act like any other Web browser plug-in, and
display SMIL content over an HTTP connection. However, it can
also subscribe to a host group and view an IP multicast, or negotiate
the control connection and open a unicast RTSP connection to stream
the data.
The TCP/IP protocol upon which the Internet is based is reliable
over a wide variety of physical networks because the packets can
be retransmitted by TCP, the Transmission Control Protocol. [4]
When delivering streaming data over a high bandwidth corporate
or university campus network, however, the high reliability of
TCP is not required, and retransmissions can slow down performance
and take up too much network bandwidth. RTSP is designed to degrade
gracefully even if a few packets get lost, and therefore delivers
the data faster with lower overhead. Early papers on Real-Time
Video concluded that the Web was not suitable for high bandwidth
media, because of the inherent delays. [5] The new protocols help
deal with these problems.
With IP multicast, a streaming server can send a broadcast message
across the network, allowing multiple computers to receive it.
The new protocols can allow some packets to expire without retransmission,
and new routers can allow the data to go across the network to
multiple destinations with only one destination address, the multicast
host group address, in the header.
If you want to take advantage of these capabilities, first decide
on your needs, then take the time to understand the options. You
can then talk with your networking group for help in configuring
routers, or to help you decide what technology best suits your
needs.
Just as web based training (WBT) has some important
gains over traditional classroom training, the use of SMIL allows
the Instructional Designer (ISD) to take the training experience
one step further. SMIL builds on the existing base of XML standards,
tools and experience. It allows for very easy indexing and editing
because the control files are all plain text with tags, similar
to HTML. It can be used inline within an HTML page, and allows
simple extensibility for other applications (such as testing engines)
within a well-defined framework. Open standards tend to be simple
and durable.
Poorly designed WBT can be a waste of time and money
as well as an ineffectual tool for training. Many of the streaming
videos foisted upon unsuspecting viewers as WBT show a video of
a slide presentation with an audio narration, or even worse a
subject matter expert as a "talking head" giving a lecture.
To take advantage of SMIL, build a presentation including the
audio track of the lecture combined with streaming text of the
speaker's notes, streaming JPEGs of any presentation slides, and
perhaps short video segments of any animated processes required
to illustrate the topic. Besides being more informative and imminently
more useful to the end-user, the performance gains of the SMIL
presentation over the pure video are tremendous. The network
bandwidth required for audio playback audio with text and JPEGs
can be less than half that of a single video stream. Due to the
fact that the JPEGs will take up less bandwidth than the video,
the quality of the JPEGs can be much higher and small details
on the presentation slides, like text, will actually be readable
by the end-user. The audio and text tracks could be localized
and presented to the user in the language of their choice without
having to re-author the entire presentation in a single monolithic
chunk. SMIL gives you the ability to split these data types into
separately maintained files and maintain full control over how
and when they are displayed to the user. One of SMIL's strongest
abilities is controlling when something happens on-screen. The
content author can precisely control all events, effects and transitions.
Applications that support open standards tend to
be available at low or no cost for academic uses. Many of the
instructional design tools available today use proprietary code
and custom designed Java plug-ins to display the coursework.
Some software companies charge far too much for their WBT solutions
and may have simply repackaged their custom computer-based training
software engines into a complicated browser plug-in. Using this
type of approach to WBT can lead to a variety of problems. Trying
to deploy the specialized plug-ins to your client base and dealing
with unforeseen incompatibilities caused by these plug-ins can
quickly become a maintenance nightmare. Better to rely upon a
solution based on open standards where the browser plug-ins are
freely available and tested by the Internet at large.
There are still some issues to be resolved with SMIL.
Drag-and-drop tools to automatically generate SMIL code are still
in their first-generation or in beta testing. Complex SMIL presentations
still require hand-coding or at the very least, some hand tweaking
to perfect and debug. At the time this paper was authored, writing
SMIL presentations complex enough to be called WBT requires knowledge
and experience beyond the capabilities of the typical instructional-designer.
For now, the use of an experienced web site programmer or a staff
member who can be dedicated to learning the technical aspects
of SMIL is recommended.
From an authoring standpoint, the emerging collaboration
and streaming technologies in Microsoft Office 2000 appear to
be simple to use and well integrated into the traditional Office
suite of tools. It remains to be seen if the Microsoft products
continue to have the server scalability and quality issues which
plagued earlier streaming technology releases.
When choosing a streaming technology for WBT many
factors must be weighed and evaluated. There are a variety of
technical factors such as existing network infrastructure between
you and your audience, server platform and availability, and client
software maintenance. These will be discussed in "The Real Nitty-Gritty"
section below. Other factors, which can sometimes be more important
to overcome than the technical issues, include personal experience
and comfort level with the technology, institutional politics,
and any existing corporate relationships. First and foremost,
you need to be comfortable and familiar with the technology you
implement. If your instructional designers are all familiar with
a specific tool set and very comfortable with the processes and
procedures surrounding your existing traditional or CBT training
methods, there will probably be a substantial resistance to change.
Overcoming any internal training paradigm "inertia"
will definitely be an obstacle. Often, training the trainers
is the hardest job of all. Convincing the management that a new
method of training is needed and it may cost them some money to
overcome the technical issues can sometimes be an insurmountable
hurdle. Presenting the idea to management requires careful analysis
of the actual costs involved. Another potential issue when choosing
a streaming technology within an organization can be factoring
in any pre-existing corporate relationships. If your organization
has, for example, a strong relationship with Apple Computers,
then trying to justify a WBT solution utilizing SMIL instead of
QuickTime for streaming media may require good analysis and strong
justifications.
From your storyboard, build a timeline and organize
the presentation of your learning materials. Plan the layout
of the learning materials and the navigational items. Decide
when a particular item needs to be displayed. This will help you
determine load orders of your media assets and help identify any
constraints imposed on your load order by your target bandwidth.
With SMIL different items within your presentation can be specified
to load serially or in parallel. It's usually a good idea to
make sure that your navigation buttons and other graphics and
text load before the user gets to view the video animation on
the first page of your presentation.
After the layout and storyboard have been finalized,
the next two steps for building your presentation are the design
of the interface and the content design of the learning materials.
These steps frequently happen in parallel. While one group of
graphics artists work on the backgrounds, buttons, graphics and
other window dressing, the instructional designers work with the
media production staff to plan and create the learning materials.
When designing the interface for SMIL presentations, one must
consider how the presentation will be displayed. SMIL can be
embedded into a web page or displayed stand-alone in the RealPlayer®.
The choice can be simple depending on the level of integration
desired, use of any courseware testing engines, and finally personal
preference. Either way, standard web design rules definitely
apply. To be effective, the interface must be clean and uncluttered.
The design should encourage the user to explore while intuitively
leading them safely through the learning materials in the appropriate
order to effectively teach them what they need to know. The design
must support the learning materials. Layout of the presentation
should lead the user to focus on the learning materials. With
time-based control over all display elements, SMIL provides the
ultimate in design flexibility.
As with any web based design project, the module must be designed
with the lowest common denominator client system in mind. If
your audience is within your organization's intranet and there
are hardware and software standards in place to assure that all
of your users have at least a certain minimum configuration then
it is relatively simple to plan your WBT module to fit within
those requirements. Typically, a SMIL presentation should be designed
to fit within a 640x480 VGA screen. Remember that the actual
usable space within a browser is smaller than the full screen
resolution. For a 640x480 VGA display with the web browser window
maximized, with default menu settings, the usable screen real
estate is approximately 600x300 pixels. When adding a 320x240-pixel
video, not much room is left vertically for titles and text.
While the graphics artists are busy with the interface, the source
materials for the video, audio, and other media clips must be
recorded and encoded. Plan and conduct source material recording
sessions. Once the project storyboard and layout are finalized,
it's time to build the actual "meat" of the presentation.
Successfully planning and producing the actual presentation material
is simple if you, the designer, have control over the material
being presented. More often than not, the audio and video have
to be recorded onto cassette or videotape and then digitized and
encoded for use on the web.
When dealing with video as a streamed medium, many factors influence
the final stream quality and playback rates. The well-known rule
of "Garbage in, garbage out." applies to streaming
video. The higher the quality of the recording used as source
material, the smaller and faster the streaming video file will
be. The differences in signal-to-noise ratios and overall resolution
between VHS, S-VHS, 8mm, High-8, Mini-DV, BetaCam-SC, and DV-PRO
video formats (these are listed in increasing degree of quality)
directly influence the playback frame rate and encoded file size
of the streaming video file. The better the format you can afford
to record in, the cleaner and better your video will stream to
your clients.
For important high-bandwidth content, the use of a professional
video production staff equipped with proper lighting and recording
equipment will always yield a higher quality recording than a
consumer-quality video camera. This by no means should be interpreted
to mean that low cost, consumer-quality equipment is incapable
of producing satisfactory results. However, to provide Internet-based
video streams larger than a postage stamp at acceptable quality
when network bandwidth is at a minimum, starting with premium
quality video recordings is essential. The objective here is to
plan the multimedia source materials appropriately taking into
consideration the time, resources, and funding required to realistically
achieve your design goals.
After the actual content files have been created, the SMIL presentation
files need to be created. This is quite similar to creating HTML
pages, except SMIL is time sensitive and requires specific timing
for each event and transition, and the files need to live on the
streaming server, not the web server. For instructions on how
to code SMIL files, the SMIL technical documentation can be found
at the World Wide Web Consortium Architecture for Synchronized
Multimedia [1]. Technological Issues
The topics covered within this section will address issues surrounding
manufacture of streaming media for intranet use where high-bandwidth
network connections are available. Although SMIL presentations
can be adapted to incorporate different sized videos for either
high or low bandwidth use, that is outside the scope of this paper.
Once the instructional designer has obtained the source materials
for the videos, they need to be digitized. The format into which
the video is digitized will affect the final encoded output. Always
digitize video uncompressed at 30 frames per second and in Stereo
at 16-bit 44-KHz sampling rates. Let the streaming format encoder
software have the best quality input so it has the all of the
data it needs to provide the highest-quality output. The more
data the encoder has to work with, the fewer assumptions the compression
routines need to make. This will result in smoother, cleaner,
and smaller encoded video output.
Digital editing of video and audio sources before encoding is
usually required. Certain optimizations such as video cropping
and audio normalization can be made to provide optimal output
upon playback. Applications for video and audio editing include
Adobe Premiere® and Sonic Foundry's SoundForge®.
Audio and video can be encoded into the RealNetworks® RealMedia®
format using a variety of third party applications. The easiest
encoder to use is the RealProducer® Plus G2 from RealNetworks.
It has many different stream options. The RealMedia G2 SureStream
format option allows multiple streams at different bandwidths
to be encoded into the same file. This allows the RealPlayer®
and RealServer® to better negotiate how much data to send
the player based on network performance. For example, a video
may be encoded for 28.8K modem, 56K modem, 64K Single ISDN, 128K
Dual ISDN, 220K xDSL and Cable Modem, and 150K Corporate LAN data
rates all within a single file. Depending on the available network
bandwidth, the player will switch between these different encoded
formats dynamically as the user watches the video. This feature
provides much better playback than older streaming technologies,
which only adapt to changing network conditions by dropping frames
or "fuzzing-out" the video into large indistinguishable
blocks. Depending on the resolution and frame rate of your video
source files it may not make sense to encode at the higher bandwidth
settings, such as 220K and 150K, and at the low bandwidth settings,
28.8K, 56K, and single ISDN in the same file. The lower settings
may not have enough available bandwidth to stream the file.
Streamed animations can be produced using Macromedia's Flash®
technology. Flash is in widespread use for non-streamed web based
animations. The same animations can be included into your SMIL
presentation after a simple encoding procedure into the RealFlash
format. Now the use for Flash animations is no longer limited
to the realm of the static web page and can be unleashed into
the dynamic environment of a SMIL presentation. The RealNetworks
site has some good examples of RealFlash SMIL presentations.
Bandwidth between you and your target audience is the limiting
factor on SMIL design. Designing SMIL presentations includes tradeoffs
for each data stream sent to the player. The designer must balance
data stream buffering times versus compression and the number
of streams being loaded simultaneously. These calculations are
also affected by the resolution of the data to be streamed. Resizing
a video originally intended to stream at 320x240 pixels down to
160x120 will reduce your bandwidth requirements by a factor of
four (assuming constant compression rates). The RealNetworks SMIL
kit has exhaustive information on this topic.
As the streaming media files are created, they need to be stored
on a separate server running the RealNetworks RealServer®
G2 server software. The content creator will need to place the
files in a subdirectory off the mount point for the server, and
will need the address and port number of the server, as well as
whether the Ramgen file system, for sending temporary small files,
is in use. The files can then be linked to from any web page.
Links can be of the format http://server/ramgen/MountPoint/virtual_directory/filename,
and once within SMIL, individual components can be specified in
a very similar format, rtsp://server/MountPoint/virtual_directory/filename.
[6]
Hardware requirements vary widely depending on your application.
Four sets of hardware requirements are involved: network infrastructure,
web server, stream server, and client browser/player. The web
server requirements and configuration are outside the scope of
this document. Many of the issues discussed here are particularly
important to corporate implementers who have controlled environments
into which they wish to introduce streaming technologies. The
only successful way to implement streaming media in a corporate
environment is to work with the network and computer infrastructure
organizations within your company to understand and proactively
adapt to the additional requirements imposed by the technology.
Network infrastructure: Both Internet and intranet bandwidth demands
should not be underestimated. Careful analysis of current network
loads and capacities can help determine how much streaming traffic
can be handled before network upgrades are required to provide
adequate quality of service to all users. Network upgrades are
expensive and time consuming. Always consult with your network
operations staff before implementing any streaming technologies
on a widespread basis across your network.
Client browser/player: The RealNetworks RealPlayer G2 will currently
run on any PC-compatibles running Windows 95, 98, NT4.0, and Power
Macintosh. Performance will vary depending on CPU speed and available
memory. For fast, responsive control and playback, we recommend
a minimum of a 166-Mhz Pentium with 32Mb of RAM Slower machines
will provide sub-optimal playback. PCs also need to be MPC-2 compliant
and have appropriate sound-cards, drivers, and headphones or speakers
installed. For corporate intranet sites, overcoming the current
installed base of non multi-media equipped PCs can be a significant
challenge.
Stream server: The RealNetworks RealServer G2 products run on
a variety of UNIX platforms as well as Microsoft NT. Hardware
requirements vary depending on expected number of users. Consult
the RealNetworks website for details.
· Digital Renaissance TAG Author® 2.0
· Veon Interactive V-Active® for RealSystem G2
· Adobe Premiere®
Most currently available web based testing software requires custom
format files, special software, and is not standards-based. New
products such as TopClass [7] use plain text and HTML, and are
much better suited to integration within a SMIL framework. Testing
is the next step.
The evolution of the tools currently available will no doubt give
rise to a suite of powerful and easy-to-use tools for creating
SMIL presentations. Ongoing development of SMIL with ratification
via W3 will assure interoperability.
Better integration and tools will allow the potential we see to
be realized. Right now, streaming media is at the level of maturity
the web was in 1994. The standards are there, and the tools are
coming. You can go out and use the technology now. Let us know
what you do with it!
The Real Nitty-Gritty
Building a WBT module using SMIL follows the same
process as building traditional WBT with a few specialized requirements.
The first phase of any project is the conceptual brainstorming
and storyboarding. This is best done on paper for speed and easy
reference. The first step is to define the objective of the WBT
module. Decide what information is to be conveyed to the viewer
and be specific. Define the project scope, setting definite boundaries
encompassing just enough detail to properly cover your objectives.
Keep the focus tight and stick to your stated objectives. Next,
set requirements for the user experience, thinking about not just
what the user will be learning, but how you want them to learn.
The idea is to lead them through your learning materials in an
organized and straightforward manner. This will help you design
the navigational methods used within the WBT module. Try to design
the framework first. Don't worry about the graphics yet, work
on the layout first. Designing a common look-and-feel that can
be reused across modules will help reduce development time on
subsequent projects and lend a consistency to your training.
Consistent look-and-feel across WBT modules gives the viewer
a higher comfort level knowing that even though the subject matter
may be new, the process of learning throughout your modules is
familiar.
Interface Design
Learning Materials Content Design
Building the RealText, RealPix, and SMIL files
Encoding content to RealNetworks formats
Uploading files to stream server
Hardware requirements
Tools
SMIL Authoring · RealNetworks RealProducer® Pro G2
Audio/Video Editing · Sonic Foundry's Sound Forge® 4.5
Streaming Media Encoding · RealNetworks RealProducer® Plus G2
Future Directions
[1] W3C Recommendation: Synchronized Multimedia Integration
Language (SMIL) at http://www.w3.org/AudioVideo/
[2] RealNetworks HTML+TIME at http://www.real.com/
[3] W3C Recommendation: Extensible Markup Language
(XML) at http://www.w3.org/XML/
[4] Comer, Douglas E. The Internet Book: Everything
You Need to Know About Computer Networking and How the
Internet Works. Prentice Hall, August 1997
[5] "Real-Time Video and Audio in the World
Wide Web" by Chen, Tan, Campbell, and Li. Proceedings of
the Fourth International World Wide Web Conference, December 1995.
[6] RealServer Administration Guide at http://service.real.com/help/library/servers.html
[7] WBT Systems TopClass Overview at http://www.wbtsystems.com/