CURRENT_MEETING_REPORT_ Reported by Steve Casner/USC-ISI Minutes of the Audio/Video Transport Working Group (AVT) The AVT Working Group met for only one session at this meeting since the draft specification for the Real-time Transport Protocol (RTP) is nearly completed for submission as an RFC. The emphasis of this session was on implementation experience with the focus shifting to companion specifications for profiles and encodings. Status of Draft RTP Specification This group did not meet in Amsterdam, but there has been substantial progress on the RTP specification via e-mail and a teleconference, and a new draft-ietf-avt-rtp-04.txt and .ps has been installed. The specification has been submitted to the Area Director with a request for ``IESG Last Call,'' and is in review by the Directorate. Steve Casner gave a brief description of the most recent change to the specification, which was the addition of the APP option. This option allows experimental application-specific options to be defined without official registration while avoiding conflicts with other option definitions. See the draft RTP specification for details. A brief description was also given on a proposal from Andrew Cherenson to add an option, not in the main RTP specification but in the audio/video profile, to indicate the mode or state of a participant. The proposed set of states were: active, video frozen (still image), private (listening but not sending), and hold (not listening and not sending). A good fraction of the attendees at this meeting had read the RTP specification. Comments were solicited both on the specification and on the two options just described, but no comments were offered. However, behind the scenes, some objections have been raised to the classification of RTP as a Proposed Standard and to certain details of the specification. These issues will be discussed further on the mailing list. Implementation Experience Ron Frederick from Xerox PARC gave a presentation on his experience with implementing RTP in the network video (nv) program. He reported that overall, the implementation went very cleanly, and that the combination of the sequence number, timestamp and sync bit worked well together. He found the option format easy to generate and parse, but cautioned that the parser must watch out for an illegal option length zero or length greater than the packet length. (The example option parsing code in the appendix to the specification includes these checks.) The one nuisance Ron found was that the program needs to know if an SSRC option is present to fully identify the sender before the parsing can act upon the other options. This requires parsing the options twice, or storing the information while parsing and then acting upon it at the end. To reduce this nuisance, it was proposed that the specification be modified to require that if an SSRC option is present, it must follow immediately after the fixed header. Since this is the logical place for translators to insert the SSRC option, and since there can be only one, this restriction should cause no difficulties. David Kristol from AT&T described his work (just beginning) on a quality of service monitor for RTP. It would create a map of the MBONE, and display a measure of the reception quality for each receiver on the map using data obtained from reception reports multicast by the receivers. This would allow a visual determination of bottleneck points. One observation was that the measure of video delay is affected by the use of the same timestamp on all packets of a video frame even though the packets are not transmitted at the same time. A solution is to measure delay only on the first packet of a frame. This illustrates that reception quality measurement may be dependent upon the medium. Dave also implemented a vat/RTP translator to allow participation in vat audio sessions inside the AT&T firewall. This turned out to be very simple, the only problem being translation of vat's beginning-of-talkspurt flag into RTP's end-of-talkspurt flag. For now, he is just copying the bit and ignoring the distinction. Encoding Specifications Frank Kastenholz from FTP Software asked for the addition in the audio/video profile of an 8-bit linear encoding (``L8'') and a format code for L8 encoding at 11.025 KHz. This matches the capability of common audio hardware on PC and Mac platforms. It is possible to convert in software to 8-bit mu-law at 8 KHz, but this increases the minimum processing power required to participate. This request was generally agreed upon, and Frank was requested to provide the details to go into the profile. Henning Schulzrinne cautioned that adding a new ``standard'' encoding places a burden on all implementations to include at least a decoder for it. Bill Fenner from NRL and Ron Frederick gave presentations on carrying JPEG video over RTP, and on the issues to be addressed in an encoding specification. Although the JPEG specification includes a variety of formats, Ron recommended that we stick with 4:2:2 video format, square pixels (as produced by most of the chips even though CCIR 601 specifies rectangular pixels), a 16x8 block as the minimum coded unit, and progressive scan. Ron also recommended that we use the Q factors defined for C-JPEG and D-JPEG by the Free JPEG Group and use the standard Huffman coding table, though these could be overridden by custom table definitions. Bill has designed an encoding for JPEG over RTP, and implemented it using the Parallax JPEG hardware. He points out that JPEG frames are large, so they are likely to require segmentation and reassembly. Losing one packet out of a frame will result in frame loss because the Huffman reset mechanism that is part of the standard does not provide enough sequence space for packet-size losses. He also observed that the Q factor does not provide much usable quality range (the picture gets lots uglier without the frame rate increasing as much as one would expect). The encoding Bill defined uses the same RTP timestamp on all packets of a frame, and the RTP sync bit indicates the last packet of the frame, as usual. In addition, he has defined a small header to go at the beginning of the data in the first packet of a frame. The presence of this header is indicated by the first two bytes being one of the application-specific codes (0xFF 0xE1) provided in the JPEG specification and guaranteed not to appear in the data. This code is followed by two bytes to encode the Q factor, Huffman table index, and some size information. Special values of these indices can be used to indicate that custom quantization and/or Huffman tables will follow. The mechanisms for requesting and/or periodically retransmitting custom tables are still to be decided and tested. There were no major objections to this design other than the suggestion that explicit image width and height factors be included. Bill agreed to produce a first draft specification for JPEG over RTP with assistance from Ron and Fengmin Gong from MCNC. Video Decoder API In Columbus we had a good discussion on the feasibility of creating a common interface for software video decoders so that each packet video program can incorporate decoders for many or all of the other programs' native formats to enable interoperation. At this meeting, Ron Frederick gave an update on the decoder API in the nv program in which decoding and rendering of the image data are decoupled: nv does all the network I/O, RTP processing, and X-window system interaction; the image decode routines just convert each packet of compressed bits into uncompressed YUV pixels for a portion of the image. A callback routine is provided to render a rectangular portion of the image after decoding. Ron identified several open issues that have arisen: o Is YUV a good choice for color decoding? It allows easy rendering into monochrome or color images, but requires extra processing for encodings that would more naturally use RGB or dithered data. The difficulty is that number of variations in the rendering code is already large to handle variations in pixel depth and ordering. It may not be worthwhile to double or triple this to render from additional input formats. o It is desirable to enable the use of hardware encoders and/or decoders for increased performance, but what additional hooks are required to fit this into the model? Some answers may come from exploring the options for the SunVideo board Cell-B encoder and for JPEG video using the Parallax board as Bill Fenner has done. o Should the common code handle resequencing of packets? Previously, nv ignored packet sequencing because packets of the nv encoding can be processed out of order. Now, nv is processing the sequence numbers to accumulate packet loss information, and could do the resequencing. However, Ron feels that this function should be left to the decode routines because the requirements may not be the same for all encodings, unless we can define as part of the profile an extra level of framing for all the encodings to use. Other API's may also be needed. Henning suggested that video encoding routines should also be sharable to reduce the effort of writing them. Since nv already separates the frame grab from the encoding, an interface could be explored there. Abel Weinrib also pointed out that we need API's at a higher layer, that of whole media agents to be controlled by different session managers. Report from IMA Network Focus Group A the end of the session, we got a report from Thomas Maslen of Sun on the recent first meeting of the IMA Network Focus Group, and on the potential interaction with the IETF AVT and MMUSIC Working Groups' activities. The Interactive Multimedia Association (IMA) is an industry group chartered to develop standards to support multimedia applications. In particular, the Multimedia System Services (MSS) proposal defines an object-oriented architecture for the infrastructure to support multimedia applications. In a way, the MSS work fits between the AVT and MMUSIC areas. The MSS proposal does not specify media transport mechanisms or protocols. The Network FoG is to address the requirements for network transport in the MSS, and to define network transport interfaces, target environments and protocol profiles to support those requirements. The group will work with other standards groups, including the IETF, to incorporate existing protocols and cooperate on the definition of new ones where needed. At first look, it appears that RTP may be suitable as one of the protocols to be used for transport of real-time media. Similarly, MSS provides infrastructure for multimedia applications such at teleconferencing, but does not include the applications themselves. Abel pointed out that it does not include higher-level objects like people in its model, nor does it include policies. Therefore, MMUSIC sits above MSS, and the session management mechanisms to be developed in that working group might be used for communication among a set of applications implemented using MSS. Future Working Group Activity The session closed with a discussion of future working group activity. As work on the RTP specification is completed, the group's emphasis will shift to profile and encoding specifications. From the point of view of our Area Director, Allison Mankin, it is appropriate for the group to continue work as needed, or to go on hiatus but keep the mailing list (rem-conf) active. Meetings at future IETFs may then be called to address new questions such as the interface between network real-time services and RTP, or when appropriate to advance any of the specifications through the standards process. Attendees Andy Adams ala@merit.edu Stephen Batsell batsell@itd.nrl.navy.mil Tom Benkart teb@acc.com Richard Binder rbinder@cnri.reston.va.us Ronald Broersma ron@nosc.mil Stephen Casner casner@isi.edu Ping Chen ping@ping2.aux.apple.com Chuck de Sostoa chuckd@cup.hp.com Stephen Deering deering@parc.xerox.com David Dubois dad@pacersoft.com Ed Ellesson ellesson@vnet.ibm.com Julio Escobar jescobar@bbn.com Roger Fajman raf@cu.nih.gov William Fenner fenner@cmf.nrl.navy.mil James Fielding jamesf@arl.army.mil Robert Fink rlfink@lbl.gov Ron Frederick frederick@parc.xerox.com Mark Garrett mwg@faline.bellcore.com Atanu Ghosh atanu@cs.ucl.ac.uk Shawn Gillam shawn@timonware.com Robert Gilligan Bob.Gilligan@Eng.Sun.Com Fengmin Gong gong@concert.net Darren Griffiths dag@ossi.com Regina Hain rrosales@bbn.com Shai Herzog herzog@catarina.usc.edu Phil Irey pirey@relay.nswc.navy.mil Rick Jones raj@cup.hp.com Frank Kastenholz kasten@ftp.com David Kaufman dek@magna.telco.com Byonghak Kim bhkim@cosmos.kaist.ac.kr Charley Kline cvk@uiuc.edu Michael Kornegay mlk@bir.com David Kristol dmk@allegra.att.com Allison Mankin mankin@cmf.nrl.navy.mil David Marlow dmarlow@relay.nswc.navy.mil Jim Martin jim@noc.rutgers.edu Thomas Maslen maslen@eng.sun.com Marjo Mercado marjo@cup.hp.com Greg Minshall minshall@wc.novell.com Dan Nordell Marsha Perrott perrott@prep.net J. Mark Pullen mpullen@cs.gmu.edu Jim Rees Jim.Rees@umich.edu Eve Schooler schooler@isi.edu Henning Schulzrinne hgs@research.att.com Michael Speer michael.speer@sun.com John Stewart jstewart@cnri.reston.va.us Matsuaki Terada tera@sdl.hitachi.co.jp Chuck Warlick chuck.warlick@pscni.nasa.gov Abel Weinrib abel@bellcore.com Jean Yao yao@cup.hp.com Shinichi Yoshida yoshida@sumitomo.com