CURRENT_MEETING_REPORT_ Reported by Phill Gross/CNRI, Steve Hunter/LBL Bernhard Stockman/Nordunet and Dale Johnson/Merit OPSTAT Minutes The inaugural meeting of the Opstats Working Group was convened by Bernhard Stockman and Phill Gross. The primary purpose of the meeting was to decide on how the NOCs could most effectively share their operational statistics. Phill presented a model of data sharing (see below). _______________ ______________ | New | | Old | | Collection | | Collection | | Tool | | Tool | |_____________| |____________| \ | \ ________|________ \ | Pre-Processor | \ |_______________| \ / \ / \ / __________ / Common \ / Statistics \ \ Database / \__________/ / \ / \ / \ / \ / __________________ / | Post-Processor | / |________________| / | ________________ _______|________ | New | | Old | | Presentation | | Presentation | | Tool | | Tool | |______________| |______________| 1 This model was based on previous work in the NJM working group and on work by Bernhard at the Nordic Engineering Technical Forum (NETF). The goal is to define, implement, and make available in the public domain, the tools required for the model. Issues o Legal, ethical and political concerns of data sharing. People are concerned about showing data that may make one of the networks look bad. o Insure integrity, conformity and confidentiality of the shared data. To be useful, the same data must be collected from all of the involved sites and it must be collected at the same interval. To prevent vendors from getting an unfair performance information, certain data must not be made available. o Access control methods. Both of the above make this an obvious requirement. Mailing list Chris Myers ( will set up the WG mailing list -- ( Listserv commands can be sent to (e.g., help, add). List of Desired Operational Statistics The group brainstormed a list of desired operational statistics. We began by laying out categories of important operational statistics: o UTILIZATION (throughput) - traffic totals/period - traffic peaks/period - protocol usage/period o PERFORMANCE (delays, congestions) - Ping statistics - TCP RTT estimate o AVAILABILITY (long term accessability) - Line availability (percentage line uptime) - Route availability - Service availability o STABILITY (short term accessability) - Number of line status transitions per time unit - ICMP behaviour - Route stability. (Compare to work done at Merit) 2 * Total number of route changes per time unit. * Total number of routes per interface and box (dumping the Route table is hard with the SNMP powerful GETNEXT operator, maybe add to MIB) * Next Hop count * Changes in traffic pattern Both Availabilty and Stability would need asynchronous mechanisms, traps, etc. to be defined. The next step was to define specific objects from the above categories. It was recognized that not all this information might be easy to obtain. Therefore, a "degree of difficulty" was assigned to each desired statistic. The list of desired operational statistics is below, where the "degree of difficulty" is noted as: 1. ( E) Easy, Variables already in standard MIB thus easy to retrieve. 2. (HP) Hard, Variables that need high resolution polling which is hard due to resulting network load. 3. (HM) Hard, Variables sometimes in private enterprise MIB thus could be hard to retrieve. 4. ( I) Impossible, Variables not at all in the MIB thus impossible to retrieve using SNMP. Some variables could be proposed for future inclusion in MIB, but some variables cannot be retrieved by SNMP due to limitations in the SNMP specification. For each interface: Packets in (E) (for each protocol) (I) Packets out (E) (for each protocol) (I) Octets in (E) Octets out (E) Aggregate errors in (HM) Aggregate errors out (HM) Congestion events in (HM) Congestion events out (HM) Seconds of missing statistics (HP) Interface resets (HM) % interface unavailable (HP) Routing Changes (HM) Interface route hop count (HP) A distribution of queue length (HP) Inter-packet arrival time (I) Packet size distribution (I) Line status (E) 3 for the node: Packets forwarded (for each protocol) IP- (E) DECnet- (HM) OSI- (I) Packet size distribution (HP) IP packets dropped for queue overflow(I) sysUpTime (E) Therefore, the following metrics were chosen as desireable and reasonable: For each Interface: o Octets in o Octect out o Unicast packets in o Unicast packets out o Nonunicast packets in o Nonunicast packets out o In discards o Out discards o Line status o Number of routes in table(s) (If we can get it into the MIB) o Number of route changes (If we can get it into the MIB) For the node: o IP forwards o IP discards o sysUpTime Polling frequency After much discussion, it was decided that all participating NOCs should poll at fifteen minute intervals, or some interval which has fifteen minutes as an integer multiple. A five minute interval was desired by some, but it requires too much disk and CPU resources unless it can be shown to be obviously superior. An alternative suggestion was to poll fast, like every five minutes, but just store the high, low, and average values once per hour. This may also be researched. 4 Common Data Storage Format (CDSF) It was proposed that the data be stored as a flat file with the following format: o Header Record: This will be a table of tag identifiers. A tag will be defined which uniquely identifies each data value as to its source node and data type. o Data Records: Timestamp [TAB] Delta Time [TAB] tag [TAB] Object Value Where: - Timestamp - yyyymmddhhmmssxxx and xxxx is the offset from GMT - Delta Time - time, in seconds, since last poll - Tag - Unique identifier defined above (ASCII string) - Object Value- Change in SNMP counter or current status Data Presentation We will take this issue up in more detail at the next meeting. It was suggested that we study network status reports in the next Topology Engineering Working Group to get ideas about diplay format. Phill Gross will ask the presenters at the next TEWG to give thought to how they like to see operational data presented. Data Collection Tools We will take this issue up in more detail at the next meeting. We need to consider the following in more detail: o SNMP based o NNstat o Ad Hoc scripts and methods (many folks have ad hoc methods in use) o Performance and Benchmarking tools and methods Other notes: Related work is being done by the following IETF WGs -- Remote LAN, BMWG, NJM, TEWG. The following European groups are also doing work in this area -- RIPE and NETF groups. MERIT has been working quite a bit 5 on this for the last four months. A good reference for data display formats is ``The Display of Quantitative Information'' by Edward R. Tufte, published by Graphics Press, Box 430, Cheshire, CT 06410, c1983. Attendees Guy Almes Anne Ambler Karl Auerbach Scott Bradner Randy Butler Tom Easterday Fred Engel Mike Erlinger Vince Fuller vaf@Standford.EDU Phillip Gross Jack Hahn Susan Hares Eugene Hastings Steven Hunter Dale Johnson Ken Jones uunet!konkord!ksj Dan Jordt Christopher Kolb Walter Lazear Daniel Long E. Paul Love Marilyn Martin Matt Mathis Milo Medin Cyndi Mills Lynn Monsanto Donald Morris Chris Myers David O'Leary Mark Oros Robert Reschly Bill Rust Timothy Salo Jonathan Saperia Ken Schroder Michael Schwartz Marc Sheldon Bernhard Stockman Roxanne Streeter Joanie Thompson Kannan Varadhan Steven Waldbusser Carol Ward Dan Wintringham C. Philip Wood Jessica (Jie Yun) Yu 6 7