3. Base IETF Service Assurance YANG Module
3.1. Concepts
The "ietf-service-assurance" YANG module assumes a set of subservices, to be assured independently. A subservice is a feature or a subpart of the network system that a given service instance depends on. Examples of subservices include:¶
- device: whether a device is healthy, and if not, what are the symptoms. Potential symptoms are "CPU overloaded", "Out of RAM", or "Out of TCAM".¶
- ip-connectivity: given two IP addresses bound to two devices, what is the quality of the IP connectivity between them. Potential symptoms are "No route available" or "ECMP Imbalance".¶
The first example is a subservice representing a subpart of the network system, while the second is a subservice representing a feature of the network. In both cases, these subservices might depend on other subservices, for instance, the connectivity might depend on a subservice representing the routing system and on a subservice representing ECMP.¶
The two subservices presented above need different sets of parameters to fully characterize one of their instance. An instance of the device subservice is fully characterized by a single parameter allowing to identify the device to monitor. For ip-connectivity subservice, at least the device and IP address for both ends of the link are needed to fully characterize an instance. Therefore, the "ietf-service-assurance" module is intended to be augmented for each type of subservice, so that the needed parameters are modelled in the augmenting module.¶
The only "built-in" type available represents service instances. A service instance is represented as a subservice instance of type "service". The parameters required to fully identify a service instance are the type of the service and the name of the service instance.¶
The dependencies are modelled as an adjacency list, in the sense that each subservice contains a list of pointers to its dependencies. That list can be empty if the subservice instance does not have any dependencies.¶
By specifying service instances and their dependencies in terms of subservices, one defines a global assurance graph. That assurance graph is the result of merging all the individual assurance graphs for the assured service instances. Each subservice instance is expected to appear only one in the global assurance graph even if several service instances depend on it. For example, an instance of the device subservice is a dependency of every service instance that rely on the corresponding device. The assurance graph of a specific service instance is the subgraph obtained by traversing the global assurance graph through the dependencies starting from the specific service instance.¶
An assurance agent configured with such a graph is expected to produce, for each configured subservice: a health-status indicating how healthy the subservice is and when the subservice is not healthy, a list of symptoms explaining why the subservice is not healthy.¶
3.2. Tree View
The following tree diagram [RFC8340] provides an overview of the "ietf-service-assurance" module.¶
module: ietf-service-assurance +--ro assurance-graph-last-change yang:date-and-time +--rw subservices | +--rw subservice* [type id] | +--rw type identityref | +--rw id string | +--ro last-change? yang:date-and-time | +--ro label? string | +--rw maintenance-contact? string | +--rw (parameter) | | +--:(service-instance-parameter) | | +--rw service-instance-parameter | | +--rw service string | | +--rw instance-name string | +--ro health-score? union | +--ro symptoms-history-start? yang:date-and-time | +--ro symptoms | | +--ro symptom* [start-date-time agent-id symptom-id] | | +--ro symptom-id | | | -> /agents/symptoms-description/symptom-id | | +--ro agent-id -> /agents/agent-id | | +--ro health-score-weight? uint8 | | +--ro start-date-time yang:date-and-time | | +--ro stop-date-time? yang:date-and-time | +--rw dependencies | +--rw dependency* [type id] | +--rw type | | -> /subservices/subservice/type | +--rw id leafref | +--rw dependency-type? identityref +--ro agents* [agent-id] | +--ro agent-id string | +--ro symptoms-description* [symptom-id] | +--ro symptom-id string | +--ro description string +--ro assured-services* [service] +--ro service leafref +--ro instances* [instance-name] +--ro instance-name leafref +--ro subservices* [type id] +--ro type -> /subservices/subservice/type +--ro id leafref¶
The date of last change "assurance-graph-last-change" is read only. It must be updated each time the graph structure is changed by addition or deletion of subservices, dependencies or modification of their configurable attributes. Such modifications correspond to a structural change in the graph. The date of last change is useful for a client to quickly check if there is a need to update the graph structure. A change in the health-score or symptoms associated to a service or subservice does not change the structure of the graph and thus has no effect on the date of last change.¶
The "subservice" list contains all the subservice instances currently configured on the server. A subservice declaration MUST provide:¶
- A subservice type ("type"): reference to an identity that inherits from "subservice-base", which is the base identity for any subservice type.¶
- An id ("id"): string uniquely identifying the subservice among those with the same type,¶
The type and id uniquely identify a given subservice.¶
The "last-change" indicates when this particular subservice was modified for the last time.¶
The "label" is a human-readable description of the subservice.¶
The presence of "maintenance-contact" field inhibits the emission of symptoms for that subservice and subservices that depend on them. See Section 3.6 of [I-D.ietf-opsawg-service-assurance-architecture] for a more detailed discussion.¶
The "parameter" choice is intended to be augmented in order to describe parameters that are specific to the current subservice type. This base module defines only the subservice type representing service instances. Service instances MUST be modeled as a particular type of subservice with two parameters, "service" and "instance-name". The "service" parameter is the name of the service defined in the network orchestrator, for instance "point-to-point-l2vpn". The "instance-name" parameter is the name assigned to the particular instance to be assured, for instance the name of the customer using that instance.¶
The "health-score" contains a value normally between 0 and 100 indicating how healthy the subservice is. The special value -1 can be used to specify that no value could be computed for that health-score, for instance if some metric needed for that computation could not be collected.¶
The "symptoms-history-start" is the cutoff date for reporting symptoms. Symptoms that were terminated before that date are not reported anymore in the model.¶
The status of each subservice contains a list of symptoms. Each symptom is specified by¶
- an identifier "symptom-id" which identifies the symptom locally to an agent,¶
- an agent identifier "agent-id" which identifies the agent raising the symptom,¶
- a "health-score-weight" specifying the impact to the health score incurred by this symptom,¶
- a "start-date-time" indicating when the symptom became active and¶
- a "stop-date-time" indicating when the symptom stopped being active, that field is not present if the symptom is still active.¶
In order for the pair "agent-id" and "symptom-id" to uniquely identify a symptom, the following is necessary:¶
- The "agent-id" MUST be unique among all agents of the system¶
- The "symptom-id" MUST be unique among all symptoms raised by the agent¶
Note that "agent-id" and "symptom-id" are leafrefs pointing to the objects defined later in the document. While the combination of "symptom-id" and "agent-id" is sufficient as an unique key list, the "start-date-time" second key help sorting and retrieving relevant symptoms.¶
The "dependency" list contains the dependencies for the current subservice. Each of them is specified by a leafref to both "type" and "id" of the target dependencies. A dependency has a type indicated in the "dependency-type" field. Two types are specified in the model:¶
- Impacting: such a dependency indicates an impact on the health of the dependent,¶
- Informational: such a dependency might explain why the dependent has issues but does not impact its health.¶
To illustrate the difference between "impacting" and "informational", consider the interface subservice, representing a network interface. If the device to which the network interface belongs goes down, the network interface will transition to a "down" state as well. Therefore, the dependency of the interface subservice towards the device subservice is "impacting". On the other hand, a dependency towards the ecmp-load subservice, which checks that the load between ECMP remains stable throughout time, is only "informational". Indeed, services might be perfectly healthy even if the load distribution between ECMP changed. However, such an instability might be a relevant symptom for diagnosing the root cause of a problem.¶
The list "agents" at the top level contains the list of symptoms per agent. As stated above, the key of the list is the "agent-id", which MUST be unique among agents of a given assurance system. For each agent, the list "symptoms-description" maps a "symptom-id" to its "description". The "symptom-id" MUST be unique among the symptoms raised by the agent.¶
The list "assured-services" presents the subservices indexed by assured service instances. For each service type, identified by the "service" leaf, all instances of that service are listed in the "instances" list. For each instance, identified by the "instance" leaf, the "subservices" list contains all subservices part of the assurance graph for that specific instance. These imbricated lists provide a query optimization to get the list of subservices in that assurance graph in a single query, instead of recursively querying the dependencies of each subservice, starting from the node representing the service instance.¶
The relation between the health score ("health-score") and the health-score-weight of the currently active symptoms is not explicitly defined in this document. The only requirement is that a health score that strictly smaller than 100 (the maximal value) must be explained by at least one symptom. A way to enforce that requirement is to first detect symptoms and then compute the health score based on the health-score-weight of the detected symptoms. As an example, such a computation could be to sum the health-score-weight of the active symptoms, subtract that value from 100 and change the value to 0 if negative. The relation between health-score and health-score-weight is left to the implementor (of an agent [I-D.ietf-opsawg-service-assurance-architecture]).¶
Keeping the history of the graph structure is out of scope for this YANG module. Only the current version of the assurance graph can be fetched. In order to keep the history of the graph structure, some time-series database (TSDB) or similar storage must be used.¶
3.3. YANG Module
<CODE BEGINS> file "ietf-service-assurance@2022-04-07.yang"¶
module ietf-service-assurance { yang-version 1.1; namespace "urn:ietf:params:xml:ns:yang:ietf-service-assurance"; prefix sain; import ietf-yang-types { prefix yang; reference "RFC 6991: Common YANG Data Types"; } organization "IETF OPSAWG Working Group"; contact "WG Web: <https://datatracker.ietf.org/wg/opsawg/> WG List: <mailto:opsawg@ietf.org> Author: Benoit Claise <mailto:benoit.claise@huawei.com> Author: Jean Quilbeuf <mailto:jean.quilbeu@huawei.com>"; description "This module defines objects for assuring services based on their decomposition into so-called subservices, according to the SAIN (Service Assurance for Intent-based Networking) architecture. The subservices hierarchically organised by dependencies constitute an assurance graph. This module should be supported by an assurance agent, able to interact with the devices in order to produce a health status and symptoms for each subservice in the assurance graph. This module is intended for the following use cases: * Assurance graph configuration: - subservices: configure a set of subservices to assure, by specifying their types and parameters. - dependencies: configure the dependencies between the subservices, along with their type. * Assurance telemetry: export the health status of the subservices, along with the observed symptoms. Copyright (c) 2022 IETF Trust and the persons identified as authors of the code. All rights reserved. Redistribution and use in source and binary forms, with or without modification, is permitted pursuant to, and subject to the license terms contained in, the Revised BSD License set forth in Section 4.c of the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info). This version of this YANG module is part of RFC XXXX; see the RFC itself for full legal notices. "; revision 2022-08-10 { description "Initial version."; reference "RFC xxxx: YANG Modules for Service Assurance"; } identity subservice-base { description "Base identity for subservice types."; } identity service-instance-type { base subservice-base; description "Identity representing a service instance."; } identity dependency-type { description "Base identity for representing dependency types."; } identity informational { base dependency-type; description "Indicates that symptoms of the dependency might be of interest for the dependent, but the status of the dependency should not have any impact on the dependent."; } identity impacting { base dependency-type; description "Indicates that the status of the dependency directly impacts the status of the dependent."; } grouping symptom { description "A grouping for the symptoms for a specific subservice."; leaf symptom-id { type leafref { path "/agents/symptoms-description/symptom-id"; } description "Identifier of the symptom, to be interpreted according to the agent identified by the agent-id."; } leaf agent-id { type leafref { path "/agents/agent-id"; } description "Identifier of the agent raising the current symptom."; } leaf health-score-weight { type uint8 { range "0 .. 100"; } description "The weight to the health score incurred by this symptom. The higher the value, the more of an impact this symptom has. If a subservice health score is not 100, there must be at least one symptom with a health score weight larger than 0."; } leaf start-date-time { type yang:date-and-time; description "Date and time at which the symptom was detected."; } leaf stop-date-time { type yang:date-and-time; description "Date and time at which the symptom stopped being detected. must after the start-date-time."; } } grouping subservice-reference { description "Reference to a specific subservice, identified by its type and identifier"; leaf type { type leafref { path "/subservices/subservice/type"; } description "The type of the subservice to refer to (e.g., device)."; } leaf id { type leafref { path "/subservices/subservice[type=current()/../type]/id"; } description "The identifier of the subservice to refer to."; } } grouping subservice-dependency { description "Represents a dependency to another subservice."; uses subservice-reference; leaf dependency-type { type identityref { base dependency-type; } description "Represents the type of dependency (e.g., informational, impacting)."; } } leaf assurance-graph-last-change { type yang:date-and-time; config false; mandatory true; description "Date and time at which the assurance graph last changed after the changes (dependencies and/or maintenance windows parameters) are applied to the subservice(s). These date and time must be more recent or equal compared to the more recent value of any changed subservices last-change"; } container subservices { description "Root container for the subservices."; list subservice { key "type id"; description "List of configured subservices."; leaf type { type identityref { base subservice-base; } description "Type of the subservice, for instance, device or interface."; } leaf id { type string; description "Identifier of the subservice instance. Must be unique among subservices of the same type."; } leaf last-change { type yang:date-and-time; config false; description "Date and time at which the structure for this subservice instance last changed, i.e., dependencies and/or maintenance windows parameters."; } leaf label { type string; config false; description "Label of the subservice, i.e., text describing what the subservice is to be displayed on a human interface. It is not intended for random end users but for network/system/software engineers that are able to interpret it. Therefore, no mechanism for language tagging is needed."; } leaf maintenance-contact { type string; description "A string used to model an administratively assigned name of the resource that is performing maintenance. The presence of this field indicates that the current subservice is under maintenance. It is suggested that this name contain one or more of the following: IP address, management station name, network manager's name, location, or phone number. In some cases the agent itself will be the owner of an entry. In these cases, this string shall be set to a string starting with 'monitor'."; } choice parameter { mandatory true; description "Specify the required parameters per subservice type. Each module augmenting this module with a new subservice type, that is a new identity based on subservice-base should augment this choice as well, by adding a container available only if the current subservice type is the newly added identity."; container service-instance-parameter { when "derived-from-or-self(../type, 'sain:service-instance-type')"; description "Specify the parameters of a service instance."; leaf service { type string; mandatory true; description "Name of the service."; } leaf instance-name { type string; mandatory true; description "Name of the instance for that service."; } } // Other modules can augment their own cases into here } leaf health-score { type union { type uint8 { range "0 .. 100"; } type enumeration { enum missing { value -1; description "Explictly represent the fact that the health score is missing. This could be used when metrics crucial to establish the health score are not collected anymore."; } } } config false; description "Score value of the subservice health. A value of 100 means that subservice is healthy. A value of 0 means that the subservice is broken. A value between 0 and 100 means that the subservice is degraded."; } leaf symptoms-history-start { type yang:date-and-time; config false; description "Date and time at which the symptoms history starts for this subservice instance, either because the subservice instance started at that date and time or because the symptoms before that were removed due to a garbage collection process."; } container symptoms { config false; description "Symptoms for the subservice."; list symptom { key "start-date-time agent-id symptom-id"; description "List of symptoms the subservice. While the start-date-time key is not necessary per se, this would get the entries sorted by start-date-time for easy consumption."; uses symptom; } } container dependencies { description "Indicates the set of dependencies of the current subservice, along with their types."; list dependency { key "type id"; description "List of dependencies of the subservice."; uses subservice-dependency; } } } } list agents { key "agent-id"; config false; description "Contains symptoms of each agent involved in computing the health status of the current graph. This list act as a glossary for understanding the symptom ids returned by each agent."; leaf agent-id { type string; description "Id of the agent for which we are defining the symptoms. This identifier must be unique among all agents."; } list symptoms-description { key "symptom-id"; description "List of symptoms raised by the current agent, identified by their symptom-id."; leaf symptom-id { type string; description "Id of the symptom for the current agent. The agent must guarantee the unicity of this identifier."; } leaf description { type string; mandatory true; description "Description of the symptom, i.e., text describing what the symptom is, to be computer-consumable and be displayed on a human interface. It is not intended for random end users but for network/system/software engineers that are able to interpret it. Therefore, no mechanism for language tagging is needed."; } } } list assured-services { key "service"; config false; description "Types of service that are currently part of the assurance graph. The list must contain an entry for every service type that is currently present in the assurance graph. This list presents an alternate access to the graph stored in /subservices that optimizes querying the assurance graph of a specific service instance."; leaf service { type leafref { path "/subservices/subservice/service-instance-parameter/" + "service"; } description "Name of the service type."; } list instances { key "instance-name"; description "Instances of the parent service type. The list must contain an entry for every instance of the parent service."; leaf instance-name { type leafref { path "/subservices/subservice/service-instance-parameter/" + "instance-name"; } description "Name of the service instance. The leafref must point to a service-instance-parameter whose service leaf matches the parent service."; } list subservices { key "type id"; description "Subservices that appear in the assurance graph of the current service instance. The list must contain the subservice corresponding to the service instance, that is the subservice that matches the service and instance-name keys. For every subservice in the list, all subservices listed as dependencies must also appear in the list."; uses subservice-reference; } } } }¶
<CODE ENDS>¶
3.4. Rejecting Circular Dependencies
The statuses of services and subservices depend on the statuses of their dependencies, and thus circular dependencies between them prevents the computation of statuses. The SAIN architecture document [I-D.ietf-opsawg-service-assurance-architecture] discusses in Section 3.1.1 how such dependencies appear and how they could be removed. The responsibility of avoiding such dependencies falls to the SAIN orchestrator. However, we specify in this section the expected behavior when a server supporting the ietf-service-assurance module receives a data instance containing circular dependencies.¶
Enforcing the absence of circular dependencies as a YANG constraint falls back to implementing a graph traversal algorithm with XPath and checking that the current node is not reachable from its dependencies. Even with such a constraint, there is no guarantee that merging two graphs without dependency loops will result in a graph without dependency loops. Indeed, the Section 3.1.1 of [I-D.ietf-opsawg-service-assurance-architecture] presents an example where merging two graphs without dependency loops results in a graph with a dependency loop.¶
Therefore, a server implementing the ietf-service-assurance module MUST check that there is no dependency loop whenever the graph is modified. A modification creating a dependency loop MUST be rejected.¶