Sunday, August 9, 2009

Element Management Systems Architecture

Every telecommunication equipment vendor has to supply some level of manageability for the equipment. The management software supplied with the network element can be a propreitary software interoperating with the vendor's element management system or it can be a standards based management agent so that it can interoperate with both the vendor's element management system and other third party network management systems. In this article I will cover various aspects to consider in the architecture of a management solution for network elements, especially those based on ATCA platform.

What is element management?

Managing a network element involves the following

  1. Configuration of the network element. The configurations can be initial provisioning (which remain static as long as the network element is operational) and run time configurations (which are dynamically added / modified / deleted at run time)

  2. Collection of fault notifications from the network element.

  3. Collection of statistics and call counters from the network element.

What is an element management system?

An element management system provides some mechanism, in the form of GUI or command line utilities, to manage the network element. The GUI or command line utilities will have fields / commands to configure the device, view alarms and statistics. Typical element management architectures are

  • GUI is a web based. Network element runs an HTTP server with CGI scripts which communicate with back end APIs in the network element to configure the device and collect alarms. The advantage here is that, GUI can be made extremely interactive if AJAX is employed. Since most of the browsers now support JavaScript, portability doesn't become an issue. The disadvantage is that it doesnt use any standard protocol for management. If a customer wants to integrate a third party network management system (explained later) with the network element, it is not possible unless the third party NMS understands how to invoke the CGI scripts on the network element. Figure 1 provides a conceptual view of this architecture.

Figure 1

  • GUI is a stand alone JAVA GUI which uses SNMP PDUs to communicate with the network element. Network element runs an SNMP agent which interprets the PDUs and provides configuration and fault management facilities. The advantage of this solution is that it is standards based. Any third party network management system can be integrated with the network element. The SNMP MIB serves as the contract between the NMS and the network element. The disadvantage is that, Java based GUI require separate installation and are generally less interactive (compared to AJAX based web interface). Figure 2 provides a conceptual view of this architecture.
Figure 2

  • GUI is web based. Network element runs CGI scripts which communicate with local SNMP agent using SNMP. This approach combines the best of (1) and (2) above. Figure 3 provides a conceptual view of this architecture.
Figure 3

What is a Network Management System (NMS)?

Network management systems are standards based management systems that can control and monitor a variety of network elements within a network. A network can comprise multiple devices from multiple vendors. Hence it becomes imperative for the network management system to talk the same language to all the devices it manages. Hence a standard protocol for management becomes mandatory. SNMP is used as the standard protocol to configure various network elements.

Steps to do to make a network element provide stadards compliant management interface:

The following are the steps involved in making a device standards compliant for integration with any NMS.

  1. Define an information model for managing the device. The information model should not depend on how it is going to be managed from an EMS or NMS. That is, the model should care only about what should be managed and not how it should be managed. If some attributes / fields are added in the model just to satisfy some ease of use of use from the NMS / EMS it breaks this goal. If th model takes this path, then it becomes extremely difficult to integrate the network element's instrumentation of the model (the SNMP agent part) with the NMS (the SNMP manager part).

  2. While defining the tables in the information model, normalize the tables using database normalization techniques. This calls for expertise in database design.

  3. Capture the information model as an SNMP MIB.

  4. Implement the instrumentation for the MIB as an SNMP agent at the network element. Instrumentation means, for example if a SET request is given for a particular attribute, the agent should handle propagating that information to the appropriate module in the network element that handles that attribute.

How SNMP based network management systems function?

The function of an NMS can be split into the following

  1. Setting the configuration on a network element (uses SNMP SET). These operations should follow the ACID properties (Atomicity, Consistency, Integrity and Durability). To ensure atomicity of these operations, configuration set requests from NMS are typically blocking calls.

  2. Querying the existing configuration from a network element. This happens when a new network element is added into the NMS (or) when NMS loses connectivity with the network element for sometime and then comes back. The query can be on a single attribute or it can on the complete database. Though many NMS implement both these operations as blocking calls, it makes sense to implement bulk queries as asynchronous operations. This requires the SNMP manager module in the NMS to buffer the sent request so that it can match the responses with the request.

  3. Listening for asynchronous events from the network element. These events include alarms and statistics.

Issues with Synchronous SNMP towards network elements with multiple process architecture

Most of the NMS employ synchronous (blocking call) SNMP towards the SNMP agent for configuration query and apply. This provides a simple implementation for network elements where the entire application along with management agent (SNMP agent) are compiled as a single image and loaded. This is the case with propreitary hardware based solutions.

However, next generation network elements (PDG, AAA servers, e-PDG, MME) running on standards based ATCA chassis environment are architected with multiple UNIX processes for high availability. Such network elements will have each process catering to a separate functionality. For example in a AAA server, one process could be catering to the RADIUS / DIAMETER interface towards the PDG or other NAS, while another process could be interfacing towards the traditional HLR through an SS7 interface. This SS7 process could run on a separate blade, having an SS7 card. In such an architecture, synchronous SNMP adds to lot of delay. Consider the example given in the figure below


Figure 4


As shown in the above figure, in multi cluster HA architectures some of the configuration data need to be set in every cluster. If the SNMP agent blocks for response from all of them it adds to a significant delay. Sometimes some of the clusters may be busy in data plane or control plane processing. It may process the configuration message after some delay. If the NMS which gave the configuration request is set to timeout say, 2 seconds, it may timeout while the request is still in processing at the network element. This leads to momentary data inconsistency between the configuration data available in the NMS and the configuration processed in the hardware.

There are two ways to address this problem

  1. Make the SNMP command request and response asynchronous. This means the NMS will never block once it sends out an SNMP SET request. It may receive the SET response later. But it has to maintain a window of requests sent, so that it can match the response to the request.

  2. Use synchronous SNMP from NMS to SNMP agent. The SNMP agent should return immediately after verifying the validity of the message. It should then process the configuration request in a separate thread and once it configures the data in all the clusters it should send a success / failure notification back to the NMS as an unsolicited message (TRAP or INFORM REQUEST PDU). In this approach the NMS need not maintain any window of requests. It should commit the configured value into the database, only after it receives the success notification from the agent. The notification will carry the OID of the attribute that was configured, the value set and its status.

Conclusion:

In summary, to make a network element provide a standard management interface and be interoperable with any NMS the following need to be done:

  1. The SNMP MIB, should model only what should be managed and not how it should be managed. The model should be well normalized. Make sure that the MIB defines events for notifying success or failure of configuration requests.

  2. Make the mode of operation of SNMP agent (synchronous / asynchronous) configurable (or) a compile time option. If the agent is configured to operate in asynchronous mode, then it can use the asynchronous events to signal the success or failure of operation to the NMS.

No comments: