Sky is the limit: July 2008

The market for Commercial Off The Shelf (COTS) middleware for high availability and management of telecommunication application along with modular compute platforms like ATCA and Carrier Grade operating systems like Carrier Grade Linux (CGL) are gaining momentum. Developers working in the teleocm industry whose expertise mainly lie in the development of protocol stacks and telecommunications applications, are often clueless as to how to evaluate these commercial off the shelf middleware. "Managers" :-) believe that a "Carrier Grade HA middleware" is a magic box which when integrated with their telecom application makes it five nines available overnight. But the fact is that the COTS middleware are just platforms over which one has to integrate their application. The onus of designing how the application needs to be deployed to achieve high availability still lies with the application developer. These COTS frameworks only provides mechanisms to incorporate those designs. Think of the COTS middleware as an "Operating System". An Operating System like Windows / Linux as such is of no use to the user. Its the applications built on top of them that the user is using. Similarly the COTS HA middleware is just a platform.

As a developer who has worked in the development of a carrier grade SAForum compliant middleware and also architected / designed some proprietary HA and management frameworks I would like to bring out my own list of questions to evaluate any commercial off the shelf middleware.

1. Does the COTS middleware provide a tool to model the application in terms of services, their redundancy policy etc.,. (prefarrably in SA Forum compliant way)? What redundancy models it supports? 1:1? N+K? N-Way active load balanced?

2. Have any legacy network element been integrated with the middleware? What was the redundancy model followed for it? What was the cycle time taken to make the legacy software into a HA product using your middleware?

3. What operating systems and hardware platforms are supported?

4. What is the measured switchover time from the active machine to the standby machine once a failure is detected by the HA middleware? How is the switchover indicated to the standby application? Does the middleware provide a configurable interface (preferably data driven / script based) for defining the actions to do on getting a switchover indication?

5. What is the replication mechanism followed by the HA middleware? Message based redundancy? Checkpointing? Distributed shared memory? Explain its performance in terms of number of messages replicated / checpointed per second with the average message size being 64 bytes.

6. Does the middleware provide a management framework? If so what is the interface provided by the framework? What is the database used for storing managed objects? Is it a proprietary database or a commercial database? What is the model follwed by the database - (i.e) Is it relational or hierarchical object oriented? How are the managed objects addressed in the database? Also does the management framework provide any IDE to model the management information (i.e) Does it provide a management information modeling tool?

7. How to integrate an existing application MIB with the management middleware? If the internal managed object database follows hiearchical object oriented model how do you map SNMP OIDs (which follow relational database model) to the managed object instances in the hiearchical database?

8. How are the events on hardware (boards, hot-swap events) mapped to state transitions on the related managed objects?

9. How can one define cross consistency check rules between MO attributes in the information model for management? Does the middleware provide any tool to define it? Also does the management framework handle the backend consistency check logic once the rule is defined through the tool without the need for writing any code?

10. Does the HA middleware provide location transparent communication between the service units defined? If a service unit A needs to send a message to service unit B will it be possible to address service unit B without knowing its physical whereabouts? What is the transport protocol used? Is the location transparent mechanism a sort of "naming service" over TCP / UDP transport or location transparency is supported at the transport protocol itself (TIPC)? If supported through "naming service" what is the messaging latency introduced by the overhead of a naming service layer over the transport?

11. Is it possible to model fault recovery policies in the HA middleware? For example if one wants to associate the continuous receipt of critical alarms from say an SS7 card to a service failover, is it possible to do it?

12. How are faults (alarms) from various service units (service unit could be a software or it could be a hardware proxied by a software) modeled and how are they associated with the MO classes in the model? At run time how is an alarm instance associated with an MO instance?

13. Does the management framework support the delivery of Attribute value change notification (AVCN), state change notification (SCN), object create notification (OCN) and object delete notification (ODN) on the managed objects to the northbound management stations?

14. How are alarms to MO state transitions mapping modeled?

15. Is it possible to define northbound interface level access permissions for each attribute while modeling? For example I would like to model certain attributes as RO to SNMP interface but RW to CLI interface. Is this possible?

16. Is management service realized as a separate process? If so, how are queries to "run time" object attributes that are owned by the call processing services handled? For example, if one defines the current active calls table as a MO class in the model (which is Read Only to the operator) the MO class can have 1000s of instances (dynamically changing) at run time. If the operator at some point does a "getbulk" on this table does the management service issue a query to Call processing service (involving an IPC)? How is the call processing application performance at this time?

17. What are the types of searches supported on the MO tree? Depth first? Breadth first?

18. Is it possible to configure delivery of critical alarms to email addresses and SMS numbers in the management framework? This is not a critical requirement for a management framework used in the network element. Many NMS (Network Management Systems) that collect management data from a cluster of network elements they manage provide this facility. But having this feature at the network element's management framework is one of my wish lists.

19. Does the alarm management service persist the current active alarm list? Is the alarm reporting compliant to ITU-T X.733 specification?

20. Has the managed object database replication in a 1:1 configuration been measured for performance? How many bytes can be replicated per second? Is the replication done synchronously or asynchrously?

21. Does the management database provide transaction semantics? (i.e) If a configuration change needs to be applied to more than one node it should happen in "all or none" fashion.

22. Does the middleware provide mechanisms to define software upgrade and firmware upgrade policies and a smooth integration path with the management model? What is the HA state transition followed while upgrading? How does it ensure minimum downtime upgrade?

Uuuuh quite a mouthful of evaluation criteria. If any COTS middleware supports atleast 60% of this let me know. I am interested in evaluating it. I am more concerned about ease of integrating, manageability and performance of message replication mechanism.

Sky is the limit

Tuesday, July 29, 2008

Evaluating COTS Telecommunication Middleware

Wednesday, July 2, 2008

Such a simple life

My Profiles

Sridhar's shared items

Blog Archive

About Me