Pershing has been around since the 1930s and runs one of the biggest financial clearing businesses in the U.S., but the company is little known to many people because it does so much of its work behind the scenes. Sold by Credit Suisse Group to Bank of New York for US$2.1 billion earlier this year, Pershing relies heavily on its network to conduct much of its business, which involves supplying about 250 applications and services to more than 1,300 customers. Ramaswamy Nagappan, Pershing's managing director of e-services, spoke recently about the challenges of running such a huge online applications system.
Tell us about your network.
Our infrastructure is very widespread, from mainframes to Unix to NT (servers - of which there are about 500). We're distributed across all the major vendors (on the software side, including Oracle and Sybase). The internal network is all Ethernet, and we have a Gig backbone. From a customer point of view, we have ATM, frame relay, T-1 and point-to-point. We run Internet-based VPNs. We also have dark fiber and OC-3 (155M bit/sec links). Primarily, we have three data centers.
What do you do day to day?
I am responsible for the development of applications and distribution of them. These include brokerage trading, account information, market data, news and other research information and content. The modes of access can be through a standard Web browser, telephone (voice recognition), wireless or through a native Windows application. Knowing the end-user experience is a key in delivering applications over the network. The applications need to be instrumented and integrated well into our enterprise monitoring and operations framework. This way we are alerted to problems before the customer calls to report them.
So what's your secret for keeping these applications up and running?
The primary thing we do is make sure we have dual data centers (while Pershing has three data centers, two are redundant), and both are hot sites. As for the applications, all the functions are running at both locations. It resembles a disaster recovery kind of situation. Even our development staff is in multiple locations.
Our software architecture and hardware infrastructure play a key role here. The software architecture incorporates features that support easy distribution of request processing to redundant software and hardware components; the hardware and network infrastructure in turn supports this distribution and replication strategy across and between data centers. It is still critical that applications are designed according to their usage patterns and that key architectural support is there to meet our performance requirements, for example caching and load balancing.
Do you throw bandwidth at your applications or lean more toward quality-of-service techniques?
We maintain a safe and conservative amount of free capacity, which is monitored continuously. Bandwidth is increased to achieve this margin. We use network QoS technology at the physical switching level to prioritize online processing traffic over automated data movements like FTP. We do not provide QoS at the application level at this time, and we have the ability in our architecture to apply routing constraints to software messages.
How closely does the application development staff work with the network group?
When we started years ago, the team was smaller and the culture brewed in such a way that the application development team always worked closely with the network people. In terms of our e-commerce delivery, if you look at external monitoring companies such as Keynote and Gomez and their measurements, we are the top in performance. The only reason that happened is because the developers work with the network very closely. We have gone to the extent of building routing agents that know which applications and which machines are running faster and feeding intelligence about which application or machine it should be load balancing to the Cisco distributor device. Even now we are working with the network team to figure out (QoS)-based routing. Both groups try to leverage existing infrastructure to get the economies of scale. In some cases, the network infrastructure must be modified to allow for the new functionality/connectivity. We now offer streaming content over our network.
What's your Web services strategy?
Web services form the essence of our customer integration strategy. We are migrating existing proprietary interfaces to open standards such as (Simple Object Access Protocol) and HTTP. This includes a diverse range of integration requirements, from document delivery to request-response interactions and alert notifications.
What affect will Web services have applications development?
Within our architecture, increased testing is the primary impact introduced by Web services. Our service-based architecture has implicit support for Web services, allowing services to be created that can be accessed via a number of protocols without substantial additional development effort. This will help us re-use a lot of these services to offer in different product flavors. Also, our customers will be able to integrate them into their proprietary products.
How do you prioritize security concerns?
Security is very high among our customers' concerns. Before they use our service, they actually have their security team come and audit us to make sure we are on par. We keep with industry standards in terms of all security levels. We try to get independent certification people to come in and certify us often. We have a dedicated team, starting with intrusion detection all the way up to application security.
What's your take on utility computing?
The vendors are a little bit late in terms of the application integration with infrastructure and management. We end up writing all the application instrumentation and linking it to tools such as OpenView, Smarts and Netcool. For example, with the idea of multiple data center implementations, a lot of customized work needs to be done. If you have the same application and you decided to run it on two different data centers and then let it load balance, there is nothing out of the box to do that. A lot of things need to be done. You need to understand the application. Then there's the network and the database - that is a lot of customization just to make sure your application is redundant 7 by 24. The vendors are on the right track with the idea and currently are saying that they will come with a team of people and do it for you. But that's an expensive and long process.
What sorts of tools do you use to stay on top of application performance?
We use a suite of tools: Micromuse's Netcool, HP OpenView and technology from (Systems Management Arts). We also have homegrown agents that collect and feed data that these tools don't capture automatically. We have instrumented a lot of automation in our tools. We use SMARTS' automation for correlation, and we have an internally developed automation application that brings in various events and puts things together. Keynote is also part of our automation because it gives us an agent sitting on 50 different (points of presence) all over the world. Our instrumentation only works within our data center. Keynote goes outside of that, and these POPs collect the responsiveness of our site from various locations and feeds that data back to Netcool. We take that and if there is any particular peering point that is giving us a poor response, our operations staff starts proactively calling those peering places. It could be in California between MCI and Sprint, so we work with those vendors to figure out how we can reroute and get better performance.
What about your job or what IT issue keeps you up at night?
Everyone is asking us to be more efficient and run the infrastructure at a lower cost. Technical problems are easy to solve. I don't think keeping a redundant system or making sure the system works is a challenge. The biggest challenge is how to give the same level of service at a reduced cost, how to stay current and roll out new technology and somehow keep the costs low.