lunes, 28 de marzo de 2016

How Big Banks Thread The Software Performance Needle

Timothy Prickett Morgan

While parallel programming on distributed systems is difficult, making applications scale across multiple machines – or hybrid compute elements that mix CPUs with FPGAs, GPUs, DSPs, or other motors – linked by a network is not the only problem that coders have to deal with. Inside each machine, the number of cores and threads have ballooned in the past decade, and each socket is as complex as a symmetric multiprocessing system from two decades ago was in its own right.

With so many cores and usually multiple threads per core to execute software, getting the performance out of software can be a tricky business. At the world’s hyperscalers, financial services behemoths, HPC centers, and database and middleware providers, the smartest programmers in the world are often off in a corner, with pencil and paper, mapping out the dependencies in the hairball of code they and their peers have created to find out the affinities between threads within that application. Having sorted out these dependencies, they engage in the unnatural act of pinning software processes or threads to specific cores in a physical system to optimize their performance.

Pinning threads is a bit like doing air traffic control in your head, and Leonardo Martins had such an onerous task a few years back. Martins got his start in the IT sector two decades ago as an engineer at middleware software makers Talarian and TIBCO before moving to Lehman Brothers to introduce Monte Carlo simulation systems for risk management to the bank. In 2004, he moved to Barclays Capital to introduce its first Linux-based systems as its senior middleware program manager and architect, and in 2010, he was the low latency senior architect at HSBC. While at HSBC, Martins was one of the wizards that would map out the applications and figure out how to pin their threads to specific cores in a system to maximize performance – a process that might take anywhere from two to eight weeks.

This is not big deal, right? Wrong. At the major financial institutions, the trading applications are updated at least monthly and sometimes as much as 200 times a year, so having the tuning process take weeks to months means code is never as optimized as it needs to be for a competitive edge. Martins looked around for a tool that would automate this thread pinning, and when he could not find one he found a few peers and set out to create one.

Martins founded Pontus Networks back in 2010 as a consultancy specializing in the tuning of latency sensitive applications, and was joined by Martin Raumann, an FPGA designer and specialist in low latency, high frequency trading hardware, and Deepak Aggarwal, another C, C++, C#, and Java programmer with deep expertise in distributed systems who built front office and back office systems for equities, foreign exchange, and fixed income asset trading at Barclays Capital, Credit Suisse, Citigroup, ABN, and Standard Chartered. They started work on the Pontus Vision Thread Manager and filed their first patents relating to the automated thread pinning in August 2014. The alpha version of Thread Manager debuted quietly at the end of November last year with its first customer, and the product is now available and has been acquired by three customers – all of whom are in the financial services sector. It is a fair guess that these companies are probably the ones where the founders of Pontus Networks used to work and do such painstaking thread pinning work, but that is just a guess.

Several other HPC-related users in government and university labs as well as a few Formula One racing teams are kicking the tires to see how Thread Manager might remove the human bottleneck and help get tuned software into production faster. In this latter case, Thread Manager is expected to help boost the performance of the mechanical engineering design and simulation programs as well as some of the post-processing that is done on designs to test them.) The company is also getting ready to do some performance tests on Hadoop clusters as well, and thinks that performance boosts on HDFS storage will be similar to what it has seen on Extract-Test-Load (ETL) applications that front end data warehouses. (Informatica is working with Pontus Networks on these tests.)

And as you have learned to expect from reading The Next Platform, none of these organizations looking for a bleeding edge advantage are willing to go on the record with their experiences just yet – and they may never do it because of that advantage. But we can tell you anecdotally what is going on and give you the results of some synthetic benchmarks to get you started.

Thread Manager is new enough that Pontus Networks is not precisely sure how different kinds of applications will make use of the automatic thread pinning capabilities, and Robin Harker, business development director at the company, tells The Next Platform that the company is just now getting some benchmarks under its belt to prove what Thread Manager can do.

The first and most important thing is that Thread Manager is a dynamic tool, working behind the scenes as software is running and changing, rather than a static, human-based optimization process that has to be invoked every time the code (or the hardware for that matter) changes. The dynamism is import in another way.

“If you look at an Oracle Exadata, where the company owns the whole box, they pin processes, not threads, which is a bit coarser grain control,” explains Harker. “So Oracle is probably pretty well optimized to run on a single box, and even across a RAC cluster for that matter. However, if you want to add a web application server to the same box, you are adding a different application that is going to have an effect on the Oracle system. But Thread Manager doesn’t care because all it sees is threads that talk to each other, and we don’t care if they come from Oracle or Tomcat or Linux or whatever.”

So the thinking of Pontus Networks, as more and more cores and threads get stuffed into single machines because we cannot really increase clock speed anymore to goose performance, companies will want to run multiple applications on machines (even if they are clustered) and they will have an even more complex thread pinning nightmare to deal with. Hence, the automation.

Source

No hay comentarios:

Publicar un comentario