5 Things to Consider When Deploying Linux in an HPEC System

6 November 2013
single-generator.png

single generator1. What do you really need from your operating system?

Specifically, what are your timing needs? Are you just interested in throughput, or do you need your system to take action based on events with a 100% guarantee of performance within a critical window? The need for real-time behavior is often characterized into two types: “hard” and “soft.” If we ever miss a deadline in hard real-time systems, the application is considered to have failed. This can be as dramatic as the failure of a flight control system, with subsequent threat to life and limb. Soft real time indicates that the usefulness of the result is degraded if a deadline is missed, but the application will continue to run. Take, for example, a radar system where the results of one scan are compromised, but subsequent ones are not; this may be tolerable in the right circumstances.

2. Not all Linux builds are created equal.

A standard Linux kernel contains many unbounded latency paths that can lead to the execution of a task occasionally taking much longer than usual. Since kernel version 2.4, a patch has been available (PREEMPT_RT) that removes many of these and leads to a much more responsive and bounded performance. However, it is still possible that occasional situations will cause response time to increase. If you want to eliminate this possibility, there are some commercial Linux builds that can be used. For instance, Red Hat supports its MRG distribution, which has all the real-time kernel patches applied. RedHawk Linux has its own micro-kernel which can be tuned for hard real-time performance

3. You get what you pay for.

There are many, many options when it comes to choosing a Linux distribution. I’ve counted more than 200 without even trying. The majority are free, work well and have community support. However, if you are developing a critical application, you should consider one of the commercial distributions that come with dedicated support. You should also take note of any implications for your intellectual property when you use GPL-licensed code.

4. Linux is not your only option, but …

You can certainly choose a traditional hard real-time operating system like VxWorks. The problem may be that not all peripheral devices may be supported—or some software stacks might not be available. For instance, GE supports the OpenFabrics Enterprise Distribution (OFED) on many of our boards as the enabler of RDMA protocols over Ethernet or InfiniBand. However, this is currently only available for Linux or Windows. We can support VxWorks on our boards—and have many customers using it successfully—but in HPEC applications, this means giving up RDMA in favor of a traditional TCP/IP stack. This reduces throughput and increases processor loading.

You can also consider using a hypervisor to virtualize the platform and host one or more guest operating systems, but the same issues of driver and stack support may come into play.

5. Take care how you run

Task priority can have a large effect on performance. Running at high priority can help reduce jitter by ensuring the task of interest takes priority over things of less interest. It can also help if a task is locked to a specific core and the operating system is told not to use that core for system purposes. Consider disabling HyperThreading, as this can increase jitter due to resource contention. Look at your system’s policy for power management. Allowing cores to enter idle state or to vary clock frequency can affect performance.

GE Intelligent Platforms recently investigated interrupt response times and message passing latencies using OpenMPI on three different versions of the Linux kernel—one “standard” server grade; one server grade with real-time pre-empt patches applied; and one with a proprietary, real-time kernel. The results are presented in a white paper available for download here.