Existing Applications: Pitfalls to Avoid
When you have an existing application and you want to port it to a supercomputer in the hopes of getting your results more quickly, there are several potential pitfalls that you need to be aware of. The goal of this page is to list some of them and explain how to avoid them.
Assuming that the software runs under Linux
The first pitfall comes from automatically assuming that your application will be able to run on a supercomputer. It's often the case but you first need to verify that it's true in your case. The principal roadblock that an application might face concerns the operating system. All of the Calcul Québec supercomputers use a variant of the operating system Linux. If your application only runs under Windows, it will be much harder to get it to work on our servers. Sometimes it's possible, particularly if the program can run without a graphical interface, but you should expect to have to overcome several obstacles before the software runs. It may be wiser to attempt to find another program that achieves much the same result but is better adapted to Calcul Québec's infrastructure. We recommend that you familiarize yourself with Linux before beginning to use the Calcul Québec servers.
Assuming that the software uses the filesystem correctly
One of the main differences between a supercomputer and an ordinary desktop computer is its parallel filesystem. From a technical perspective, while parallel filesystems offer a much higher bandwidth than conventional hard drives their latency is generally much higher. What this means for you as a user is that the performance of small file I/O operations like reading or writing just a few bytes will be much worse on a parallel filesystem, whereas very large I/O operations involving hundreds of megabytes will have a much better performance. What's more, as the filesystem is a shared resource, an application which hasn't been designed to use such a filesystem won't just find its own performance degraded but will also negatively affect the jobs of other users on that server.
The same observation applies to the number and size of files. In order to optimize the performance of a parallel filesystem, it is much better to have a handful of very large files, even several gigabytes in size, rather than thousands (or millions) of files of just a few kilobytes.
Assuming that the software will run faster on a supercomputer
In general terms, a typical application won't run faster by using more processors. At best, it will run as quickly as it does with a single processor and may well run more slowly, wasting resources. In order to run more quickly and exploit the additional processors, an application needs to be designed to run in parallel, which will normally require modifying its source code to a greater or lesser degree. The best that can be done with a serial application is to run simultaneously several instances of the same program using different parameter values. You can accomplish this either by using an appropriate set of commands in your job submission script or by using a process manager like GNU parallel.
Assuming that parallel software runs faster with more processors
In almost every case, running an application in parallel implies a certain amount of overhead associated with inter-process communication and synchronization, regardless of the parallelization technique employed (MPI, OpenMP, pthreads...). The typical pattern of behaviour that's observed is that initially, for a relatively small number of processors, the application will run more quickly but that at a certain point, depending on both the application and the supercomputer, there will be an inflection point beyond which there are no further gains in performance. Not only this, but it's common for applications to run more slowly as more processors are added beyond this point, due to the growing communication and synchronization costs and at a certain point the application may stop running altogether. There's no way to predict at what number of processors this inflection point will arise, as it depends on the algorithms used, how they're implemented in code, the input parameters and model being used, the server's networking hardware, processor speed, filesystem, memory bandwidth... The only way to determine the optimal number of processors for your code and model on a particular machine is to carry out a set of empirical tests, gradually increasing the number of processors on a fixed input model and seeing what the resulting acceleration is on the machine that you will be using for your research.