Stack alignment on x86

Previous: SIMD alignment and fftw_malloc, Up: Data Alignment

3.1.2 Stack alignment on x86

On the Pentium and subsequent x86 processors, there is a substantial performance penalty if double-precision variables are not stored 8-byte aligned; a factor of two or more is not unusual. Unfortunately, the stack (the place that local variables and subroutine arguments live) is not guaranteed by the Intel ABI to be 8-byte aligned.

Recent versions of gcc (as well as most other compilers, we are told, such as Intel's, Metrowerks', and Microsoft's) are able to keep the stack 8-byte aligned; gcc does this by default (see -mpreferred-stack-boundary in the gcc documentation). If you are not certain whether your compiler maintains stack alignment by default, it is a good idea to make sure.

Unfortunately, gcc only preserves the stack alignment—as a result, if the stack starts off misaligned, it will always be misaligned, with a disastrous effect on performance (in double precision). To prevent this, FFTW includes hacks to align its own stack if necessary, so it should perform well even if you call it from a program with a misaligned stack. Currently, our hacks support gcc and the Intel C compiler; if you use another compiler you are on your own. Fortunately, recent versions of glibc (on GNU/Linux) provide a properly-aligned starting stack, but this was not the case with a number of older versions, and we are not certain of the situation on other operating systems. Hopefully, as time goes by this will become less of a concern.