c - Why is floor() so slow? -


I recently wrote some code (ISO / ANSI C), and it was astonished on poor performance. Short story short, it came to know that the culprit was the floor () function. Not only was this slow, but it was not vectored with Intel compiler, aka ICL.

There are some benchmarks for making floor for all cells in 2D matrix:

  VC: 0.10 ICL: 0.20  

Compare to a simple artist:

  VC: 0.04 ICL: 0.04  

how floor () compare to a simple artist Can be very slow in?! It essentially does the same thing (for negative numbers). Second Question: Does anyone know about superfast floor () implementation?

PS: This is the loop that I was benchmarking:

  Zero level (float * matte, int * intra, constant int height, constant int width, con width width) { Float * Row A = Faucet; Int * intRowA = NULL; Int row, col; (Row = 0; line and height; ++ row) for {rowA = matA + line * width_conversion; IntRowA = intA + line * width_aligned; #pragma ivdep (col = 0; col; lt; width; ++ col) {/ * intRowA [col] = floor (rowA [col]); * / IntRowA [col] = (int) (rowA [col]); Some things slow down the floor of an artist and prevent vectoring. I 

The most important one:

The floor can modify the global state. If you pass a value that is huge to represent in the float format as integer, then the errno variable is set to EDOM specially for the NAN It is also handled properly. All these behaviors are for applications that detect overflow case and want to handle the situation in any way (do not ask me how).

Detecting these problematic situations is not easy and the uplift time of more than 90% of the floor is real spherical and it can be underlined / vector-related. Apart from this it is a lot of code, so by inlining the entire floor-function, your program slows down.

Some compilers have special compiler flags that use the compiler as the least used C-standard rules. For example, GCC can be told that you are not interested in the wrong at all. To do this, pass -fno-math-errno or - hand-math The ICC and vice-president can be the same compiler flags.

BTW - You can roll your own floor-function using ordinary musk. You only have to handle negative and positive matters in a different way. It can be very fast if you do not need overflow and special operation of NaN.


Comments