Skip to content

Conversation

@wx257osn2
Copy link
Contributor

  • Add ADA_AVX2 macro in include/ada/common_defs.h enabled if defined(__AVX2__)
  • Implement has_tabs_or_newline using AVX2
@lemire
Copy link
Member

lemire commented Jun 4, 2023

I built this PR while passing -mavx2 to the compiler. Note that, right now, this optimization won't build otherwise (AVX2 is a compile time flag that is not defined by default).

I use GCC 11 on an Ice Lake processor.

Before this PR:

BasicBench_AdaURL_aggregator_href 23545729 ns 23504034 ns 30 GHz=3.18576 cycle/byte=8.54734 cycles/url=742.415 instructions/byte=25.7505 instructions/cycle=3.0127 instructions/ns=9.59773 instructions/url=2.23667k ns/url=233.042 speed=369.643M/s time/byte=2.70532ns time/url=234.982ns url/s=4.25565M/s BasicBench_AdaURL_aggregator_href 23525308 ns 23484552 ns 30 GHz=3.18608 cycle/byte=8.54269 cycles/url=742.011 instructions/byte=25.7505 instructions/cycle=3.01433 instructions/ns=9.60391 instructions/url=2.23667k ns/url=232.892 speed=369.949M/s time/byte=2.70307ns time/url=234.787ns url/s=4.25918M/s BasicBench_AdaURL_aggregator_href 23436257 ns 23393300 ns 30 GHz=3.18577 cycle/byte=8.5561 cycles/url=743.176 instructions/byte=25.7519 instructions/cycle=3.00977 instructions/ns=9.58844 instructions/url=2.23679k ns/url=233.28 speed=371.392M/s time/byte=2.69257ns time/url=233.875ns url/s=4.2758M/s 

After this PR:

BasicBench_AdaURL_aggregator_href 23419502 ns 23375945 ns 30 GHz=3.18859 cycle/byte=8.75773 cycles/url=760.689 instructions/byte=25.7505 instructions/cycle=2.94032 instructions/ns=9.37548 instructions/url=2.23667k ns/url=238.566 speed=371.668M/s time/byte=2.69057ns time/url=233.701ns url/s=4.27897M/s BasicBench_AdaURL_aggregator_href 23561403 ns 23517519 ns 30 GHz=3.18576 cycle/byte=8.56581 cycles/url=744.02 instructions/byte=25.7519 instructions/cycle=3.00636 instructions/ns=9.57755 instructions/url=2.23679k ns/url=233.545 speed=369.431M/s time/byte=2.70687ns time/url=235.116ns url/s=4.25321M/s BasicBench_AdaURL_aggregator_href 23543378 ns 23502914 ns 30 GHz=3.18865 cycle/byte=8.56245 cycles/url=743.728 instructions/byte=25.7519 instructions/cycle=3.00754 instructions/ns=9.58998 instructions/url=2.23679k ns/url=233.242 speed=369.66M/s time/byte=2.70519ns time/url=234.97ns url/s=4.25586M/s 

So I am not saying any robust difference. It is possible that I made a mistake, but we need quantified benefits one way or another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants