1

I wish to have a template method, which takes in data and processes it with a lambda function, whatever way the method itself wants to do that. However, I want the lambda function to get inlined so that the compiled assembly output won't end up having a "call" assembly instruction. Is this possible?

If it's not possible with lambdas, is there some other way to do that? Somehow using templates to pass a function as a template type or something?

I'm using C++17.

Below is an example of what I'm trying to achieve:

template <typename T> static inline void Process(const T* p_source1, const T* p_source2, T* p_destination, const int count, std::function<T (T, T)> processor) { for (int i = 0; i < count; i++) p_destination[i] = processor(p_source1[i], p_source2[i]); } void Process_Add(const uint8_t* p_source1, const uint8_t* p_source2, uint8_t* p_destination, const int count) { // How to make something like this lambda inline? auto lambda = [] (uint8_t a, uint8_t b) { return a + b; }; Process<uint8_t>(p_source1, p_source2, p_destination, count, lambda); } 
12
  • Yes, it's possible. You have better changes of achieving that if the lambda is passed as a parameter of templated type: typename U, U &&processor. This plus enabing optimization should do it. Commented Jan 14, 2024 at 11:28
  • @HolyBlackCat I'm not sure how to do that in practise. Commented Jan 14, 2024 at 11:39
  • 1
    Just do what I said, add a second template parameter (say, typename U), and replace std::function<T (T, T)> processor with U &&processor. Commented Jan 14, 2024 at 11:50
  • @HolyBlackCat ok, that indeed seems to work. Thank you! Commented Jan 14, 2024 at 11:56
  • @HolyBlackCat the code seems to compile to identical assembly code with and without the &&. Why is the && required in that parameter? Commented Jan 14, 2024 at 12:00

1 Answer 1

1

Yes, it's possible, but std::function is making it very unlikely because the call mechanism is so complex that it can't be inlined, even in simple cases. See Understanding the overhead from std::function and capturing synchronous lambdas

Here's the typical way of making inlining more likely:

template <typename T, typename F> requires (std::invocable<F, const T&, const T&> // optional: C++20 constraint && std::convertible_to<std::invoke_result<F, const T&, const T&>, T>) inline void Process(const T* p_source1, const T* p_source2, T* p_destination, const int count, F processor) { for (int i = 0; i < count; i++) p_destination[i] = processor(p_source1[i], p_source2[i]); } 

Each lambda expression has a unique closure type, so processor(...) invokes a call operator which is known at compile time. This makes inlining quite likely, as long as the lambda expression is relatively short.

Further notes

You could imitate the C++20 constraint with std::enable_if_t, or you could just leave the function unconstrained.

Using static and inline in combination is basically pointless. static communicates internal linkage for functions, and that's likely not your intent, assuming this template is used in more than one cpp file. See Should one never use static inline function?

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.