2

The documentation for _alloca() says here:

The _alloca routine returns a void pointer to the allocated space, which is guaranteed to be suitably aligned for storage of any type of object.

However, here it says:

_alloca is required to be 16-byte aligned and additionally required to use a frame pointer.

So it seems that in the first reference they forgot about 32-byte aligned AVX/AVX2 types like __m256d.

Another thing that confuses me is that the first page says _alloca() is deprecated, while it suggests to use instead a function that may allocate memory from the heap rather than the stack (which is unacceptable in my multi-threaded application).

So can someone point me whether there is some modern (perhaps, new C/C++ standard?) way for aligned stack memory allocation?

Clarification 1: Please, don't provide solutions which require the array size to be compile-time constant. My function allocates variable number of array items depending on run-time parameter value.

11
  • 1
    First, decide if you are asking about C or C++, though _alloca is not part of either of them. Commented Oct 22, 2017 at 20:42
  • alloca align allocation on 16byte. if you need another align - allocate more memory and align yourself Commented Oct 22, 2017 at 20:44
  • Will std::aligned_storage work for your needs? You can specify the alignment as the second template parameter and it comes from the stack given the example implementation which uses alignas. en.cppreference.com/w/cpp/types/aligned_storage Commented Oct 22, 2017 at 20:45
  • What is alignof(__m256d), for the benefit of people who don't have your platform extensions? Commented Oct 22, 2017 at 20:53
  • @KerrekSB, it was in the question: 32 bytes. Commented Oct 22, 2017 at 20:54

4 Answers 4

7

Overallocate with _alloca(), then hand-align. Like this:

const int align = 32; void *p =_alloca(n + align - 1); __m256d *pm = (__m256d *)((((int_ptr_t)p + align - 1) / align) * align); 

Replace const with #define, if necessary.

Sign up to request clarification or add additional context in comments.

4 Comments

better (__m256d *)(((UINT_PTR)p + (align - 1)) & ~(align - 1))
Either works :) The point is, overallocate to the worst possible case - (alignment-1) extra bytes. Then round up.
You actually only need to over allocate by 16 bytes, then check p % 32 != 0. If it's not 0 add 16 and done. The address needs to be 16 bytes aligned by spec.
Is that really what the spec says? I thought it was a GCC implementation quirk. Cite maybe?
2

_alloca() is certainly not a standard or portable way of handling alignment on the stack. Luckily in C++11 we got alignas and std::aligned_storage. Neither of these forces you to put anything on the heap, so they should work for your use case. For example, to align an array of structs to a 32 byte boundary:

#include <type_traits> struct bar { int member; /*...*/ }; void fun() { std::aligned_storage<sizeof(bar), 32>::type array[16]; auto bar_array = reinterpret_cast<bar*>(array); } 

Or if you just want to align a single variable on the stack to a boundary:

void bun() { alignas(32) bar b; } 

You can also use the alignof operator to get the alignment requirements for a given type.

Comments

1

C++11 introduced the alignof operator:

An alignof expression yields the alignment requirement of its operand type.

You can use it as follows:

struct s {}; typedef s __attribute__ ((aligned (64))) aligned_s; std::cout << alignof(aligned_s); // Outputs: 64 

Note: If your type's alignment is bigger than its size, the compiler won't let you declare arrays of the array type(See more here):

error: alignment of array elements is greater than element size

But, if your type's alignment is smaller then its size, you can safely allocate arrays:

aligned_s arr[32]; -- OR -- constexpr size_t arr_size = 32; aligned_s arr[arr_size]; 

Compilers that support VLAs, will allow those for the newly defined type as well.

3 Comments

Does this approach allow non-constant array size? Array size changes at runtime between calls of the function where I need _alloca().
cl not support non-constant array size
@SergeRogatch, Dynamic Arrays (a.k.a VLAs) were considered as part of the standard but didn't make it. Though, G++ (4.6.3) and Clang (900.0.38) allow it.
0

The "modern" way is:

Don't make variable-length allocation on the stack.

In the context of your question - wanting to allocate on the heap but refraining from doing so - I'm assuming you may be allocating more than some small compile-time constant amount of memory. In that case, you're simply going to smash your stack with that alloca() call. Instead, use a thread-safe memory allocator. I'm sure there are libraries for this on GitHub (and at worst you could protect allocation calls with a global mutex, although that's slow if you need lots of them).

On the other hand, if you do know in advance what's the cap on the allocation size - just pre-allocate that much memory in thread-local storage; or use a fixed-size local array (which will get allocated on the stack).

2 Comments

Don't make variable-length allocation on the stack - this is why ?
really variable-length allocation is very effective on relative small blocks. if we do this in user mode and in own exe file (so we exactly know the stack size and can set it at build time). usually we free allocate hundreds of thousands bytes in stack. another question, when we do this first time and allocate several pages(4KB) or more this will be slowly compare heap allocation (if special not move guard page down before). and in case stack overflow behavior is defined. (all this for windows)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.