fix: literal arrays are memcpy-ed instead of stored, non constant arrays are unrolled#1633
fix: literal arrays are memcpy-ed instead of stored, non constant arrays are unrolled#1633
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the compiler’s handling of array initializers by emitting memcpy from a materialized global for constant aggregate literals, and introducing a lowering pass that rewrites non-constant array literals into indexed assignments / FOR loops (including external POU constructor wiring behavior).
Changes:
- Materialize constant array/struct literals as anonymous
.const_initglobals and initialize viallvm.memcpyto avoid large inline aggregatestores. - Add
array_loweringpass to lower non-constant array literal initializers into element assignments / loops, integrated into the driver pipeline. - Add extensive lit + IR + Rust snapshot tests covering constant vs non-constant arrays, multiplied segments, REF/ADR cases, and external ctor scenarios.
Reviewed changes
Copilot reviewed 34 out of 34 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/lit/single/init/array_variable_elements.st | New lit test for non-constant variable element array lowering behavior. |
| tests/lit/single/init/array_ref_elements.st | New lit test for arrays initialized with REF() calls. |
| tests/lit/single/init/array_multiplied_variable_override.st | New lit test for overriding type-initialized arrays with multiplied literals. |
| tests/lit/single/init/array_multiplied_type_init.st | New lit test for type-level multiplied array initializer. |
| tests/lit/single/init/array_multiplied_struct_elements.st | New lit test for multiplied array init with struct literals. |
| tests/lit/single/init/array_multiplied_partial_init.st | New lit test for partial multiplied init and defaulting remainder. |
| tests/lit/single/init/array_multiplied_nonzero_offset.st | New lit test for non-zero-based array bounds with multiplied init. |
| tests/lit/single/init/array_multiplied_mixed_init.st | New lit test for mixed multiplied segments + explicit tail values. |
| tests/lit/single/init/array_multiplied_large.st | New lit test stressing large constant array init. |
| tests/lit/single/init/array_mixed_ref_segments.st | New lit test for mixed multiplied segments with REF() calls. |
| tests/lit/single/init/array_external_type_init.st | New lit test for type-level init correctness in external/global blocks. |
| tests/lit/multi/extern_fb_array_ctor_wiring/run.test | New split-compilation lit harness for external FB ctor + const array init. |
| tests/lit/multi/extern_fb_array_ctor_wiring/main.st | Consumer side of split-compilation test verifying ctor wiring. |
| tests/lit/multi/extern_fb_array_ctor_wiring/lit.local.cfg | Custom lit config to compile/link C FB body + ctor-only object. |
| tests/lit/multi/extern_fb_array_ctor_wiring/header.pli | External FB declaration with constant array initializer. |
| tests/lit/multi/extern_fb_array_ctor_wiring/fb_body.c | C implementation of external FB body for split-compilation scenario. |
| tests/lit/ir_tests/external_fb_array_no_init_globals.st | IR test: external FB without external-ctor generation should not emit init globals. |
| tests/lit/ir_tests/external_fb_array_const_init.st | IR test: with external ctor generation, should emit .const_init + memcpy in ctor. |
| src/tests/adr/arrays_adr.rs | Update expected IR to use .const_init + llvm.memcpy instead of inline aggregate stores. |
| src/codegen/tests/snapshots/rusty__codegen__tests__function_tests__function_call_with_array_access.snap | Snapshot update for memcpy-based constant aggregate initialization. |
| src/codegen/tests/snapshots/rusty__codegen__tests__code_gen_tests__array_of_struct_initialization_in_body.snap | Snapshot update for struct array init via .const_init + memcpy. |
| src/codegen/tests/oop_tests/super_tests.rs | Snapshot/expectation updates due to memcpy-based init for aggregates. |
| src/codegen/tests/initialization_test/complex_initializers.rs | Snapshot update for aggregate string-array init using .const_init + memcpy. |
| src/codegen/tests/fnptr.rs | Snapshot update for aggregate return/init path using .const_init + memcpy. |
| src/codegen/tests/debug_tests.rs | Debug IR snapshot updates for aggregate init now using memcpy. |
| src/codegen/generators/pou_generator.rs | Skip generating __init globals for external/included POUs. |
| src/codegen/generators/llvm.rs | Add helper to materialize aggregate constants as private unnamed-addr globals. |
| src/codegen/generators/expression_generator.rs | Use memcpy from materialized globals for aggregate literals to reduce IR size. |
| compiler/plc_lowering/src/tests/array_lowering_tests.rs | New unit tests validating array lowering decisions (unroll vs loop, stripping init). |
| compiler/plc_lowering/src/tests.rs | Register new lowering test module. |
| compiler/plc_lowering/src/lib.rs | Export new array_lowering module. |
| compiler/plc_lowering/src/array_lowering.rs | New lowering pass implementation for non-constant array literal initializers. |
| compiler/plc_driver/src/pipelines/participant.rs | Add pipeline participant integrating array lowering before annotation. |
| compiler/plc_driver/src/pipelines.rs | Wire the new ArrayLowerer into the default mutable pipeline participants. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| for block in &mut pou.variable_blocks { | ||
| for var in &mut block.variables { | ||
| if has_array_literal_initializer(var) && is_lowered_type(var, lowered_type_names) { | ||
| var.initializer = None; | ||
| } |
| // For aggregate constant values (array/struct literals), avoid emitting a giant | ||
| // inline `store` instruction. Instead, materialize the constant as an anonymous | ||
| // global and memcpy from it — this is O(1) in IR size regardless of array length. | ||
| if (expression.is_array_value() || expression.is_struct_value()) && left_type.is_aggregate() { |
mhasel left a comment
There was a problem hiding this comment.
I like it! One bug, the rest of the comments are minor.
| .get_type() | ||
| .size_of() | ||
| .ok_or_else(|| Diagnostic::codegen_error("Cannot determine type size", right_statement))?; | ||
| self.llvm.builder.build_memcpy(target, 1, global_ptr, 1, size)?; |
There was a problem hiding this comment.
instead of hardcoding the alignment to 1 here, can we compute the preferred llvm alignment and pass that here instead?
| fn store_aggregate_via_memcpy( | ||
| &self, | ||
| target: PointerValue<'ink>, | ||
| _target_type: &DataTypeInformation, |
There was a problem hiding this comment.
Dead parameter - ties into my comment about computing alignments
| // Strip from user-type declarations (e.g. `TYPE tarr : ARRAY[...] := [...]; END_TYPE`). | ||
| for udt in &mut unit.user_types { | ||
| if udt.initializer.as_ref().is_some_and(is_literal_array_node) { | ||
| let type_name = udt.data_type.get_name(); |
There was a problem hiding this comment.
This lowers variables by implementation.type_name, but strip_initializers looks them up by pou.name. For function block constructors these can differ, causing initializers to not be stripped when they should be.
I have added a test to showcase this: fb_member_array_non_const_initializer_is_stripped
| global.set_unnamed_addr(true); | ||
| global.set_linkage(Linkage::Private); | ||
| // Prevent Module::drop from disposing the module we don't own. | ||
| std::mem::forget(module); |
There was a problem hiding this comment.
While creating a new module that thinks it owns the pointer and forgetting it later looks to be fine now, there is a risk if inkwell ever adds panic!s or Results to the called functions.
A panic would result in the rust unwinder dropping both the original module and the copy here, resulting in a double free.
A result with an early return would call drop on the local value, invalidating the original module's llvm pointer in the process. I think it is worth to add this to the SAFETY comment above
There was a problem hiding this comment.
yeah this is the one thing i don't like about this change, i'm thinking of moving this into the llvm wrapper and implementing it through C api calls there and getting a reference to the module. Will look into it
| array_info: &ArrayInfo, | ||
| id_provider: &mut IdProvider, | ||
| ) -> LoweredResult { | ||
| let use_for_loop = count >= FOR_LOOP_THRESHOLD; |
There was a problem hiding this comment.
The threshold is checked per MultipliedStatement segment, not against the total element count. An initializer split into multiple sub-threshold segments - e.g. [10(v), 10(v), 10(v), 10(v)] totalling 40 elements - is fully unrolled into 40 individual assignments instead of emitting a FOR loop.
This is not blocking for me, as this is an edge case that is unlikely to happen in real code. I've added a test (multi_segment_above_threshold_is_unrolled_per_segment) to document this behaviour anyway.
Literal array values are now memcpy-ed
Arrays with non-constant values are converted into single assignments (or for loop assignments for multipication assignments)
For Reviewers: I mostly vibed this (with some steering/guidance), pay extra attention to the review