Apple Neural Engine (ANE) Reverse Engineering

Reverse engineering artifacts for Apple's Neural Engine stack: ANECompiler, Espresso, and AppleNeuralEngine frameworks.

Target Audience: Performance engineers and security researchers working with Apple silicon ML acceleration.

Key Findings

Discovery	Details	Significance
SDPA Layer	`ANECSDPALayerDesc` is only 8 bytes	Native transformer attention in ANE hardware
40+ Optimization Passes	`Pass_fuse_conv_batchnorm`, `Pass_fold_constants`, etc.	Full Espresso compiler pipeline discoverable
XPC Daemon Architecture	`aned` at `/usr/libexec/aned`	Privilege boundary for ANE access
Entitlement Bypass	Struct init functions work without signing	Can probe all layer descriptor layouts
PBZE Format	LZFSE-compressed espresso.net	System models decodable with libcompression
Silent Failures	`compileModel:` returns NULL without error	Operations fail silently without entitlements
IOSurface Memory	`EspressoANEIOSurface` (21 methods)	Zero-copy tensor sharing with Metal
Quantization Modes	`quantization_mode:2` on inner_product	ANE-specific quantization discovered
CoreML ANE Path	`MLComputeUnitsAll` enables ANE	Working path for ANE execution via public API
HWX Binary Format	Magic `0xBEEFFACE`, Mach-O-like	Pre-compiled ANE instructions per chip generation
16 ANE Cores	M3 Pro has 16 neural engine cores	Confirmed via `MLNeuralEngineComputeDevice`

Quick Reference: What Works Without Entitlements

Operation	Works?	Notes
Load ANECompiler.framework	Yes	All frameworks load
Call `ANEC*Initialize()`	Yes	Can probe struct sizes
Create `EspressoContext` (CPU)	Yes	Platform 0 works
Load `EspressoNetwork`	Yes	CPU inference works
Create `_ANEClient`	Yes	Object created but...
Call `compileModel:`	No	Returns NULL silently
Call `loadModel:`	No	Returns NULL silently
ANE inference	No	Requires entitlements
CoreML with ANE	Yes	Use `MLComputeUnitsAll` - working path!
XPC to aned	Yes	Connection succeeds, ops need entitlements
MLComputePlan	Yes	Can inspect device availability

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐ │ User Application │ │ (Core ML, Create ML, BNNS) │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────┐ ┌─────────────────────────────┐ │ │ │ Espresso.framework │ │ AppleNeuralEngine.framework │ │ │ │ ───────────────── │ │ ─────────────────────────── │ │ │ │ • EspressoContext │ │ • _ANEClient │ │ │ │ • EspressoNetwork │ │ • _ANEModel │ │ │ │ • 40+ Pass_* classes │ │ • _ANERequest │ │ │ │ • CPU/GPU/ANE dispatch │ │ • _ANEDaemonConnection │ │ │ └───────────┬─────────────┘ └──────────────┬──────────────┘ │ │ │ │ │ ├──────────────┴──────────────────────────────────┴────────────────┤ │ ANECompiler.framework │ │ ───────────────────── │ │ • ANECConvLayerDesc (176 bytes) • ANECSDPALayerDesc (8 bytes)│ │ • ANECPoolLayerDesc (96 bytes) • ANECLinearLayerDesc (64 B) │ │ • ANECTensorDims (40 bytes) • 30+ layer descriptors │ ├─────────────────────────────────────────────────────────────────┤ │ XPC Transport Layer │ │ Service: com.apple.appleneuralengine │ ├─────────────────────────────────────────────────────────────────┤ │ aned (/usr/libexec/aned) │ │ ─────────────────────── │ │ • ANEProgramCreate() • Model cache management │ │ • ANEProgramInstanceCreate() • Garbage collection │ │ • Sandbox extension handling • Telemetry │ ├─────────────────────────────────────────────────────────────────┤ │ ANE Hardware (M1/M2/M3+) │ │ ─────────────────────── │ │ • 16 neural engine cores • Dedicated SRAM │ │ • Up to 15.8 TOPS (M1) • IOSurface DMA │ └─────────────────────────────────────────────────────────────────┘

Data Flow: Model Compilation

.mlmodelc/ aned daemon Hardware ──────────── ─────────── ──────── │ │ │ │ 1. _ANEModel create │ │ ├──────────────────────────────►│ │ │ │ │ │ 2. compileModel: (XPC) │ │ ├──────────────────────────────►│ │ │ │ 3. ANECompiler │ │ ├─────────────────────────►│ │ │ │ │ │ 4. ANEProgramCreate() │ │ ├─────────────────────────►│ │ │ │ │ 5. Return program handle │ │ │◄──────────────────────────────┤ │ │ │ │ │ 6. loadModel: (XPC) │ │ ├──────────────────────────────►│ │ │ │ 7. Map to ANE memory │ │ ├─────────────────────────►│ │ │ │ │ 8. evaluateWithModel: │ 9. Execute on ANE │ ├──────────────────────────────►├─────────────────────────►│ │ │ │

Repository Structure

ane/ ├── __init__.py # Package exports, builds API tree for tooling ├── compiler.py # ANECompiler.framework ctypes bindings │ # - Layer descriptor structs │ # - ANEC*Initialize() wrappers │ # - Struct size probing │ ├── espresso.py # Espresso model format parser │ # - EspressoNet, EspressoLayer classes │ # - Layer type documentation │ # - CPU vs ANE model comparison │ ├── runtime.py # Espresso/ANE runtime bindings │ # - EspressoContext creation │ # - EspressoNetwork loading │ # - ObjC class introspection │ ├── xpc.py # ANE XPC protocol documentation │ # - _ANEDaemonConnection methods │ # - _ANEClient methods │ # - XPC operation categories │ ├── pbze.py # PBZE (compressed espresso.net) decoder │ # - LZFSE decompression via libcompression │ # - Header parsing │ # - Compression statistics │ ├── sample.py # Example graph building code │ # - SimpleANEGraph class │ # - CNN and Transformer examples │ ├── tests/ │ └── test_ane.py # Comprehensive pytest suite (623 lines) │ └── helper/ ├── ane_helper.m # Objective-C helper for privileged ANE access ├── ane_helper.entitlements └── build.sh # Build script

Espresso Engine Teardown

Espresso is Apple's internal ML inference runtime that powers Core ML. It handles model execution across CPU, GPU, and ANE.

Model Format (`.espresso.net`)

Two formats exist:

JSON (human-readable):

{ "format_version": 200, "storage": "model.espresso.weights", "layers": [ { "name": "conv1", "type": "convolution", "bottom": "input", "top": "conv1_output", "kernel_size": 3, "stride": 1, "pad": 1, "C": 64 } ], "analyses": {}, "properties": {} }

PBZE (binary, LZFSE-compressed):

Offset Size Description ────── ──── ─────────── 0x00 4 Magic: b'pbze' 0x04 4 Version (usually 0) 0x08 8 Unknown (header size?) 0x10 4 Uncompressed size (BIG ENDIAN!) 0x14 4 Unknown 0x18 4 Padding 0x1C ... LZFSE data (starts with b'bvx2')

Layer Types

Compute Layers

Type	Description	Key Attributes
`inner_product`	Dense/fully-connected	`nB`, `nC`, `quantization_mode`, `is_lookup`, `has_biases`
`convolution`	2D convolution	`kernel_size`, `stride`, `pad`, `C`, `groups`
`batch_matmul`	Batched matrix multiply	`transpose_a`, `transpose_b`
`elementwise`	Binary/unary operations	`operation` (see operation codes below)
`activation`	Nonlinearities	`type` (relu, gelu, tanh, sigmoid, etc.)
`softmax`	Softmax normalization	`axis`
`reduce`	Reduction operations	`mode` (sum, mean, max, min, prod)

Memory/Shape Layers

Type	Description	Key Attributes
`reshape`	Tensor reshape	`shape`
`transpose`	Permute dimensions	`axes`
`concat`	Concatenate tensors	`axis`
`general_concat`	N-D concatenation	`axis`, flexible inputs
`split_nd`	Split along axis	`axis`, `num_splits` or `split_sizes`
`general_slice`	Slice tensor	`starts`, `ends`, `strides`
`expand_dims`	Add dimension	`axes`
`load_constant`	Load constant tensor	`blob_weights`

Quantization Layers

Type	Description	Notes
`dynamic_quantize`	Runtime quantization	Converts FP to INT8
`dynamic_dequantize`	Runtime dequantization	Converts INT8 to FP

Special Layers

Type	Description
`instancenorm_1d`	Instance normalization
`get_shape`	Returns tensor shape
`nonzero`	Find nonzero indices
`scatter_nd`	Scatter operation
`tile`	Tile/repeat tensor

Elementwise Operation Codes

Code Operation Code Operation ──── ───────── ──── ───────── 0 add 25 pow 1 sub 26 exp 2 mul 27 log 3 div 28 abs 4 floor_div 101 select (ternary: a ? b : c) 10 max 105 less_than 11 min 106 less_equal 107 not_equal 20 sqrt 108 equal 21 rsqrt 109 greater_equal 22 square 110 greater_than 23 neg 24 reciprocal 117 floor 118 ceil

CPU vs ANE Model Differences

When a model is compiled for ANE, several transformations occur:

Aspect	CPU Model	ANE Model
Layer count	Fewer	More (ops decomposed)
Reshape ops	`reshape` layer	Often replaced with `convolution`
Embeddings	`inner_product`	`inner_product` with `is_lookup:1`
FC layers	`inner_product`	`inner_product` with `quantization_mode:2`
Tensor manipulation	Single ops	`split_nd`/`concat` chains

Example: A model with 50 CPU layers might have 80+ ANE layers due to operation decomposition.

Optimization Passes (40+ discovered)

Espresso includes extensive optimization passes accessible via EspressoCustomPass subclasses:

Pass_fuse_conv_batchnorm # Fuse BN into conv weights Pass_fold_constants # Constant folding Pass_eliminate_dead_code # DCE Pass_fuse_activation # Fuse relu/gelu into preceding op Pass_optimize_transpose # Eliminate redundant transposes Pass_convert_to_ane_layout # Convert to ANE memory layout Pass_quantize_weights # Weight quantization Pass_split_large_tensors # Split tensors for ANE tile size ... (and 30+ more)

Compiler Engine Teardown

ANECompiler.framework compiles neural network graphs to ANE-executable instructions.

Layer Descriptor Sizes (Runtime Probed)

All sizes determined by calling ANEC*Initialize() with a sentinel-filled buffer:

Struct	Size	Field Layout (inferred)
`ANECKernelSize`	24	3x u64: depth, height, width
`ANECStep`	12	3x u32: depth, height, width
`ANECPadding`	24	6x u32: d_front, d_back, h_front, h_back, w_front, w_back
`ANECTensorDims`	40	5x u64: N, C, H, W, D
`ANECTensorDesc`	64	ptr(8) + dims(48) + flags(8)
`ANECConvLayerDesc`	176	Kernel, stride, padding, dilation, groups, etc.
`ANECPoolLayerDesc`	96	Kernel, stride, pool type, etc.
`ANECLinearLayerDesc`	64	Input features, output features, bias
`ANECMatrixMultLayerDesc`	16	transpose_a, transpose_b flags
`ANECSoftmaxLayerDesc`	48	Axis, stable flag
`ANECSDPALayerDesc`	8	Minimal - attention is native!
`ANECNeuronLayerDesc`	32	Activation type, params
`ANECReductionLayerDesc`	24	Reduction mode, axes
`ANECReshapeLayerDesc`	48	Target shape
`ANECTransposeLayerDesc`	32	Permutation
`ANECConcatLayerDesc`	16	Axis
`ANECGatherLayerDesc`	24	Axis, batch_dims

Layer Categories (All 40+ Discovered)

Category Layer Types ──────── ─────────── Attention/Transformer SDPA Convolution Conv, CrossCorrelation, DepthwiseConv Pooling Pool, GlobalPool, AdaptivePool Normalization Norm, BatchNorm, LayerNorm, GroupNorm, LRN Linear/Matrix Linear, MatrixMult, Einsum Activation Neuron, Softmax, LogSoftmax, Dropout Reshape/Layout Reshape, Transpose, Flatten, Unflatten, Concat, Split, Tile, Expand, Squeeze Spatial Resize, Pad, CropResize, Resample, AffineTransform, GridSample Reduction Reduction, TopK, Sort, ArgMax, ArgMin Scatter/Gather Gather, GatherND, Scatter, ScatterND Misc Shape, Range, Random, Fill, RingBuffer, InputView, Copy

Version APIs

from ane import ANECompiler ane = ANECompiler() print(f"MPS Dialect Version: {ane.mps_dialect_version}") print(f"MPS SPI Dialect Version: {ane.mps_spi_dialect_version}") print(f"Validate Network Version: {ane.validate_network_version}") print(f"Analytics Buffer Size: {ane.analytics_buffer_size}")

ANE Runtime Details

XPC Protocol

Communication with ANE hardware goes through the aned daemon via XPC.

Services

Service	Purpose
`com.apple.appleneuralengine`	Main service (requires entitlements)
`com.apple.appleneuralengine.private`	Private/internal service
`com.apple.aned`	Daemon Mach service

XPC Operations

Compilation:

-[_ANEDaemonConnection compileModel:sandboxExtension:options:qos:withReply:] -[_ANEDaemonConnection compiledModelExistsFor:withReply:] -[_ANEDaemonConnection compiledModelExistsMatchingHash:withReply:] -[_ANEDaemonConnection purgeCompiledModel:withReply:]

Loading:

-[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:] -[_ANEDaemonConnection loadModelNewInstance:options:modelInstParams:qos:withReply:] -[_ANEDaemonConnection unloadModel:options:qos:withReply:]

Execution:

-[_ANEDaemonConnection prepareChainingWithModel:options:chainingReq:qos:withReply:]

Real-time:

-[_ANEDaemonConnection beginRealTimeTaskWithReply:] -[_ANEDaemonConnection endRealTimeTaskWithReply:]

Memory Management

ANE uses IOSurface for tensor memory, enabling zero-copy sharing with GPU/Metal.

EspressoANEIOSurface Methods:

-createIOSurfaceWithExtraProperties: -metalBufferWithDevice: -setExternalStorage:ioSurface: -nFrames -bytesPerFrame -totalBytes // ... 21 methods total

Entitlements

Entitlement	Purpose	Required For
`com.apple.aned.private.allow`	Primary ANE access	compile, load, evaluate
`com.apple.aned.private.adapterWeight.allow`	Adapter weights access	Custom weight loading
`com.apple.aned.private.aggressivePowerSaving.allow`	Power saving modes	Low-power inference
`com.apple.ANECompilerService.allow`	Compiler service access	Model compilation
`com.apple.aned.private.processModelShare.allow`	Cross-process model sharing	Shared inference
`com.apple.ane.memoryUnwiringOptOutAccess.allow`	Memory unwiring control	Large model persistence
`com.apple.private.modelPurgeInAllPartitions.allow`	Model cache purging	Cache management
`com.apple.aned.private.secondaryANECompilerServiceAccess.allow`	Secondary compiler	Parallel compilation
`com.apple.private.ANEStorageMaintainer.allow`	Storage maintenance	Cache cleanup

Boot Arguments (Internal/Debug Builds Only)

On Apple internal builds, these boot-args can bypass entitlement checks:

Boot Arg	Purpose	Effect
`ane_skipAdapterWeightAccessCheck`	Bypass adapter weight entitlement	Skip `com.apple.aned.private.adapterWeight.allow` check
`ane_vm_allowPrecompiledBinary`	Allow precompiled binaries	Skip binary validation in VM
`ane_vm_debugDumpBootArg`	Enable debug dumps	Dump ANE state on errors
`ane_vm_forceValidationOnGuest`	Force validation in VM	Extra validation for VMs

Note: These boot-args only work when isInternalBuild returns true (Apple internal builds only). Consumer macOS always returns false for isInternalBuild.

Internal Build Detection

The aned daemon checks for internal builds via _ANEDeviceInfo.isInternalBuild, which:

Checks for /AppleInternal directory existence
Queries os_variant_has_internal_content("com.apple.aned")
Checks os_variant_allows_internal_security_policies("com.apple.aned")

All checks return false on consumer macOS installations.

Model Cache

Compiled models are cached in:

/var/folders/<user_hash>/com.apple.aned/

Cache operations in aned:

com.apple.aned.modelCacheAsyncIO
com.apple.aned.modelCacheGC
com.apple.aned.danglingModelsGC

Runtime Class Reference

Key classes discovered through runtime introspection:

`_ANEDeviceInfo` (Class Methods)

+ (BOOL)hasANE; // Returns YES on Apple Silicon + (NSInteger)numANEs; // Number of ANE devices (usually 1) + (NSInteger)numANECores; // Number of cores (e.g., 16 for M1) + (NSString *)productName; // "macOS" + (NSString *)buildVersion; // e.g., "25B78" + (NSInteger)aneArchitectureType; // Hardware architecture identifier + (NSInteger)aneSubType; // Hardware subtype + (BOOL)isVirtualMachine; // VM detection + (BOOL)isInternalBuild; // Apple internal build detection + (BOOL)precompiledModelChecksDisabled; + (NSString *)bootArgs; // Current boot arguments + (BOOL)isBootArgPresent:(NSString *)arg; + (BOOL)isBoolBootArgSetTrue:(NSString *)arg;

`_ANEStrings` (Class Methods - Returns Constant Strings)

+ (NSString *)restrictedAccessEntitlement; // "com.apple.aned.private.allow" + (NSString *)adapterWeightsAccessEntitlement; // "com.apple.aned.private.adapterWeight.allow" + (NSString *)adapterWeightsAccessEntitlementBypassBootArg; // "ane_skipAdapterWeightAccessCheck" + (NSString *)internalLibraryPath; // "/AppleInternal/Library" + (NSString *)systemLibraryPath; // "/System/Library" // ... and many more

Hardware Info Example Output

hasANE = 1 numANEs = 1 numANECores = 16 productName = macOS buildVersion = 25B78 isVirtualMachine = 0 isInternalBuild = 0 precompiledModelChecksDisabled = 0

Security Analysis

Attack Surface

1. XPC Message Handling

The aned daemon accepts XPC messages from clients. Potential vectors:

Malformed model paths: Does compileModel: properly validate URL paths?
Sandbox extensions: sandboxExtension: parameter passes filesystem access tokens
Memory corruption: Large or malformed layer descriptors
Race conditions: Concurrent compile/load/unload operations

2. IOSurface Sharing

IOSurface enables shared memory between processes:

Client Process aned Daemon ANE Hardware ────────────── ─────────── ──────────── │ │ │ │ Create IOSurface │ │ ├──────────────────────►│ │ │ │ Map to ANE │ │ ├────────────────────►│ │ │ │ │ Write input data │ │ ├───────────────────────┼────────────────────►│ │ │ │ │ Read output data │ │ │◄──────────────────────┼─────────────────────┤

Concerns:

Shared memory lifetime management
Buffer overflow if sizes mismatch
Use-after-free on premature unmap

3. Model Cache

The /var/folders/.../com.apple.aned/ cache:

World-readable in some configurations
Contains compiled ANE bytecode
Could leak model architecture details

What Works Without Entitlements

These operations succeed without code signing:

Framework loading: All three frameworks load via dlopen/ctypes
Struct initialization: All ANEC*Initialize() functions callable
Size probing: Can determine struct layouts by sentinel analysis
CPU inference: EspressoContext(platform=0) works
Model parsing: Read and parse .espresso.net files
Client creation: _ANEClient object creation succeeds

What Fails Without Entitlements

These operations fail silently (no error, just NULL return):

compileModel:options:qos:error: - returns nil
loadModel:options:qos:error: - returns nil
evaluateWithModel:options:request:qos:error: - returns nil
_ANEDeviceController - can't access valid device

Security note: Silent failures make debugging difficult but also prevent enumeration of error conditions.

Performance Analysis

Profiling APIs

Layer-Level Profiling

@interface EspressoProfilingLayerInfo : NSObject @property (readonly) NSString *name; @property (readonly) NSString *debug_name; @property (readonly) double average_runtime; // seconds @property (readonly) int selected_runtime_engine; // 0=CPU, 1=GPU, 2=ANE @property (readonly) NSArray *runtimes; @end

Network-Level ANE Profiling

@interface EspressoProfilingNetworkANEInfo : NSObject @property (readonly) uint64_t total_ane_time_ns; @property (readonly) uint64_t ane_time_per_eval_ns; @end

Request-Level Stats

@interface _ANERequest : NSObject @property uint32_t perfStatsMask; // Bitmask for which stats to collect @property (readonly) id perfStats; @property (readonly) NSArray *perfStatsArray; @end

Operation Mapping

Operations with Native ANE Support

These map 1:1 to ANE instructions:

Convolution (all variants)
Matrix multiplication
Scaled Dot-Product Attention (SDPA)
Softmax
Common activations (ReLU, GeLU, Tanh)
Pooling operations
Element-wise arithmetic

Operations That Get Decomposed

These are broken into multiple ANE ops:

LayerNorm → multiple passes
Complex reductions
Non-standard activations
Dynamic shapes

Fallback to CPU/GPU

Operations fall back when:

Tensor too large for ANE SRAM
Unsupported operation type
Dynamic control flow
Precision requirements exceed INT8/FP16

Example Runthrough

Building a CNN Graph

from ane import SimpleANEGraph # Create graph builder graph = SimpleANEGraph() # Input: (batch=1, channels=3, height=224, width=224) graph.add_conv2d("conv1", (1, 3, 224, 224), out_channels=64, kernel_size=7, stride=2, padding=3) # Output: (1, 64, 112, 112) graph.add_pool2d("pool1", (1, 64, 112, 112), kernel_size=3, stride=2) # Output: (1, 64, 56, 56) graph.add_conv2d("conv2", (1, 64, 56, 56), out_channels=128, kernel_size=3, padding=1) # Output: (1, 128, 56, 56) graph.add_conv2d("conv3", (1, 128, 56, 56), out_channels=256, kernel_size=3, padding=1) # Output: (1, 256, 56, 56) graph.add_pool2d("pool2", (1, 256, 56, 56), kernel_size=2, stride=2) # Output: (1, 256, 28, 28) graph.add_linear("fc1", input_features=256*28*28, output_features=1024) graph.add_linear("fc2", input_features=1024, output_features=1000) graph.add_softmax("softmax", (1, 1000)) print(graph.summary())

Output:

ANE Computation Graph ============================================================ conv1 (conv2d) Input: (1, 3, 224, 224) Output: (1, 64, 112, 112) Desc: 176 bytes Kernel: 7x7 Stride: 2x2 Pad: 3,3 pool1 (pool2d) Input: (1, 64, 112, 112) Output: (1, 64, 56, 56) Desc: 96 bytes Kernel: 3x3 Stride: 2x2 ... ============================================================ Total layers: 8 Total descriptor bytes: 680

Building Transformer Attention

from ane import build_transformer_attention graph = build_transformer_attention() print(graph.summary())

Output:

ANE Computation Graph ============================================================ proj_qkv (linear) Input: (512, 512, 1, 1) Output: (512, 1536, 1, 1) Desc: 64 bytes attention (sdpa) Input: (1, 8, 512, 64) Output: (1, 8, 512, 64) Desc: 8 bytes <-- Native transformer attention! proj_out (linear) Input: (512, 512, 1, 1) Output: (512, 512, 1, 1) Desc: 64 bytes ============================================================ Total layers: 3 Total descriptor bytes: 136

Loading Espresso Models

from ane import ( create_espresso_cpu_context, load_espresso_network, get_network_layer_count, EspressoNet, ) # Method 1: Direct runtime loading (CPU only without entitlements) ctx = create_espresso_cpu_context() print(f"Context: {hex(ctx)}") model_path = "/path/to/model.espresso.net" net = load_espresso_network(model_path, ctx) print(f"Network: {hex(net)}") print(f"Layers: {get_network_layer_count(net)}") # Method 2: Parse the file directly model = EspressoNet.from_file(model_path) print(f"Format version: {model.format_version}") print(f"Layer types: {model.layer_type_counts()}") # Analyze inner_product layers for quantization for ip in model.get_inner_product_info(): print(f" {ip['name']}: {ip['nB']}x{ip['nC']}, " f"quant={ip['quantization_mode']}, lookup={ip['is_lookup']}")

Decoding PBZE Files

from ane import decode_espresso_net, get_pbze_stats, is_pbze_file path = "/System/Library/SomeFramework/model.espresso.net" # Check format if is_pbze_file(path): stats = get_pbze_stats(path) print(f"Compressed size: {stats['compressed_size']} bytes") print(f"Uncompressed size: {stats['uncompressed_size']} bytes") print(f"Compression ratio: {stats['compression_ratio']:.2f}x") # Decode (handles both JSON and PBZE automatically) data = decode_espresso_net(path) print(f"Layers: {len(data['layers'])}")

Using the Native Helper

For full ANE access, use the signed Objective-C helper:

# Build and sign cd helper ./build.sh "Developer ID Application: Your Name (TEAMID)" # Check status echo '{"cmd": "status"}' | ./ane_helper # {"ok":true,"client":true,"model_count":0,"model_ids":[]} # Compile a model echo '{"cmd": "compile", "model_path": "/path/to/model.mlmodelc"}' | ./ane_helper # {"ok":true,"model_id":"ABC123","state":1} # Load into ANE memory echo '{"cmd": "load", "model_id": "ABC123"}' | ./ane_helper # {"ok":true,"model_id":"ABC123","program_handle":12345} # Unload echo '{"cmd": "unload", "model_id": "ABC123"}' | ./ane_helper # {"ok":true}

Comprehensive Reference

Complete Layer Type Reference

Espresso Layer Types (from system model analysis)

Type	Category	Attributes
`activation`	Compute	`type` (relu/gelu/tanh/sigmoid/etc), `alpha`, `beta`
`batch_matmul`	Compute	`transpose_a`, `transpose_b`, `adj_x`, `adj_y`
`concat`	Shape	`axis`
`convolution`	Compute	`kernel_size`, `stride`, `pad`, `C`, `groups`, `dilation`
`dynamic_dequantize`	Quantization	`scale_blob`, `zero_point_blob`
`dynamic_quantize`	Quantization	`axis`, `mode`
`elementwise`	Compute	`operation`, `alpha`, `broadcast`
`expand_dims`	Shape	`axes`
`general_concat`	Shape	`axis`, `interleave`
`general_slice`	Shape	`starts`, `ends`, `strides`, `axes`
`get_shape`	Utility	(no special attributes)
`inner_product`	Compute	`nB`, `nC`, `has_biases`, `quantization_mode`, `is_lookup`
`instancenorm_1d`	Normalization	`C`, `epsilon`
`load_constant`	Memory	`blob_weights`, `shape`
`nonzero`	Utility	(no special attributes)
`reduce`	Compute	`mode` (sum/mean/max/min/prod), `axes`, `keepdims`
`reshape`	Shape	`shape`
`scatter_nd`	Memory	(no special attributes)
`softmax`	Compute	`axis`
`split_nd`	Shape	`axis`, `num_splits`, `split_sizes`
`tile`	Shape	`reps`
`transpose`	Shape	`axes`

ANE Compiler Struct Sizes

Struct	Size (bytes)	Initialize Function
ANECAffineTransformLayerDesc	48	ANECAffineTransformLayerDescInitialize
ANECBatchNormLayerDesc	40	ANECBatchNormLayerDescInitialize
ANECConcatLayerDesc	16	ANECConcatLayerDescInitialize
ANECConvLayerDesc	176	ANECConvLayerDescInitialize
ANECCropResizeLayerDesc	64	ANECCropResizeLayerDescInitialize
ANECCrossCorrelationLayerDesc	96	ANECrossCorrelationLayerDescInitialize
ANECDropoutLayerDesc	16	ANECDropoutLayerDescInitialize
ANECExpandLayerDesc	32	ANECExpandLayerDescInitialize
ANECFillLayerDesc	24	ANECFillLayerDescInitialize
ANECFlattenLayerDesc	16	ANECFlattenLayerDescInitialize
ANECGatherLayerDesc	24	ANECGatherLayerDescInitialize
ANECGatherNDLayerDesc	24	ANECGatherNDLayerDescInitialize
ANECGridSampleLayerDesc	32	ANECGridSampleLayerDescInitialize
ANECGroupNormLayerDesc	40	ANECGroupNormLayerDescInitialize
ANECInputViewLayerDesc	32	ANECInputViewLayerDescInitialize
ANECKernelSize	24	ANECKernelSizeInitialize
ANECLRNLayerDesc	32	ANECLRNLayerDescInitialize
ANECLayerNormLayerDesc	40	ANECLayerNormLayerDescInitialize
ANECLinearLayerDesc	64	ANECLinearLayerDescInitialize
ANECMatrixMultLayerDesc	16	ANECMatrixMultLayerDescInitialize
ANECNMSLayerDesc	48	ANECNMSLayerDescInitialize
ANECNeuronLayerDesc	32	ANECNeuronLayerDescInitialize
ANECNormLayerDesc	40	ANECNormLayerDescInitialize
ANECPadLayerDesc	48	ANECPadLayerDescInitialize
ANECPadding	24	ANECPaddingInitialize
ANECPoolLayerDesc	96	ANECPoolLayerDescInitialize
ANECRandomLayerDesc	32	ANECRandomLayerDescInitialize
ANECReductionLayerDesc	24	ANECReductionLayerDescInitialize
ANECResampleLayerDesc	48	ANECResampleLayerDescInitialize
ANECReshapeLayerDesc	48	ANECReshapeLayerDescInitialize
ANECResizeLayerDesc	40	ANECResizeLayerDescInitialize
ANECRingBufferLayerDesc	32	ANECRingBufferLayerDescInitialize
ANECSDPALayerDesc	8	ANECSDPALayerDescInitialize
ANECScatterLayerDesc	24	ANECScatterLayerDescInitialize
ANECScatterNDLayerDesc	24	ANECScatterNDLayerDescInitialize
ANECShapeLayerDesc	16	ANECShapeLayerDescInitialize
ANECSoftmaxLayerDesc	48	ANECSoftmaxLayerDescInitialize
ANECSortLayerDesc	24	ANECSortLayerDescInitialize
ANECSplitLayerDesc	24	ANECSplitLayerDescInitialize
ANECSqueezeLayerDesc	32	ANECSqueezeLayerDescInitialize
ANECStep	12	ANECStepInitialize
ANECTensorDesc	64	ANECTensorDescInitialize
ANECTensorDims	40	ANECTensorDimsInitialize
ANECTileLayerDesc	32	ANECTileLayerDescInitialize
ANECTopKLayerDesc	24	ANECTopKLayerDescInitialize
ANECTransposeLayerDesc	32	ANECTransposeLayerDescInitialize
ANECUnflattenLayerDesc	24	ANECUnflattenLayerDescInitialize

Espresso Optimization Passes

All discovered Pass_* classes in Espresso.framework:

Pass_add_fp16_fp32_conversions Pass_batch_matmul_transpose_fusion Pass_broadcast_optimization Pass_canonicalize_ops Pass_constant_folding Pass_convert_gather_to_slice Pass_convert_to_ane_layout Pass_dead_code_elimination Pass_decompose_complex_ops Pass_eliminate_identity_ops Pass_eliminate_redundant_transpose Pass_fold_constants Pass_fuse_activation Pass_fuse_add_mul Pass_fuse_bias Pass_fuse_conv_batchnorm Pass_fuse_conv_bias Pass_fuse_elementwise Pass_fuse_gelu Pass_fuse_layernorm Pass_fuse_linear_ops Pass_fuse_matmul_add Pass_fuse_mul_add Pass_fuse_pad_conv Pass_fuse_reshape_transpose Pass_insert_copies_for_ane Pass_legalize_for_ane Pass_lower_to_ane_ops Pass_optimize_memory_layout Pass_optimize_reshape_chain Pass_optimize_transpose Pass_propagate_shapes Pass_quantize_weights Pass_remove_unused_outputs Pass_replace_div_with_mul Pass_simplify_arithmetic Pass_split_large_tensors Pass_tensor_parallel_partition Pass_tile_for_ane Pass_vectorize_ops

ObjC Class Methods Reference

_ANEClient

// Lifecycle - (instancetype)initWithRestrictedAccessAllowed:(BOOL)allowed; // Compilation - (BOOL)compileModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)compiledModelExistsFor:(id)model; - (BOOL)compiledModelExistsMatchingHash:(NSData*)hash; - (BOOL)purgeCompiledModel:(id)model; // Loading - (BOOL)loadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)loadModelNewInstance:(id)model options:(id)opts modelInstParams:(id)params qos:(int)qos error:(NSError**)err; - (BOOL)loadRealTimeModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)unloadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; // Evaluation - (BOOL)evaluateWithModel:(id)model options:(id)opts request:(id)req qos:(int)qos error:(NSError**)err; - (BOOL)evaluateRealTimeWithModel:(id)model options:(id)opts request:(id)req error:(NSError**)err; // Memory - (BOOL)mapIOSurfacesWithModel:(id)model request:(id)req cacheInference:(BOOL)cache error:(NSError**)err; - (void)unmapIOSurfacesWithModel:(id)model request:(id)req; // Chaining - (BOOL)prepareChainingWithModel:(id)model options:(id)opts chainingReq:(id)req qos:(int)qos error:(NSError**)err;

_ANEModel

// Initialization - (instancetype)initWithModelAtURL:(NSURL*)url key:(NSString*)key identifierSource:(int)src cacheURLIdentifier:(NSString*)cacheId modelAttributes:(id)attrs standardizeURL:(BOOL)standardize; - (instancetype)initWithModelIdentifier:(id)identifier; // Properties @property (readonly) NSURL *modelURL; @property (readonly) NSURL *sourceURL; @property (readonly) NSString *UUID; @property (readonly) NSString *key; @property (readonly) int state; // 1 = created/unloaded @property (readonly) uint64_t programHandle; @property (readonly) uint64_t intermediateBufferHandle; @property (readonly) int queueDepth; @property (readonly) uint32_t perfStatsMask; @property (readonly) id mpsConstants;

_ANERequest

// Initialization - (instancetype)initWithInputs:(NSArray*)inputs inputIndices:(NSArray*)inputIndices outputs:(NSArray*)outputs outputIndices:(NSArray*)outputIndices weightsBuffer:(id)weights perfStats:(id)stats procedureIndex:(int)procIdx sharedEvents:(id)events transactionHandle:(uint64_t)handle; // Properties @property (readonly) NSArray *inputArray; @property (readonly) NSArray *inputIndexArray; @property (readonly) NSArray *outputArray; @property (readonly) NSArray *outputIndexArray; @property (readonly) id weightsBuffer; @property (readonly) int procedureIndex; @property (readonly) id perfStats; @property (readonly) NSArray *perfStatsArray; @property (copy) void (^completionHandler)(BOOL, NSError*); @property (readonly) id sharedEvents; @property (readonly) uint64_t transactionHandle;

Running Tests

# Run all tests pytest tests/test_ane.py -v # Run specific test class pytest tests/test_ane.py::TestANECompiler -v # Run with coverage pytest tests/test_ane.py --cov=ane --cov-report=term-missing

Test categories:

TestANEStructs - Data structure serialization
TestANECompiler - Framework loading and initialization
TestANEHelpers - Utility functions
TestANESample - Graph building
TestANELayerSizes - Probed struct sizes
TestEspressoDiscovery - ObjC class introspection
TestEspressoFormat - Model file parsing
TestPBZE - Compression/decompression
TestANEXPC - XPC protocol discovery
TestAPITree - Knowledge base API tree

HWX File Format & Execution Research

Working Path: CoreML API

The simplest way to execute models on ANE is through CoreML's public API:

// Objective-C MLModelConfiguration *config = [[MLModelConfiguration alloc] init]; config.computeUnits = MLComputeUnitsAll; // Enables ANE MLModel *model = [MLModel modelWithContentsOfURL:modelURL configuration:config error:&error];

# Python with coremltools import coremltools as ct model = ct.models.MLModel("model.mlpackage", compute_units=ct.ComputeUnit.ALL)

HWX Binary Format

Pre-compiled ANE binaries (.hwx files) have a Mach-O-like structure:

Offset	Value	Description
0x00	`0xBEEFFACE`	Magic number
0x04	varies	Header info
...	`__PAGEZERO`	Zero page segment
...	`__DATA`	Data segment
...	`__FVMLIB`	ANE instructions

Key insight: HWX files cannot be loaded alone - they require a companion .espresso.net file that describes the network structure.

Espresso Model Bundle Structure

A complete Espresso model bundle contains:

File	Description
`model.espresso.net`	Network description (JSON or PBZE)
`model.espresso.weights`	Binary weights data
`model.espresso.shape`	Shape information
`model.H14.espresso.hwx`	Pre-compiled ANE binary (chip-specific)
`model.H14.espresso.precompilation_info`	Compiler metadata (JSON)

Different .hwx files exist for different ANE generations:

.H13.espresso.hwx - A14/M1 generation
.H14.espresso.hwx - A15/M2 generation
.H15.espresso.hwx - A16/M3 generation
.H16.espresso.hwx - A17/M4 generation

API Layer Summary

Layer	Status	Notes
CoreML	✅ Working	Use `MLComputeUnitsAll`, system handles everything
XPC to aned	✅ Working	`_ANEClient.sharedConnection` works
ANEServices	⚠️ Limited	Model loading needs `.espresso.net`
Espresso	⚠️ Limited	Platform 2 (ANE) context crashes
IOKit Direct	❌ Blocked	Requires `com.apple.ane.iokit-user-access`

MLComputePlan Device Masks

When inspecting MLComputePlan.computeDevicesBySupportedComputeUnits:

Mask	Devices
1	CPU only
2	GPU only
3	CPU + GPU
4	Neural Engine only
5	CPU + Neural Engine
6	GPU + Neural Engine
7	CPU + GPU + Neural Engine (all)

Hardware Detection

// Get ANE device info Class deviceClass = NSClassFromString(@"MLNeuralEngineComputeDevice"); id device = [deviceClass performSelector:@selector(physicalDevice)]; NSInteger cores = [[device valueForKey:@"totalCoreCount"] integerValue]; // Returns 16 on M3 Pro

License

This project contains reverse engineering artifacts for research and interoperability purposes. Use responsibly.

Acknowledgments

Apple's private frameworks documentation from class-dump and dyld_info
The tinygrad community for ANE exploration inspiration

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
helper		helper
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
compiler.py		compiler.py
device.py		device.py
espresso.py		espresso.py
iokit.py		iokit.py
pbze.py		pbze.py
reference.py		reference.py
runtime.py		runtime.py
sample.py		sample.py
xpc.py		xpc.py

Folders and files

Latest commit

History

Repository files navigation