Reverse engineering artifacts for Apple's Neural Engine stack: ANECompiler, Espresso, and AppleNeuralEngine frameworks.
Target Audience: Performance engineers and security researchers working with Apple silicon ML acceleration.
- Key Findings
- Architecture Overview
- Repository Structure
- Espresso Engine Teardown
- Compiler Engine Teardown
- ANE Runtime Details
- Security Analysis
- Performance Analysis
- Example Runthrough
- Comprehensive Reference
| Discovery | Details | Significance |
|---|---|---|
| SDPA Layer | ANECSDPALayerDesc is only 8 bytes | Native transformer attention in ANE hardware |
| 40+ Optimization Passes | Pass_fuse_conv_batchnorm, Pass_fold_constants, etc. | Full Espresso compiler pipeline discoverable |
| XPC Daemon Architecture | aned at /usr/libexec/aned | Privilege boundary for ANE access |
| Entitlement Bypass | Struct init functions work without signing | Can probe all layer descriptor layouts |
| PBZE Format | LZFSE-compressed espresso.net | System models decodable with libcompression |
| Silent Failures | compileModel: returns NULL without error | Operations fail silently without entitlements |
| IOSurface Memory | EspressoANEIOSurface (21 methods) | Zero-copy tensor sharing with Metal |
| Quantization Modes | quantization_mode:2 on inner_product | ANE-specific quantization discovered |
| CoreML ANE Path | MLComputeUnitsAll enables ANE | Working path for ANE execution via public API |
| HWX Binary Format | Magic 0xBEEFFACE, Mach-O-like | Pre-compiled ANE instructions per chip generation |
| 16 ANE Cores | M3 Pro has 16 neural engine cores | Confirmed via MLNeuralEngineComputeDevice |
| Operation | Works? | Notes |
|---|---|---|
| Load ANECompiler.framework | Yes | All frameworks load |
Call ANEC*Initialize() | Yes | Can probe struct sizes |
Create EspressoContext (CPU) | Yes | Platform 0 works |
Load EspressoNetwork | Yes | CPU inference works |
Create _ANEClient | Yes | Object created but... |
Call compileModel: | No | Returns NULL silently |
Call loadModel: | No | Returns NULL silently |
| ANE inference | No | Requires entitlements |
| CoreML with ANE | Yes | Use MLComputeUnitsAll - working path! |
| XPC to aned | Yes | Connection succeeds, ops need entitlements |
| MLComputePlan | Yes | Can inspect device availability |
┌─────────────────────────────────────────────────────────────────┐ │ User Application │ │ (Core ML, Create ML, BNNS) │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────┐ ┌─────────────────────────────┐ │ │ │ Espresso.framework │ │ AppleNeuralEngine.framework │ │ │ │ ───────────────── │ │ ─────────────────────────── │ │ │ │ • EspressoContext │ │ • _ANEClient │ │ │ │ • EspressoNetwork │ │ • _ANEModel │ │ │ │ • 40+ Pass_* classes │ │ • _ANERequest │ │ │ │ • CPU/GPU/ANE dispatch │ │ • _ANEDaemonConnection │ │ │ └───────────┬─────────────┘ └──────────────┬──────────────┘ │ │ │ │ │ ├──────────────┴──────────────────────────────────┴────────────────┤ │ ANECompiler.framework │ │ ───────────────────── │ │ • ANECConvLayerDesc (176 bytes) • ANECSDPALayerDesc (8 bytes)│ │ • ANECPoolLayerDesc (96 bytes) • ANECLinearLayerDesc (64 B) │ │ • ANECTensorDims (40 bytes) • 30+ layer descriptors │ ├─────────────────────────────────────────────────────────────────┤ │ XPC Transport Layer │ │ Service: com.apple.appleneuralengine │ ├─────────────────────────────────────────────────────────────────┤ │ aned (/usr/libexec/aned) │ │ ─────────────────────── │ │ • ANEProgramCreate() • Model cache management │ │ • ANEProgramInstanceCreate() • Garbage collection │ │ • Sandbox extension handling • Telemetry │ ├─────────────────────────────────────────────────────────────────┤ │ ANE Hardware (M1/M2/M3+) │ │ ─────────────────────── │ │ • 16 neural engine cores • Dedicated SRAM │ │ • Up to 15.8 TOPS (M1) • IOSurface DMA │ └─────────────────────────────────────────────────────────────────┘ .mlmodelc/ aned daemon Hardware ──────────── ─────────── ──────── │ │ │ │ 1. _ANEModel create │ │ ├──────────────────────────────►│ │ │ │ │ │ 2. compileModel: (XPC) │ │ ├──────────────────────────────►│ │ │ │ 3. ANECompiler │ │ ├─────────────────────────►│ │ │ │ │ │ 4. ANEProgramCreate() │ │ ├─────────────────────────►│ │ │ │ │ 5. Return program handle │ │ │◄──────────────────────────────┤ │ │ │ │ │ 6. loadModel: (XPC) │ │ ├──────────────────────────────►│ │ │ │ 7. Map to ANE memory │ │ ├─────────────────────────►│ │ │ │ │ 8. evaluateWithModel: │ 9. Execute on ANE │ ├──────────────────────────────►├─────────────────────────►│ │ │ │ ane/ ├── __init__.py # Package exports, builds API tree for tooling ├── compiler.py # ANECompiler.framework ctypes bindings │ # - Layer descriptor structs │ # - ANEC*Initialize() wrappers │ # - Struct size probing │ ├── espresso.py # Espresso model format parser │ # - EspressoNet, EspressoLayer classes │ # - Layer type documentation │ # - CPU vs ANE model comparison │ ├── runtime.py # Espresso/ANE runtime bindings │ # - EspressoContext creation │ # - EspressoNetwork loading │ # - ObjC class introspection │ ├── xpc.py # ANE XPC protocol documentation │ # - _ANEDaemonConnection methods │ # - _ANEClient methods │ # - XPC operation categories │ ├── pbze.py # PBZE (compressed espresso.net) decoder │ # - LZFSE decompression via libcompression │ # - Header parsing │ # - Compression statistics │ ├── sample.py # Example graph building code │ # - SimpleANEGraph class │ # - CNN and Transformer examples │ ├── tests/ │ └── test_ane.py # Comprehensive pytest suite (623 lines) │ └── helper/ ├── ane_helper.m # Objective-C helper for privileged ANE access ├── ane_helper.entitlements └── build.sh # Build script Espresso is Apple's internal ML inference runtime that powers Core ML. It handles model execution across CPU, GPU, and ANE.
Two formats exist:
- JSON (human-readable):
{ "format_version": 200, "storage": "model.espresso.weights", "layers": [ { "name": "conv1", "type": "convolution", "bottom": "input", "top": "conv1_output", "kernel_size": 3, "stride": 1, "pad": 1, "C": 64 } ], "analyses": {}, "properties": {} }- PBZE (binary, LZFSE-compressed):
Offset Size Description ────── ──── ─────────── 0x00 4 Magic: b'pbze' 0x04 4 Version (usually 0) 0x08 8 Unknown (header size?) 0x10 4 Uncompressed size (BIG ENDIAN!) 0x14 4 Unknown 0x18 4 Padding 0x1C ... LZFSE data (starts with b'bvx2') | Type | Description | Key Attributes |
|---|---|---|
inner_product | Dense/fully-connected | nB, nC, quantization_mode, is_lookup, has_biases |
convolution | 2D convolution | kernel_size, stride, pad, C, groups |
batch_matmul | Batched matrix multiply | transpose_a, transpose_b |
elementwise | Binary/unary operations | operation (see operation codes below) |
activation | Nonlinearities | type (relu, gelu, tanh, sigmoid, etc.) |
softmax | Softmax normalization | axis |
reduce | Reduction operations | mode (sum, mean, max, min, prod) |
| Type | Description | Key Attributes |
|---|---|---|
reshape | Tensor reshape | shape |
transpose | Permute dimensions | axes |
concat | Concatenate tensors | axis |
general_concat | N-D concatenation | axis, flexible inputs |
split_nd | Split along axis | axis, num_splits or split_sizes |
general_slice | Slice tensor | starts, ends, strides |
expand_dims | Add dimension | axes |
load_constant | Load constant tensor | blob_weights |
| Type | Description | Notes |
|---|---|---|
dynamic_quantize | Runtime quantization | Converts FP to INT8 |
dynamic_dequantize | Runtime dequantization | Converts INT8 to FP |
| Type | Description |
|---|---|
instancenorm_1d | Instance normalization |
get_shape | Returns tensor shape |
nonzero | Find nonzero indices |
scatter_nd | Scatter operation |
tile | Tile/repeat tensor |
Code Operation Code Operation ──── ───────── ──── ───────── 0 add 25 pow 1 sub 26 exp 2 mul 27 log 3 div 28 abs 4 floor_div 101 select (ternary: a ? b : c) 10 max 105 less_than 11 min 106 less_equal 107 not_equal 20 sqrt 108 equal 21 rsqrt 109 greater_equal 22 square 110 greater_than 23 neg 24 reciprocal 117 floor 118 ceil When a model is compiled for ANE, several transformations occur:
| Aspect | CPU Model | ANE Model |
|---|---|---|
| Layer count | Fewer | More (ops decomposed) |
| Reshape ops | reshape layer | Often replaced with convolution |
| Embeddings | inner_product | inner_product with is_lookup:1 |
| FC layers | inner_product | inner_product with quantization_mode:2 |
| Tensor manipulation | Single ops | split_nd/concat chains |
Example: A model with 50 CPU layers might have 80+ ANE layers due to operation decomposition.
Espresso includes extensive optimization passes accessible via EspressoCustomPass subclasses:
Pass_fuse_conv_batchnorm # Fuse BN into conv weights Pass_fold_constants # Constant folding Pass_eliminate_dead_code # DCE Pass_fuse_activation # Fuse relu/gelu into preceding op Pass_optimize_transpose # Eliminate redundant transposes Pass_convert_to_ane_layout # Convert to ANE memory layout Pass_quantize_weights # Weight quantization Pass_split_large_tensors # Split tensors for ANE tile size ... (and 30+ more) ANECompiler.framework compiles neural network graphs to ANE-executable instructions.
All sizes determined by calling ANEC*Initialize() with a sentinel-filled buffer:
| Struct | Size | Field Layout (inferred) |
|---|---|---|
ANECKernelSize | 24 | 3x u64: depth, height, width |
ANECStep | 12 | 3x u32: depth, height, width |
ANECPadding | 24 | 6x u32: d_front, d_back, h_front, h_back, w_front, w_back |
ANECTensorDims | 40 | 5x u64: N, C, H, W, D |
ANECTensorDesc | 64 | ptr(8) + dims(48) + flags(8) |
ANECConvLayerDesc | 176 | Kernel, stride, padding, dilation, groups, etc. |
ANECPoolLayerDesc | 96 | Kernel, stride, pool type, etc. |
ANECLinearLayerDesc | 64 | Input features, output features, bias |
ANECMatrixMultLayerDesc | 16 | transpose_a, transpose_b flags |
ANECSoftmaxLayerDesc | 48 | Axis, stable flag |
ANECSDPALayerDesc | 8 | Minimal - attention is native! |
ANECNeuronLayerDesc | 32 | Activation type, params |
ANECReductionLayerDesc | 24 | Reduction mode, axes |
ANECReshapeLayerDesc | 48 | Target shape |
ANECTransposeLayerDesc | 32 | Permutation |
ANECConcatLayerDesc | 16 | Axis |
ANECGatherLayerDesc | 24 | Axis, batch_dims |
Category Layer Types ──────── ─────────── Attention/Transformer SDPA Convolution Conv, CrossCorrelation, DepthwiseConv Pooling Pool, GlobalPool, AdaptivePool Normalization Norm, BatchNorm, LayerNorm, GroupNorm, LRN Linear/Matrix Linear, MatrixMult, Einsum Activation Neuron, Softmax, LogSoftmax, Dropout Reshape/Layout Reshape, Transpose, Flatten, Unflatten, Concat, Split, Tile, Expand, Squeeze Spatial Resize, Pad, CropResize, Resample, AffineTransform, GridSample Reduction Reduction, TopK, Sort, ArgMax, ArgMin Scatter/Gather Gather, GatherND, Scatter, ScatterND Misc Shape, Range, Random, Fill, RingBuffer, InputView, Copy from ane import ANECompiler ane = ANECompiler() print(f"MPS Dialect Version: {ane.mps_dialect_version}") print(f"MPS SPI Dialect Version: {ane.mps_spi_dialect_version}") print(f"Validate Network Version: {ane.validate_network_version}") print(f"Analytics Buffer Size: {ane.analytics_buffer_size}")Communication with ANE hardware goes through the aned daemon via XPC.
| Service | Purpose |
|---|---|
com.apple.appleneuralengine | Main service (requires entitlements) |
com.apple.appleneuralengine.private | Private/internal service |
com.apple.aned | Daemon Mach service |
Compilation:
-[_ANEDaemonConnection compileModel:sandboxExtension:options:qos:withReply:] -[_ANEDaemonConnection compiledModelExistsFor:withReply:] -[_ANEDaemonConnection compiledModelExistsMatchingHash:withReply:] -[_ANEDaemonConnection purgeCompiledModel:withReply:]Loading:
-[_ANEDaemonConnection loadModel:sandboxExtension:options:qos:withReply:] -[_ANEDaemonConnection loadModelNewInstance:options:modelInstParams:qos:withReply:] -[_ANEDaemonConnection unloadModel:options:qos:withReply:]Execution:
-[_ANEDaemonConnection prepareChainingWithModel:options:chainingReq:qos:withReply:]Real-time:
-[_ANEDaemonConnection beginRealTimeTaskWithReply:] -[_ANEDaemonConnection endRealTimeTaskWithReply:]ANE uses IOSurface for tensor memory, enabling zero-copy sharing with GPU/Metal.
EspressoANEIOSurface Methods:
-createIOSurfaceWithExtraProperties: -metalBufferWithDevice: -setExternalStorage:ioSurface: -nFrames -bytesPerFrame -totalBytes // ... 21 methods total| Entitlement | Purpose | Required For |
|---|---|---|
com.apple.aned.private.allow | Primary ANE access | compile, load, evaluate |
com.apple.aned.private.adapterWeight.allow | Adapter weights access | Custom weight loading |
com.apple.aned.private.aggressivePowerSaving.allow | Power saving modes | Low-power inference |
com.apple.ANECompilerService.allow | Compiler service access | Model compilation |
com.apple.aned.private.processModelShare.allow | Cross-process model sharing | Shared inference |
com.apple.ane.memoryUnwiringOptOutAccess.allow | Memory unwiring control | Large model persistence |
com.apple.private.modelPurgeInAllPartitions.allow | Model cache purging | Cache management |
com.apple.aned.private.secondaryANECompilerServiceAccess.allow | Secondary compiler | Parallel compilation |
com.apple.private.ANEStorageMaintainer.allow | Storage maintenance | Cache cleanup |
On Apple internal builds, these boot-args can bypass entitlement checks:
| Boot Arg | Purpose | Effect |
|---|---|---|
ane_skipAdapterWeightAccessCheck | Bypass adapter weight entitlement | Skip com.apple.aned.private.adapterWeight.allow check |
ane_vm_allowPrecompiledBinary | Allow precompiled binaries | Skip binary validation in VM |
ane_vm_debugDumpBootArg | Enable debug dumps | Dump ANE state on errors |
ane_vm_forceValidationOnGuest | Force validation in VM | Extra validation for VMs |
Note: These boot-args only work when isInternalBuild returns true (Apple internal builds only). Consumer macOS always returns false for isInternalBuild.
The aned daemon checks for internal builds via _ANEDeviceInfo.isInternalBuild, which:
- Checks for
/AppleInternaldirectory existence - Queries
os_variant_has_internal_content("com.apple.aned") - Checks
os_variant_allows_internal_security_policies("com.apple.aned")
All checks return false on consumer macOS installations.
Compiled models are cached in:
/var/folders/<user_hash>/com.apple.aned/ Cache operations in aned:
com.apple.aned.modelCacheAsyncIOcom.apple.aned.modelCacheGCcom.apple.aned.danglingModelsGC
Key classes discovered through runtime introspection:
+ (BOOL)hasANE; // Returns YES on Apple Silicon + (NSInteger)numANEs; // Number of ANE devices (usually 1) + (NSInteger)numANECores; // Number of cores (e.g., 16 for M1) + (NSString *)productName; // "macOS" + (NSString *)buildVersion; // e.g., "25B78" + (NSInteger)aneArchitectureType; // Hardware architecture identifier + (NSInteger)aneSubType; // Hardware subtype + (BOOL)isVirtualMachine; // VM detection + (BOOL)isInternalBuild; // Apple internal build detection + (BOOL)precompiledModelChecksDisabled; + (NSString *)bootArgs; // Current boot arguments + (BOOL)isBootArgPresent:(NSString *)arg; + (BOOL)isBoolBootArgSetTrue:(NSString *)arg;+ (NSString *)restrictedAccessEntitlement; // "com.apple.aned.private.allow" + (NSString *)adapterWeightsAccessEntitlement; // "com.apple.aned.private.adapterWeight.allow" + (NSString *)adapterWeightsAccessEntitlementBypassBootArg; // "ane_skipAdapterWeightAccessCheck" + (NSString *)internalLibraryPath; // "/AppleInternal/Library" + (NSString *)systemLibraryPath; // "/System/Library" // ... and many morehasANE = 1 numANEs = 1 numANECores = 16 productName = macOS buildVersion = 25B78 isVirtualMachine = 0 isInternalBuild = 0 precompiledModelChecksDisabled = 0 The aned daemon accepts XPC messages from clients. Potential vectors:
- Malformed model paths: Does
compileModel:properly validate URL paths? - Sandbox extensions:
sandboxExtension:parameter passes filesystem access tokens - Memory corruption: Large or malformed layer descriptors
- Race conditions: Concurrent compile/load/unload operations
IOSurface enables shared memory between processes:
Client Process aned Daemon ANE Hardware ────────────── ─────────── ──────────── │ │ │ │ Create IOSurface │ │ ├──────────────────────►│ │ │ │ Map to ANE │ │ ├────────────────────►│ │ │ │ │ Write input data │ │ ├───────────────────────┼────────────────────►│ │ │ │ │ Read output data │ │ │◄──────────────────────┼─────────────────────┤ Concerns:
- Shared memory lifetime management
- Buffer overflow if sizes mismatch
- Use-after-free on premature unmap
The /var/folders/.../com.apple.aned/ cache:
- World-readable in some configurations
- Contains compiled ANE bytecode
- Could leak model architecture details
These operations succeed without code signing:
- Framework loading: All three frameworks load via dlopen/ctypes
- Struct initialization: All
ANEC*Initialize()functions callable - Size probing: Can determine struct layouts by sentinel analysis
- CPU inference:
EspressoContext(platform=0)works - Model parsing: Read and parse
.espresso.netfiles - Client creation:
_ANEClientobject creation succeeds
These operations fail silently (no error, just NULL return):
compileModel:options:qos:error:- returns nilloadModel:options:qos:error:- returns nilevaluateWithModel:options:request:qos:error:- returns nil_ANEDeviceController- can't access valid device
Security note: Silent failures make debugging difficult but also prevent enumeration of error conditions.
@interface EspressoProfilingLayerInfo : NSObject @property (readonly) NSString *name; @property (readonly) NSString *debug_name; @property (readonly) double average_runtime; // seconds @property (readonly) int selected_runtime_engine; // 0=CPU, 1=GPU, 2=ANE @property (readonly) NSArray *runtimes; @end@interface EspressoProfilingNetworkANEInfo : NSObject @property (readonly) uint64_t total_ane_time_ns; @property (readonly) uint64_t ane_time_per_eval_ns; @end@interface _ANERequest : NSObject @property uint32_t perfStatsMask; // Bitmask for which stats to collect @property (readonly) id perfStats; @property (readonly) NSArray *perfStatsArray; @endThese map 1:1 to ANE instructions:
- Convolution (all variants)
- Matrix multiplication
- Scaled Dot-Product Attention (SDPA)
- Softmax
- Common activations (ReLU, GeLU, Tanh)
- Pooling operations
- Element-wise arithmetic
These are broken into multiple ANE ops:
- LayerNorm → multiple passes
- Complex reductions
- Non-standard activations
- Dynamic shapes
Operations fall back when:
- Tensor too large for ANE SRAM
- Unsupported operation type
- Dynamic control flow
- Precision requirements exceed INT8/FP16
from ane import SimpleANEGraph # Create graph builder graph = SimpleANEGraph() # Input: (batch=1, channels=3, height=224, width=224) graph.add_conv2d("conv1", (1, 3, 224, 224), out_channels=64, kernel_size=7, stride=2, padding=3) # Output: (1, 64, 112, 112) graph.add_pool2d("pool1", (1, 64, 112, 112), kernel_size=3, stride=2) # Output: (1, 64, 56, 56) graph.add_conv2d("conv2", (1, 64, 56, 56), out_channels=128, kernel_size=3, padding=1) # Output: (1, 128, 56, 56) graph.add_conv2d("conv3", (1, 128, 56, 56), out_channels=256, kernel_size=3, padding=1) # Output: (1, 256, 56, 56) graph.add_pool2d("pool2", (1, 256, 56, 56), kernel_size=2, stride=2) # Output: (1, 256, 28, 28) graph.add_linear("fc1", input_features=256*28*28, output_features=1024) graph.add_linear("fc2", input_features=1024, output_features=1000) graph.add_softmax("softmax", (1, 1000)) print(graph.summary())Output:
ANE Computation Graph ============================================================ conv1 (conv2d) Input: (1, 3, 224, 224) Output: (1, 64, 112, 112) Desc: 176 bytes Kernel: 7x7 Stride: 2x2 Pad: 3,3 pool1 (pool2d) Input: (1, 64, 112, 112) Output: (1, 64, 56, 56) Desc: 96 bytes Kernel: 3x3 Stride: 2x2 ... ============================================================ Total layers: 8 Total descriptor bytes: 680 from ane import build_transformer_attention graph = build_transformer_attention() print(graph.summary())Output:
ANE Computation Graph ============================================================ proj_qkv (linear) Input: (512, 512, 1, 1) Output: (512, 1536, 1, 1) Desc: 64 bytes attention (sdpa) Input: (1, 8, 512, 64) Output: (1, 8, 512, 64) Desc: 8 bytes <-- Native transformer attention! proj_out (linear) Input: (512, 512, 1, 1) Output: (512, 512, 1, 1) Desc: 64 bytes ============================================================ Total layers: 3 Total descriptor bytes: 136 from ane import ( create_espresso_cpu_context, load_espresso_network, get_network_layer_count, EspressoNet, ) # Method 1: Direct runtime loading (CPU only without entitlements) ctx = create_espresso_cpu_context() print(f"Context: {hex(ctx)}") model_path = "/path/to/model.espresso.net" net = load_espresso_network(model_path, ctx) print(f"Network: {hex(net)}") print(f"Layers: {get_network_layer_count(net)}") # Method 2: Parse the file directly model = EspressoNet.from_file(model_path) print(f"Format version: {model.format_version}") print(f"Layer types: {model.layer_type_counts()}") # Analyze inner_product layers for quantization for ip in model.get_inner_product_info(): print(f" {ip['name']}: {ip['nB']}x{ip['nC']}, " f"quant={ip['quantization_mode']}, lookup={ip['is_lookup']}")from ane import decode_espresso_net, get_pbze_stats, is_pbze_file path = "/System/Library/SomeFramework/model.espresso.net" # Check format if is_pbze_file(path): stats = get_pbze_stats(path) print(f"Compressed size: {stats['compressed_size']} bytes") print(f"Uncompressed size: {stats['uncompressed_size']} bytes") print(f"Compression ratio: {stats['compression_ratio']:.2f}x") # Decode (handles both JSON and PBZE automatically) data = decode_espresso_net(path) print(f"Layers: {len(data['layers'])}")For full ANE access, use the signed Objective-C helper:
# Build and sign cd helper ./build.sh "Developer ID Application: Your Name (TEAMID)" # Check status echo '{"cmd": "status"}' | ./ane_helper # {"ok":true,"client":true,"model_count":0,"model_ids":[]} # Compile a model echo '{"cmd": "compile", "model_path": "/path/to/model.mlmodelc"}' | ./ane_helper # {"ok":true,"model_id":"ABC123","state":1} # Load into ANE memory echo '{"cmd": "load", "model_id": "ABC123"}' | ./ane_helper # {"ok":true,"model_id":"ABC123","program_handle":12345} # Unload echo '{"cmd": "unload", "model_id": "ABC123"}' | ./ane_helper # {"ok":true}| Type | Category | Attributes |
|---|---|---|
activation | Compute | type (relu/gelu/tanh/sigmoid/etc), alpha, beta |
batch_matmul | Compute | transpose_a, transpose_b, adj_x, adj_y |
concat | Shape | axis |
convolution | Compute | kernel_size, stride, pad, C, groups, dilation |
dynamic_dequantize | Quantization | scale_blob, zero_point_blob |
dynamic_quantize | Quantization | axis, mode |
elementwise | Compute | operation, alpha, broadcast |
expand_dims | Shape | axes |
general_concat | Shape | axis, interleave |
general_slice | Shape | starts, ends, strides, axes |
get_shape | Utility | (no special attributes) |
inner_product | Compute | nB, nC, has_biases, quantization_mode, is_lookup |
instancenorm_1d | Normalization | C, epsilon |
load_constant | Memory | blob_weights, shape |
nonzero | Utility | (no special attributes) |
reduce | Compute | mode (sum/mean/max/min/prod), axes, keepdims |
reshape | Shape | shape |
scatter_nd | Memory | (no special attributes) |
softmax | Compute | axis |
split_nd | Shape | axis, num_splits, split_sizes |
tile | Shape | reps |
transpose | Shape | axes |
| Struct | Size (bytes) | Initialize Function |
|---|---|---|
| ANECAffineTransformLayerDesc | 48 | ANECAffineTransformLayerDescInitialize |
| ANECBatchNormLayerDesc | 40 | ANECBatchNormLayerDescInitialize |
| ANECConcatLayerDesc | 16 | ANECConcatLayerDescInitialize |
| ANECConvLayerDesc | 176 | ANECConvLayerDescInitialize |
| ANECCropResizeLayerDesc | 64 | ANECCropResizeLayerDescInitialize |
| ANECCrossCorrelationLayerDesc | 96 | ANECrossCorrelationLayerDescInitialize |
| ANECDropoutLayerDesc | 16 | ANECDropoutLayerDescInitialize |
| ANECExpandLayerDesc | 32 | ANECExpandLayerDescInitialize |
| ANECFillLayerDesc | 24 | ANECFillLayerDescInitialize |
| ANECFlattenLayerDesc | 16 | ANECFlattenLayerDescInitialize |
| ANECGatherLayerDesc | 24 | ANECGatherLayerDescInitialize |
| ANECGatherNDLayerDesc | 24 | ANECGatherNDLayerDescInitialize |
| ANECGridSampleLayerDesc | 32 | ANECGridSampleLayerDescInitialize |
| ANECGroupNormLayerDesc | 40 | ANECGroupNormLayerDescInitialize |
| ANECInputViewLayerDesc | 32 | ANECInputViewLayerDescInitialize |
| ANECKernelSize | 24 | ANECKernelSizeInitialize |
| ANECLRNLayerDesc | 32 | ANECLRNLayerDescInitialize |
| ANECLayerNormLayerDesc | 40 | ANECLayerNormLayerDescInitialize |
| ANECLinearLayerDesc | 64 | ANECLinearLayerDescInitialize |
| ANECMatrixMultLayerDesc | 16 | ANECMatrixMultLayerDescInitialize |
| ANECNMSLayerDesc | 48 | ANECNMSLayerDescInitialize |
| ANECNeuronLayerDesc | 32 | ANECNeuronLayerDescInitialize |
| ANECNormLayerDesc | 40 | ANECNormLayerDescInitialize |
| ANECPadLayerDesc | 48 | ANECPadLayerDescInitialize |
| ANECPadding | 24 | ANECPaddingInitialize |
| ANECPoolLayerDesc | 96 | ANECPoolLayerDescInitialize |
| ANECRandomLayerDesc | 32 | ANECRandomLayerDescInitialize |
| ANECReductionLayerDesc | 24 | ANECReductionLayerDescInitialize |
| ANECResampleLayerDesc | 48 | ANECResampleLayerDescInitialize |
| ANECReshapeLayerDesc | 48 | ANECReshapeLayerDescInitialize |
| ANECResizeLayerDesc | 40 | ANECResizeLayerDescInitialize |
| ANECRingBufferLayerDesc | 32 | ANECRingBufferLayerDescInitialize |
| ANECSDPALayerDesc | 8 | ANECSDPALayerDescInitialize |
| ANECScatterLayerDesc | 24 | ANECScatterLayerDescInitialize |
| ANECScatterNDLayerDesc | 24 | ANECScatterNDLayerDescInitialize |
| ANECShapeLayerDesc | 16 | ANECShapeLayerDescInitialize |
| ANECSoftmaxLayerDesc | 48 | ANECSoftmaxLayerDescInitialize |
| ANECSortLayerDesc | 24 | ANECSortLayerDescInitialize |
| ANECSplitLayerDesc | 24 | ANECSplitLayerDescInitialize |
| ANECSqueezeLayerDesc | 32 | ANECSqueezeLayerDescInitialize |
| ANECStep | 12 | ANECStepInitialize |
| ANECTensorDesc | 64 | ANECTensorDescInitialize |
| ANECTensorDims | 40 | ANECTensorDimsInitialize |
| ANECTileLayerDesc | 32 | ANECTileLayerDescInitialize |
| ANECTopKLayerDesc | 24 | ANECTopKLayerDescInitialize |
| ANECTransposeLayerDesc | 32 | ANECTransposeLayerDescInitialize |
| ANECUnflattenLayerDesc | 24 | ANECUnflattenLayerDescInitialize |
All discovered Pass_* classes in Espresso.framework:
Pass_add_fp16_fp32_conversions Pass_batch_matmul_transpose_fusion Pass_broadcast_optimization Pass_canonicalize_ops Pass_constant_folding Pass_convert_gather_to_slice Pass_convert_to_ane_layout Pass_dead_code_elimination Pass_decompose_complex_ops Pass_eliminate_identity_ops Pass_eliminate_redundant_transpose Pass_fold_constants Pass_fuse_activation Pass_fuse_add_mul Pass_fuse_bias Pass_fuse_conv_batchnorm Pass_fuse_conv_bias Pass_fuse_elementwise Pass_fuse_gelu Pass_fuse_layernorm Pass_fuse_linear_ops Pass_fuse_matmul_add Pass_fuse_mul_add Pass_fuse_pad_conv Pass_fuse_reshape_transpose Pass_insert_copies_for_ane Pass_legalize_for_ane Pass_lower_to_ane_ops Pass_optimize_memory_layout Pass_optimize_reshape_chain Pass_optimize_transpose Pass_propagate_shapes Pass_quantize_weights Pass_remove_unused_outputs Pass_replace_div_with_mul Pass_simplify_arithmetic Pass_split_large_tensors Pass_tensor_parallel_partition Pass_tile_for_ane Pass_vectorize_ops // Lifecycle - (instancetype)initWithRestrictedAccessAllowed:(BOOL)allowed; // Compilation - (BOOL)compileModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)compiledModelExistsFor:(id)model; - (BOOL)compiledModelExistsMatchingHash:(NSData*)hash; - (BOOL)purgeCompiledModel:(id)model; // Loading - (BOOL)loadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)loadModelNewInstance:(id)model options:(id)opts modelInstParams:(id)params qos:(int)qos error:(NSError**)err; - (BOOL)loadRealTimeModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; - (BOOL)unloadModel:(id)model options:(id)opts qos:(int)qos error:(NSError**)err; // Evaluation - (BOOL)evaluateWithModel:(id)model options:(id)opts request:(id)req qos:(int)qos error:(NSError**)err; - (BOOL)evaluateRealTimeWithModel:(id)model options:(id)opts request:(id)req error:(NSError**)err; // Memory - (BOOL)mapIOSurfacesWithModel:(id)model request:(id)req cacheInference:(BOOL)cache error:(NSError**)err; - (void)unmapIOSurfacesWithModel:(id)model request:(id)req; // Chaining - (BOOL)prepareChainingWithModel:(id)model options:(id)opts chainingReq:(id)req qos:(int)qos error:(NSError**)err;// Initialization - (instancetype)initWithModelAtURL:(NSURL*)url key:(NSString*)key identifierSource:(int)src cacheURLIdentifier:(NSString*)cacheId modelAttributes:(id)attrs standardizeURL:(BOOL)standardize; - (instancetype)initWithModelIdentifier:(id)identifier; // Properties @property (readonly) NSURL *modelURL; @property (readonly) NSURL *sourceURL; @property (readonly) NSString *UUID; @property (readonly) NSString *key; @property (readonly) int state; // 1 = created/unloaded @property (readonly) uint64_t programHandle; @property (readonly) uint64_t intermediateBufferHandle; @property (readonly) int queueDepth; @property (readonly) uint32_t perfStatsMask; @property (readonly) id mpsConstants;// Initialization - (instancetype)initWithInputs:(NSArray*)inputs inputIndices:(NSArray*)inputIndices outputs:(NSArray*)outputs outputIndices:(NSArray*)outputIndices weightsBuffer:(id)weights perfStats:(id)stats procedureIndex:(int)procIdx sharedEvents:(id)events transactionHandle:(uint64_t)handle; // Properties @property (readonly) NSArray *inputArray; @property (readonly) NSArray *inputIndexArray; @property (readonly) NSArray *outputArray; @property (readonly) NSArray *outputIndexArray; @property (readonly) id weightsBuffer; @property (readonly) int procedureIndex; @property (readonly) id perfStats; @property (readonly) NSArray *perfStatsArray; @property (copy) void (^completionHandler)(BOOL, NSError*); @property (readonly) id sharedEvents; @property (readonly) uint64_t transactionHandle;# Run all tests pytest tests/test_ane.py -v # Run specific test class pytest tests/test_ane.py::TestANECompiler -v # Run with coverage pytest tests/test_ane.py --cov=ane --cov-report=term-missingTest categories:
TestANEStructs- Data structure serializationTestANECompiler- Framework loading and initializationTestANEHelpers- Utility functionsTestANESample- Graph buildingTestANELayerSizes- Probed struct sizesTestEspressoDiscovery- ObjC class introspectionTestEspressoFormat- Model file parsingTestPBZE- Compression/decompressionTestANEXPC- XPC protocol discoveryTestAPITree- Knowledge base API tree
The simplest way to execute models on ANE is through CoreML's public API:
// Objective-C MLModelConfiguration *config = [[MLModelConfiguration alloc] init]; config.computeUnits = MLComputeUnitsAll; // Enables ANE MLModel *model = [MLModel modelWithContentsOfURL:modelURL configuration:config error:&error];# Python with coremltools import coremltools as ct model = ct.models.MLModel("model.mlpackage", compute_units=ct.ComputeUnit.ALL)Pre-compiled ANE binaries (.hwx files) have a Mach-O-like structure:
| Offset | Value | Description |
|---|---|---|
| 0x00 | 0xBEEFFACE | Magic number |
| 0x04 | varies | Header info |
| ... | __PAGEZERO | Zero page segment |
| ... | __DATA | Data segment |
| ... | __FVMLIB | ANE instructions |
Key insight: HWX files cannot be loaded alone - they require a companion .espresso.net file that describes the network structure.
A complete Espresso model bundle contains:
| File | Description |
|---|---|
model.espresso.net | Network description (JSON or PBZE) |
model.espresso.weights | Binary weights data |
model.espresso.shape | Shape information |
model.H14.espresso.hwx | Pre-compiled ANE binary (chip-specific) |
model.H14.espresso.precompilation_info | Compiler metadata (JSON) |
Different .hwx files exist for different ANE generations:
.H13.espresso.hwx- A14/M1 generation.H14.espresso.hwx- A15/M2 generation.H15.espresso.hwx- A16/M3 generation.H16.espresso.hwx- A17/M4 generation
| Layer | Status | Notes |
|---|---|---|
| CoreML | ✅ Working | Use MLComputeUnitsAll, system handles everything |
| XPC to aned | ✅ Working | _ANEClient.sharedConnection works |
| ANEServices | Model loading needs .espresso.net | |
| Espresso | Platform 2 (ANE) context crashes | |
| IOKit Direct | ❌ Blocked | Requires com.apple.ane.iokit-user-access |
When inspecting MLComputePlan.computeDevicesBySupportedComputeUnits:
| Mask | Devices |
|---|---|
| 1 | CPU only |
| 2 | GPU only |
| 3 | CPU + GPU |
| 4 | Neural Engine only |
| 5 | CPU + Neural Engine |
| 6 | GPU + Neural Engine |
| 7 | CPU + GPU + Neural Engine (all) |
// Get ANE device info Class deviceClass = NSClassFromString(@"MLNeuralEngineComputeDevice"); id device = [deviceClass performSelector:@selector(physicalDevice)]; NSInteger cores = [[device valueForKey:@"totalCoreCount"] integerValue]; // Returns 16 on M3 ProThis project contains reverse engineering artifacts for research and interoperability purposes. Use responsibly.
- Apple's private frameworks documentation from class-dump and dyld_info
- The tinygrad community for ANE exploration inspiration