This repository contains direct datapath implementations for Amazon's Elastic Fabric Adapter (EFA), enabling high-performance network operations with minimal CPU overhead.
Elastic Fabric Adapter (EFA) is Amazon's custom network interface designed for machine learning (ML) training, inference, and High Performance Computing (HPC) workloads on AWS. EFA provides:
- High bandwidth networking: Up to 400 Gbps network performance on latest instances
- Low-latency communication: Optimized for distributed ML training and inference
- Bypass kernel networking: Direct hardware access for improved performance
- AWS integration: Native support in AWS Nitro System architecture
- ML framework optimization: Optimized for PyTorch, TensorFlow, and other ML frameworks
EFA uses SRD as its primary transport protocol, which provides:
- Reliable delivery: Guaranteed packet delivery with hardware-level acknowledgments
- Multi-path load balancing: Efficiently distributes traffic across multiple network paths
- Fast failure recovery: Quickly recovers from packet drops or link failures
- High-throughput optimization: Designed for bandwidth-intensive workloads
- Hardware-accelerated congestion control: Built-in flow control mechanisms
-
Kernel Driver
- Full kernel-space implementation
- Standard verbs interface
- Complete feature set with all EFA capabilities
-
Userspace Libraries
- libfabric provider: Standard OFI (OpenFabrics Interface) implementation
- libibverbs provider: RDMA verbs compatibility layer
- MPI libraries: Direct integration with popular MPI implementations
This repository focuses on direct datapath implementations that bypass traditional software stacks:
- CUDA Datapath: GPU-native EFA operations for CUDA applications
- Direct posting of work requests from GPU kernels
- GPU-side completion polling
- No CPU involvement in data path operations
- Optimized for GPU-to-GPU communication over EFA
- CPU Direct Path: Userspace CPU implementation with direct hardware access
- Additional accelerator support: Support for other compute accelerators
- Distributed ML training: Large-scale model training across multiple GPUs and nodes
- ML inference: High-throughput inference serving with minimal latency
- GPU-to-GPU communication: Direct GPU communication for parameter synchronization
- Model parallelism: Efficient distribution of large models across multiple devices
- GPU-accelerated simulations: Direct GPU-to-GPU communication
- Scientific computing: Large-scale parallel computations
- Computational fluid dynamics: High-bandwidth data exchange between compute nodes
- Real-time analytics: Low-latency data processing pipelines
- Financial modeling: High-frequency trading and risk calculations
- Media processing: Real-time video/audio processing workflows
Each implementation directory contains its own detailed documentation:
- CUDA Implementation: Complete guide for GPU-based EFA operations
- Additional implementations will be documented as they are added
- EFA-enabled EC2 instances
- EFA kernel driver installed and configured
- Libibverbs (rdma-core) and EFA verbs provider
- Implementation-specific requirements (see individual directories)
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.