Skip to content

Conversation

@cheneeheng
Copy link

I tested this setup with CUDA11.7 + cuDNN8.5 on a GTX1660TI. It runs openpose for human pose extraction normally without the huge GPU memory usage issue. The GPU memory usage is the same as the CUDA10.2+cuDNN7 setup, while the inference speed is about ~1fps faster.

Hope this helps someone who needs to use CUDA11 very badly.

Changelog:

  • added cudnn-frontend submodule.
  • updated cmake with new flag and new 3rdparty repository cudnn_frontend .
  • changed caffe submodule repo target.
    -- added DUSE_CUDNN_FRONTEND option. Uses the frontend api instead of the current algorithm wrapper cudnnGetConvolutionForwardAlgorithm_v7 for cuDNN8.
    -- added cudnn_v8_utils.hpp + cudnn_v8_utils.cpp files for cudnn_frontend api. It currently only supports forwardpass.
    -- fixed warnings.
    -- reduced GPU memory usage by setting CUDNN_STREAMS_PER_GROUP=1
    -- added compute capability check in tensor creation to enable tensor core usage in ampere cards.
- added cudnn-frontend submodule - updated cmake - changed caffe submodule repo target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant