0

I used a temporary tensor to store data in my customized gpu-based op. For debug purpose, I want to print the data of this tensor by traditional printf inside C++. How can I pull this gpu-based tensor to cpu and then print its contents. Thank you very much.

1 Answer 1

1

If by temporary you mean allocate_temp instead of allocate_output, there is no way of fetching the data on the python side.

I usually return the tensor itself during debugging so that a simple sess.run fetches the result. Otherwise, the only way to display the data is the traditional printf inside C++. Given your tensor is an output of your custom operation a tf.Print eases further debugging.

Example:

Tensor temp_tensor; OP_REQUIRES_OK(ctx, ctx->allocate_temp(DT_FLOAT, some.shape(), &temp_tensor)); float* host_memory = new float[some.NumElements()]; cudaMemcpy(host_memory, temp_tensor.flat<Dtype>().data(), some.NumElements() * sizeof(float), cudaMemcpyDeviceToHost); std::cout << host_memory[0] << std::endl; std::cout << host_memory[1] << std::endl; std::cout << host_memory[2] << std::endl; delete[] host_memory; 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your kind help. Yes, I want to output the data by traditional printf inside C++. Since it is a gpu-based tensor, I would like to know how to pull the data from gpu to cpu?
cumemcpy or if(!threadIdx.x) printf like in every cuda implementation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.