I know very little about inline assembly, codes(see here for details) are as follows:
JNIEXPORT void JNICALL Java_com_xingin_xarengine_RGBAToGrayRenderer_nCopy(JNIEnv *env, jclass clazz, jobject dstBuf, jobject srcBuf, jint sz) { if(sz & 63){ sz = (sz & -64) + 64; } auto dst = (uint8_t volatile*)env->GetDirectBufferAddress(dstBuf); auto src = (uint8_t volatile*)env->GetDirectBufferAddress(srcBuf); asm volatile ( "NEONCopyPLD: \n" " VLDM %[src]!,{d0-d7} \n" " VSTM %[dst]!,{d0-d7} \n" " SUBS %[sz],%[sz],#0x40 \n" " BGT NEONCopyPLD \n" : [dst]"+r"(dst), [src]"+r"(src), [sz]"+r"(sz) : : "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "cc", "memory"); LOGD("Use neon registers for memory copy"); } It's basically used to copy memory by NEON registers. While the compiler complaint when building my application:
Build command failed. Error while executing process /Users/user/Library/Android/sdk/cmake/3.10.2.4988404/bin/ninja with arguments {-C /Users/user/Projects/XarEngine/android/arview/.cxx/Release/5s3f6f2r/arm64-v8a XarEngine} ninja: Entering directory `/Users/user/Projects/XarEngine/android/arview/.cxx/Release/5s3f6f2r/arm64-v8a' [1/2] Building CXX object CMakeFiles/XarEngine.dir/XarEngine/details.cpp.o FAILED: CMakeFiles/XarEngine.dir/XarEngine/details.cpp.o /Users/user/Library/Android/sdk/ndk/21.1.6352462/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ --target=aarch64-none-linux-android21 --gcc-toolchain=/Users/user/Library/Android/sdk/ndk/21.1.6352462/toolchains/llvm/prebuilt/darwin-x86_64 --sysroot=/Users/user/Library/Android/sdk/ndk/21.1.6352462/toolchains/llvm/prebuilt/darwin-x86_64/sysroot -DXarEngine_EXPORTS -D__GIT_TAG__=\"1.3.3-7-g59b0706\" -I../../../../../../components/PlaneTracker/include -I../../../../../../thirdparty/rapidjson -I../../../../../../thirdparty/filament/include -I../../../../../../thirdparty/opencv_4.5.3/include -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -s -O2 -O2 -DNDEBUG -fPIC -MD -MT CMakeFiles/XarEngine.dir/XarEngine/details.cpp.o -MF CMakeFiles/XarEngine.dir/XarEngine/details.cpp.o.d -o CMakeFiles/XarEngine.dir/XarEngine/details.cpp.o -c ../../../../../../XarEngine/details.cpp clang++: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument] ../../../../../../XarEngine/details.cpp:175:48: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] : [dst]"+r"(dst), [src]"+r"(src), [sz]"+r"(sz) : : "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "cc", "memory"); ^ ../../../../../../XarEngine/details.cpp:173:12: note: use constraint modifier "w" " SUBS %[sz],%[sz],#0x40 \n" ^~~~~ %w[sz] ../../../../../../XarEngine/details.cpp:175:48: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] : [dst]"+r"(dst), [src]"+r"(src), [sz]"+r"(sz) : : "d0", "d1", "d2", "d3", "d4", "d5", "d6", "d7", "cc", "memory"); ^ ../../../../../../XarEngine/details.cpp:173:18: note: use constraint modifier "w" " SUBS %[sz],%[sz],#0x40 \n" ^~~~~ %w[sz] ../../../../../../XarEngine/details.cpp:171:6: error: vector register expected " VLDM %[src]!,{d0-d7} \n" ^ <inline asm>:2:12: note: instantiated into assembly here VLDM x0!,{d0-d7} ^ ../../../../../../XarEngine/details.cpp:172:6: error: vector register expected " VSTM %[dst]!,{d0-d7} \n" ^ <inline asm>:3:13: note: instantiated into assembly here VSTM x21!,{d0-d7} ^ 2 warnings and 2 errors generated. ninja: build stopped: subcommand failed. Who can help figuring out above information?
UPDATE
Is it related about compiler? My compiler is clang while above inline assembly should be gcc-compliant
jintto be an integer type that could use"+r".asmstatement. But hopefully you won't need to mess with asm directly, just read it while you tweak the C++ source, to get good performance.dstBufis mapped DMA buffer from GPU memory and is not cached on CPU, call c++ memcpy directly may be very slow for some GPU(e.g. Mali), so we can use NEON register to overcome itvldmlike that.#idef __aarch64__to make sure you use the right inline asm. (What predefined macro can I use to detect the target architecture in Clang? / Get architecture type (ABI) to C preprocessor for Android NDK)