"CATARACT; #大腿骨~2010"
I need to pick up the 大腿骨 in R using gsub, it is actually unicode that starts with &# followed by a five digits number and then ended with ;.
I know how to get rid of these unicode using the following:
gsub("&#[0-9]+;","","CATARACT; #大腿骨~2010")
But how can I retain these unicode using gsub?
Edit 01
My desired output is 大腿骨.
Edit 02
Thanks for the answer, but what if the pattern is not always like that, I need to pick up the unicode no matter where they are:
"CATARACT; #大腿骨~2010;CATARACT; #夨膀骩~2010"