We call these stories "Spikes" and allocate up to three days to each. Once the time is up work stops on the spike and the findings are discussed in the team. Based on those discussions the team then decides about the next steps.
Typically we use spikes for tricky tasks as you call them such as new technologies, new designs, challenging bugs, performances issues, trialing new tools, user interface prototypes, and many more. This list is far from complete.
We create a spike when we are unsure about the content or size of a particular item. In other words the item comes with a lot of uncertainty. We use the spike to reduce or eliminate the uncertainty.
So in that sense I think you use them the "proper" way. You just use a different label for this type of task than we do.