My starting point here wouldn't be to look at generic emulation approaches as these themselves produce approximations to the funtionfunction being emulatoremulated. (If an approximation is good enough then why not use the exact reciprocal square root result as an approximation to the approximate function ?)
Instead, here you know the instruction returns an approximation to a known mathematical calculation.
Approximate reciprocal square-root operation were developed some time ago. Initial versions were based on a 'magic' first step followed by 1 or more Newton-Raphson iterations.
Since them similar instructions have appeared in several hardware architecture. (Intel & AMD x86-64, ARM NEON etc.) The exact implementation of these is not generally documented, presumably because you seldom need to reproduce bit-accurate versions of approximate calculations. (The SX-Aurora document you linkslinked to says that its implementedimplementation is system dependent. This, in theory, could mean that different versions or steppings of the same hardware could produce different results.
There is however at least one documented hardware implementation as Intel provided C reference implementations of the AVX-512 versions of their approximation instructions in 2015. These can be found here.
The first thing I'd try to d would be to implement a couple of the the obvious or published implementations and see if they match the results you get from the hardware.