Skip to main content
grammar
Source Link
Ian Cook
  • 2.7k
  • 13
  • 18

My starting point here wouldn't be to look at generic emulation approaches as these themselves produce approximations to the funtionfunction being emulatoremulated. (If an approximation is good enough then why not use the exact reciprocal square root result as an approximation to the approximate function ?)

Instead, here you know the instruction returns an approximation to a known mathematical calculation.

Approximate reciprocal square-root operation were developed some time ago. Initial versions were based on a 'magic' first step followed by 1 or more Newton-Raphson iterations.

Since them similar instructions have appeared in several hardware architecture. (Intel & AMD x86-64, ARM NEON etc.) The exact implementation of these is not generally documented, presumably because you seldom need to reproduce bit-accurate versions of approximate calculations. (The SX-Aurora document you linkslinked to says that its implementedimplementation is system dependent. This, in theory, could mean that different versions or steppings of the same hardware could produce different results.

There is however at least one documented hardware implementation as Intel provided C reference implementations of the AVX-512 versions of their approximation instructions in 2015. These can be found here.

The first thing I'd try to d would be to implement a couple of the the obvious or published implementations and see if they match the results you get from the hardware.

My starting point here wouldn't be to look at generic emulation approaches as these themselves produce approximations to the funtion being emulator. (If an approximation is good enough then why not use the exact reciprocal square root result as an approximation to the approximate function ?)

Instead, here you know the instruction returns an approximation to a known mathematical calculation.

Approximate reciprocal square-root operation were developed some time ago. Initial versions were based on a 'magic' first step followed by 1 or more Newton-Raphson iterations.

Since them similar instructions have appeared in several hardware architecture. (Intel & AMD x86-64, ARM NEON etc.) The exact implementation of these is not generally documented, presumably because you seldom need to reproduce bit-accurate versions of approximate calculations. (The SX-Aurora document you links says its implemented is system dependent. This, in theory, could mean that different versions or steppings of the same hardware could produce different results.

There is however at least one documented hardware implementation as Intel provided C reference implementations of the AVX-512 versions of their approximation instructions in 2015. These can be found here.

The first thing I'd try to d would be to implement a couple of the the obvious or published implementations and see if they match the results you get from the hardware.

My starting point here wouldn't be to look at generic emulation approaches as these themselves produce approximations to the function being emulated. (If an approximation is good enough then why not use the exact reciprocal square root result as an approximation to the approximate function ?)

Instead, here you know the instruction returns an approximation to a known mathematical calculation.

Approximate reciprocal square-root operation were developed some time ago. Initial versions were based on a 'magic' first step followed by 1 or more Newton-Raphson iterations.

Since them similar instructions have appeared in several hardware architecture. (Intel & AMD x86-64, ARM NEON etc.) The exact implementation of these is not generally documented, presumably because you seldom need to reproduce bit-accurate versions of approximate calculations. (The SX-Aurora document you linked to says that its implementation is system dependent. This, in theory, could mean that different versions or steppings of the same hardware could produce different results.

There is however at least one documented hardware implementation as Intel provided C reference implementations of the AVX-512 versions of their approximation instructions in 2015. These can be found here.

The first thing I'd try would be to implement a couple of the obvious or published implementations and see if they match the results you get from the hardware.

Post Undeleted by Ian Cook
added 1327 characters in body
Source Link
Ian Cook
  • 2.7k
  • 13
  • 18

My starting point here wouldn't be to look at generic emulation approachapproaches as these themselves produce approximations to the funtion being emulator. Here (If an approximation is good enough then why not use the exact reciprocal square root result as an approximation to the approximate function ?)

Instead, here you know the instruction returns an approximation to a known mathematical calculation. A approximate

Approximate reciprocal square-root operation were developedsomedeveloped some time ago. Initial versions were based on a 'magic' first step followed by 1 or more Newton-Raphson iterations.

appearsSince them similar instructions have appeared in several hardware architecture. (Intel & AMD x86-64, ARM NEON etc.) The exact implementation of these is not generally documented, presumably because you seldom need to reproduce bit-accurate versions of approximate calculations. (The SX-Aurora document you links says its implemented is system dependent. This, in theory, could mean that different versions or steppings of the same hardware could produce different results.

I'd also expectThere is however at least one documented hardware implementation as Intel provided C reference implementations of the 64 bit variantAVX-512 versions of their approximation instructions in 2015. These can be found here.

The first thing I'd try to d would be to implement a couple of the same fundamental algoritm,the obvious or published implementations and see if they match the results you get from the hardware.

My starting point here wouldn't be to look at generic emulation approach. Here you know the instruction returns an approximation to a known mathematical calculation. A approximate reciprocal square-root operation were developedsome time ago

appears in several architecture

I'd also expect the 64 bit variant to implement the same fundamental algoritm,.

My starting point here wouldn't be to look at generic emulation approaches as these themselves produce approximations to the funtion being emulator. (If an approximation is good enough then why not use the exact reciprocal square root result as an approximation to the approximate function ?)

Instead, here you know the instruction returns an approximation to a known mathematical calculation.

Approximate reciprocal square-root operation were developed some time ago. Initial versions were based on a 'magic' first step followed by 1 or more Newton-Raphson iterations.

Since them similar instructions have appeared in several hardware architecture. (Intel & AMD x86-64, ARM NEON etc.) The exact implementation of these is not generally documented, presumably because you seldom need to reproduce bit-accurate versions of approximate calculations. (The SX-Aurora document you links says its implemented is system dependent. This, in theory, could mean that different versions or steppings of the same hardware could produce different results.

There is however at least one documented hardware implementation as Intel provided C reference implementations of the AVX-512 versions of their approximation instructions in 2015. These can be found here.

The first thing I'd try to d would be to implement a couple of the the obvious or published implementations and see if they match the results you get from the hardware.

Post Deleted by Ian Cook
Source Link
Ian Cook
  • 2.7k
  • 13
  • 18

My starting point here wouldn't be to look at generic emulation approach. Here you know the instruction returns an approximation to a known mathematical calculation. A approximate reciprocal square-root operation were developedsome time ago

appears in several architecture

I'd also expect the 64 bit variant to implement the same fundamental algoritm,.