Skip to main content
Added reference to article
Source Link
Dunk
  • 5.1k
  • 1
  • 22
  • 25

I don't have a definitive answer because it would take a bit of analysis. It also depends on the CPU, how many bits etc. But for a ballpark number.

FPGA's have been built that will do a divide operation in 1 clock cycle. A general purpose CPU/ALU can take 20 to 100 clock cycles. DSP processors will likely take less clock cycles. So we can say a FPGA is 100x faster.

That's just the divide operation (which gives you the modulo also).

You would still have to account for memory transfers. Would a dedicated FPGA need to do this? A general purpose CPU would definitely need to do it. This makes the FPGA all the more faster if it could eliminate most of its memory transfers. At this point 1000x improvement seems realistic.

However, I didn't read the algorithm but I'm guessing there's some table updates/lookups, this would probably require memory transfers to/from the FPGA, which slows down the FPGA processing to be more equivalent to the CPU processing. Even if the FPGA has dedicated extra fast memory, this would still cause that 1000x guesstimate to drop somewhat.

Anyways, my estimate would be a 100x-1000x speed improvement with an FPGA in the best of circumstances.

UPDATE

I came across this article http://research.microsoft.com/apps/pubs/default.aspx?id=70636 (a little dated 2008) but probably still applicable. It is called "Where's the Beef? Why FPGAs are so fast" They did various timing comparisons including 128-bit AES encryption. For the AES encryption they were able to get about a 4000 times increase from the simplistic software implementation to a highly optimized FPGA. Although, I think there were also software optimizations that could be made that reduced that 4000 times number.

I don't know if the state-of-the-art FPGA/Custom Hardware has improved dramatically compared to state-of-the-art CPUs of today, so I don't know how applicable this info is any longer. But, I still think the 100x to 1000x is a good estimate as I would assume that the software would be optimized.

I don't have a definitive answer because it would take a bit of analysis. It also depends on the CPU, how many bits etc. But for a ballpark number.

FPGA's have been built that will do a divide operation in 1 clock cycle. A general purpose CPU/ALU can take 20 to 100 clock cycles. DSP processors will likely take less clock cycles. So we can say a FPGA is 100x faster.

That's just the divide operation (which gives you the modulo also).

You would still have to account for memory transfers. Would a dedicated FPGA need to do this? A general purpose CPU would definitely need to do it. This makes the FPGA all the more faster if it could eliminate most of its memory transfers. At this point 1000x improvement seems realistic.

However, I didn't read the algorithm but I'm guessing there's some table updates/lookups, this would probably require memory transfers to/from the FPGA, which slows down the FPGA processing to be more equivalent to the CPU processing. Even if the FPGA has dedicated extra fast memory, this would still cause that 1000x guesstimate to drop somewhat.

Anyways, my estimate would be a 100x-1000x speed improvement with an FPGA in the best of circumstances.

I don't have a definitive answer because it would take a bit of analysis. It also depends on the CPU, how many bits etc. But for a ballpark number.

FPGA's have been built that will do a divide operation in 1 clock cycle. A general purpose CPU/ALU can take 20 to 100 clock cycles. DSP processors will likely take less clock cycles. So we can say a FPGA is 100x faster.

That's just the divide operation (which gives you the modulo also).

You would still have to account for memory transfers. Would a dedicated FPGA need to do this? A general purpose CPU would definitely need to do it. This makes the FPGA all the more faster if it could eliminate most of its memory transfers. At this point 1000x improvement seems realistic.

However, I didn't read the algorithm but I'm guessing there's some table updates/lookups, this would probably require memory transfers to/from the FPGA, which slows down the FPGA processing to be more equivalent to the CPU processing. Even if the FPGA has dedicated extra fast memory, this would still cause that 1000x guesstimate to drop somewhat.

Anyways, my estimate would be a 100x-1000x speed improvement with an FPGA in the best of circumstances.

UPDATE

I came across this article http://research.microsoft.com/apps/pubs/default.aspx?id=70636 (a little dated 2008) but probably still applicable. It is called "Where's the Beef? Why FPGAs are so fast" They did various timing comparisons including 128-bit AES encryption. For the AES encryption they were able to get about a 4000 times increase from the simplistic software implementation to a highly optimized FPGA. Although, I think there were also software optimizations that could be made that reduced that 4000 times number.

I don't know if the state-of-the-art FPGA/Custom Hardware has improved dramatically compared to state-of-the-art CPUs of today, so I don't know how applicable this info is any longer. But, I still think the 100x to 1000x is a good estimate as I would assume that the software would be optimized.

Source Link
Dunk
  • 5.1k
  • 1
  • 22
  • 25

I don't have a definitive answer because it would take a bit of analysis. It also depends on the CPU, how many bits etc. But for a ballpark number.

FPGA's have been built that will do a divide operation in 1 clock cycle. A general purpose CPU/ALU can take 20 to 100 clock cycles. DSP processors will likely take less clock cycles. So we can say a FPGA is 100x faster.

That's just the divide operation (which gives you the modulo also).

You would still have to account for memory transfers. Would a dedicated FPGA need to do this? A general purpose CPU would definitely need to do it. This makes the FPGA all the more faster if it could eliminate most of its memory transfers. At this point 1000x improvement seems realistic.

However, I didn't read the algorithm but I'm guessing there's some table updates/lookups, this would probably require memory transfers to/from the FPGA, which slows down the FPGA processing to be more equivalent to the CPU processing. Even if the FPGA has dedicated extra fast memory, this would still cause that 1000x guesstimate to drop somewhat.

Anyways, my estimate would be a 100x-1000x speed improvement with an FPGA in the best of circumstances.