0

This question is kind of long-detailed and took a lot of my time; hope you read it till end; thanks for your patience

I've find out that size of data in some programming languages like python isn't really matter, and they can store very big values like huge numbers; and then increase them to something even bigger.
I work with c and I know in that language, variables must have a particular type and size. To store huge integers (which is very bigger than the largest type in c), I know 2 different ways:

First method: Using char pointer and store each digit at particular index

  • First value of pointer can tell us the sign.
  • Pointer can stop at particular value (like \0 in strings but now we use $).

to store 123456789 pointer would be like this:

++++++++++++++++++++++++++++++++++++++++++++ | + | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9| $ | ++++++++++++++++++++++++++++++++++++++++++++ ^ ^ | | sign finisher 

Second method: Using unsigned int pointer and store numbers like computers store binaries (I mean store them based on size of int)

  • First value of pointer can tell us the sign
  • Second value of pointer can tell us the length

to store 123456789 pointer would be like this:

// 123456789 MOD 2^32 = 123456789 +++++++++++++++++++++++++++++++++++ | + | 1 |123456789| +++++++++++++++++++++++++++++++++++ ^ ^ ^ | | | sign length value 

and to store something bigger like 9998877665544332211 the pointer would be as in below:

// 9998877665544332211 / 2^32 = 2328045122 // 9998877665544332211 MOD 2^32 = 2942002099 +++++++++++++++++++++++++++++++++++++++++++++++ | + | 2 |2942002099|2328045122| +++++++++++++++++++++++++++++++++++++++++++++++ ^ ^ ^ ^ | | | | sign length +---------+ | values 

For simplification I don't use only first value of pointer to keep both sign and size.

issues:

  • If I use First method, it will waste a lot of memory (e.g. for 18-digit First method needs 20 bytes and Second method needs 16 bytes; the difference will be greater in bigger values).

  • Second method needs more calculation if we want print it as decimal.

Questions:
Which method you suggest me to use?
Which method is more similar to that one languages like python use?
Is there any better method?

PS: Yeah, using struct can make it easier, but that's not my point; I just wanna know storing each digit separately is better than storing number based on int's size or not. Please don't refer me to external libraries like GMP. I wanna know what such libraries internally do.

11
  • 1
    Obviously if you are going to do arithmetic, storing one digit per element is going to be a lot less efficient. You can see that from the trivial example 1234 * 5678 when storing 1 digit per element will require 16 multiplications and more additions with the basic 'longhand' method, but a larger radix will only require 1 operation. Commented Aug 5, 2022 at 18:20
  • 1
    There are quite a few 'big number' libraries. However, asking about them is officially off-topic on SO. GCC uses GNU MultiPrecision (GMP), and MPC and MPFR, for example. You can perform searches for similar names to find your own options. Commented Aug 5, 2022 at 18:30
  • 1
    How long is a piece of string? It depends on how much and what computation you need to do. On the one hand a power-of-2 radix can ease the computation, but a compromise is to use 10^9 radix (with 32-bit elements) making the conversion to and from a decimal string quite easy. Commented Aug 5, 2022 at 18:31
  • I wanna know what such libraries internally do. Set off and write one! Commented Aug 5, 2022 at 18:35
  • Storing as individual decimal digits is pretty much the worst of both worlds. Yes, it's easy to input and print out as decimal, but it's hugely inefficient in both space and time. If you store an array of bigger integers, as you've shown, the question is, what's the "base"? You showed doing things mod 2^32, which makes for maximally efficient storage and computation, but difficulty converting to/from base 10. But you could also store things mod 10^9, which is just about as efficient in space and speed, and also easy to convert to/from base 10. Commented Aug 5, 2022 at 18:37

2 Answers 2

3

Neither is ideal. For both storage and speed the best way (in a generic context) is having your number binary encoded in 2s complement in an array of bits grouped by the largest data size of the underlying architecture (e.g. unsigned long long for x64 or unsigned int for x86).

more calculation if we want print it as decimal.

Unless you have a very specific use case, most of the time is spent in calculations between bignums. To and from decimal conversion is usually done only on IO which is slow anyway so it doesn't matter that much. Slower conversion to and from decimal is an acceptable tradeoff in favor of bignum calculation performance.

I definitely don't recommend building your own bignum library except for academic purposes (which can be an excellent exercise if you are into this). Beyond difficulty of implementation, making it fast requires deep knowledge and vast experience in system architectures, operating systems and compilers. Just use an existing proven library like bignum, tiny-bignum, GMP etc.

wanna know what such libraries internally do.

Most of them are open source so ... go look around!

Sign up to request clarification or add additional context in comments.

2 Comments

Absolutely right on the difficulties — but I would add that it is an excellent academic exercise! (I've got a homebrew bignum library I play with, and while it's definitely a gratuitous wheel reinvention and will never come anywhere close to the performance of GMP, it's tons of fun and I've learned a lot from writing it. 121883461705416278634058248038432770474079746486049)
@SteveSummit absolutely 👍
2

What you're looking for is arbitrary precision arithmetic or "bignum". It rapidly gets complicated to do it efficiently and is way too much to explain in an answer here.

Is there any better method?

Yes, use an existing library.

Which method is more similar to that one languages like python use?

libmpedc is what Python uses. Ruby will use GMP if available.

As an example, this is the decimal struct from libmpedc.

#include <mpdecimal.h> typedef struct mpd_t { uint8_t flags; // [memory flags] | [specials] | sign mpd_ssize_t exp; // exponent mpd_ssize_t digits; // number of decimal digits mpd_ssize_t len; // number of words in use mpd_ssize_t alloc; // number of allocated words mpd_uint_t *data; // words } mpd_t; 

It's using unsigned integers, but also a lot of other tricks.

I would suggest poking around in their source code to learn more. Particularly mpd_set_string (creating a decimal from a string) and the basic arithmetic functions.

1 Comment

That's what Python's decimal module uses. Python's native integers and rationals use an idiosyncratic bignum implementation based on 30- or 15-bit binary units (depending on the platform). There's also a Python binding for GMP.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.