Decide a C integer literal

Question

Objective

Given an ASCII string, decide whether it is a valid C integer literal.

C integer literal

A C integer literal consists of:

One of:
- 0 followed by zero or more octal digits (0–7)
- A nonzero decimal digit followed by zero or more decimal digits (0–9)
- 0X or 0x, followed by one or more hexadecimal digits (0–9, A–F, and a–f)
optionally followed by one of:
- One of U or u, which are the "unsigned" suffixes
- One of L, l, LL, or ll, which are the "long" and "long long" suffixes
- Any combination of the above, in any order.

Note that there can be arbitrarily many digits, even though C doesn't support arbitrary-length integers. Likewise, even if the literal with l and co would overflow the long type or co, it is still considered a valid literal.

Also note that there must not be a leading plus or minus sign, for it is not considered to be a part of the literal.

Rules

It is implementation-defined to accept leading or trailing whitespaces.
Non-ASCII string falls in don't care situation.

Examples

Truthy

0
007
42u
42lu
42UL
19827489765981697847893769837689346573uLL (Digits can be arbitrarily many even if it wouldn't fit the unsigned long long type)
0x8f6aa032838467beee3939428l (So can to the long type)
0XCa0 (You can mix cases)

Falsy

08 (Non-octal digit)
0x (A digit must follow X or x)
-42 (Leading signature isn't a part of the literal)
42Ll (Only LL or ll is valid for the long long type)
42LLLL (Redundant type specifier)
42Uu (Redundant type specifier)
42Ulu (Redundant type specifier)
42lul (Redundant type specifier)
42H (Invalid type specifier)
0b1110010000100100001 (Valid C++, but not valid C)
Hello
Empty string

Ungolfed solution

Haskell

Doesn't recognize leading or trailing whitespaces.

Returns () on success. Monadic failure otherwise.

import Text.ParserCombinators.ReadP decideCIntegerLit :: ReadP () decideCIntegerLit = do choice [ do '0' <- get munch (flip elem "01234567"), do satisfy (flip elem "123456789") munch (flip elem "0123456789"), do '0' <- get satisfy (flip elem "Xx") munch1 (flip elem "0123456789ABCDEFabcdef") ] let unsigned = satisfy (flip elem "Uu") let long = string "l" +++ string "L" +++ string "ll" +++ string "LL" (unsigned >> long >> return ()) +++ (optional long >> optional unsigned) eof

Suggested falsey test cases: 1L1L, 0xabucdlu (or any other test case with an l/L/u somewhere in the middle, making it invalid). — Kevin Cruijssen
– Kevin Cruijssen, Commented Oct 14, 2020 at 8:33
Suggested test-case: 2-1 (starts with a digit and is a valid C constant-expression, but not a bare integer literal). So for example feeding a=2-1; or a[2-1]; to a C compiler wouldn't reject it. (Working on a bash answer that uses cc -c after testing the first digit, trying to let a compiler do the heavy lifting.) — Peter Cordes
– Peter Cordes, Commented Oct 14, 2020 at 12:56
Suggested test case: 0o765. This is a valid octal literal in many languages that might try to get away with a built-in "eval" / "read-int" sort of approach, but it's not valid C. — lynn
– lynn, Commented Oct 14, 2020 at 16:21
This challenge looks outdated so quickly. Modern C supports 0b and 0o literal prefixes now. — Explorer09
– Explorer09, Commented Sep 5 at 8:29

Neil · Accepted Answer · 2020-10-14 10:31:47Z

Retina 0.8.2, 60 59 bytes

i`^(0[0-7]*|0x[\da-f]+|[1-9]\d*)(u)?(l)?(?-i:\3?)(?(2)|u?)$

Try it online! Link includes test cases. Edit: Saved 1 byte thanks to @FryAmTheEggMan. Explanation:

i`

Match case-insensitively.

^(0[0-7]*|0x[\da-f]+|[1-9]\d*)

Start with either octal, hex or decimal.

(u)?

Optional unsigned specifier.

(l)?

Optional length specifier.

(?-i:\3?)

Optionally repeat the length specifier case sensitively.

(?(2)|u?)$

If no unsigned specifier yet, then another chance for an optional specifier, before the end of the literal.

\$\begingroup\$ You can use \d in the hex character class, too. \$\endgroup\$

FryAmTheEggman
– FryAmTheEggman

2020-10-14 00:22:17 +00:00
Commented Oct 14, 2020 at 0:22 — FryAmTheEggman
– FryAmTheEggman, Commented Oct 14, 2020 at 0:22

skytomo · Accepted Answer · 2020-10-16 23:22:27Z

C# (.NET Core), 197 191 bytes

@nwellnhof shaved 6bytes:

using c=System.Console;class P{static void Main(){c.WriteLine(System.Text.RegularExpressions.Regex.IsMatch(c.ReadLine(),@"^(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)$"));}}

Original:

using c=System.Console;using System.Text.RegularExpressions;class P{static void Main(){c.WriteLine(Regex.IsMatch(c.ReadLine(),@"^(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)$"));}}

Try it online!

Since Regex is used only once, you can write System.Text.RegularExpressions.Regex and remove the using statement, saving 6 bytes. — nwellnhof
– nwellnhof, Commented Oct 16, 2020 at 13:38

Xcali · Accepted Answer · 2020-10-14 15:40:34Z

5

Perl 5 `-p`, 65 61 bytes

@NahuelFouilleul shaved 4 bytes

$_=/^(0[0-7]*|0x\p{Hex}+|[1-9]\d*)(u?l?l?|l?l?u?)$/i*!/lL|Ll/

Try it online!

edited Oct 14, 2020 at 15:40

answered Oct 14, 2020 at 5:01

Xcali

17k2 gold badges17 silver badges42 bronze badges

\$\begingroup\$ could save 2 bytes using l?l? instead of l{0,2} \$\endgroup\$

Nahuel Fouilleul
– Nahuel Fouilleul

2020-10-14 09:24:06 +00:00
Commented Oct 14, 2020 at 9:24
1

\$\begingroup\$ 61 bytes \$\endgroup\$

Nahuel Fouilleul
– Nahuel Fouilleul

2020-10-14 09:50:50 +00:00
Commented Oct 14, 2020 at 9:50

Add a comment |

Kevin Cruijssen · Accepted Answer · 2020-10-15 07:10:40Z

Java 8 / Scala polyglot, 89 79 bytes

s->s.matches("(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\\d*|0x[\\da-f]+)(u?l?l?|l?l?u?)")

-10 bytes thanks to @NahuelFouilleul

Try it online in Java 8.
Try it online in Scala (except with => instead of -> - thanks to @TomerShetah).

Explanation:

s-> // Method with String parameter and boolean return-type s.matches( // Check whether the input-string matches the regex "(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\\d*|0x[\\da-f]+)(u?l?l?|l?l?u?)")

Regex explanation:

In Java, the String#matches method implicitly adds a leading and trailing ^...$ to match the entire string, so the regex is:

^(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)$

 (?! ) # The string should NOT match: ^ .* # Any amount of leading characters ( ) # Followed by: Ll # "Ll" |lL # Or "lL" # (Since the `?!` is a negative lookahead, it acts loose from the # rest of the regex below) (?i) # Using case-insensitivity, ^ ( # the string should start with: 0 # A 0 [0-7]* # Followed by zero or more digits in the range [0,7] | # OR: [1-9] # A digit in the range [1,9] \d* # Followed by zero or more digits | # OR: 0x # A "0x" [ ]+ # Followed by one or more of: \d # Digits a-f # Or letters in the range ['a','f'] )( # And with nothing in between, )$ # the string should end with: u? # An optional "u" l?l? # Followed by no, one, or two "l" | # OR: l?l? # No, one, or two "l" u? # Followed by an optional "u"

@NahuelFouilleul Ah, smart way to use the case-insensitivity after we've checked the Ll/lL. Didn't even knew that was possible. Thanks! — Kevin Cruijssen
– Kevin Cruijssen, Commented Oct 14, 2020 at 10:20
@TomerShetah Thanks for mentioning. I've added it as a polyglot. :) — Kevin Cruijssen
– Kevin Cruijssen, Commented Oct 14, 2020 at 11:02
It's also a Java/Kotlin polyglot, since Kotlin also uses a -> and Scala uses => — user
– user, Commented Oct 14, 2020 at 12:38

hyperneutrino · Accepted Answer · 2020-10-14 19:55:39Z

Python 3, 103 bytes

import re;re.compile("^(0[0-7]*|[1-9]\d*|0[xX][\dA-Fa-f]+)([uU](L|l|LL|ll)?|(L|l|LL|ll)[uU]?)?$").match

Try it online!

just a basic regex, probably very suboptimal

returns a match object for truthy and None for falsy; input may not contain surrounding whitespace

-3 bytes thanks to Digital Trauma (on my Retina answer)
-1 byte thanks to FryAmTheEggman (on my Retina answer)
-3 bytes thanks to pxeger

\$\begingroup\$ This is why regexes are so fun. \$\endgroup\$

Dannyu NDos
– Dannyu NDos

2020-10-14 00:02:59 +00:00
Commented Oct 14, 2020 at 0:02 — Dannyu NDos
– Dannyu NDos, Commented Oct 14, 2020 at 0:02
\$\begingroup\$ 103 bytes \$\endgroup\$

pxeger
– pxeger

2020-10-14 19:25:13 +00:00
Commented Oct 14, 2020 at 19:25 — pxeger
– pxeger, Commented Oct 14, 2020 at 19:25
\$\begingroup\$ @pxeger Oh cool, thanks! \$\endgroup\$

hyperneutrino
– hyperneutrino ♦

2020-10-14 19:55:31 +00:00
Commented Oct 14, 2020 at 19:55 — hyperneutrino
– hyperneutrino ♦, Commented Oct 14, 2020 at 19:55

hyperneutrino · Accepted Answer · 2020-10-14 00:20:42Z

3

Retina 0.8.2, 73 bytes

^(0[0-7]*|[1-9]\d*|0[xX][\dA-Fa-f]+)([uU](L|l|LL|ll)?|(L|l|LL|ll)[uU]?)?$

Try it online!

Just the same regex I used. First time using Retina, I'm sure this can be optimized with some Retina golf things!

-3 bytes thanks to Digital Trauma
-1 byte thanks to FryAmTheEggman

edited Oct 14, 2020 at 0:20

answered Oct 14, 2020 at 0:04

hyperneutrino♦

42.8k5 gold badges72 silver badges227 bronze badges

\$\begingroup\$ Also never used Retina, but 55 bytes? \$\endgroup\$

caird coinheringaahing
– caird coinheringaahing ♦

2020-10-14 00:07:28 +00:00
Commented Oct 14, 2020 at 0:07
\$\begingroup\$ @cairdcoinheringaahing I thought of that; unfortunately, no. but thanks for trying :P \$\endgroup\$

hyperneutrino
– hyperneutrino ♦

2020-10-14 00:07:53 +00:00
Commented Oct 14, 2020 at 0:07

Add a comment |

a stone arachnid · Accepted Answer · 2025-09-05 17:24:05Z

C (clang), 207 200 bytes

u;l;o;f(char*s){l=*s-48?12:(32|*++s)=='x'?s++,0:14;if(l<14&&!*s)return 0;s+=strspn(s,"ABCDEFabcdef9876543210"+l);for(u=l=0;o=*s&~32;s++)if(o==76&&!l++)s[1]==*s&&s++;else if(u++||o-85)break;return!*s;}

Try it online!

A C solution for a C challenge.

Takes input as a pointer to a string, and returns 1/0 for truthy/falsy.

Explained:

int u, l; int f(char*s){ /* assign offset for strtok */ l = (*s-48) /* if the first character is not '0': */ ? 12 /* use decimal "9876543210" */ : (32|*++s) == 'x' /* else if the first two characters are "0x"/"0X": */ ? s++, 0 /* use all characters (l = 0) */ : 14; /* else use octal */ /* handle the case when the string is empty. * This is also the case when a "0" is input, because of the ++s above * If this is the case, we are using the octal offset, so check that as well */ if(l < 14 && !*s) return 0; /* skip past all the digits that we consider valid */ s += strspn(s, "ABCDEFabcdef9876543210" + l); /* handle the U and L part */ /* u and l are used to count if we have a u or an l yet */ /* o contains the current character, uppercased */ for(u=l=0; o=*s & ~32; s++) if(o == 76 && !l++) /* if it's an L, and we don't already have an L: */ s[1] == *s && s++;/* if the next character is exactly the same (either ll or LL, then advance the string so it doesn't get counted as 2 l's */ else if(o-85||u) /* else, if we're not a U, or we already have a U, then it's a fail */ break; else u++; /* otherwise it's a U */ /* if, after removing the digits and counting the L's and U's, the string is empty so *s will be 0 */ /* if it failed, then *s isn't zero */ return !*s; }

\$\begingroup\$ 190 bytes \$\endgroup\$

jdt
– jdt

2025-09-05 18:13:14 +00:00
Commented Sep 5 at 18:13 — jdt
– jdt, Commented Sep 5 at 18:13
\$\begingroup\$ 185 bytes \$\endgroup\$

ceilingcat
– ceilingcat

2025-09-07 22:11:47 +00:00
Commented Sep 7 at 22:11 — ceilingcat
– ceilingcat, Commented Sep 7 at 22:11

Toby Speight · Accepted Answer · 2025-09-09 12:42:57Z

C, 121 bytes

u;f(char*s)char*p;strtol(s,&p,0);p+=u=(*p32)==117;p+=*p==p[1];p+=(*p|32)==108;p+=!u*(*p|32)==117;return(*s^48)<10*!*p;}

Since input is encoded in ASCII, we use numeric codepoints so that this works regardless of compilation character set.

How it works:

We use library function strtol() to set p to end of numeric part, then advance it past u and l suffixes and check that it's the end of string. Also test that the string begins with a digit (since strtol() will consume leading whitespace and sign).

int f(const char *s) { const char *p; (void)strtol(s, &p, 0); int u = 0; if (tolower(*p) == 'u') ++u,++p; if (*p == *(p+1)) ++p; if (tolower(*p) == 'l') ++p; if (!u && tolower(*p) == 'u') ++p; return *p == '\0' && *s >= '0' && *s <= '9'; }

Try it online!

It looks like using wcstol instead of wcstoul works for the given examples. — jdt
– jdt, Commented Sep 5 at 18:48
@jdt, that's not portable (compilation error when I tried it in GCC), so probably needs entering as a different language. — Toby Speight
– Toby Speight, Commented Sep 6 at 6:50

Neil · Accepted Answer · 2020-10-14 11:15:25Z

Charcoal, 76 bytes

≔⊟Φ³¬⌕↧θ…0xιη≔✂↧θη⁻ＬθＬ⊟Φ⪪”{“↧←；⭆δa”¶⁼ι↧…⮌θＬι¹ζ›∧⁺Ｌζ¬⊖η⬤ζ№Ｅ∨×⁸ηχ⍘λφι∨№θLl№θlL

Try it online! Link is to verbose version of code. Explanation:

≔⊟Φ³¬⌕↧θ…0xιη

Find the length of the longest prefix of 0x in the lowercased input.

≔✂↧θη⁻ＬθＬ⊟Φ⪪”{“↧←；⭆δa”¶⁼ι↧…⮌θＬι¹ζ

Slice off the prefix and also check for a lowercase suffix of ull, ul, llu or lu, and if so then slice that off as well.

›...∨№θLl№θlL

The original input must not contain Ll or lL.

∧⁺Ｌζ¬⊖η

The sliced string must not be empty unless the prefix was 0.

⬤ζ№Ｅ∨×⁸ηχ⍘λφι

Convert the prefix length to 10, 8 or 16 appropriately, then take that many base 62 digits and check that all of the remaining lowercased characters are one of those digits.

Kevin Cruijssen · Accepted Answer · 2020-10-14 14:21:09Z

05AB1E, 63 61 62 bytes

„Uuõª„LLæDl«âDí«JéRʒÅ¿}нõ.;Ðć_ilDć'xQiA6£мÐþQë\7ÝKõQë\þQ}sõÊ*

This isn't too easy without regexes.. :/ Can definitely be golfed a bit more, though.

+1 byte as bug-fix for inputs like "u", "l", "LL", etc. (thanks for noticing @Neil)

Try it online or verify all test cases.

Explanation:

„Uu # Push string "Uu" õª # Convert it to a list of characters, and append an empty string: # ["U","u",""] „LL # Push string "LL" æ # Take its powerset: ["","L","L","LL"] Dl # Create a lowercase copy: ["","l","l","ll"] « # Merge the lists together: ["","L","L","LL","","l","l","ll"] â # Create all possible pairs of these two lists Dí # Create a copy with each pair reversed « # Merge the list of pairs together J # Join each pair together to a single string éR # Sort it by length in descending order

We now have the list:

["llu","LLu","llU","LLU","ull","uLL","Ull","ULL","ll","LL","lu","lu","Lu","Lu","lU","lU","LU","LU","ll","LL","ul","ul","uL","uL","Ul","Ul","UL","UL","l","l","L","L","u","u","U","U","l","l","L","L","u","u","U","U","","","",""]

ʒ # Filter this list by: Å¿ # Where the (implicit) input ends with this string }н # After the filter: only leave the first (longest) one õ.; # And remove the first occurrence of this in the (implicit) input ÐD # Triplicate + duplicate (so there are 4 copies on the stack now) ć # Extract head; pop and push remainder-string and first character # separated to the stack _i # If this first character is a 0: l # Convert the remainder-string to lowercase D # Duplicate it †¹ ć # Extract head again 'xQi '# If it's equal to "x": A # Push the lowercase alphabet 6£ # Only leave the first 6 characters: "abcdef" м # Remove all those characters from the string Ð # Triplicate it †² þ # Only keep all digits in the copy Q # And check that the two are still the same # (thus it's a non-negative integer without decimal .0s) ë # Else: \ # Discard the remainder-string 7Ý # Push list [0,1,2,3,4,5,6,7] K # Remove all those digits õQ # Check what remains is an empty string ë # Else: \ # Discard the remainder-string þ # Only keep all digits Q # And check that the two are still the same # (thus it's a non-negative integer without decimal .0s) }s # After the if-else: Swap the two values on the stack # (this will get the remaining copy of †² for "0x" cases, # or the remaining copy of †¹ for other cases) õÊ # Check that this is NOT an empty string * # And check that both are truthy # (after which the result is output implicitly)

\$\begingroup\$ This incorrectly outputs 1 for u... \$\endgroup\$

Neil
– Neil

2020-10-14 14:15:01 +00:00
Commented Oct 14, 2020 at 14:15 — Neil
– Neil, Commented Oct 14, 2020 at 14:15
\$\begingroup\$ @Neil Thanks for noticing. Fixed at the cost of 1 byte. \$\endgroup\$

Kevin Cruijssen
– Kevin Cruijssen

2020-10-14 14:21:19 +00:00
Commented Oct 14, 2020 at 14:21 — Kevin Cruijssen
– Kevin Cruijssen, Commented Oct 14, 2020 at 14:21

Noodle9 · Accepted Answer · 2020-10-14 15:31:31Z

AWK, 86 bytes

{print/^(0[0-7]*|[1-9][0-9]*|0[xX][0-9A-Fa-f]+)([uU](L|l|LL|ll)?|(L|l|LL|ll)[uU]?)?$/}

Try it online!

Simply prints truthy or falsey depending on whether or not the input line matches the regex. Doesn't accept leading or trailing whitespaces.

pxeger · Accepted Answer · 2020-10-14 20:08:01Z

2

Elixir, 74 bytes

&(&1=~~r/^(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)?$/i&&!(&1=~~r/Ll/))

Try it online!

answered Oct 14, 2020 at 20:08

pxeger

25.3k4 gold badges59 silver badges146 bronze badges

Add a comment |

Arnauld · Accepted Answer · 2021-03-19 14:02:05Z

JavaScript (ES6), 77 76 bytes

Saved 1 byte thanks to @l4m2

s=>/^(0x[\da-f]+|0[0-7]*|[1-9]\d*)(u?l?l?|l?l?u?)$/i.test(s)>/Ll|lL/.test(s)

Try it online!

How?

The first regex is case-insensitive. The only invalid patterns that cannot be filtered out that way are "Ll" and "lL". So we use a 2nd case-sensitive regex to take care of them.

\$\begingroup\$ &! => >.... \$\endgroup\$

l4m2
– l4m2

2021-03-19 13:30:01 +00:00
Commented Mar 19, 2021 at 13:30 — l4m2
– l4m2, Commented Mar 19, 2021 at 13:30

Adamátor · Accepted Answer · 2025-09-04 19:55:01Z

Janet, 112 bytes

|(peg/match~(cmt(*(+(*(+"0X""0x"):h+)(*"0"(any(range"07"))):d+)(+"U""u"'0)(+"LL""ll""L""l"0)(+"U""u"'0)-1),or)$)

Janet’s PEGs are a tiiny bit more verbose than regexes :P

I’m using quite a dirty trick to prevent things like 42Ulu from matching. The pattern (+"U""u"'0) matches an optional U or u, but captures an empty string if the U/u is not present. The entire pattern is then wrapped in a cmt with the or function to check if something was captured.

lynn · Accepted Answer · 2020-10-14 12:17:42Z

Haskell, 169 bytes

import Data.Char s!p=s>""&&dropWhile p s`elem`do u<-["","u","U"];l<-"":words"L l LL ll";[u++l,l++u] f('0':x:s)|elem x"xX"=s!isHexDigit|1<2=(x:s)!isOctDigit f s=s!isDigit

Try it online!

Stack Exchange Network

Decide a C integer literal

Objective

C integer literal

Rules

Examples

Truthy

Falsy

Ungolfed solution

Haskell

15 Answers 15

Retina 0.8.2, 60 59 bytes

C# (.NET Core), 197 191 bytes

Perl 5 `-p`, 65 61 bytes

Java 8 / Scala polyglot, 89 79 bytes

Python 3, 103 bytes

Retina 0.8.2, 73 bytes

C (clang), 207 200 bytes

C, 121 bytes

How it works:

Charcoal, 76 bytes

05AB1E, 63 61 62 bytes

AWK, 86 bytes

Elixir, 74 bytes

JavaScript (ES6), 77 76 bytes

How?

Janet, 112 bytes

Haskell, 169 bytes

Hot Network Questions

Decide a C integer literal

Objective

C integer literal

Rules

Examples

Truthy

Falsy

Ungolfed solution

Haskell

15 Answers 15

Retina 0.8.2, 60 59 bytes

C# (.NET Core), 197 191 bytes

Perl 5 -p, 65 61 bytes

Java 8 / Scala polyglot, 89 79 bytes

Python 3, 103 bytes

Retina 0.8.2, 73 bytes

C (clang), 207 200 bytes

C, 121 bytes

How it works:

Charcoal, 76 bytes

05AB1E, 63 61 62 bytes

AWK, 86 bytes

Elixir, 74 bytes

JavaScript (ES6), 77 76 bytes

How?

Janet, 112 bytes

Haskell, 169 bytes

Related

Hot Network Questions

Perl 5 `-p`, 65 61 bytes