26
\$\begingroup\$

Objective

Given an ASCII string, decide whether it is a valid C integer literal.

C integer literal

A C integer literal consists of:

  • One of:

    • 0 followed by zero or more octal digits (07)

    • A nonzero decimal digit followed by zero or more decimal digits (09)

    • 0X or 0x, followed by one or more hexadecimal digits (09, AF, and af)

  • optionally followed by one of:

    • One of U or u, which are the "unsigned" suffixes

    • One of L, l, LL, or ll, which are the "long" and "long long" suffixes

    • Any combination of the above, in any order.

Note that there can be arbitrarily many digits, even though C doesn't support arbitrary-length integers. Likewise, even if the literal with l and co would overflow the long type or co, it is still considered a valid literal.

Also note that there must not be a leading plus or minus sign, for it is not considered to be a part of the literal.

Rules

  • It is implementation-defined to accept leading or trailing whitespaces.

  • Non-ASCII string falls in don't care situation.

Examples

Truthy

  • 0

  • 007

  • 42u

  • 42lu

  • 42UL

  • 19827489765981697847893769837689346573uLL (Digits can be arbitrarily many even if it wouldn't fit the unsigned long long type)

  • 0x8f6aa032838467beee3939428l (So can to the long type)

  • 0XCa0 (You can mix cases)

Falsy

  • 08 (Non-octal digit)

  • 0x (A digit must follow X or x)

  • -42 (Leading signature isn't a part of the literal)

  • 42Ll (Only LL or ll is valid for the long long type)

  • 42LLLL (Redundant type specifier)

  • 42Uu (Redundant type specifier)

  • 42Ulu (Redundant type specifier)

  • 42lul (Redundant type specifier)

  • 42H (Invalid type specifier)

  • 0b1110010000100100001 (Valid C++, but not valid C)

  • Hello

  • Empty string

Ungolfed solution

Haskell

Doesn't recognize leading or trailing whitespaces.

Returns () on success. Monadic failure otherwise.

import Text.ParserCombinators.ReadP decideCIntegerLit :: ReadP () decideCIntegerLit = do choice [ do '0' <- get munch (flip elem "01234567"), do satisfy (flip elem "123456789") munch (flip elem "0123456789"), do '0' <- get satisfy (flip elem "Xx") munch1 (flip elem "0123456789ABCDEFabcdef") ] let unsigned = satisfy (flip elem "Uu") let long = string "l" +++ string "L" +++ string "ll" +++ string "LL" (unsigned >> long >> return ()) +++ (optional long >> optional unsigned) eof 
\$\endgroup\$
10
  • 1
    \$\begingroup\$ Suggested falsey test cases: 1L1L, 0xabucdlu (or any other test case with an l/L/u somewhere in the middle, making it invalid). \$\endgroup\$ Commented Oct 14, 2020 at 8:33
  • 2
    \$\begingroup\$ Suggested test case for floating point values \$\endgroup\$ Commented Oct 14, 2020 at 8:46
  • 1
    \$\begingroup\$ Suggested test-case: 2-1 (starts with a digit and is a valid C constant-expression, but not a bare integer literal). So for example feeding a=2-1; or a[2-1]; to a C compiler wouldn't reject it. (Working on a bash answer that uses cc -c after testing the first digit, trying to let a compiler do the heavy lifting.) \$\endgroup\$ Commented Oct 14, 2020 at 12:56
  • 3
    \$\begingroup\$ Suggested test case: 0o765. This is a valid octal literal in many languages that might try to get away with a built-in "eval" / "read-int" sort of approach, but it's not valid C. \$\endgroup\$ Commented Oct 14, 2020 at 16:21
  • 2
    \$\begingroup\$ This challenge looks outdated so quickly. Modern C supports 0b and 0o literal prefixes now. \$\endgroup\$ Commented Sep 5 at 8:29

15 Answers 15

9
\$\begingroup\$

Retina 0.8.2, 60 59 bytes

i`^(0[0-7]*|0x[\da-f]+|[1-9]\d*)(u)?(l)?(?-i:\3?)(?(2)|u?)$ 

Try it online! Link includes test cases. Edit: Saved 1 byte thanks to @FryAmTheEggMan. Explanation:

i` 

Match case-insensitively.

^(0[0-7]*|0x[\da-f]+|[1-9]\d*) 

Start with either octal, hex or decimal.

(u)? 

Optional unsigned specifier.

(l)? 

Optional length specifier.

(?-i:\3?) 

Optionally repeat the length specifier case sensitively.

(?(2)|u?)$ 

If no unsigned specifier yet, then another chance for an optional specifier, before the end of the literal.

\$\endgroup\$
1
  • 3
    \$\begingroup\$ You can use \d in the hex character class, too. \$\endgroup\$ Commented Oct 14, 2020 at 0:22
6
\$\begingroup\$

C# (.NET Core), 197 191 bytes

@nwellnhof shaved 6bytes:

using c=System.Console;class P{static void Main(){c.WriteLine(System.Text.RegularExpressions.Regex.IsMatch(c.ReadLine(),@"^(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)$"));}} 

Original:

using c=System.Console;using System.Text.RegularExpressions;class P{static void Main(){c.WriteLine(Regex.IsMatch(c.ReadLine(),@"^(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)$"));}} 

Try it online!

\$\endgroup\$
3
  • 1
    \$\begingroup\$ Nice first answer, welcome to the site! \$\endgroup\$ Commented Oct 14, 2020 at 13:45
  • 1
    \$\begingroup\$ Since Regex is used only once, you can write System.Text.RegularExpressions.Regex and remove the using statement, saving 6 bytes. \$\endgroup\$ Commented Oct 16, 2020 at 13:38
  • \$\begingroup\$ @nwellnhof Thanks for noticing! \$\endgroup\$ Commented Oct 16, 2020 at 23:24
5
\$\begingroup\$

Perl 5 -p, 65 61 bytes

@NahuelFouilleul shaved 4 bytes

$_=/^(0[0-7]*|0x\p{Hex}+|[1-9]\d*)(u?l?l?|l?l?u?)$/i*!/lL|Ll/ 

Try it online!

\$\endgroup\$
2
  • \$\begingroup\$ could save 2 bytes using l?l? instead of l{0,2} \$\endgroup\$ Commented Oct 14, 2020 at 9:24
  • 1
    \$\begingroup\$ 61 bytes \$\endgroup\$ Commented Oct 14, 2020 at 9:50
5
\$\begingroup\$

Java 8 / Scala polyglot, 89 79 bytes

s->s.matches("(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\\d*|0x[\\da-f]+)(u?l?l?|l?l?u?)") 

-10 bytes thanks to @NahuelFouilleul

Try it online in Java 8.
Try it online in Scala (except with => instead of -> - thanks to @TomerShetah).

Explanation:

s-> // Method with String parameter and boolean return-type s.matches( // Check whether the input-string matches the regex "(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\\d*|0x[\\da-f]+)(u?l?l?|l?l?u?)") 

Regex explanation:

In Java, the String#matches method implicitly adds a leading and trailing ^...$ to match the entire string, so the regex is:

^(?!.*(Ll|lL))(?i)(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)$ 
 (?! ) # The string should NOT match: ^ .* # Any amount of leading characters ( ) # Followed by: Ll # "Ll" |lL # Or "lL" # (Since the `?!` is a negative lookahead, it acts loose from the # rest of the regex below) (?i) # Using case-insensitivity, ^ ( # the string should start with: 0 # A 0 [0-7]* # Followed by zero or more digits in the range [0,7] | # OR: [1-9] # A digit in the range [1,9] \d* # Followed by zero or more digits | # OR: 0x # A "0x" [ ]+ # Followed by one or more of: \d # Digits a-f # Or letters in the range ['a','f'] )( # And with nothing in between, )$ # the string should end with: u? # An optional "u" l?l? # Followed by no, one, or two "l" | # OR: l?l? # No, one, or two "l" u? # Followed by an optional "u" 
\$\endgroup\$
7
  • 2
    \$\begingroup\$ 79 bytes \$\endgroup\$ Commented Oct 14, 2020 at 10:03
  • \$\begingroup\$ @NahuelFouilleul Ah, smart way to use the case-insensitivity after we've checked the Ll/lL. Didn't even knew that was possible. Thanks! \$\endgroup\$ Commented Oct 14, 2020 at 10:20
  • 2
    \$\begingroup\$ The same work for scala: Try it online! \$\endgroup\$ Commented Oct 14, 2020 at 10:53
  • \$\begingroup\$ @TomerShetah Thanks for mentioning. I've added it as a polyglot. :) \$\endgroup\$ Commented Oct 14, 2020 at 11:02
  • \$\begingroup\$ It's also a Java/Kotlin polyglot, since Kotlin also uses a -> and Scala uses => \$\endgroup\$ Commented Oct 14, 2020 at 12:38
4
\$\begingroup\$

Python 3, 103 bytes

import re;re.compile("^(0[0-7]*|[1-9]\d*|0[xX][\dA-Fa-f]+)([uU](L|l|LL|ll)?|(L|l|LL|ll)[uU]?)?$").match 

Try it online!

just a basic regex, probably very suboptimal

returns a match object for truthy and None for falsy; input may not contain surrounding whitespace

-3 bytes thanks to Digital Trauma (on my Retina answer)
-1 byte thanks to FryAmTheEggman (on my Retina answer)
-3 bytes thanks to pxeger

\$\endgroup\$
3
  • 2
    \$\begingroup\$ This is why regexes are so fun. \$\endgroup\$ Commented Oct 14, 2020 at 0:02
  • \$\begingroup\$ 103 bytes \$\endgroup\$ Commented Oct 14, 2020 at 19:25
  • \$\begingroup\$ @pxeger Oh cool, thanks! \$\endgroup\$ Commented Oct 14, 2020 at 19:55
3
\$\begingroup\$

Retina 0.8.2, 73 bytes

^(0[0-7]*|[1-9]\d*|0[xX][\dA-Fa-f]+)([uU](L|l|LL|ll)?|(L|l|LL|ll)[uU]?)?$ 

Try it online!

Just the same regex I used. First time using Retina, I'm sure this can be optimized with some Retina golf things!

-3 bytes thanks to Digital Trauma
-1 byte thanks to FryAmTheEggman

\$\endgroup\$
2
  • \$\begingroup\$ Also never used Retina, but 55 bytes? \$\endgroup\$ Commented Oct 14, 2020 at 0:07
  • \$\begingroup\$ @cairdcoinheringaahing I thought of that; unfortunately, no. but thanks for trying :P \$\endgroup\$ Commented Oct 14, 2020 at 0:07
3
\$\begingroup\$

C (clang), 207 200 bytes

u;l;o;f(char*s){l=*s-48?12:(32|*++s)=='x'?s++,0:14;if(l<14&&!*s)return 0;s+=strspn(s,"ABCDEFabcdef9876543210"+l);for(u=l=0;o=*s&~32;s++)if(o==76&&!l++)s[1]==*s&&s++;else if(u++||o-85)break;return!*s;} 

Try it online!

A C solution for a C challenge.

Takes input as a pointer to a string, and returns 1/0 for truthy/falsy.

Explained:

int u, l; int f(char*s){ /* assign offset for strtok */ l = (*s-48) /* if the first character is not '0': */ ? 12 /* use decimal "9876543210" */ : (32|*++s) == 'x' /* else if the first two characters are "0x"/"0X": */ ? s++, 0 /* use all characters (l = 0) */ : 14; /* else use octal */ /* handle the case when the string is empty. * This is also the case when a "0" is input, because of the ++s above * If this is the case, we are using the octal offset, so check that as well */ if(l < 14 && !*s) return 0; /* skip past all the digits that we consider valid */ s += strspn(s, "ABCDEFabcdef9876543210" + l); /* handle the U and L part */ /* u and l are used to count if we have a u or an l yet */ /* o contains the current character, uppercased */ for(u=l=0; o=*s & ~32; s++) if(o == 76 && !l++) /* if it's an L, and we don't already have an L: */ s[1] == *s && s++;/* if the next character is exactly the same (either ll or LL, then advance the string so it doesn't get counted as 2 l's */ else if(o-85||u) /* else, if we're not a U, or we already have a U, then it's a fail */ break; else u++; /* otherwise it's a U */ /* if, after removing the digits and counting the L's and U's, the string is empty so *s will be 0 */ /* if it failed, then *s isn't zero */ return !*s; } 
\$\endgroup\$
2
  • \$\begingroup\$ 190 bytes \$\endgroup\$ Commented Sep 5 at 18:13
  • \$\begingroup\$ 185 bytes \$\endgroup\$ Commented Sep 7 at 22:11
3
\$\begingroup\$

C, 121 bytes

u;f(char*s)char*p;strtol(s,&p,0);p+=u=(*p32)==117;p+=*p==p[1];p+=(*p|32)==108;p+=!u*(*p|32)==117;return(*s^48)<10*!*p;} 

Since input is encoded in ASCII, we use numeric codepoints so that this works regardless of compilation character set.

How it works:

We use library function strtol() to set p to end of numeric part, then advance it past u and l suffixes and check that it's the end of string. Also test that the string begins with a digit (since strtol() will consume leading whitespace and sign).

int f(const char *s) { const char *p; (void)strtol(s, &p, 0); int u = 0; if (tolower(*p) == 'u') ++u,++p; if (*p == *(p+1)) ++p; if (tolower(*p) == 'l') ++p; if (!u && tolower(*p) == 'u') ++p; return *p == '\0' && *s >= '0' && *s <= '9'; } 

Try it online!

\$\endgroup\$
6
  • \$\begingroup\$ 111 bytes \$\endgroup\$ Commented Sep 5 at 18:43
  • \$\begingroup\$ It looks like using wcstol instead of wcstoul works for the given examples. \$\endgroup\$ Commented Sep 5 at 18:48
  • \$\begingroup\$ @jdt, that's not portable (compilation error when I tried it in GCC), so probably needs entering as a different language. \$\endgroup\$ Commented Sep 6 at 6:50
  • \$\begingroup\$ 116 if you want to stick to gcc. \$\endgroup\$ Commented Sep 9 at 12:31
  • \$\begingroup\$ 113 \$\endgroup\$ Commented Sep 10 at 12:46
2
\$\begingroup\$

Charcoal, 76 bytes

≔⊟Φ³¬⌕↧θ…0xιη≔✂↧θη⁻LθL⊟Φ⪪”{“↧←;⭆δa”¶⁼ι↧…⮌θLι¹ζ›∧⁺Lζ¬⊖η⬤ζ№E∨×⁸ηχ⍘λφι∨№θLl№θlL 

Try it online! Link is to verbose version of code. Explanation:

≔⊟Φ³¬⌕↧θ…0xιη 

Find the length of the longest prefix of 0x in the lowercased input.

≔✂↧θη⁻LθL⊟Φ⪪”{“↧←;⭆δa”¶⁼ι↧…⮌θLι¹ζ 

Slice off the prefix and also check for a lowercase suffix of ull, ul, llu or lu, and if so then slice that off as well.

›...∨№θLl№θlL 

The original input must not contain Ll or lL.

∧⁺Lζ¬⊖η 

The sliced string must not be empty unless the prefix was 0.

⬤ζ№E∨×⁸ηχ⍘λφι 

Convert the prefix length to 10, 8 or 16 appropriately, then take that many base 62 digits and check that all of the remaining lowercased characters are one of those digits.

\$\endgroup\$
2
\$\begingroup\$

05AB1E, 63 61 62 bytes

„Uuõª„LLæDl«âDí«JéRʒÅ¿}нõ.;Ðć_ilDć'xQiA6£мÐþQë\7ÝKõQë\þQ}sõÊ* 

This isn't too easy without regexes.. :/ Can definitely be golfed a bit more, though.

+1 byte as bug-fix for inputs like "u", "l", "LL", etc. (thanks for noticing @Neil)

Try it online or verify all test cases.

Explanation:

„Uu # Push string "Uu" õª # Convert it to a list of characters, and append an empty string: # ["U","u",""] „LL # Push string "LL" æ # Take its powerset: ["","L","L","LL"] Dl # Create a lowercase copy: ["","l","l","ll"] « # Merge the lists together: ["","L","L","LL","","l","l","ll"] â # Create all possible pairs of these two lists Dí # Create a copy with each pair reversed « # Merge the list of pairs together J # Join each pair together to a single string éR # Sort it by length in descending order 

We now have the list:

["llu","LLu","llU","LLU","ull","uLL","Ull","ULL","ll","LL","lu","lu","Lu","Lu","lU","lU","LU","LU","ll","LL","ul","ul","uL","uL","Ul","Ul","UL","UL","l","l","L","L","u","u","U","U","l","l","L","L","u","u","U","U","","","",""] 
ʒ # Filter this list by: Å¿ # Where the (implicit) input ends with this string }н # After the filter: only leave the first (longest) one õ.; # And remove the first occurrence of this in the (implicit) input ÐD # Triplicate + duplicate (so there are 4 copies on the stack now) ć # Extract head; pop and push remainder-string and first character # separated to the stack _i # If this first character is a 0: l # Convert the remainder-string to lowercase D # Duplicate it †¹ ć # Extract head again 'xQi '# If it's equal to "x": A # Push the lowercase alphabet 6£ # Only leave the first 6 characters: "abcdef" м # Remove all those characters from the string Ð # Triplicate it †² þ # Only keep all digits in the copy Q # And check that the two are still the same # (thus it's a non-negative integer without decimal .0s) ë # Else: \ # Discard the remainder-string 7Ý # Push list [0,1,2,3,4,5,6,7] K # Remove all those digits õQ # Check what remains is an empty string ë # Else: \ # Discard the remainder-string þ # Only keep all digits Q # And check that the two are still the same # (thus it's a non-negative integer without decimal .0s) }s # After the if-else: Swap the two values on the stack # (this will get the remaining copy of †² for "0x" cases, # or the remaining copy of †¹ for other cases) õÊ # Check that this is NOT an empty string * # And check that both are truthy # (after which the result is output implicitly) 
\$\endgroup\$
2
  • 1
    \$\begingroup\$ This incorrectly outputs 1 for u... \$\endgroup\$ Commented Oct 14, 2020 at 14:15
  • \$\begingroup\$ @Neil Thanks for noticing. Fixed at the cost of 1 byte. \$\endgroup\$ Commented Oct 14, 2020 at 14:21
2
\$\begingroup\$

AWK, 86 bytes

{print/^(0[0-7]*|[1-9][0-9]*|0[xX][0-9A-Fa-f]+)([uU](L|l|LL|ll)?|(L|l|LL|ll)[uU]?)?$/} 

Try it online!

Simply prints truthy or falsey depending on whether or not the input line matches the regex. Doesn't accept leading or trailing whitespaces.

\$\endgroup\$
2
\$\begingroup\$

Elixir, 74 bytes

&(&1=~~r/^(0[0-7]*|[1-9]\d*|0x[\da-f]+)(u?l?l?|l?l?u?)?$/i&&!(&1=~~r/Ll/)) 

Try it online!

\$\endgroup\$
2
\$\begingroup\$

JavaScript (ES6),  77  76 bytes

Saved 1 byte thanks to @l4m2

s=>/^(0x[\da-f]+|0[0-7]*|[1-9]\d*)(u?l?l?|l?l?u?)$/i.test(s)>/Ll|lL/.test(s) 

Try it online!

How?

The first regex is case-insensitive. The only invalid patterns that cannot be filtered out that way are "Ll" and "lL". So we use a 2nd case-sensitive regex to take care of them.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ &! => >.... \$\endgroup\$ Commented Mar 19, 2021 at 13:30
2
\$\begingroup\$

Janet, 112 bytes

|(peg/match~(cmt(*(+(*(+"0X""0x"):h+)(*"0"(any(range"07"))):d+)(+"U""u"'0)(+"LL""ll""L""l"0)(+"U""u"'0)-1),or)$) 

Janet’s PEGs are a tiiny bit more verbose than regexes :P

I’m using quite a dirty trick to prevent things like 42Ulu from matching. The pattern (+"U""u"'0) matches an optional U or u, but captures an empty string if the U/u is not present. The entire pattern is then wrapped in a cmt with the or function to check if something was captured.

\$\endgroup\$
1
\$\begingroup\$

Haskell, 169 bytes

import Data.Char s!p=s>""&&dropWhile p s`elem`do u<-["","u","U"];l<-"":words"L l LL ll";[u++l,l++u] f('0':x:s)|elem x"xX"=s!isHexDigit|1<2=(x:s)!isOctDigit f s=s!isDigit 

Try it online!

\$\endgroup\$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.