3
$\begingroup$

I have a text file containing many, many lines of text like test in the following:

test = "word 123 456 7890.000 0.12000"; 

I would like to extract all of the "string representations of integers." However, I need to be clear about what I would like. In test above, I would like the output to be:

{"123", "456"} 

since I am only interested in actual, isolated (delimited by spaces) string representations of integers. Yes, 7890 is an integer, but in test above, it is not isolated, so I do not want my function to return it (since 7890.000 is a decimal).

In the case of test, I could use this:

StringCases[test, Repeated[DigitCharacter, 3]][[1 ;; 2]] 

which returns

{"123", "456"} 

However, this is not general, because my string may contain more than two string representations of integers. So I would like my function to also take this input:

test = "word 123 456 123 7890.000 0.12000"; 

and return:

{"123", "456", "123"} 

I have thought about using StringSplit followed by ToExpression and IntegerQ, but this seems like it would be very (unnecessarily?) complicated. Perhaps Mathematica has something better built in that I can use?

Do you have any advice? Thanks!

$\endgroup$

5 Answers 5

7
$\begingroup$

I quite like Mathematica's StringExpression. A bit longer than regular expressions but easier to read.

StringCases[test, " " ~~ d : DigitCharacter.. ~~ " " -> d] 

{"123", "456", "123"}

(Of course this does require two spaces between integers).

$\endgroup$
1
  • 1
    $\begingroup$ You don't actually need two spaces since you can set the option Overlaps->True. $\endgroup$ Commented Aug 1, 2012 at 18:56
7
$\begingroup$

Regular Expressions are your friend:

StringTrim[StringCases["word 123 456 123 7890.000 0.12000", RegularExpression[" \\d+ "]]] 

This returns

{"123", "456", "123"} 
$\endgroup$
6
$\begingroup$

You can do it with Select and StringCases with NumberString as:

Select[StringCases[test, NumberString], IntegerQ@ToExpression@# &] (* {"123", "456", "123"} *) 

Add a Rationalize to the test if you also need integer numbers with the head Real.

$\endgroup$
1
$\begingroup$
test = "word 123 456 123 7890.000 0.12000"; 

Using Position and Extract

Extract[#, Position[#, _?(IntegerQ @ ToExpression @ # &)]] & @ StringSplit[test] 

{"123", "456", "123"}

$\endgroup$
1
$\begingroup$
test = "word 123 456 123 7890.000 0.12000" StringSplit[test] // Select[IntegerQ@*ToExpression] SemanticImportString[test] // Normal // First // Select[IntegerQ] // Map[ToString] With [{t = TextWords[test]}, t // StringContainsQ[Except[DigitCharacter]] // Pick[t, #, False] &] 

{"123", "456", "123"}

$\endgroup$
2
  • $\begingroup$ +1 - You don't need the Whitespace (1st answer) and the 2nd answer needs a ToString /@ ... $\endgroup$ Commented Apr 6, 2024 at 6:55
  • 1
    $\begingroup$ Thanks eldo, I have updated. $\endgroup$ Commented Apr 6, 2024 at 6:59

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.