2

I'm using componentsSeparatedByString on String to split a long String into a String array using the comma (,) as the string to split on. The problem is, one component is the comma character. For example, the string is "a,b,c,,,1,2,3". After calling componentsSeparatedByString the array is ["a", "b", "c", "", "", "1", "2", "3"] but I need it to be ["a", "b", "c", ",", "1", "2", "3"]. Luckily I can modify the string but I really don't want to change all of the commas to a different character. Is there a way I can 'escape' the comma I need as a component such that componentsSeparatedByString won't split on that middle one?

I tried replacing it with \u{002C}, but it was smarter than that. Still interprets that as a comma so it splits on it.

2
  • "a,b,c,,,1,2,3" should parse to ["a", "b", "c", "", "", "", "1", "2", "3"]. That just makes sense. Perhaps you should use a proper CSV format and a proper CSV parser library? Which means you would quote any item that contains the delimiter character. a,b,c,",",1,2,3. Like this library perhaps: github.com/naoty/SwiftCSV Commented Jul 4, 2015 at 19:05
  • You will have a problem: how will act your algorithm with this: "a,,,,b"? How can it knows if the result is: ["a", ",", "", "b"] or ["a", "", ",", "b"]? Commented Jul 4, 2015 at 19:10

1 Answer 1

1

If instead of componentsSeparatedByString you do it using the split method from the standard library, you can use a stateful closure to do it in a fairly hacky way by remembering if the last element was a comma and then not splitting on two in a row:

let s = "a,b,c,,,1,2,3" var lastWasComma = false let array = split(s.characters) { (c: Character)->Bool in if c == "," { lastWasComma = !lastWasComma } else { lastWasComma = false } return lastWasComma }.map(String.init) debugPrint(array) // prints ["a", "b", "c", ",", "1", "2", "3"] 

(this is for 2.0 - if you’re on 1.2, drop the .characters and the map from the end, since strings are directly sliceable before 2.0)

Sign up to request clarification or add additional context in comments.

5 Comments

This is clever, I like clever. Wonder how performance is compared to componentsSeparatedByString.
As I mentioned in my last comment, there is a problem on strings such as "a,,,,b"? How can you know if the result is: ["a", ",", "", "b"] or ["a", "", ",", "b"]?
The split will probably be more efficient. It creates subslices of the original string, referencing the original memory, whereas componentsSeparatedByString creates a string object for each component. The only risk is memory leakage if you leave just one of those subslices alive, they cause the entire string to remain in memory.
@Zaphod since that’s fundamentally ambiguous, I think you have to make an assumption such as “empty strings aren’t allowed” or “only single characters”. Otherwise, you'd have to have an escaping mechanism rather than a simple “three in a row” rule.
This also works: lastWasComma = c == "," && !lastWasComma; return lastWasComma

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.