14

I have a string "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography" I want to split this string using the regex expression [0-9][0-9][0-9][A-Z][A-Z][A-Z] so that the function returns the array:

Array = ["323 ECO Economics Course ", "451 ENG English Course", "789 Mathematical Topography"] 

How would I go about doing this using swift?

Edit My question is different than the one linked to. I realize that you can split a string in swift using myString.components(separatedBy: "splitting string") The issue is that that question doesn't address how to make the splitting string a regex expression. I tried using mystring.components(separatedBy: "[0-9][0-9][0-9][A-Z][A-Z][A-Z]", options: .regularExpression) but that didn't work.

How can I make the separatedBy: portion a regular expression?

2
  • 1
    Perhaps you are looking at this wrong. Instead of trying to find a fancy way to "split" a string using a regex, why not simply use the NSRegularExpression class and its matches function to get all of the matches of your regex? Commented Feb 27, 2017 at 1:55
  • The answer already done below is a great answer, however, after reading your question, I thought you might find this useful. This is a Regex class written in Swift that can be dropped into your project. I've used it in multiple projects with great ease and success. gist.github.com/ningsuhen/dc6e589be7f5a41e7794 Commented Feb 27, 2017 at 2:06

4 Answers 4

12

You can use regex "\\b[0-9]{1,}[a-zA-Z ]{1,}" and this extension from this answer to get all ranges of a string using literal, caseInsensitive or regularExpression search:

extension StringProtocol { func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] { var result: [Range<Index>] = [] var startIndex = self.startIndex while startIndex < endIndex, let range = self[startIndex...].range(of: string, options: options) { result.append(range) startIndex = range.lowerBound < range.upperBound ? range.upperBound : index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex } return result } } 

let inputString = "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography" let courses = inputString.ranges(of: "\\b[0-9]{1,}[a-zA-Z ]{1,}", options: .regularExpression).map { inputString[$0].trimmingCharacters(in: .whitespaces) } print(courses) // ["323 ECO Economics Course", "451 ENG English Course", "789 Mathematical Topography"] 
Sign up to request clarification or add additional context in comments.

2 Comments

If your courses codes always have 3 digits and your string have at least 3 characters, you can use regex "\\b[0-9]{3}[a-zA-Z ]{3,}"
This is a nice clean solution. I like how you build an array of ranges and then use map to extract the substrings from the original string. Very elegant use of functional programming. (Voted)
6

Swift doesn't have native regular expressions as of yet. But Foundation provides NSRegularExpression.

import Foundation let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography" let pattern = "[0-9]{3} [A-Z]{3}" let regex = try! NSRegularExpression(pattern: pattern, options: []) // NSRegularExpression works with objective-c NSString, which are utf16 encoded let matches = regex.matches(in: toSearch, range: NSMakeRange(0, toSearch.utf16.count)) // the combination of zip, dropFirst and map to optional here is a trick // to be able to map on [(result1, result2), (result2, result3), (result3, nil)] let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in let range = current.rangeAt(0) let start = String.UTF16Index(range.location) // if there's a next, use it's starting location as the ending of our match // otherwise, go to the end of the searched string let end = next.map { $0.rangeAt(0) }.map { String.UTF16Index($0.location) } ?? String.UTF16Index(toSearch.utf16.count) return String(toSearch.utf16[start..<end])! } dump(results) 

Running this will output

▿ 3 elements - "323 ECO Economics Course " - "451 ENG English Course " - "789 MAT Mathematical Topography" 

1 Comment

+1 for for supplying utf-16 NSString encoding length. Worked for me when just count truncates the result by the grapheme length less the code-point length (which are usually the same in some languages.)
2

I needed something like this and should work more like JS String.prototype.split(pat: RegExp) or Rust's String.splitn(pat: Pattern<'a>) but with Regex. I ended up with this

extension NSRegularExpression { convenience init(_ pattern: String) {...} /// An array of substring of the given string, separated by this regular expression, restricted to returning at most n items. /// If n substrings are returned, the last substring (the nth substring) will contain the remainder of the string. /// - Parameter str: String to be matched /// - Parameter n: If `n` is specified and n != -1, it will be split into n elements else split into all occurences of this pattern func splitn(_ str: String, _ n: Int = -1) -> [String] { let range = NSRange(location: 0, length: str.utf8.count) let matches = self.matches(in: str, range: range); var result = [String]() if (n != -1 && n < 2) || matches.isEmpty { return [str] } if let first = matches.first?.range { if first.location == 0 { result.append("") } if first.location != 0 { let _range = NSRange(location: 0, length: first.location) result.append(String(str[Range(_range, in: str)!])) } } for (cur, next) in zip(matches, matches[1...]) { let loc = cur.range.location + cur.range.length if n != -1 && result.count + 1 == n { let _range = NSRange(location: loc, length: str.utf8.count - loc) result.append(String(str[Range(_range, in: str)!])) return result } let len = next.range.location - loc let _range = NSRange(location: loc, length: len) result.append(String(str[Range(_range, in: str)!])) } if let last = matches.last?.range, !(n != -1 && result.count >= n) { let lastIndex = last.length + last.location if lastIndex == str.utf8.count { result.append("") } if lastIndex < str.utf8.count { let _range = NSRange(location: lastIndex, length: str.utf8.count - lastIndex) result.append(String(str[Range(_range, in: str)!])) } } return result; } } 

Passes the following tests

func testRegexSplit() { XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love"), ["My", "Love"]) XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love . "), ["My", "Love", ""]) XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love"), ["", "My", "Love"]) XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love . "), ["", "My", "Love", ""]) XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX"), ["", "My", "", "Love", ""]) } func testRegexSplitWithN() { XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 1), ["xXMyxXxXLovexX"]) XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", -1), ["", "My", "", "Love", ""]) XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 2), ["", "MyxXxXLovexX"]) XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 3), ["", "My", "xXLovexX"]) XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 4), ["", "My", "", "LovexX"]) } func testNoMatches() { XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 1), ["MyLove"]) XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove"), ["MyLove"]) XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 3), ["MyLove"]) } 

2 Comments

I found this crashed for me if I provided a string that had 0 matches to the pattern. I ended up using this instead: gist.github.com/hcrub/218e1d25f1659d00b7f77aebfcebf15a
@Patrick I have fixed this and also added test cases for it
0

Even if the question is a bit older. This variant uses the RegexComponent from iOS 16 and macOS 13, which will allow you something like "a b c".split(regex: /\s+/):

public extension StringProtocol { func split<R>( regex: R, maxSplits: Int = Int.max, omittingEmptySubsequences: Bool = true ) -> [Substring] where R: RegexComponent, SubSequence == Substring { guard maxSplits > 0 else { return [ self[startIndex...] ] } var startIndex = startIndex var result = [Substring]() for m in matches(of: regex) { let substring = self[startIndex..<m.range.lowerBound] if !omittingEmptySubsequences || !substring.isEmpty { result.append(substring) } startIndex = m.range.upperBound if result.count >= maxSplits { break } } if startIndex < endIndex { result.append(self[startIndex...]) } return result } } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.