1

I'm quite new to Swift and native programming, and for a small project I'm doing for myself I'm getting in the full html after doing a twitter search, and I'm trying to filter out just the text of the first tweet. I'm up to the point were I'm able to get the first tweet, including all the tags that are in there, but I'm a bit clueless on how to filter just the text out of there and remove the HTML elements.

For example, it's pretty easy to take a single tweet and filter out the possible <a href=""> and <span> etc. But when I'd change the tweet or search, it wouldnt work as specific. The thing I'm looking for really is on how to remove everything in a string that starts with < and ends with >. This way I'm able to filter out all the stuff I don't need in my string. I'm using "string.componentsSeparatedByString()" to grab the one tweet I need out of all the HTML, but I can't use this method to filter all the stuff out of my string.

Please bear with me since I'm quite new at this, I'm aware that I'm possibly not even doing this right at all and there's a way easier method to pull a single tweet instead of all this hassle. If so, please let me know as well.

2 Answers 2

4

You can create a function to do it for you as follow:

func html2String(html:String) -> String { return NSAttributedString(data: html.dataUsingEncoding(NSUTF8StringEncoding)!, options:[NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string } 

or as an extension:

extension String { var html2String:String { return NSAttributedString(data: dataUsingEncoding(NSUTF8StringEncoding)!, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string } var html2NSAttributedString:NSAttributedString { return NSAttributedString(data: dataUsingEncoding(NSUTF8StringEncoding)!, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)! } } 

you might prefer as a NSData extension

extension NSData{ var htmlString:String { return NSAttributedString(data: self, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string } } 

or NSData as a function:

func html2String(html:NSData)-> String { return NSAttributedString(data: html, options: [NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType,NSCharacterEncodingDocumentAttribute:NSUTF8StringEncoding], documentAttributes: nil, error: nil)!.string } 

Usage:

"<div>Testing<br></div><a href=\"http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573\"><span>&nbsp;Hello World !!!</span>".html2String // "Testing\n Hello World !!!" let result = html2String("<div>Testing<br></div><a href=\"http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573\"><span>&nbsp;Hello World !!!</span>") // "Testing\n Hello World !!!" 

// lets load this html as String

import UIKit class ViewController: UIViewController { let questionLink = "http://stackoverflow.com/questions/27661722/removing-everything-between-a-certain-set-of-characters-with-swift/27662573#27662573" override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view, typically from a nib. if let questionUrl = NSURL(string: questionLink) { println("LOADING URL") if let myHtmlDataFromUrl = NSData(contentsOfURL: questionUrl){ println(myHtmlDataFromUrl.htmlString) } } } override func didReceiveMemoryWarning() { super.didReceiveMemoryWarning() // Dispose of any resources that can be recreated. } } 
Sign up to request clarification or add additional context in comments.

1 Comment

Okay so is there a way to keep majority of your tags and just take out a certain tag such as <span></span>
0

Quite a lot of values have changed in Swift over the last few years, so I just wanted to post an updated version of Leo Dabus' answer, updated to current Swift syntax.

extension String { func removeHTMLEncoding() throws -> String? { guard let data = self.data(using: .utf8) else { return nil } let attr = try NSAttributedString( data: data, options: [ .documentType: NSAttributedString.DocumentType.html, .characterEncoding: NSNumber(value: String.Encoding.utf8.rawValue) ], documentAttributes: nil ) return attr.string } } 

Kinda annoying that you still need to convert the string encoding value to an NSNumber - NSAttributedString is pretty out of date!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.