I’m using Rails 4.2.7. I’m currently using the following logic to parse a doc with Nokogiri:
content.xpath("//pre[@class='text-results']").xpath('text()').to_s In my HTML document, this content appears within my “text-results” block:
<pre class="text-results"><html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=Title content="<p><a href=http://mychiptime"> <meta name=Keywords content=""> <meta http-equiv=Content-Type content="text/html; charset=macintosh”>… I include this section because my parsing dies with the following error:
Error during processing: unknown encoding name - macintosh /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node.rb:627:in `find' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node.rb:627:in `serialize' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node.rb:786:in `to_format' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node.rb:642:in `to_html' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node.rb:512:in `to_s' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:187:in `block in each' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:186:in `upto' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:186:in `each' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:218:in `map' /Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:218:in `to_s' /Users/davea/Documents/workspace/myproject/app/services/onlinerr_service.rb:8:in `pre_process_data' /Users/davea/Documents/workspace/myproject/app/services/abstract_import_service.rb:77:in `process_my_object_data' /Users/davea/Documents/workspace/myproject/app/services/onlinerr_my_object_finder_service.rb:82:in `process_my_object_link' /Users/davea/Documents/workspace/myproject/app/services/abstract_my_object_finder_service.rb:29:in `block in process_data' /Users/davea/Documents/workspace/myproject/app/services/abstract_my_object_finder_service.rb:28:in `each' /Users/davea/Documents/workspace/myproject/app/services/abstract_my_object_finder_service.rb:28:in `process_data' /Users/davea/Documents/workspace/myproject/app/services/run_crawlers_service.rb:18:in `block in run_all_crawlers' /Users/davea/.rvm/gems/ruby-2.3.0/gems/activerecord-4.2.7.1/lib/active_record/relation/delegation.rb:46:in `each' /Users/davea/.rvm/gems/ruby-2.3.0/gems/activerecord-4.2.7.1/lib/active_record/relation/delegation.rb:46:in `each' /Users/davea/Documents/workspace/myproject/app/services/run_crawlers_service.rb:5:in `run_all_crawlers' Is there any way to make Nokogiri ignore this unknown encoding? I’m trying to get the content inside the <pre> tag as text, so I don’t need it parsed further.
I'm on Mac El Capitan. Per the comment, here's my locale settings:
davea$ locale LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL=
localetoutf-8, see: perlgeek.de/en/article/set-up-a-clean-utf8-environmentexport LC_ALL="en_US.UTF-8"before you runrails sand see if you have same problem