我使用的是Ruby 2.3的Rails 4.2.7。我有这条线
words = line.split(/s+/)
在下面产生错误
Error during processing: (ArgumentError) invalid byte sequence in UTF-8
/Users/davea/Documents/workspace/myproject/app/services/text_table_to_my_object_time_converter_service.rb:92:in `split'
/Users/davea/Documents/workspace/myproject/app/services/text_table_to_my_object_time_converter_service.rb:92:in `get_headers'
/Users/davea/Documents/workspace/myproject/app/helpers/webpage_helper.rb:216:in `block in get_my_object_times_from_table_data'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:187:in `block in each'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:186:in `upto'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/nokogiri-1.6.8.1/lib/nokogiri/xml/node_set.rb:186:in `each'
/Users/davea/Documents/workspace/myproject/app/helpers/webpage_helper.rb:209:in `each_with_index'
/Users/davea/Documents/workspace/myproject/app/helpers/webpage_helper.rb:209:in `get_my_object_times_from_table_data'
/Users/davea/Documents/workspace/myproject/app/services/single_table_data_service.rb:31:in `process_page_data'
/Users/davea/Documents/workspace/myproject/app/services/abstract_import_service.rb:84:in `process_my_object_data'
/Users/davea/Documents/workspace/myproject/app/services/running_in_the_usa_my_object_finder_service.rb:183:in `process_my_object_link'
/Users/davea/Documents/workspace/myproject/app/services/abstract_my_object_finder_service.rb:29:in `block in process_data'
/Users/davea/Documents/workspace/myproject/app/services/abstract_my_object_finder_service.rb:28:in `each'
/Users/davea/Documents/workspace/myproject/app/services/abstract_my_object_finder_service.rb:28:in `process_data'
/Users/davea/Documents/workspace/myproject/app/services/run_crawlers_service.rb:18:in `block in run_all_crawlers'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/activerecord-4.2.7.1/lib/active_record/relation/delegation.rb:46:in `each'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/activerecord-4.2.7.1/lib/active_record/relation/delegation.rb:46:in `each'
/Users/davea/Documents/workspace/myproject/app/services/run_crawlers_service.rb:5:in `run_all_crawlers'
(irb):2:in `irb_binding'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/workspace.rb:87:in `eval'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/workspace.rb:87:in `evaluate'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/context.rb:380:in `evaluate'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:489:in `block (2 levels) in eval_input'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:623:in `signal_status'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:486:in `block in eval_input'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:246:in `block (2 levels) in each_top_level_statement'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:232:in `loop'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:232:in `block in each_top_level_statement'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:231:in `catch'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb/ruby-lex.rb:231:in `each_top_level_statement'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:485:in `eval_input'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:395:in `block in start'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:394:in `catch'
/Users/davea/.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/irb.rb:394:in `start'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/railties-4.2.7.1/lib/rails/commands/console.rb:110:in `start'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/railties-4.2.7.1/lib/rails/commands/console.rb:9:in `start'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:68:in `console'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/railties-4.2.7.1/lib/rails/commands/commands_tasks.rb:39:in `run_command!'
/Users/davea/.rvm/gems/ruby-2.3.0/gems/railties-4.2.7.1/lib/rails/commands.rb:17:in `<top (required)>'
该线包含一个重音" A",我认为这会导致问题。是否有一种方法可以自动检测编码,然后再应用拆分。数据源自网络,但我想尽可能地保留角色与在线显示的角色。
如果您不知道编码是什么,则很难保持您的重音字符。这个:
words = line.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "?").split(/s+/)
将用?
替换它们,并防止您的分裂丢弃错误。
您是否知道编码可能是什么?iso-8859-1
也许?