0

I'm trying to extract data from mongodb to Elasticsearch, getMongodoc = coll.find().limit(10) will find the first 10 entries in mongo.

As you can see , result = ec.mongoConn should get result from method mongoConn() in class MongoConnector. when I use p hsh(to examine the output is correct), it will print 10 entires, while p result = ec.mongoConn will print #<Enumerator: #<Mongo::Cursor:0x70284070232580 @view=#<Mongo::Collection::View:0x70284066032180 namespace='mydatabase.mycollection' @filter={} @options={"limit"=>10}>>:each>

I changed p hsh to return hsh, p result = ec.mongoConn will get the correct result, but it just prints the first entry not all 10 entries. it seems that the value of hsh did not pass to result = ec.mongoConn correctly, Can anyone tell me what am I doing wrong? is this because I did something wrong with method calling?

class MongoConncetor def mongoConn() BSON::OrderedHash.new client = Mongo::Client.new([ 'xx.xx.xx.xx:27017' ], :database => 'mydatabase') coll = client[:mycollection] getMongodoc = coll.find().limit(10) getMongodoc.each do |document| hsh = symbolize_keys(document.to_hash).select { |hsh| hsh != :_id } return hsh # p hsh end end class ElasticConnector < MongoConncetor include Elasticsearch::API CONNECTION = ::Faraday::Connection.new url: 'http://localhost:9200' def perform_request(method, path, params, body) puts "--> #{method.upcase} #{path} #{params} #{body}" CONNECTION.run_request \ method.downcase.to_sym, path, (( body ? MultiJson.dump(body) : nil)), {'Content-Type' => 'application/json'} end ec = ElasticConnector.new p result = ec.mongoConn client = ElasticConnector.new client.bulk index: 'myindex', type:'test' , body: result end 

1 Answer 1

1

You are calling return inside a loop (each). This will stop the loop and return the first result. Try something like:

getMongodoc.map do |document| symbolize_keys(document.to_hash).select { |hsh| hsh != :_id } end 

Notes:

  • In ruby you usually don't need the return keyword as the last value is returned automatically. Usually you'd use return to prevent some code from being executed
  • in ruby snake_case is used for variable and method names (as opposed to CamelCase or camelCase)
  • map enumerates a collection (by calling the block for every item in the collection) and returns a new collection of the same size with the return values from the block.
  • you don't need empty parens () on method definitions

UPDATE:

  • The data structure returned by MongoDB is a Hash (BSON is a special kind of serialization). A Hash is a collection of keys ("_id", "response") that point to values. The difference you point out in your comment is the class of the hash key: string vs. symbol
  • In your case a document in Mongo is represented as Hash, one hash per document
  • If you want to return multiple documents, then an array is required. More specifically an array of hashes: [{}, {}, ...]
  • If your target (ES) does only accept one hash at a time, then you will need to loop over the results from mongo and add them one by one:

list_of_results = get_mongo_data list_of_results.each do |result| add_result_to_es(result) end

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks, it did solve the return value problem, but when I use map, it will return an array instead a hash, the result looks like this: [{:response=>{:version=>"1.1", :statusCode=>302,.......] , which could not be inserted into elastisearch, the right one should be like : {:response=>{:version=>"1.1", :statusCode=>302,....... without [ ] , how can I fix this?
well, it seems you want to have an array because you have multiple results? So you can either: * use the first/last/... of the list of result: try .first, .last or [some_index] * loop over the result array and insert each item into ES * find a way to add multiple results at once to ES
actually I need a hash, maybe a hash map, cause body: in ES-ruby API only take hash as parameter, the result I got is an array which has all 10 entries, right now I think I need find a way to make method mongoConn return a hash. I have 60000 entries so it may not a good idea to loop through each of them, right?
either you have one entry (a hash, which has various key/value pairs) or you have multiple entries (an array). I think there is some confusion as to what data you read from Mongo and add to ES. Perhaps you can elaborate what you want to achieve?
the data from Mongo is something like this : {"_id"=>BSON::ObjectId('567ccbd747824a621d8b4567'), "response"=>{"version"=>"1.1", "statusCode"=>302,..........} which is BSON format, the data ES accept is like this : {:response=>{:version=>"1.1", :statusCode=>302,...} , what I did in the code was to delete "_id" and translated BSON to hash so that it could be added to ES. At the end of the code, ` body: result` body parameter only take hash, while the solution you gave returns an array: [{:response=>{:version=>"1.1", :statusCode=>302,.......} ]
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.