0

I'm looking for some pointers on mapping a somewhat dynamic structure for consumption by Elasticsearch.

The raw structure itself is json, but the problem is that a portion of the structure contains a variable, rather than the outer elements of the structure being static.

To provide a somewhat redacted example, my json looks like this:

"stat": { "state": "valid", "duration": 5, }, "12345-abc": { "content_length": 5, "version": 2 } "54321-xyz": { "content_length": 2, "version", 1 } 

The first block is easy; Elasticsearch does a great job of mapping the "stat" portion of the structure, and if I were to dump a lot of that data into an index it would work as expected. The problem is that the next 2 blocks are essentially the same thing, but the raw json is formatted in such a way that a unique element has crept into the structure, and Elasticsearch wants to map that by default, generating a map that looks like this:

"stat": { "properties": { "state": { "type": "string" }, "duration": { "type": "double" } } }, "12345-abc": { "properties": { "content_length": { "type": "double" }, "version": { "type": "double" } } }, "54321-xyz": { "properties": { "content_length": { "type": "double" }, "version": { "type": "double" } } } 

I'd like the ability to index all of the "content_length" data, but it's getting separated, and with some of the variable names being used, when I drop the data into Kibana I wind up with really long fieldnames that become next to useless.

Is it possible to provide a generic tag to the structure? Or is this more trivially addressed at the json generation phase, with our developers hard coding a generic structure name and adding an identifier field name.

Any insight / help greatly appreciated.

Thanks!

2
  • I don't understand what's the desired behavior. For the sample you provided, what's the expected mapping ES should create by its own? Commented Jul 19, 2016 at 21:10
  • I wish to be able to aggregate all the datasets for the same structure, but the raw json gives the structure variable names. This also means that if the json had some curious problem, like a numerical value wrapped in quotes I wouldn't be able to over-write the mapping without knowing ahead of time what the structures are going to look like, which doesn't seem practical. With that said, I'm increasingly thinking this is due to the raw json structure and that's where I'll need to make changes, as per ajaeles comment below. Commented Jul 20, 2016 at 13:38

1 Answer 1

1

If those keys like 12345-abc are generated and possibly infinite values, it will get hard (if not impossible) to do some useful queries or aggregations. It's not really clear which exact use case you have for analyzing your data, but you should probably have a look at nested objects (https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html) and generate your input json accordingly to what you want to query for. It seems that you will have better aggregation results if you put these additional objects into an array with a special field containing what is currently your key.

{ "stat": ..., "things": [ { "thingkey": "12345-abc", "content_length": 5, "version": 2 }, ... ] } 
Sign up to request clarification or add additional context in comments.

2 Comments

The structures represent nodes in a distributed system, so aggregation is key, and as we agree, will be broken as is. I've looked at nested structures quite a bit online but all the documentation points to structures which are statically named, like the "stat" example. I think I need to have a discussion with the guys generating the json to see what options we have. If the "things" were given a generic name as per your example I could still isolate specific entities in Kibana with filters and term restrictions. Thanks for confirming my suspicion.
I've spent further time playing with things, and breaking values out of keys from the structure massively simplified things, as per this suggestion here. The next problem was Kibana doesn't seem to support nested queries once I had this data properly in elasticsearch... sigh. Thanks for the quick turnaround ajaegle

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.