10

I have a splunk query something like

index=myIndex* source="source/path/of/logs/*.log" "Elephant" 

Thus, this brings up about 2,000 results which are JSON responses from one of my APIs that include the world "Elephant". This is kind of what I want - However, some of these results have duplicate carId fields, and I only want Splunk to show me the unique search results

The Results of Splunk looks something like this:

MyApiRequests {"carId":3454353435,"make":"toyota","year":"2015","model":"camry","value":25000.00} 

NOW, I just want to filter on the carId's that are unique. I don't want duplicates. Thus, I would expect the original value of 2,000 results to decrease quite a bit.

Can anyone help me formulate my Splunk Query to achieve this?

2 Answers 2

12

stats will be your friend here.

Consider the following:

index=myIndex* source="source/path/of/logs/*.log" "Elephant" carId=* | stats values(*) as * by carId 
Sign up to request clarification or add additional context in comments.

5 Comments

Interesting. When I try this, I get 0 results back.
This answer and @Mads Hansen's presume the carId field is extracted already. If it isn't the neither query will work. The fields can be extracted automatically by specifying either INDEXED_EXTRACTION=JSON or KV_MODE=json in props.conf. Otherwise, you can use the spath command in a query. Either way, the JSON must be in the correct format. For improper JSON, you can use rex to extract fields.
@RichG - ennth indicated the field seems to be "available" already
Yes, if you do "fields carId" or the "carId=*" as the post stated, it will automatically extract the field "carId" with those values. You can see it if you go to the left side bar of your splunk, it will be extracted there . For some reason, I can only get this to work with results in my _raw area that are in the key=value format. The only thing I can't figure out now is that stats(values) never returns Unique values for me, despite everyone saying it returns only unique values.
@ennth - are you sure you have the spelling on the field name correct?
9

You could use dedup

index=myIndex* source="source/path/of/logs/*.log" "Elephant" | dedup carId 

2 Comments

Okay I tried piping the results (which there was 2000) into dedup and I get 0 events as the results.... I expected to get a filtered list of the results back. I'm assuming if I had, say 5 duplicates, this would have got returned to me... So Is this how dedup works?
You can use dedup. But you generally shouldn't. It's a very inefficient operation in Splunk

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.