13

I'm trying to figure out how to model data in Riak. Let's say you are building something like a CMS with two features, news and products. You need to be able to store this information for multiple clients X and Y. How would you typically structure this?

  1. One bucket per client and then two keys news and products. Store multiple objects under each key and then use map/reduce to order them.

  2. Store both the news and the products in the same bucket, but with a new autogenerated key for each news item and product item. That is, one bucket for X and one for Y.

  3. One bucket per client/feature combination, that is, the buckets would be X-news, X-products, Y-news and Y-products. Then use map/reduce on the whole bucket to return the results in order.

Which would be the best way to handle this problem?

2 Answers 2

15

I'd create 2 buckets: news and products. Then I'd prefix keys in each bucket with client names. I'd probably also include dates in news keys for easy date ranging.

news/acme_2011-02-23_01 news/acme_2011-02-23_02 news/bigcorp_2011-02-21_01 

And optionally prefix product names with category names

products/acme_blacksmithing_anvil products/bigcorp_databases_oracle 

Then in your map/reduce you could use key filtering:

// BigCorp News items { "inputs":{ "bucket":"news", "key_filters":[["starts_with", "bigcorp"]] } // ... rest of mapreduce job } // Acme Blacksmithing items { "inputs":{ "bucket":"products", "key_filters":[["starts_with", "acme_blacksmithing"]] } // ... rest of mapreduce job } // News for all clients from Feb 12th to 19th { "inputs":{ "bucket":"news", "key_filters":[["tokenize", "_", 2], ["between", "2011-02-12", "2011-02-19"]] } // ... rest of mapreduce job } 
Sign up to request clarification or add additional context in comments.

Comments

7

An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.

Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.

You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.

1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).

2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.

3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key). Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.

1 Comment

This is a much better answer and is also up to date.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.