0

Lets assume a data model in which a User have blog-posts. Each post has a unique title and many attributes.

I have a Column Family "posts" in which each row is like this:

posts = { "yersterday" : { date : 03-04-2012 userID : abfe222234 tags : "beatles,paul" } } 

I want to index the posts by user, so I have another regular column family:

user_posts = { abfe222234 : { yesterday : null .... } } 

This model comes after a lot of research about secondary indexing in Cassandra, in which I came to these slides: http://www.slideshare.net/edanuff/indexing-in-cassandra and understood that Super Column Family are less and less used.

My question:

If you want all the details about the user posts, it means that I have to read the DB twice: once for getting all the posts IDs, and once for fetching all the post's details for those IDs.

What am I missing?

Thanks, Issahar.

edit:

The other option, is to make "user_posts" be a Super CF, and make it contain all the data that is inside "posts".

pros: you'll have to fetch all the data only once.

cons: 1. You'll duplicate all of your data. 2. You can't search for once attribute of a post.

What do you say?

1 Answer 1

1

Looks pretty straightforward to me- you really do indeed need to perform two database reads to get the data in this case. For what it's worth, most relational databases need to perform two logical reads also, unless the data that the user is interested in is fully contained in the index. The only difference is that in a relational DB there is only one network round trip.

Sign up to request clarification or add additional context in comments.

5 Comments

And what if there are hundreds of posts? how do you fetch it? build a very very long CQL with "KEY in ('a', 'b', ...)"? it doesn't seems right!
Slowly, I'd imagine. Seriously, using a predicate seems the logical approach. See prettyprint.me/2010/01/20/… for example, specifically "When reading or writing data it’s possible to read/write a set of columns for one specific key (row) atomically. This set of columns may either be a specified by the list column names, or by a slice predicate, assuming the columns are sorted in some way (that’s a configuration parameter)"
But they're not sorted at all. You have posts of user A, then posts of user B and then again posts of user A. BTW, I speak Hebrew, so thanks for the pointer... :)
That's convenient! I'd draw your attention to this particular phrase in my comment: This set of columns may either be a specified by the list column names. I'd imagine that you'd need to pack up all the column names that you got from the index, and ship them back up to the server to serve as a filter.
Thanks for the help. Do you think that maybe I should use a different model? like Super CF, as I wrote in my edit for the post?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.