Secondary Index in Cassandra will lead to two DB reads

Question

Lets assume a data model in which a User have blog-posts. Each post has a unique title and many attributes.

I have a Column Family "posts" in which each row is like this:

posts = { "yersterday" : { date : 03-04-2012 userID : abfe222234 tags : "beatles,paul" } }

I want to index the posts by user, so I have another regular column family:

user_posts = { abfe222234 : { yesterday : null .... } }

This model comes after a lot of research about secondary indexing in Cassandra, in which I came to these slides: http://www.slideshare.net/edanuff/indexing-in-cassandra and understood that Super Column Family are less and less used.

My question:

If you want all the details about the user posts, it means that I have to read the DB twice: once for getting all the posts IDs, and once for fetching all the post's details for those IDs.

What am I missing?

Thanks, Issahar.

edit:

The other option, is to make "user_posts" be a Super CF, and make it contain all the data that is inside "posts".

pros: you'll have to fetch all the data only once.

cons: 1. You'll duplicate all of your data. 2. You can't search for once attribute of a post.

What do you say?

Chris Shain · Accepted Answer · 2012-03-04 16:16:04Z

1

Looks pretty straightforward to me- you really do indeed need to perform two database reads to get the data in this case. For what it's worth, most relational databases need to perform two logical reads also, unless the data that the user is interested in is fully contained in the index. The only difference is that in a relational DB there is only one network round trip.

answered Mar 4, 2012 at 16:16

Chris Shain

51.5k7 gold badges95 silver badges124 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Issahar Weiss Over a year ago

And what if there are hundreds of posts? how do you fetch it? build a very very long CQL with "KEY in ('a', 'b', ...)"? it doesn't seems right!

Chris Shain Over a year ago

Slowly, I'd imagine. Seriously, using a predicate seems the logical approach. See prettyprint.me/2010/01/20/… for example, specifically "When reading or writing data it’s possible to read/write a set of columns for one specific key (row) atomically. This set of columns may either be a specified by the list column names, or by a slice predicate, assuming the columns are sorted in some way (that’s a configuration parameter)"

Issahar Weiss Over a year ago

But they're not sorted at all. You have posts of user A, then posts of user B and then again posts of user A. BTW, I speak Hebrew, so thanks for the pointer... :)

Chris Shain Over a year ago

That's convenient! I'd draw your attention to this particular phrase in my comment: This set of columns may either be a specified by the list column names. I'd imagine that you'd need to pack up all the column names that you got from the index, and ship them back up to the server to serve as a filter.

Issahar Weiss Over a year ago

Thanks for the help. Do you think that maybe I should use a different model? like Super CF, as I wrote in my edit for the post?

Collectives™ on Stack Overflow

Secondary Index in Cassandra will lead to two DB reads

1 Answer 1

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Related