I'm toying with two schemas and I can't decide which is more scalable. The schema is for a Q&A, and it's built in MySQL. People post questions/answers and like/dislike/favourite questions and answers. A question can have many answers/likes/dislikes, and so can an answer.
To read a question to a user both schemas require the same number of joins, but the joins are handled differently:
Schema 1
questions(id, title, body, userId) questionLikes(id, questionId, userId) questionDislikes(id, questionId, userId) quetionComments(id, questionId, body, userId) answers(id, questionId, body, userId) answerLikes(id, answerId, userId) answerDislikes(id, answerId, userId) answerComments(id, answerId, userId, body) favourites(id, questionId, userId) This is more normalized, easier to develop for, but scalable? Seems to be a lot of repeat information. The join sequence to grab a question is to a user (we want to include his like/dislike activity)
select question join answers join questionLikes join questionDislikes join questionComments join favouites join answers to answerLikes join answers to answerDislikes join answers to answerComments (multiply answer joins by number of answers) Schema 2
posts(id, postTypeId, userId, title, body) postTypeId(id, postType) comments(id, postId, userId) votes(id, voteTypeId, userId) voteTypeId(id, voteType) This is less normalized and compact, seems like it would scale better, a pain in the neck with self joins and other development issues (conditional validation). The join sequence to grab a question is
select question and its answers in the same read using where @id for question, and @questionId for answers; each row, join the following: join votes on as likes on voteType 1 join votes as dislikes on votetype 2 join comments join favouites (multiply joins by number of rows) So what will scale better? I know can add some additional fields to store counts so no joins are necessary. But both require the same number of joins and I cant make up my mind.