Should I perform some minor denormalization to save myself several queries in the future?

Question

I'm in charge of designing the entire backend for the REST API of an application that works more or less like an online browser game (think OGAME, Travian, and the likes). In this game, players are able to loot procedurally generated equipment, like in Borderlands. This equipment can have any amount of bonii to all the different stats present in the game; so to speak, a gun could potentially grant you more Health or Intelligence.

Right now, they have given me a database schema they were using as a Proof of Concept. Said DB schema is in dire need of a redesign, and since we are in such early stages of development, I might as well go all in with it.

My problem is as follows: the server application generates random equipment on demand, and the generated equipment is stored in the database. Depending on the type of the generated piece of equipment, it may appear either in the Weapons table (1:n with the player, using a third table [?] for the relation), the Armor table (1:n with the player, same as with the Weapons table), the Ammo table (1:1 with the player, since Ammo types are hardcoded) and the Consumables table (1:1 with the player, same as with the Ammo table). Armor and Weapon tables are similar, but not equal, so merging them in one table isn't possible unless we use a solution similar to what I am about to ask.

One of the usecases is, obviously, listing the player's current Inventory. As is, this would take three different queries (I'm not expecting a huge concurrent user volume, but I'd rather be safe than sorry). In addition, this design has some limitations:

A scenario in which we would need to run several versions of the game (v1, v2, v3, etc) simultaneously is likely, and these versions may be forwards-compatible. Using our current schema, this would mean creating a new table or even database for all the new versions.
As is, the schema does not allow for n:m relationships between Players and objects. In case two users were to roll identical pieces of equipment (which is likely, given the algorithms in place), the row would be duplicated. The changes required to make this work are relatively minor.
Consumables and Ammo are hardcoded into the application.
This design does not allow for misc. items, like quest items and the likes.

I have thought of a new "denormalized" schema in which all objects are stored in a Warehouse table. In this Warehouse table, we would have an UUID for the object and/or maybe a hash field and, since we are using PostgreSQL, a JSONB field with the information about the object. Said information will be mostly read-only, requiring an update only in some rare cases; in addition, searches using data present in these fields will not only be rare, but only performed in some small subsets of data (finding all the Weapons owned by a player, for example, would filter first by Player). This denormalization would not create redundant data in any case, just data coupling. Furthermore, the JSON documents should never be bigger than 2kb, and they would never have nested objects, just one-dimensional arrays at worst.

I've asked some colleagues and they said it didn't sound like a good idea, but couldn't pinpoint exactly why. As far as I know, this isn't any worse than using MongoDB; in fact, if I were to start using MongoDB, I would have to denormalize much harder than I am doing right now.

Is this schema that bad of an idea? Should I keep the current design? What are the reasons against coupling different types of JSON schemas in the same column, other than not being able to perform data validation DB-side (which we should be doing on the server, anyway)?

EDIT: Adding some more context to this question.

Right now, there is a working client for the game. Since it's still in an internal testing stage, the client directly performs the queries against the database.
I was hired to program the server and design the REST API for the release version of the game. I've been given pretty much free reign designing the server architecture.
There is another programmer in the project, but he is working with the client. Communication between server and client will be exclusively done via the REST API.
We will be using Node.js and PostgreSQL for the server because it's what I'm more comfortable with. I can't foresee whether we will be changing the system in the future.
We are using an in-house ORM. It wouldn't be a problem to modify the ORM in case we found a limitation.
Data from/to the API will be sent and received in JSON format.
A very hastily made estimate of the maximum concurrent number of players, based on wild especulation using a single sample of a similar platform, is 2000.
I could ask the other programmer to modify the way his app works (like adding cache and the likes), but I'd rather not bother him that much right now. Consider the app sees the server as a black box.
The proof of concept DB schema is using TPC right now. I had considered TPH (and that's more or less why I thought using JSONB fields would be cleaner; at best I would save a bit of disk space, but the DB still couldn't guarantee the given object is a valid object). Didn't consider TPT but that still doesn't reduce the number of queries, which is why I wanted to use the JSONB schema.
I also thought about emulating a TPH with a view over the Armors and Weapons table, but we still have the issue of not being flexible enough, and we still don't contemplate the Misc. items table.
I can't tell much more about it without saying too much about the project, but trust me when I say a scenario in which we would have to run several game versions at once is likely. Export scripts could be used.
An usecase in which I would need to filter by any of the JSONB's fields is unlikely. Even then, PostgreSQL would allow it with more than acceptable performance, according to most benchmarks.
An usecase in which I would need to update a single member of a JSONB field is unlikely. Even then, PostgreSQL also allows it.
An usecase in which I would need to update the whole JSONB field is slightly more likely, but then again, most objects are read-only.
The given JSON objects would be very simple and relatively small. They would be, at worst, just a key-value pair with arrays filled with primitives.
Weapons and armor, in the proof of concept, are a n:1 relationship with the player. So to speak, both the instance and the weapon blueprint are the same object. This makes sense, but so would be treating weapon blueprints/archetypes as their own thing and then instancing them in the player's inventory using another table.
If this were a personal project I wouldn't be asking this question: I would just pick the JSONB option and gladly shoot myself in the foot with it, if only for the learning experience. But this is a project some other people will be using in production, so I'd rather check twice and thrice to commit to a solution.

Flater · Accepted Answer · 2020-04-10 12:48:55Z

To summarize

Without trying to cause any offence, the issue that needs the most addressing here is your attitude towards development. I'm not accusing you of being malevolent or ill-intentioned, but there are some massive red flags going up in regards to how you're approaching the development of your game and how to find appropriate solutions.

My answer will delve both into the technical possibilities as the way to approach their development, because without addressing the developer attitude, the same technical issues are going to keep popping up as you (unintentionally) cause yourself more issues down the line.

Jumping the gun

Right now, they have given me a database schema they were using as a Proof of Concept.

Said DB schema is in dire need of a redesign,

This is perfectly fine - a proof of concept obviously isn't a finished or well-honed product. There is no reasonably expectation that a POC, or the version after a POC, or even a few versions after the POC, is a finished product.

and since we are in such early stages of development, I might as well go all in with it.

Woah there.

I would pull the brakes here. Going all in sounds much too involved for something that is currently a POC. Your question goes on to state several requirements as being given facts, but if you're dealing with a POC, you're nowhere near a state where you have a closed (and immutable) list of requirements nor their implementation.

Improving things is good, but trying to decide and settle the specific implementation of everything today is going to bite you tomorrow.

Do me a favor and re-read your question and notice that you speak in certainties when talking about why the POC is not correct, but then you consistently speak in uncertainties ("I expect", "is likely", ...) whenever you're addressing your proposed fixes. This is a massive red flag to me.

My general feedback for you is to hold your horses, don't paint yourself into corners, and work on concrete tasks one at a time. Rome wasn't built in a day, and even more importantly for you: Rome's street plan wasn't designed in a day either.
You cannot and should not be making definitive decisions like these, unless you are a solo developer who has sole executive power over the development, design and ownership of the game. I surmise that this is not the case for you.

so merging them in one table isn't possible unless we use a solution similar to what I am about to ask

I want to address two issues here. One is of a technical nature (there is another solution) which I will address in a later point, but the more important one here is that your statement here is a massive red flag for developer arrogance.
You've just asserted that your solution is the only solution - before you've even managed to make your solution work.

There is a massive difference between "I don't see another solution" and "there is no other solution", and the inability to distinguish between these two is one of the biggest flaws a developer can exhibit, regardless of seniority or skill level.

One of the most common consequences of this approach is that it generates XY problems, which is a scenario where someone assumes that their attempt at solving a problem is the only/best possible solution to the problem. This leads to a lot of time wasted on researching the wrong solution, and struggling to get the right answer when asking for help, as an XY question is based on a premise that others don't understand or disagree with.

I suggest looking into what an XY problem is, and how to avoid it. It will save you a lot of time in the long run.

As is, the schema does not allow for n:m relationships between Players and objects. In case two users were to roll identical pieces of equipment (which is likely, given the algorithms in place), the row would be duplicated. The changes required to make this work are relatively minor.

You're using "weapon" ambiguously. Is a weapon a physical object (instance), or is it a predefined reusable type that can apply to multiple instances?

Wanting to implement a n-m relationship implies that if two weapons have the exact same stats, you want them to refer to the same weapon row. This seems like a very bad idea.

If two people in my company are called John Smith with the same birthday (let's assume this is all the data I store for people), should I have their other records refer to the same data row? No. These two people having the same data is coincidence and trying to manage duplicates is going to give you more issues than it's going to solve.

This is exactly why an object's equality check does not default to the combined equality checks of all of the object's properties. The identity of an object is defined by more than just the value of its properties. Two objects with the same values in their properties are still two distinct objects!

Your weapons are in the same situation: each weapon is individually generated, it's an individual entity. There may be other weapons like it, but that's coincidence and there's no functional benefit to knowing that these weapons are the same.

Stick with 1-n person/weapon, and generate your weapons individually. The n-m management is going to do your head in.

A scenario in which we would need to run several versions of the game (v1, v2, v3, etc) simultaneously is likely, and these versions may be forwards-compatible.

"is likely" is not the same as "is a requirement". This sounds like an educated guess without an explicit requirement backing it.

Developers often have to fill in some blanks in the requirements, but they should do so sparingly. If you decide your versioning scheme based on what you think might be relevant - without actual information on what is required/relevant, you're going to have a bad time.

Don't make any concrete decisions until you have actual concrete information on the versioning requirements.

Speaking as both a gamer and software developer with some PBBG development experience, it's highly unlikely that you're going to want to have players with different game versions playing the same game in a multiplayer setting where they interact with each other. That seems to be nothing more than asking for trouble.

Odds are that having separate databases per version (with an upgrade migration script) is going to suffice. But don't take my work for it. Wait for concrete requirements.

Is this schema that bad of an idea? [..] What are the reasons against coupling different types of JSON schemas in the same column

Yes. The JSON conversion cost (performance + development effort) is going to negatively impact you in the long run. You didn't specify your programming language, but there are issues in either case:

Statically typed languages are going to need a lot of handholding to deserialize the data, resorting to reflection and a lot of "interesting" code to keep it all logically consistent.
Dynamically typed languages are going to become a mire of bugs as they perform no sanity checking for the properties you're trying to access and whether they exist on the current object or not.

Should I keep the current design?

Not particularly. But I wouldn't throw it out in its entirety. At face value, it's closer to a solution than your suggested schema. I suggest you start from the existing design, and whenever you encounter a concrete issue with it, improve it based on your concrete requirements.

... other than not being able to perform data validation DB-side (which we should be doing on the server, anyway)?

I have no idea where the assertion than the backed should filter data instead of the database comes from. You call it "validation" but the entire question is based on the querying and retrieving of filtered datasets. If we're talking about actual validation of input data now, that's a whole different question.

Filtering should favor the DB-side. Mostly for performance reasons, but also because it's the database's responsibility, and your business logic should be doing things other than data juggling (that's what you have the database for, after all).

A proposed improvement

One possible design you have missed is that you can implement a form of inheritance in your table structure, where multiple concrete types share a reusable base type. In your codebase, this is a simple inheritance architecture, e.g.:

public class Item { public int Id { get; set; } public int PlayerId { get; set; } public string Name { get; set; } } public class Weapon : Item { public int AttackDamage { get; set; } } public class Armor: Item { public int MeleeDefense { get; set; } public int RangedDefense { get; set; } }

Your inventory can thus be modeled as a List<Item>, which can contain any kind of item that derives from the base class (weapons, armor, ammo, quest items, ...)

To store this data in your database, there are three possible approaches:

Table per hierarchy (TPH) is the simplest but also not the cleanest or most efficient. It means you just mash all columns into a single table:

 Table: ITEMS ID | PlayerId | Name | AttackDamage | MeleeDefense | RangedDefense ============================================================================= 1 | 1 | Pistol | 15 | NULL | NULL 2 | 1 | Helmet | NULL | 20 | 10

Table per concrete type (TPC) makes separate tables. The properties of the base class are part of each separate table.

Table: WEAPONS ID | PlayerId | Name | AttackDamage ============================================== 1 | 1 | Pistol | 15 Table: ARMOR ID | PlayerId | Name | MeleeDefense | RangedDefense ============================================================== 1 | 1 | Helmet | 20 | 10

Table per type (TPT) is like TPC, but it instead abstracts the base class properties into a table of its own. The interesting this here is that every concrete PK is a reference to the PK of its related base items, e.g. weapons and armor never have overlapping ID values since they exist in the same base items table:

 Table: ITEMS ID | PlayerId | Name ======================== 1 | 1 | Pistol 2 | 1 | Helmet Table: WEAPONS ID | AttackDamage =================== 1 | 15 Table: ARMOR ID | MeleeDefense | RangedDefense =================================== 2 | 20 | 10

Note that the table examples here are simplified for the sake of example. You're presumably going to rely on an ORM, and the ORM will do most of the concrete management for you.

TPH is the only approach that avoids multiple queries, but it comes with drawbacks of its own: data wastage (a lot of NULL columns), versioning issues (when supporting multiple derived classes across multiple versions can be impossible depending on version differences), and the cost of having to convert each data row manually.

TPC/TPT are much easier in terms of data management, though they come at the cost of having to fire more than one query to retrieve a list of items. That's not inherently a problem unless you're expecting really high volumes, at which point you could still rely on client side caching to lower the load on your database.

You're going to have to make a decision here which is more important to you: squeezing query performance (TPH) at the cost of development effort and DB size; or minimizing development effort (TPC/TPT) at the cost of running multiple queries (one per concrete type) when you want to fetch a combined list. I favor the latter, but the decision is contextual.

Using Entity Framework as an example, ORMs mostly hide the database side of things from you, allowing you to focus on simpler data operations:

// Combined retrieval List<Item> playerInventory = db.Set<Item>().Where(i => i.PlayerId == playerId).ToList(); // Automatic type filtering List<Weapon> playerWeapons = db.Set<Weapon>().Where(i => i.PlayerId == playerId).ToList(); List<Armor> playerArmor = db.Set<Armor>().Where(i => i.PlayerId == playerId).ToList(); // EF figures out the table structure for you db.Set<Item>().Add(new Weapon() { Name = "Pistol", AttackDamage = 12, PlayerId = playerId }); db.Set<Item>().Add(new Armor() { Name = "Helmet", MeleeDefense = 15, RangedDefense = 5, PlayerId = playerId });

I've added some context to the question.

neirenoir
– neirenoir

2020-04-10 21:43:40 +00:00
Commented Apr 10, 2020 at 21:43 — neirenoir
– neirenoir, Commented Apr 10, 2020 at 21:43

Stack Exchange Network

Should I perform some minor denormalization to save myself several queries in the future?

1 Answer 1

To summarize

Jumping the gun

A proposed improvement

Hot Network Questions

Should I perform some minor denormalization to save myself several queries in the future?

1 Answer 1

To summarize

Jumping the gun

A proposed improvement

Related

Hot Network Questions