You need a container to hold a reference to each object in the current tile/block.
If your map is made of tiles
Each tile has a height (as in games like Roller Coaster Tycoon or Transport Tycoon). You have a container for each position (x, y) of the map. When an object moves, it is removed from the container at (x, y) and added to the container at (x', y'). The container must be sorted by the altitude which the object is at.
Then, you iterate the tiles. For each tile, render the tile first, then the objects referenced in the container, starting from the object with lower altitude to the object with higher altitude. This is the reason you need to sort the container by the altitude the object is at.
OpenRTC2 uses similar approach, but optimized to use a compact pool for the references of objects.
If your map is made of blocks
Blocks don't have an height, at least not like in Roller Coaster Tycoon or Transport Tycoon, but they are like blocks in Minecraft. The map can have "holes" in it, in this case.
The approach is similar, but there's a container for each block. That is, there's a container for each position (x, y, z) of the map. In most cases, if the object is at (x, y, z), it means there's no block there to render. That is, you will render only a block or only objects for the position (x, y, z). Some cases are special. I see there are some brushes in your game, so maybe a character can enter the brushes, then you render the block first, then you render the character with a sprite where the bottom of the character is not drawn, to give the illusion the bottom part of the character is inside the brushes.
Can be applied to both approaches
You will possibly want to sort the references also by (x, y) position inside the tile/block, since objects can have different positions inside the tile/block, and therefore the order of their rendering is different.