Sorry for necro'ing this thread nearly a decade later, but in case someone else comes across this like me:
The structure for a graphical adventure game is sort of the natural structure you would think about (naming taken mostly from the TADS standard library):
You have a Room, which is a single screen. There is the concept of the current Room in which the player may be, and each Room can contain Objects, and have a background image.
You have Objects, which are things visible on screen at a certain position and size. This is the base class for NPCs, doors, items to be picked up, or containers holding other Objects etc. There is also one Object representing the player. The player is special in that it is drawn in front of or behind the other items depending on where it is vertically, lower or higher. An Object has a current image, and probably multiple images that it can rotate through (for an animation), and also frame sets that can be swapped out, so you can do things like put a player on a "climbing" set of frames, or "walking down" or "walking up" or "picking something up" etc.
You have Items, which are Objects that the player can pick up. They're like items, but often have an extra image attached to them that will be shown in your inventory to represent them. They have methods that allow checking preconditions before picking them up or dropping them again is permitted.
You have NPCs which are Objects you can have conversations with. For this, you need a conversation tree GUI, and a ConversationNode with a list of Phrases a character can say. Each Phrase has a reference to another ConversationNode that will be shown after it has been said. Each ConversationNode also has a bit of text (usually the NPC's answer to your question) that will be shown above the conversation choices.
You have Containers, which are Objects that can hold other objects. Some containers are like locked bookshelves that just let you look at items in them once they've been examined once themselves, others are like chests, where you can take things out of them, or put things into them.
You have Doors, which are Objects that you can click at to take you to a different room. Some doors can be locked, meaning you need a special Item in your inventory to be able to walk through them, like a key. Doors are often part of the background, so can also just be invisible rectangles over which the mouse pointer changes to a "walk" arrow or so, they don't need to have a current image or a frame set.
The structure for a text adventure would be roughly similar, except that instead of images, it contains description messages, and a list of adjectives and nouns that can be used to refer to it.
Note that you can also combine these classes, e.g. you could have a vendor NPC be both an NPC and a Container, and selling or buying stuff from them just means putting stuff into/out of them, but it would also check that you have enough money and deduct the cost, or pay you money for sold items.
Also, nested rooms (like a car inside a room) could be a door to another room. You don't actually need a Room that is contained in another room. Though if you did put a room in a room, that would make it easier to show what is outside the car, as you could just ask the Room containing the car-Room to provide an image to show through the windshield. But then your Room would have to also be a Door so you could enter it.
Your inventory would probably just be a Container, or a Container inside the player.
To save game state, you probably want each item to produce JSON or so that describes any changes to its state, so you'd need to remember its original state, and if the current state differs, note that in the JSON. Then saving game state would just be asking each object in every room to produce that JSON, and saving every JSON dictionary that is not empty. Containers would also report any non-empty JSON in their contained items. Each JSON could include a reference to the containing item, if that changed (like, its unique ID number), and their own unique ID (so you know which item the JSON is for).