Timeline for How to rebalance data across nodes?

Current License: CC BY-SA 4.0

15 events

when toggle format	what		by	license	comment
Aug 7 at 0:09	history	bumped	CommunityBot		This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Apr 9 at 0:05	history	bumped	CommunityBot		This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Dec 9, 2024 at 23:02	history	bumped	CommunityBot		This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Aug 11, 2024 at 22:04	history	bumped	CommunityBot		This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
Jul 13, 2024 at 21:22	history	protected	gnat
Jul 13, 2024 at 19:50	comment	added	amon		If this is a cloud-first design then storage will be virtualized anyway. You may be able to share storage, or reassign volumes to another compute node with little or no downtime. Actually moving data between nodes would make scaling down more difficult. However, your main problem isn't storing/moving data, but designing a distributed system. Your problems are shared with other databases that support sharding and/or replicas. Solutions will depend on your CAP priorities/ desired consistency model. Aside from "B doesn't have the data yet", also consider "B has crashed/frozen".
Jul 13, 2024 at 14:30	comment	added	poundifdef		The goal is for the system to automatically be able to scale up and down depending on load. I assume that rebalancing data is necessary here to be able to scale in without waiting for a single node to have 0 messages.
Jul 13, 2024 at 14:28	comment	added	poundifdef		I am implementing my own queue from scratch, with the semantics of SQS.
Jul 13, 2024 at 4:29	comment	added	amon		In Kafka, changing the number of partitions only affects future messages. Existing messages remain in their partition and aren't moved around. However, brokers (nodes) in a cluster can fetch data from each other. So a client only has to connect to any node in the cluster, not to the node that happens to store the data. Perhaps you could implement a similar mechanism to internally route message reads to the correct node. But Kafka also makes some decisions (messages are immutable, clients pull messages) that make this feasible.
Jul 13, 2024 at 1:18	comment	added	Greg Burghardt		And by "implementing a message queue" do you mean you are choosing from an existing solution, or are you literally writing the code which will serve as a message queue?
Jul 13, 2024 at 1:16	comment	added	Greg Burghardt		Which kind of message queue are you using? For example, RabbitMQ allows for clustering. Is that similar to what you need?
Jul 12, 2024 at 21:38	answer	added	anon_user123456		timeline score: 0
Jul 12, 2024 at 21:31	history	edited	poundifdef		edited tags
S Jul 12, 2024 at 21:21	review	First questions
Jul 14, 2024 at 22:51
S Jul 12, 2024 at 21:21	history	asked	poundifdef	CC BY-SA 4.0