Timeline for How should we design an IoT platform that handles dynamic device schemas and time-series ingestion at scale (100K writes/min)?

Current License: CC BY-SA 4.0

9 events

when toggle format	what		by	license	comment
Aug 5 at 18:22	comment	added	Steve		Now the OP says he's working for the manufacturer, so there may be flexibility with the firmware. But then it's no longer just involving just 4 web+mobile developers, it's them plus a hardware development team, plus an overarching technical manager and decision-maker. And if 100k devices take OTA firmware updates, then it's a cybersecurity team too. Finally, the biggest flow of ones and zeros is into the budget for this project! (2/2)
Aug 5 at 18:21	comment	added	Steve		@ArseniMourzenko, that may solve the problem of clients making too many requests at once, it doesn't solve the problem of the server handling too few requests at once and failing to keep the intended pace. Also, my experience of a number of different kinds of hardware is that the behaviour and protocol of the individual device is often fixed, there may be a diversity of brands or versions in a population of installed devices, and the behaviour of at least some of these is quite often flawed in some way. There are also hardware-level failures that cause deranged behaviour. (1/2)
Aug 5 at 15:20	comment	added	Arseni Mourzenko		Wouldn't limits at the level of the reverse proxy solve that? What I mean is that, indeed, back pressure mechanism may not be straightforward to set up, and does involve the side that sends data to behave responsibly. On the other hand, reverse proxy limits are easy to configure, and would definitively protect the system against an irresponsible (I didn't say malicious!) client. That is, if you, as a client, pile up messages when the API is down, and rush it as soon as it's up, you'll receive lots of HTTP 503. Continue this way, and you'll get HTTP 429.
Aug 5 at 13:16	comment	added	Steve		@ArseniMourzenko, I think with this, it's a case of fools rush in where angels fear to tread. It's not the laws of physics that stop these applications, it's the misapprehension of overall complexity involved, and the amount of resources and expertise available for a solution.
Aug 5 at 8:05	comment	added	Ewan		I agree with steve, 100k per min can be handled, but you know some customer is going to say "ok now do my 1 mil smart lightbulbs, oh and an I need per second accuracy" and its going to fall over. You need some sort of local, or near local collection and batching
Aug 5 at 6:21	comment	added	Arseni Mourzenko		Those are indeed well known problems. Check the term “back pressure” regarding one way to solve them. In short, the side that sends data has a feedback mechanism that tells that the other side is ready to continue to receive. You will get information loss (quite expected anyway during downtime), but at least you won't crush the system that receives data.
Aug 4 at 23:08	comment	added	Steve		@ArseniMourzenko, that's just not my experience with these things. The assumption that 100,000 requests per minute will be monotonic and evenly distributed over the entire minute is not justifiable, and a request that takes 10ms under ideal conditions might be blocked for just a second - you now have a backlog of over 1,600 concurrent connections waiting, on a system reckoned to handle just 17 at once. A server or system that has just an hour of downtime, on resuming could face 100,000 concurrent connections each maybe pushing 60 times more payload than normal.
Aug 4 at 21:54	comment	added	Arseni Mourzenko		“concurrent connections to a single central data store”—it doesn't need to be single, as sharding is a perfectly viable option here. Also, for example SQL Server limit in terms of concurrent connections is 32 767. Far from the “few dozen” connections you mentioned. Finally, the author never told there are 100,000 connections at the same time. It may be that there are 100,000 requests per minute (although it's not clear from the original question). If every request takes 10 ms. to process, that's 17 concurrent connections.
Aug 4 at 21:37	history	answered	Steve	CC BY-SA 4.0