Architecture for streaming and storing realtime data

Question

I have realtime tick data coming from Bloomberg B-PIPE for thousands of stocks and equity options. I have multiple multithreaded docker instances each processing subset of tickers. The processed data gets sent to a channel using redis pub/sub. I also use redis to store the tick data before publishing it. I have script that takes a snapshot of that data every one minute and stores it in postgresql for later use if needed.

I have another microservice that listens to that redis channel and does pnl calculations for each portfolio that contains the ticker with the update.

I’m thinking of switching from redis pub/sub to something else. Problem I’m facing with redis pub/sub now is that if i have multiple instances listening to same channel, I could be processing the same ticker info multiple times. Is there a way around this or is it time to use kafka which will be pain in the ass to set up? And if using kafka, would it make more sense to have a channel/topic for each individual ticker.

I would love to stick with redis to store the intraday tick data. I’m using redis enterprise on azure and love the time series package. It’s very easy to fetch the data in my frontend system and do analytics. I have the a low tier plan and not sure if it’s feasible to upgrade to higher one as it gets drastically more expensive. Right now the server keeps slowing down every now and then and even crashing around market close with just 10k ops/sec. Are there any cheaper alternatives? I’m expecting the strategy to keep growing in size and will be processing much more tickers.

Would appreciate any help from those of you who have experience building similar systems.

Steinwolfe · Accepted Answer · 2024-12-08 04:51:35Z

I have built and am currently building something like this but its not very clear what exactly do you want to solve here.

However, from what I understand, it is that you want a scalable solution that can do better than current redis while avoiding kafka.

I would say to use NATS by JetStream rather than redis, or if you can use AWS SQS and SNS. That would solve a lot of scaling issues.
Use AreoSpike, clickhouse or TimescaleDB for timeseries analytics and other stuff.

Let me know if you need more information on this setup, or you are looking for something else.

Stack Exchange Network

Architecture for streaming and storing realtime data

1 Answer 1

Hot Network Questions

Architecture for streaming and storing realtime data

1 Answer 1

Related

Hot Network Questions