Architecture for EOD (end of day) stock exchange prices

Question

I need to work out the architecture for a NASDAQ frontend charting application (a desktop app in .Net). Note that this is NOT for real-time quotes.

NASDAQ provides an api that gives historical pricing, limited to one year's data, which is fine for our purposes.

First I use that API to get the data (it comes in csv files). I store those files in an S3 bucket.

Then I use AWS Glue (a cloud ETL tool) to move the data into ~~Redshift~~ a cloud DB (see EDIT below).

Meanwhile I setup a Lambda function that runs at the end of each day (or say 01:00 AM) to get the price for each ticker for the day that just ended, and that function adds it to the historical data in ~~Redshift~~ the cloud DB (see EDIT below).

Finally I create a separate Lambda function that the desktop app can call for any ticker symbol, and that function queries the database to return all the data up to yesterday (which is now in the DB).

Questions re this backend architecture

Is this a good use case for Lambda and the AWS API Gateway? And does the way I have laid it out make sense?
would it make more sense to use Python FastAPI on an EC2 instance to query Redshift and provide a GET endpoint for that? (although I'd be concerned about how to scale that to support more clients)

Questions re frontend app

A key question on the frontend is should the Desktop app store locally the data for each ticker the user requests (and subsequently only download the days it does not have)?

eg. 1 user downloads Apple prices on Monday 2 they close the app 3 they open app again on Thursday and asks to see Apple again,

So should the desktop have stored the previous data locally, or just make a new request to get it all again?

That seems wasteful, would it make more sense for the Desktop App to see if it has Apple data locally up to Monday, and then only request Tue/Wed EOD prices?

EDIT

Given the comment about the need to use Redshift, I did some further research and now I know that AWS Glue can directly query CSV files in S3 using AWS Athena.

So I guess Redshift is not required, the Lambda function that API Gateway calls can just call a query using Glue (I think, right?)

EDIT 2

Although I just learned that AWS Athena is not an option as it's queries are queued, not on demand, so it could take several minutes to respond.

So I'm still under the impression that we need to first store the data in an actual database for API Gateway to be able to query it quickly. Maybe Amazon Aurora?

Why on Earth do you need Redshift for storing 3300 (# of companies on the NASDAQ) * 365 days = about 1M data points? — Philip Kendall
– Philip Kendall, Commented Oct 11, 2023 at 13:44
@PhilipKendall This is an MVP. Eventually there will be 10+ years of data. And then later we will be adding each day's data so it is constantly up-to-date. I left that out of the question since it seemed unimportant to the essential architecture. But for the MVP where do you suggest we store the data if not Redshift? — rmcsharry
– rmcsharry, Commented Oct 12, 2023 at 8:55
@rmcsharry: You're using a container ship to do your groceries, to say the very least. Redshift works in the petabyte scale of data. Even if you store 1GB of data per company, per day, you'd need over a year to get to your first petabyte. You seem to be storing one price (EOD) per day per company. If I were to say that's 1kb of data I'd still be very generous. With 1kb of data per company per day, it would take you about a million years to get to your first petabyte. A single Redshift cluster can handle 16 petabytes. We're beyond overkill. — Flater
– Flater, Commented Oct 13, 2023 at 11:50
@Flater Thank you for that, but perhaps you missed my edit where I've concluded we don't need Redshift. But it seems I cannot just query the data in S3, I need to move it into a database, but which database? — rmcsharry
– rmcsharry, Commented Oct 13, 2023 at 12:16
Why vote this down? I'm asking about an architecture that I an not experienced with in order to elicit expert knowledge in an answer. How does this question NOT meet the rules of this site? — rmcsharry
– rmcsharry, Commented Oct 13, 2023 at 12:17

nickw · Accepted Answer · 2023-10-11 14:54:09Z

Lambda and gateway seem like a good fit for this, if you don't have several other endpoints you need to support for accessing that same data, if you do then maybe fast api would be the way to go.

As far as storing data locally on the app that is fine, this means your api will have to support the some sort of from_date parameter. Might be worth limiting the number of tickers stored locally to a hundred or something just to keep your apps disk memory usage down a bit.

Thanks for the answer, which I voted up. But it seems that using AWS Glue is not the answer here. I have found the solution and will post the answer. — rmcsharry
– rmcsharry, Commented Oct 14, 2023 at 11:13

rmcsharry · Accepted Answer · 2023-10-14 11:15:19Z

The answer here is not to use AWS Glue, although that is a great ETL tool.

The solution is much simpler and it's basically to write a Lambda function that imports the CSV files from S3 into an Amazon Aurora Postgres database (or similar, like RDS).

AWS even has a step-by-step guide of how to do that here

Stack Exchange Network

Architecture for EOD (end of day) stock exchange prices

2 Answers 2

Hot Network Questions

Architecture for EOD (end of day) stock exchange prices

2 Answers 2

Related

Hot Network Questions