8

I'm looking for a way to output a selected related record for each record in a table in MySQL. I'll explain further...

I have 2 tables currencies and exchange_rates. The tables are joined by a currency_code field and each currency record has multiple related exchange rate records, each exchange rate record represents a different day. So there is a 1:many relationship between currencies and exchange_rates.

I want to retrieve a full record from the exchange_rates table for each currency but with the ability to define specific criteria as to which related record to select. Not just the most recent exchange_rate for each currency but maybe the most recent exchange_rates record for each currency that has the field criteria_x=NULL.

It's a shame that you can't use LIMIT within a derived table otherwise something like this would be a neat and readable solution...

SELECT `currencies`.`currency_code`, `currencies`.`country`, `exchange_rates`.`id`, FROM_UNIXTIME(`exchange_rates`.`datestamp`), `rate` FROM `currencies` INNER JOIN ( SELECT `id`, `currency_code`, `invoice_id`, `datestamp`, `rate` FROM `exchange_rates` WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A' ORDER BY `datestamp` DESC LIMIT 0, 1 ) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code` ORDER BY `currencies`.`country` 

The LIMIT clause is applied to the parent query not the derived table.

This is the only way I've found to do this...

SELECT `currencies`.`currency_code`, `currencies`.`country`, FROM_UNIXTIME( SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 1), '-', -1)) AS `datestamp`, SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 2), '-', -1) AS `id`, SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 3), '-', -1) AS `rate` FROM `currencies` INNER JOIN ( SELECT `currency_code`, MAX(CONCAT_WS('-', `datestamp`, `id`, `rate`)) AS `concat` FROM `exchange_rates` WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A' GROUP BY `exchange_rates`.`currency_code` ) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code` ORDER BY `currencies`.`country` 

So concatenating a bunch of fields together and running a MAX() on it to get my sort order within the group, then parsing those fields out in the parent query with SUBSTRING_INDEX(). The problem is that this method only works when I can use a MIN() or MAX() on the concatenated field. It wouldn't be ideal if I wanted to sort a string or sort by multiple criteria but limit to a single record.

Also it causes me physical pain to have to resort to horrible string manipulation to get the data I want from a relational database — there has to be a better way!

Anyone got any suggestions of a better method?

1
  • kindly provide your table structure and some data so that we can see what we can do to solve it Commented May 14, 2012 at 14:02

4 Answers 4

3
+50

There are a few general issues to discuss (briefly) before trying to provide an answer.

Your first query is:

SELECT `currencies`.`currency_code`, `currencies`.`country`, `exchange_rates`.`id`, FROM_UNIXTIME(`exchange_rates`.`datestamp`), `rate` FROM `currencies` INNER JOIN ( SELECT `id`, `currency_code`, `invoice_id`, `datestamp`, `rate` FROM `exchange_rates` WHERE `criteria_x`=NULL AND `criteria_y` LIKE 'A' ORDER BY `datestamp` DESC LIMIT 0, 1 ) AS `exchange_rates` ON `currencies`.`currency_code`=`exchange_rates`.`currency_code` ORDER BY `currencies`.`country` 
  1. I don't think you need to use as many back-quotes as you are using. They aren't exactly wrong, but I'm not going to type them in my answer.
  2. The SQL standard does not sanction the criteria_x = NULL notation; that should be written as criteria_x IS NULL. MySQL may allow it; as long as you are aware that it is non-standard, it is OK for you to use.
  3. The criterion LIKE 'A' is not sensible if it contains no metacharacters (% or _ in standard SQL). You'd be better off with simple equality: = 'A'.

Your question says:

I want to retrieve a full record from the exchange_rates table for each currency but with the ability to define specific criteria as to which related record to select. Not just the most recent exchange rate for each currency, but maybe the most recent exchange rate for each currency that has the field criteria_x IS NULL.

So, you want to select the most recent exchange rate record for each currency that meets the required other criteria. We can assume that there is a unique constraint on the combination of currency_code and datestamp in the exchange rate table; this means that there will always be at most one matching row. You've not specified what should be shown if there is no matching row; an inner join will simply not list that currency, of course.

With SQL queries, I usually build and test the overall query in steps, adding extra material to the previously developed queries that are known to work and produce the right output. If it is simple and/or I've collected too much hubris, I'll try a complex query first, but when (nemesis) it doesn't work, then I go back to the build and test process. Think of it as Test Driven (Query) Development.

Stage 1: Exchange rate records that match specified criteria

SELECT id, currency_code, invoice_id, datestamp, rate FROM exchange_rates WHERE criteria_x IS NULL AND criteria_y = 'A' ORDER BY currency_code, datestamp DESC 

Stage 2: Most recent exchange rate time for each currency that matches specified criteria

SELECT currency_code, MAX(datestamp) FROM exchange_rates WHERE criteria_x IS NULL AND criteria_y = 'A' GROUP BY currency_code 

Stage 3: Exchange rate record for most recent exchange rate time for each currency that matches specified criteria

SELECT x.id, x.currency_code, x.invoice_id, x.datestamp, x.rate FROM exchange_rates AS x JOIN (SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates WHERE criteria_x IS NULL AND criteria_y = 'A' GROUP BY currency_code ) AS m ON x.currency_code = m.currency_code AND x.datestamp = m.datestamp 

Stage 4: Currency information and exchange rate record for most recent exchange rate time for each currency that matches specified criteria

This requires the joining the currencies table with the output of the previous query:

SELECT c.currency_code, c.country, r.id, FROM_UNIXTIME(r.datestamp), r.rate FROM currencies AS c JOIN (SELECT x.id, x.currency_code, x.invoice_id, x.datestamp, x.rate FROM exchange_rates AS x JOIN (SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates WHERE criteria_x IS NULL AND criteria_y = 'A' GROUP BY currency_code ) AS m ON x.currency_code = m.currency_code AND x.datestamp = m.datestamp ) AS r ON c.currency_code = r.currency_code ORDER BY c.country 

Except that Oracle only allows ') r' instead of ') AS r' for table aliases and the use of FROM_UNIXTIME(), I believe that should work correctly with the current version of almost any SQL DBMS you care to mention.

Since the invoice ID is not returned in the final query, we can remove that from the select-list of the middle query. A good optimizer might do that automatically.

If you want to see the currency information even if there is no exchange rate that matches the criteria, then you need to change the JOIN in the outermost query to a LEFT JOIN (aka LEFT OUTER JOIN). If you only want to see a subset of the currencies, you can apply that filter at either the last (outermost) query stage, or (if the filter is based on information available in the exchange rate table, such as the currency code) at either the innermost sub-query (most efficient) or the middle sub-query (not so efficient unless the optimizer realizes it can push the filter down to the innermost sub-query).

Correctness is usually the primary criterion; performance is a secondary criterion. However, performance was mentioned in the question. The first rule is to measure the 'simple' query shown here. Only if that proves too slow do you need to worry further. When you do need to worry, you examine the query plan to see if there is, for example, a crucial index missing. Only if the query still isn't fast enough do you start trying to resort to other tricks. Those tricks tend to be very specific to a particular DBMS. For example, there might be optimizer hints that you can use to make the DBMS process the query differently.

Sign up to request clarification or add additional context in comments.

2 Comments

A fantastic answer, works perfectly and really well explained. I knew there had to be a decent way without concatenating/exploding a string of data. Regarding the backticks, yeah it's a MySQL habit and a bad habit. I could omit most of them but you just never know what will happen with an upgrade at a site that will break the query if a new keyword is introduced. Also the IS NULL/=NULL, yep I realise that... I just forgot to change that when pasting my demo query in the answer. THANKS!!!
Thanks. For discussion of back-ticks etc, see also What does the SQL Standard say about the usage of `?.
2

If I've understood your problem correctly, all you need to do is self-join exchange_rates to select the rate of interest:

SELECT currencies.currency_code, currencies.country, exchange_rates.id, FROM_UNIXTIME(exchange_rates.datestamp), exchange_rates.rate FROM currencies JOIN ( SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates WHERE criteria_x IS NULL AND criteria_y LIKE 'A' GROUP BY currency_code ) AS exchange_wantd USING (currency_code) JOIN exchange_rates USING (currency_code, datestamp) ORDER BY currencies.country 

Comments

1

Try this query. It is expected to work fine but if you provide some data i will be able to do it properly

SELECT `currencies`.`currency_code` as `CurrencyCode`, `currencies`.`country`, FROM_UNIXTIME( SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 1), '-', -1)) AS `datestamp`, SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 2), '-', -1) AS `id`, SUBSTRING_INDEX( SUBSTRING_INDEX(`exchange_rates`.`concat`, '-', 3), '-', -1) AS `rate`, (SELECT MAX(CONCAT_WS('-', `datestamp`, `id`, `rate`)) AS `concat` FROM `exchange_rates` WHERE `criteria_x`= NULL AND `criteria_y` LIKE 'A' GROUP BY `exchange_rates`.`currency_code` HAVING `exchange_rates`.`currency_code` =`CurrencyCode` ) as `Concat` FROM `currencies` ORDER BY `currencies`.`country` 

2 Comments

Thanks but that looks similar to what I've got above, just with a sub-select instead of the inner join on derived table. My 2nd one above works fine but I'm trying to find a way without concatenating fields then extrapolating a string.
you might not have noticed here i have selected CurrencyCode in the outer query and then used it in the sub query in having clause to restrict search
1

If I understand you correctly the answer is quite simple if you don't force the database to think like a human. Like Jonathan Leffler I understand your intention to "select the most recent exchange rate record for each currency that meets the required other criteria".

"The most recent" is of course the problem, as that information is not explicitly stored in the DB, so we determine this information first.

SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code 

We are going to reuse this so we give the result a name

(SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code) AS dates_we_want 

The domain holding all the information we could possibly want is a record for each possible combination of dates_we_want, the currencies and exchange_rates tables

(SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code) AS dates_we_want, currencies AS c, exchange_rates AS er 

Selection of records we want:

  • matching currency_codes

    dates_we_want.currency_code=er.currency_core AND dates_we_want.currency_code=c.currency_core

  • most recent rates

    dates_we_want.datestamp=er.datestamp

Projection into the result. You

want to retrieve a full record from the exchange_rates table

simply translates to

er.* 

Put everything together in a SELECT-statement and give you the change to put your arbitrary constraints:

SELECT er.* FROM (SELECT currency_code, MAX(datestamp) AS datestamp FROM exchange_rates GROUP BY currency_code ) AS dates_we_want, currencies AS c, exchange_rates AS er WHERE dates_we_want.currency_code=er.currency_core AND dates_we_want.currency_code=c.currency_core AND dates_we_want.datestamp=er.datestamp AND `criteria_x`=NULL AND `criteria_y` LIKE 'A' 

2 Comments

You have to push down the criteria_x and criteria_y conditions into the sub-query. What you're asking is a subtly different question. You are asking for the rows where the most recent row happens to meet the sub-criteria, rather than the most recent row that does meet the sub-criteria. I recommend avoiding the horizontal scroll-bar when possible. Personally, I don't think the keyword AND needs to be on a line on its own. I also strongly recommend using the explicit JOIN notation instead of the comma separated list of table expressions with joins in the WHERE clause.
I think you are arguing to solve the wrong problem. We do not want to relay the question word by word to the DBMS, we want to "retrieve a full record from the exchange_rates table for each currency [satisfying a set of constraints]". And my query does precisely that; on purpose, not accidently or by chance. That's the point of my very first sentece: "don't force the database to think like a human". Because it won't and DBMS are so good at combining, slicing, selecting and interpreting queries before execution, that it virtually doesn't pay off to think about intermediary results.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.