1

SITUATION

I am working on a travel engine site and am writing a complex query to match visitors' search queries with their bookings based on IP Address, Destination and Date so I can work out the conversion ratio later.

PROBLEM

There needs to be multiple conversion ratios based on a parameter (In this case, the utm_source which I extract from a RequestUrl stored in the searches table). The problem is some users make multiple searches from different locations.. sometimes we get the utm_source in the request and sometimes not... and of course we need to match to only 1 booking. See screenshot of query result below to better understand:

enter image description here

See the 3rd and 4th rows have the same booking ID, etc.. but different values for the Value column. I need to select only 1 of these, but not both. Basically, if there is more than 1, I need to choose the 1 that is not "N/A".

MY QUERY:

SELECT DISTINCT "B"."Id" AS "BookingId", "PQ"."IPAddress", "PQ"."To", "PQ"."SearchDate", "PQ"."Value" FROM ( SELECT DISTINCT "IPAddress", "To", "CreatedAt"::date AS "SearchDate", COALESCE(SUBSTRING("RequestUrl", 'utm_source=([^&]*)'), 'N/A') AS "Value" FROM dbo."PackageQueries" WHERE "SiteId" = '<The ID>' AND "CreatedAt" >= '<Start Date>' AND "CreatedAt" < '<End Date>' ) AS "PQ" INNER JOIN dbo."Bookings" AS "B" ON "PQ"."IPAddress" = "B"."IPAddress" AND "B"."To" = "PQ"."To" AND "B"."BookingDate"::date = "PQ"."SearchDate" WHERE "B"."SiteId" = '<The ID>' AND "B"."BookingStatus" = 2 AND "B"."BookingDate" >= '<Start Date>' AND "B"."BookingDate" < '<End Date>' ORDER BY "B"."Id", "PQ"."IPAddress", "PQ"."To"; 
4
  • 1
    stackoverflow.com/questions/tagged/… Commented Feb 9, 2017 at 6:54
  • @a_horse_with_no_name, Thank you for the link.. and not so much for the downvote. :-D . This is a slightly more complex situation than those. For one, I can't just order by some readily available integer or date/time value and as such I don't think it was worthy of a down-vote.. but so be it. I did find a solution and I will be posting my own answer in a moment... Commented Feb 9, 2017 at 8:48
  • I didn't downvote Commented Feb 9, 2017 at 9:14
  • @a_horse_with_no_name, My apologies.. I assumed incorrectly. Commented Feb 9, 2017 at 9:47

1 Answer 1

1

I found a solution and based it on what I found here: Return rows that are max of one column in Postgresql and here: Postgres CASE in ORDER BY using an alias

My solution is as follows:

SELECT "BookingId", "IPAddress", "To", "SearchDate", "Value" FROM ( SELECT DISTINCT "B"."Id" AS "BookingId", "PQ"."IPAddress", "PQ"."To", "PQ"."SearchDate", "PQ"."Value", RANK() OVER ( PARTITION BY "B"."Id" ORDER BY CASE WHEN "PQ"."Value" = 'N/A' THEN 1 ELSE 0 END ) AS "RowNumber" FROM ( SELECT DISTINCT "IPAddress", "To", "CreatedAt"::date AS "SearchDate", COALESCE(SUBSTRING("RequestUrl", 'utm_source=([^&]*)'), 'N/A') AS "Value" FROM dbo."PackageQueries" WHERE "SiteId" = '<Site ID>' AND "CreatedAt" >= '<Start Date>' AND "CreatedAt" < '<End Date>' ) AS "PQ" INNER JOIN dbo."Bookings" AS "B" ON "PQ"."IPAddress" = "B"."IPAddress" AND "B"."To" = "PQ"."To" AND "B"."BookingDate"::date = "PQ"."SearchDate" WHERE "B"."SiteId" = '<Site ID>' AND "B"."BookingStatus" = 2 AND "B"."BookingDate" >= '<Start Date>' AND "B"."BookingDate" < '<End Date>' ) T WHERE "RowNumber" = 1 ORDER BY "BookingId", "IPAddress", "To"; 

Somewhat long-winded, but it does the trick nicely. I hope it helps others.

EDIT

This wasn't the end of the story: there were still some cases in which I was getting more than 1 value. The answer was to modify the CASE statement to generate a unique number for each text value. The solution can be found here: PostgreSQL - Assign integer value to string in case statement

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.