I have 2 tables:
Types Data +----+----------+ +-------+-------+ | id | name | | id | type | +----+----------+ +-------+-------+ | 1 | name1 | | 1 | 1 | | 2 | name2 | | 2 | 5 | | 3 | name3 | | 3 | 7 | | 4 | name4 | | 4 | 4 | | 5 | name5 | | 5 | 2 | | 6 | name6 | | 6 | 6 | | 7 | name7 | | 7 | 3 | | .. | .. | | 8 | 5 | +----+----------+ | 9 | 5 | | 10 | 4 | | 11 | 1 | | 12 | 2 | | 13 | 6 | | 14 | 5 | | 15 | 2 | | ... | ... | | 1...? | 1...? | +-------+-------+ Data table is very large, it contains millions of rows I need to select 1000 rows, but the result has to be from whole table, so every nth row select. I'v done this using answer from How to select every nth row in mySQL starting at n but, I need add some more logic to it, I need a select query that would select every nth row of all the types. I guess this sound complicated so I'll try to describe what I would like to achieve:
Lets say there are 7 Types and Data table has 7M rows 0.5M rows for types 1,2,3, 1.5M rows for types 4,5,6,7 (just be clear intervals may now be equal for all the types).
I need 1000 records that contains equal amounts of types so if I 7 types each type can occur in result set ROUND(1000/7) which would be equal to 142 records per type so I need to select 142 per type from Data table;
For types 1,2,3 which contains 0.5M rows that would be ROUND(0.5M / 142) which equals every nth 3521 row; For types 4,5,6,7 which contains 1.5M rows that would be ROUND(1.5M / 142) which equals every nth 10563 row;
So result would look something like this:
Result +-------+------+ | id | type | +-------+------+ | 1 | 1 | | 3522 | 1 | | 7043 | 1 | | .. | .. | | .. | 2 | | .. | 2 | | .. | .. | | .. | 3 | | .. | 3 | | .. | .. | | .. | 4 | | .. | 4 | | .. | .. | | .. | 5 | | .. | 5 | | .. | .. | | .. | 6 | | .. | 6 | | .. | .. | | .. | 7 | | .. | 7 | | .. | .. | +-------+------+ I could do this simply in any programming language with multiple queries that return each type's count from Data table, then after doing the maths selecting only single type at the time.
But I would like to do this purely in MySQL, using as less queries as possible.
EDIT
I'll try to explain in more detail what I wan't to achieve with real example.
I have table with 1437823 rows. Table schema looks like this:
+---------+----------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+----------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | type | int(11) | NO | | NULL | | | counter | int(11) | NO | | NULL | | | time | datetime | NO | | NULL | | +---------+----------+------+-----+---------+----------------+ That table type statistics is:
+------+-----------+ | Type | Row Count | +------+-----------+ | 1 | 135160 | | 2 | 291416 | | 3 | 149863 | | 4 | 296293 | | 5 | 273459 | | 6 | 275929 | | 7 | 15703 | +------+-----------+ (P.S. Types count can change in time.)
Let's say I need to select sample data from time interval, In first version of question I omitted time because I thought of it as insignificant but now I think it might have some significance when ordering to improve performance.
So anyway I need to select approximately 1000 rows sample in which there's equal chunk of data for each type, so the statistic of end result would look like this: I am selecting 1000 rows with 7 types so ROUND(1000 / 7) = 143 rows per type;
+------+-----------+ | Type | Row Count | +------+-----------+ | 1 | 143 | | 2 | 143 | | 3 | 143 | | 4 | 143 | | 5 | 143 | | 6 | 143 | | 7 | 143 | +------+-----------+ So now I need to select 143 rows for each type in equal gaps in time interval. So for single type it would look something like this:
SET @start_date := '2014-04-06 22:20:21'; SET @end_date := '2015-02-20 16:20:58'; SET @nth := ROUND( (SELECT COUNT(*) FROM data WHERE type = 1 AND time BETWEEN @start_date AND @end_date) / ROUND(1000 / (SELECT COUNT(*) FROM types)) ); SELECT r.* FROM (SELECT * FROM data WHERE type = 1 AND time BETWEEN @start_date AND @end_date) r CROSS JOIN ( SELECT @i := 0 ) s HAVING ( @i := @i + 1) MOD @nth = 1 Statistics:
+------+-----------+ | Type | Row Count | +------+-----------+ | 1 | 144 | +------+-----------+ This query would give me needed results with tolerable performance, but I would need a query for each type which would decrease performance and would require later to concatenate results into single data set since that's what I need for further processing, so I would like to do it in single query or at least get single result set.
P.S. I can tolerate row count deviation in result set as long as type chunks are equal.