Skip to content

Commit a006e88

Browse files
author
Varun Gupta
committed
MDEV-11563: GROUP_CONCAT(DISTINCT ...) may produce a non-distinct list
Backported from MYSQL Bug #25331425: DISTINCT CLAUSE DOES NOT WORK IN GROUP_CONCAT Issue: ------ The problem occurs when: 1) GROUP_CONCAT (DISTINCT ....) is used in the query. 2) Data size greater than value of system variable: tmp_table_size. The result would contain values that are non-unique. Root cause: ----------- An in-memory structure is used to filter out non-unique values. When the data size exceeds tmp_table_size, the overflow is written to disk as a separate file. The expectation here is that when all such files are merged, the full set of unique values can be obtained. But the Item_func_group_concat::add function is in a bit of hurry. Even as it is adding values to the tree, it wants to decide if a value is unique and write it to the result buffer. This works fine if the configured maximum size is greater than the size of the data. But since tmp_table_size is set to a low value, the size of the tree is smaller and hence requires the creation of multiple copies on disk. Item_func_group_concat currently has no mechanism to merge all the copies on disk and then generate the result. This results in duplicate values. Solution: --------- In case of the DISTINCT clause, don't write to the result buffer immediately. Do the merge and only then put the unique values in the result buffer. This has be done in Item_func_group_concat::val_str. Note regarding result file changes: ----------------------------------- Earlier when a unique value was seen in Item_func_group_concat::add, it was dumped to the output. So result is in the order stored in SE. But with this fix, we wait until all the data is read and the final set of unique values are written to output buffer. So the data appears in the sorted order. This only fixes the cases when we have DISTINCT without ORDER BY clause in GROUP_CONCAT.
1 parent fd1755e commit a006e88

File tree

4 files changed

+65
-38
lines changed

4 files changed

+65
-38
lines changed

mysql-test/main/func_gconcat.result

Lines changed: 31 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -363,8 +363,8 @@ bb,ccc,a,bb,ccc
363363
BB,CCC,A,BB,CCC
364364
select group_concat(distinct b) from t1 group by a;
365365
group_concat(distinct b)
366-
bb,ccc,a
367-
BB,CCC,A
366+
a,bb,ccc
367+
A,BB,CCC
368368
select group_concat(b order by b) from t1 group by a;
369369
group_concat(b order by b)
370370
a,bb,bb,ccc,ccc
@@ -383,11 +383,11 @@ Warning 1260 Row 2 was cut by GROUP_CONCAT()
383383
Warning 1260 Row 4 was cut by GROUP_CONCAT()
384384
select group_concat(distinct b) from t1 group by a;
385385
group_concat(distinct b)
386-
bb,c
387-
BB,C
386+
a,bb
387+
A,BB
388388
Warnings:
389-
Warning 1260 Row 2 was cut by GROUP_CONCAT()
390-
Warning 1260 Row 4 was cut by GROUP_CONCAT()
389+
Warning 1260 Row 3 was cut by GROUP_CONCAT()
390+
Warning 1260 Row 6 was cut by GROUP_CONCAT()
391391
select group_concat(b order by b) from t1 group by a;
392392
group_concat(b order by b)
393393
a,bb
@@ -413,8 +413,8 @@ bb,ccc,a,bb,ccc,1111111111111111111111111111111111111111111111111111111111111111
413413
BB,CCC,A,BB,CCC,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
414414
select group_concat(distinct b) from t1 group by a;
415415
group_concat(distinct b)
416-
bb,ccc,a,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
417-
BB,CCC,A,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
416+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,a,bb,ccc
417+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,A,BB,CCC
418418
select group_concat(b order by b) from t1 group by a;
419419
group_concat(b order by b)
420420
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,a,bb,bb,ccc,ccc
@@ -433,11 +433,11 @@ Warning 1260 Row 7 was cut by GROUP_CONCAT()
433433
Warning 1260 Row 14 was cut by GROUP_CONCAT()
434434
select group_concat(distinct b) from t1 group by a;
435435
group_concat(distinct b)
436-
bb,ccc,a,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
437-
BB,CCC,A,1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111112,00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
436+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
437+
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
438438
Warnings:
439-
Warning 1260 Row 5 was cut by GROUP_CONCAT()
440-
Warning 1260 Row 10 was cut by GROUP_CONCAT()
439+
Warning 1260 Row 2 was cut by GROUP_CONCAT()
440+
Warning 1260 Row 4 was cut by GROUP_CONCAT()
441441
select group_concat(b order by b) from t1 group by a;
442442
group_concat(b order by b)
443443
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001,11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
@@ -520,9 +520,9 @@ a group_concat(b)
520520
NULL 3,4,2,1,2,7,3,3
521521
select a, group_concat(distinct b) from t1 group by a with rollup;
522522
a group_concat(distinct b)
523-
13,4,2,1
524-
27,3
525-
NULL3,4,2,1,7
523+
11,2,3,4
524+
23,7
525+
NULL1,2,3,4,7
526526
select a, group_concat(b order by b) from t1 group by a with rollup;
527527
a group_concat(b order by b)
528528
1 1,2,2,3,4
@@ -745,10 +745,10 @@ CREATE TABLE t1(a TEXT, b CHAR(20));
745745
INSERT INTO t1 VALUES ("one.1","one.1"),("two.2","two.2"),("one.3","one.3");
746746
SELECT GROUP_CONCAT(DISTINCT UCASE(a)) FROM t1;
747747
GROUP_CONCAT(DISTINCT UCASE(a))
748-
ONE.1,TWO.2,ONE.3
748+
ONE.1,ONE.3,TWO.2
749749
SELECT GROUP_CONCAT(DISTINCT UCASE(b)) FROM t1;
750750
GROUP_CONCAT(DISTINCT UCASE(b))
751-
ONE.1,TWO.2,ONE.3
751+
ONE.1,ONE.3,TWO.2
752752
DROP TABLE t1;
753753
CREATE TABLE t1( a VARCHAR( 10 ), b INT );
754754
INSERT INTO t1 VALUES ( repeat( 'a', 10 ), 1),
@@ -847,7 +847,7 @@ create table t1(a bit(2) not null);
847847
insert into t1 values (1), (0), (0), (3), (1);
848848
select group_concat(distinct a) from t1;
849849
group_concat(distinct a)
850-
1,0,3
850+
0,1,3
851851
select group_concat(distinct a order by a) from t1;
852852
group_concat(distinct a order by a)
853853
0,1,3
@@ -860,13 +860,13 @@ insert into t1 values (1, 'a', 0), (0, 'b', 1), (0, 'c', 0), (3, 'd', 1),
860860
(1, 'e', 1), (3, 'f', 1), (0, 'g', 1);
861861
select group_concat(distinct a, c) from t1;
862862
group_concat(distinct a, c)
863-
10,01,00,31,11
863+
00,01,10,11,31
864864
select group_concat(distinct a, c order by a) from t1;
865865
group_concat(distinct a, c order by a)
866866
00,01,11,10,31
867867
select group_concat(distinct a, c) from t1;
868868
group_concat(distinct a, c)
869-
10,01,00,31,11
869+
00,01,10,11,31
870870
select group_concat(distinct a, c order by a, c) from t1;
871871
group_concat(distinct a, c order by a, c)
872872
00,01,10,11,31
@@ -1333,8 +1333,8 @@ select grp,group_concat(c limit 5.5...' at line 1
13331333
select grp,group_concat(distinct c limit 1,10 ) from t1 group by grp;
13341334
grp group_concat(distinct c limit 1,10 )
13351335
1 c
1336-
2b
1337-
3C,D
1336+
2c
1337+
3D,E
13381338
select grp,group_concat(c order by a) from t1 group by grp;
13391339
grp group_concat(c order by a)
13401340
1 b,c
@@ -1370,6 +1370,15 @@ grp group_concat(c order by c desc limit 2)
13701370
1 c,b
13711371
2 c,b
13721372
3 E,E
1373+
#
1374+
# Empty results for group concat as offset is greater than the rows
1375+
# for a group
1376+
#
1377+
select grp,group_concat(distinct c limit 10,1 ) from t1 group by grp;
1378+
grp group_concat(distinct c limit 10,1 )
1379+
1
1380+
2
1381+
3
13731382
drop table t1;
13741383
create table t2 (a int, b varchar(10));
13751384
insert into t2 values(1,'a'),(1,'b'),(NULL,'c'),(2,'x'),(2,'y');

mysql-test/main/func_gconcat.test

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -986,6 +986,13 @@ select grp,group_concat(c order by c limit 2) from t1 group by grp;
986986
select grp,group_concat(c order by c desc) from t1 group by grp;
987987
select grp,group_concat(c order by c desc limit 2) from t1 group by grp;
988988

989+
--echo #
990+
--echo # Empty results for group concat as offset is greater than the rows
991+
--echo # for a group
992+
--echo #
993+
994+
select grp,group_concat(distinct c limit 10,1 ) from t1 group by grp;
995+
989996
drop table t1;
990997

991998
create table t2 (a int, b varchar(10));

sql/item_sum.cc

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3632,23 +3632,26 @@ int dump_leaf_key(void* key_arg, element_count count __attribute__((unused)),
36323632
ulonglong *offset_limit= &item->copy_offset_limit;
36333633
ulonglong *row_limit = &item->copy_row_limit;
36343634
if (item->limit_clause && !(*row_limit))
3635+
{
3636+
item->result_finalized= true;
36353637
return 1;
3636-
3637-
if (item->no_appended)
3638-
item->no_appended= FALSE;
3639-
else
3640-
result->append(*item->separator);
3638+
}
36413639

36423640
tmp.length(0);
36433641

36443642
if (item->limit_clause && (*offset_limit))
36453643
{
36463644
item->row_count++;
3647-
item->no_appended= TRUE;
36483645
(*offset_limit)--;
36493646
return 0;
36503647
}
36513648

3649+
if (!item->result_finalized)
3650+
item->result_finalized= true;
3651+
else
3652+
result->append(*item->separator);
3653+
3654+
36523655
for (; arg < arg_end; arg++)
36533656
{
36543657
String *res;
@@ -3904,7 +3907,7 @@ void Item_func_group_concat::clear()
39043907
result.copy();
39053908
null_value= TRUE;
39063909
warning_for_row= FALSE;
3907-
no_appended= TRUE;
3910+
result_finalized= false;
39083911
if (offset_limit)
39093912
copy_offset_limit= offset_limit->val_int();
39103913
if (row_limit)
@@ -4040,12 +4043,10 @@ bool Item_func_group_concat::add(bool exclude_nulls)
40404043
tree_len+= row_str_len;
40414044
}
40424045
/*
4043-
If the row is not a duplicate (el->count == 1)
4044-
we can dump the row here in case of GROUP_CONCAT(DISTINCT...)
4045-
instead of doing tree traverse later.
4046+
In case of GROUP_CONCAT with DISTINCT or ORDER BY (or both) don't dump the
4047+
row to the output buffer here. That will be done in val_str.
40464048
*/
4047-
if (row_eligible && !warning_for_row &&
4048-
(!tree || (el->count == 1 && distinct && !arg_count_order)))
4049+
if (row_eligible && !warning_for_row && (!tree && !distinct))
40494050
dump_leaf_key(table->record[0] + table->s->null_bytes, 1, this);
40504051

40514052
return 0;
@@ -4278,9 +4279,18 @@ String* Item_func_group_concat::val_str(String* str)
42784279
DBUG_ASSERT(fixed == 1);
42794280
if (null_value)
42804281
return 0;
4281-
if (no_appended && tree)
4282-
/* Tree is used for sorting as in ORDER BY */
4283-
tree_walk(tree, &dump_leaf_key, this, left_root_right);
4282+
4283+
if (!result_finalized) // Result yet to be written.
4284+
{
4285+
if (tree != NULL) // order by
4286+
tree_walk(tree, &dump_leaf_key, this, left_root_right);
4287+
else if (distinct) // distinct (and no order by).
4288+
unique_filter->walk(table, &dump_leaf_key, this);
4289+
else if (row_limit && copy_row_limit == (ulonglong)row_limit->val_int())
4290+
return &result;
4291+
else
4292+
DBUG_ASSERT(false); // Can't happen
4293+
}
42844294

42854295
if (table && table->blob_storage &&
42864296
table->blob_storage->is_truncated_value())

sql/item_sum.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1887,7 +1887,8 @@ class Item_func_group_concat : public Item_sum
18871887
bool warning_for_row;
18881888
bool always_null;
18891889
bool force_copy_fields;
1890-
bool no_appended;
1890+
/** True if entire result of GROUP_CONCAT has been written to output buffer. */
1891+
bool result_finalized;
18911892
/** Limits the rows in the result */
18921893
Item *row_limit;
18931894
/** Skips a particular number of rows in from the result*/

0 commit comments

Comments
 (0)