Go: What is the fastest/cleanest way to remove multiple entries from a slice?

Question

How would you implement the deleteRecords function in the code below:

Example: type Record struct { id int name string } type RecordList []*Record func deleteRecords( l *RecordList, ids []int ) { // Assume the RecordList can contain several 100 entries. // and the number of the of the records to be removed is about 10. // What is the fastest and cleanest ways to remove the records that match // the id specified in the records list. }

rog · Accepted Answer · 2011-02-28 08:25:22Z

I did some micro-benchmarking on my machine, trying out most of the approaches given in the replies here, and this code comes out fastest when you've got up to about 40 elements in the ids list:

func deleteRecords(data []*Record, ids []int) []*Record { w := 0 // write index loop: for _, x := range data { for _, id := range ids { if id == x.id { continue loop } } data[w] = x w++ } return data[:w] }

You didn't say whether it's important to preserve the order of records in the list. If you don't then this function is faster than the above and still fairly clean.

func reorder(data []*Record, ids []int) []*Record { n := len(data) i := 0 loop: for i < n { r := data[i] for _, id := range ids { if id == r.id { data[i] = data[n-1] n-- continue loop } } i++ } return data[0:n] }

As the number of ids rises, so does the cost of the linear search. At around 50 elements, using a map or doing a binary search to look up the id becomes more efficient, as long as you can avoid rebuilding the map (or resorting the list) every time. At several hundred ids, it becomes more efficient to use a map or a binary search even if you have to rebuild it every time.

If you wish to preserve original contents of the slice, something like this is more appropriate:

func deletePreserve(data []*Record, ids []int) []*Record { wdata := make([]*Record, len(data)) w := 0 loop: for _, x := range data { for _, id := range ids { if id == x.id { continue loop } } wdata[w] = x w++ } return wdata[0:w] }

I have done some of my own benchmarking and am confirming that your methods are very fast. I did not find much of a speedup on the reorder function over your first function. It seems that function calls are still slow (at least on windows 8g. (maybe if the compiler starts inlining this will change.)
That's good, I just wonder why Go team don't provide a safe method to remove single/multiple entries from a slice. It's obviously a usual method.
If you need to do this often, consider using the doubly-linked list provided by container/list.

Kyle C · Accepted Answer · 2011-02-16 22:41:10Z

For a personal project, I did something like this:

func filter(sl []int, fn func(int) bool) []int { result := make([]int, 0, len(sl)) last := 0 for i, v := range sl { if fn(v) { result = append(result, sl[last:i]...) last = i + 1 } } return append(result, sl[last:]...) }

It doesn't mutate the original, but should be relatively efficient. It's probably better to just do:

func filter(sl []int, fn func(int) bool) (result []int) { for _, v := range sl { if !fn(v) { result = append(result, v) } } return }

Simpler and cleaner. If you want to do it in-place, you probably want something like:

func filter(sl []int, fn func(int) bool) []int { outi := 0 res := sl for _, v := range sl { if !fn(v) { res[outi] = v outi++ } } return res[0:outi] }

You can optimize this to use copy to copy ranges of elements, but that's twice the code and probably not worth it.

So, in this specific case, I'd probably do something like:

func deleteRecords(l []*Record, ids []int) []*Record { outi := 0 L: for _, v := range l { for _, id := range ids { if v.id == id { continue L } } l[outi] = v outi++ } return l[0:outi] }

(Note: untested.)

No allocations, nothing fancy, and assuming the rough size of the list of Records and the list of ids you presented, a simple linear search is likely to do as well as fancier things but without any overhead. I realize that my version mutates the slice and returns a new slice, but that's not un-idiomatic in Go, and it avoids forcing the slice at the callsite to be heap allocated.

peterSO · Accepted Answer · 2011-02-16 22:04:57Z

For the case you described, where len(ids) is approximately 10 and len(*l) is in the several hundreds, this should be relatively fast, since it minimizes memory allocations by updating in place.

package main import ( "fmt" "strconv" ) type Record struct { id int name string } type RecordList []*Record func deleteRecords(l *RecordList, ids []int) { rl := *l for i := 0; i < len(rl); i++ { rid := rl[i].id for j := 0; j < len(ids); j++ { if rid == ids[j] { copy(rl[i:len(*l)-1], rl[i+1:]) rl[len(rl)-1] = nil rl = rl[:len(rl)-1] break } } } *l = rl } func main() { l := make(RecordList, 777) for i := range l { l[i] = &Record{int(i), "name #" + strconv.Itoa(i)} } ids := []int{0, 1, 2, 4, 8, len(l) - 1, len(l)} fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1]) deleteRecords(&l, ids) fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1]) }

Output:

[0 1 2 4 8 776 777] 777 777 {0 name #0} {1 name #1} {776 name #776} [0 1 2 4 8 776 777] 772 777 {1 name #1} {3 name #3} {775 name #775}

Ye Lin Aung · Accepted Answer · 2015-04-26 04:05:08Z

Instead of repeatedly searching ids, you could use a map. This code preallocates the full size of the map, and then just moves array elements in place. There are no other allocations.

func deleteRecords(l *RecordList, ids []int) { m := make(map[int]bool, len(ids)) for _, id := range ids { m[id] = true } s, x := *l, 0 for _, r := range s { if !m[r.id] { s[x] = r x++ } } *l = s[0:x] }

Matt K · Accepted Answer · 2011-02-16 21:37:36Z

1

Use the vector package's Delete method as a guide, or just use a Vector instead of a slice.

edited Feb 16, 2011 at 21:37

answered Feb 16, 2011 at 21:23

Matt K

14k3 gold badges36 silver badges54 bronze badges

3 Comments

Matt K Over a year ago

Hang on, I realized that I misunderstood the question after seeing your response.

Jeroen Dirks Over a year ago

Interesting. I will wait and see what other solutions show up here and then do some benchmarking to find out if there is a big difference in the solutions.

Matt K Over a year ago

I'll bet it won't matter much since you're only copying pointers instead of the whole struct.

Matt K · Accepted Answer · 2011-02-16 21:12:43Z

0

Here is one option but I would hope there are cleaner/faster more functional looking ones:

func deleteRecords( l *RecordList, ids []int ) *RecordList { var newList RecordList for _, rec := range l { toRemove := false for _, id := range ids { if rec.id == id { toRemove = true } if !toRemove { newList = append(newList, rec) } } return newList }

edited Feb 16, 2011 at 21:12

Matt K

14k3 gold badges36 silver badges54 bronze badges

answered Feb 16, 2011 at 20:09

Jeroen Dirks

7,93913 gold badges57 silver badges71 bronze badges

6 Comments

Jesse Over a year ago

append() could allocate on every iteration of that loop.

Jeroen Dirks Over a year ago

I am assuming append will double capacity if it needs to re-allocate. I could not find it in the documentation though...

Matt K Over a year ago

Why not create newList with make([]RecordList, len(*l))?

peterSO Over a year ago

While append () could allocate on every iteration of the loop, it doesn't. The current implementation over-allocates in the runtime appendslice1 function in file src/pkg/runtime/slice.c.

newacct Over a year ago

return value does not match type

|

snakeru · Accepted Answer · 2011-02-21 15:57:52Z

With large enough l and ids it will be more effective to Sort() both lists first and then do a single loop over them instead of two nested loops

Collectives™ on Stack Overflow

Go: What is the fastest/cleanest way to remove multiple entries from a slice?

7 Answers 7

3 Comments

Comments

Comments

Comments

3 Comments

6 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

Comments

Comments

Comments

3 Comments

6 Comments

Comments

Related