Skip to main content
added 18 characters in body
Source Link
Chris Davies
  • 128.3k
  • 16
  • 179
  • 324

It seems to me that the process you're following at the moment is this, which fails with your out of memory error:

  1. Create several data files
  2. Concatenate them together
  3. Sort the result, discarding duplicate records (rows)

I think you should be able to perform the following process instead

  1. Create several data files
  2. Sort each one independently, discarding its duplicates (sort -u)
  3. Merge the resulting set of sorted data files, discarding duplicates (sort -m -u)

It seems to me that the process you're following at the moment is this, which fails with your out of memory error:

  1. Create several data files
  2. Concatenate them together
  3. Sort the result, discarding duplicate records (rows)

I think you should be able to perform the following process instead

  1. Create several data files
  2. Sort each one, discarding duplicates (sort -u)
  3. Merge the resulting set of data files, discarding duplicates (sort -m -u)

It seems to me that the process you're following at the moment is this, which fails with your out of memory error:

  1. Create several data files
  2. Concatenate them together
  3. Sort the result, discarding duplicate records (rows)

I think you should be able to perform the following process instead

  1. Create several data files
  2. Sort each one independently, discarding its duplicates (sort -u)
  3. Merge the resulting set of sorted data files, discarding duplicates (sort -m -u)
Source Link
Chris Davies
  • 128.3k
  • 16
  • 179
  • 324

It seems to me that the process you're following at the moment is this, which fails with your out of memory error:

  1. Create several data files
  2. Concatenate them together
  3. Sort the result, discarding duplicate records (rows)

I think you should be able to perform the following process instead

  1. Create several data files
  2. Sort each one, discarding duplicates (sort -u)
  3. Merge the resulting set of data files, discarding duplicates (sort -m -u)