I am using DSpace version 6.3 here. I discovered that I have created duplicate entries when performing a batch import. Using the SQL query from this answer, I managed to list all the duplicates in a given field. For this example, I am using the dc.subject field (metadata_field_id=57) to list items (dspace_object_id) that have duplicate values in dc.subject field.
Below is the query that I used:
SELECT metadata_value_id, dspace_object_id, text_value FROM (SELECT *, COUNT(*) OVER (PARTITION BY dspace_object_id, text_value) AS cnt FROM metadatavalue where metadata_field_id=57) e WHERE cnt > 1 Below is the sample list generated from that query:
| metadata_value_id | dspace_object_id | text_value |
|---|---|---|
| 503018 | 13f07109-7797-4d5b-a8bd-1f9e91a2433d | pompanos |
| 503021 | 13f07109-7797-4d5b-a8bd-1f9e91a2433d | pompanos |
| 503217 | 233d1e67-b698-4e90-8e70-a175776b2d80 | pests |
| 503219 | 233d1e67-b698-4e90-8e70-a175776b2d80 | pests |
| 83574 | 47753988-fc2a-4416-b20d-acbff6e256de | Penaeus monodon |
| 10800 | 47753988-fc2a-4416-b20d-acbff6e256de | Penaeus monodon |
| 531520 | 50965923-bc65-4fdf-af61-2c8debdfe057 | Penaeus monodon |
| 531521 | 50965923-bc65-4fdf-af61-2c8debdfe057 | Penaeus monodon |
| 483882 | 57d0544c-1825-431a-acf9-eb835c24920b | development |
| 483879 | 57d0544c-1825-431a-acf9-eb835c24920b | development |
Based on the table above, you can see that there are 2 occurrences (others have more than 2) of a text_value in the same item (dspace_object_id).
My question is how can I delete the duplicate but retain the first occurrence? In the table above, rows with the following metadata_value_id should be deleted:
503021 503219 83574 <-- This should be deleted instead of 10800 because 10800 was put in (inserted) first. 531521 483882 <-- Should be deleted instead of 483879 because of the same reason above. I'm hoping to do this via SQL query alone because what I have done before is import this list to a Google spreadsheet, remove the duplicates there using a Remove Duplicates add-on, download it as a CSV, and then use that CSV to update the metadatavalue table.