I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:
import pandas as pd df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"]) >>> df meta1 meta2 name data a g n1 y1 a g n2 y2 b h n1 y3 b h n2 y4 where I have the names of the new columns I would like in name and the respective data in data.
I would like to produce a dataframe of the form:
df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"]) >>> df meta1 meta2 n1 n2 a g y1 y2 b h y3 y4 The columns called meta are around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in meta at the moment and I would like to produce the more compact dataframe presented.
I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?
PS - the original index values are unimportant for my purposes.
Any help would be much appreciated.
Question I think is related:
I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.