I'm new in pyspark.
I want to do some column transforms.
My dataframe:
import pandas as pd df = pd.DataFrame([[10, 8, 9], [ 3, 5, 4], [ 1, 3, 9], [ 1, 5, 3], [ 2, 8, 10], [ 8, 7, 9]],columns=list('ABC')) df:
A B C 0 10 8 9 1 3 5 4 2 1 3 9 3 1 5 3 4 2 8 10 5 8 7 9 In df, each row is a triangulation, columns 'ABC' are the vertex index of the triangulations.
I want to get the dataframe of all the triangles' edges.
Under conditions:
- For each edge, always lesser vertex index first.
- Remove duplicate edges.
- Edge
[8, 9]and edge[9, 8]are seen as same edge, only remain[8,9]. (always lesser vertex index first)
My desire dataframe edge_df:
1 3 1 5 1 9 2 8 2 10 3 4 3 5 3 9 4 5 7 8 7 9 8 9 8 10 9 10 I try to join 'AB', 'AC', 'BA', 'BC', 'CA', 'CB', then distinct(), and drop() the lesser vertex index on the right column.
Is there any way more effective?