0

I have a list of maps like this:

[ %{"000000000 000000000000 00000000 ": %{}}, %{AM01: %{"C4" => "11111111", "C5" => "1"}}, %{AM04: %{"C2" => "22222222", "C6" => "2"}} ] 

How can I reduce this list of maps in one map like below?

%{ "000000000 000000000000 00000000 ": %{}, AM01: %{"C4" => "11111111", "C5" => "1"}, AM04: %{"C2" => "22222222", "C6" => "2"} } 

The code that generate this list of maps is this:

for segment <- Enum.filter(String.split(message, ["\x02", "\x1d", "\x1e", "\x03"]), fn x -> x != "" end) do [head | tail] = Enum.filter(String.split(segment, "\x1c"), fn x -> x != "" end) %{String.to_atom(head) => Map.new(tail, &String.split_at(&1, 2))} end 
7
  • Are your messages fixed width and exactly the same structure? If so, post one. Commented Jul 12, 2019 at 19:14
  • yea. in case this message is a message in the NCPDP standard. AM01 or AM04 are segments. the Map contains the information for that segment and the first two letters of each group "\ x1c" identifies the type of information. the only detail is that the first segment is a header, which does not contain a map, but I did not find another way to do that. Commented Jul 12, 2019 at 19:54
  • Can you post a link to the standard? Commented Jul 12, 2019 at 20:16
  • the documentation is ridiculously large, but I can post. in an ideal scenario the final result of the message would be: %{ HEADER: "000000000 000000000000 00000000 ", AM01: %{"C4" => "11111111", "C5" => "1"}, AM04: %{"C2" => "22222222", "C6" => "2"} } Commented Jul 12, 2019 at 20:55
  • Can you just describe the format, e.g. first two bytes is length header, next 4 bytes is... Commented Jul 12, 2019 at 20:59

5 Answers 5

2

Using String.split/2 for the task like this is an extremely ineffective, inelegant and non-erlangish approach. Erlang (and hence Elixir), being Telecom children are incredibly great in solving these tasks in particular.

They are all to be solved recursively parsing data, pattern-matching on markers.

Since you did not post an example of real input, I cannot come up with a working example, but the approach should be like:

defmodule Parse do @input "\x1cHHheader\x1cAAaa segment\x1cBBbbsegment" def parse("", {{typ, txt}, map}), do: Map.put(map, typ, txt) def parse(<<"\x1c", type :: binary-size(2), rest :: binary>>, {{typ, txt}, map}), do: parse(rest, {{type, ""}, Map.put(map, typ, txt)}) def parse(<<c :: binary-size(1), rest :: binary>>, {{typ, txt}, map}), do: parse(rest, {{typ, txt <> c}, map}) def test(input \\ @input), do: parse(input, {{nil, ""}, %{}}) end 

And use it like:

Parse.test #⇒ %{"AA" => "aa segment", "BB" => "bbsegment", "HH" => "header"} 

Of course, real code would be more complicated, you need to pattern match many different clauses, but I bet the idea is clear.

NB I did not test this code, but it should work out of the box.

Please note, that approach has another advantage over String.split/2—it is able to work with infinite streams.

Sign up to request clarification or add additional context in comments.

Comments

1

in an ideal scenario the final result of the message would be:

%{ HEADER: "000000000 000000000000 00000000 ", AM01: %{"C4" => "11111111", "C5" => "1"}, AM04: %{"C2" => "22222222", "C6" => "2"} } 

Here you go:

message = "\x02\x1d0000 0000 \x1dAM01\x1cC41111\x1c\x1c\x1cC51\x1eAM04\x1cC22222\x1cC62\x1e\x03" [header|segments] = String.split(message, ["\x02", "\x1d", "\x1e", "\x03"], trim: true) for segment <- segments, into: %{HEADER: header} do [head|tail] = String.split(segment, "\x1c", trim: true) { String.to_atom(head), Map.new(tail, &String.split_at(&1, 2)) } end 

output:

%{ AM01: %{"C4" => "1111", "C5" => "1"}, AM04: %{"C2" => "2222", "C6" => "2"}, HEADER: "0000 0000 " } 

By the way, that Map.new() bit is trick.

Comments

1

You can do it directly like this:

message = "\x02\x1d0000 0000 \x1dAM01\x1cC41111\x1c\x1c\x1cC51\x1eAM04\x1cC22222\x1cC62\x1e\x03" for segment <- String.split(message, ["\x02", "\x1d", "\x1e", "\x03"], trim: true), into: %{} do [head|tail] = String.split(segment, "\x1c", trim: true) { String.to_atom(head), Map.new(tail, &String.split_at(&1, 2)) } end 

output:

%{ "0000 0000 ": %{}, AM01: %{"C4" => "1111", "C5" => "1"}, AM04: %{"C2" => "2222", "C6" => "2"} } 

Comments

0
for map <- maps, into: %{} do [key] = Map.keys(map) {key, map[key]} end => %{ "000000000 000000000000 00000000 ": %{}, AM01: %{"C4" => "11111111", "C5" => "1"}, AM04: %{"C2" => "22222222", "C6" => "2"} } 

Comments

0

Using String.split/2 for the task like this is an extremely ineffective, inelegant and non-erlangish approach.

Elixir's String.split() calls Erlang's binary:split():

-module(my). -compile([export_all]). go() -> Input = <<"\x1cHHheader\x1cAAaa segment\x1cBBbbsegment">>, binary:split(Input, <<"\x1c">>, [global, trim_all]). 

In the shell:

7> c(my). my.erl:2: Warning: export_all flag enabled - all functions will be exported {ok,my} 8> my:go(). [<<"HHheader">>,<<"AAaa segment">>,<<"BBbbsegment">>] 

And, I would deem binary:split() not only effective, but a brief one liner is more elegant than a fairly confusing multi-clause function definition, and a brief one liner is easier to maintain, and most definitely good erlang.

2 Comments

Besides it is very error-prone (there are chances for silent false-negatives instead of blowing up,) the approach with String.split/2 requires two walks through an input and creates an intermediate redundant array for no reason. I am not sure what exactly confuses you in multi-clause functions, but pattern-match on head of the string also supports stream processing, meaning it does not require the whole string to be loaded into memory. In some cases it’d be a show-stopper for String.split/2 at all.
It might be not clear from the example with a single control sequence, but when there are dozens, maybe cross-dependent on previous ones, adding a new control sequence to multi-clause is as simple as adding a new clause. Here the whole code is to be refactored.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.