0

I seem to have some problems with defining a split condition for a string. The condition has to be used to map the string to a a dictionary.

In a previous post

I had a similar issue, but the solution given there might work for that specific case, but is not a solid solution.

It would not work for a text like this:

da:,en:H a full-bodied, vinous wine, which attracts wine connoisseurs with its well-balanced and lively bouquet. It combines crisp with a rich taste and long-lasting finish. Wines from one single vintage form the basis of this exceptional wine. som – sparkling since 1856,fr:,nl:,ru: 

As the text within the language tag has commmas, and can't thereafter select :.

Any suggestions to a more solid solution. My intention with this is to map the string such that i can get the information, I am seeking given the language tag.

5
  • I am not sure i understand why you have a dictionary with the all language codes.. How would you use that to split?, how would i parse a list of string as split condition? Commented Jun 27, 2018 at 6:58
  • no.. it will be a two letter acro Commented Jun 27, 2018 at 7:28
  • you need to store the language accro because that what you are looking for. First your string was a simple string with nothing special. Now it has comma had it's an issue for the old solution. How many time for it to have ":"? Will your string never contain somethink like "en:from:UK with the reference Fen:13245", and do you have to handle that ? The issue is you provide only one line exemple with no clear edge case scenario. So I gave you from the top of my head some kind of solution. Commented Jun 27, 2018 at 7:35
  • You were right.. It just caused an error due to an : in the middle of the string.. Commented Jun 27, 2018 at 12:00
  • There is no bullet proof method, thats why we need to define what are the edge case and what we will handle. Especially for free string like this. I will consider using Json, csv or anything that can serialise those data into a bullet proof format. Commented Jun 27, 2018 at 12:14

1 Answer 1

1

I suggest using regular expressions; providing that

  1. Language is two (small) letter word followed by column: ru:, en:
  2. Comma , is a separator: en: bla-bla-bla,ru: bla-bla-bla

You can put

using System.Text.RegularExpressions; ... string source = @"da:,en: H a full - bodied, vinous wine, which attracts wine connoisseurs with its well-balanced and lively bouquet.It combines crisp with a rich taste and long-lasting finish.Wines from one single vintage form the basis of this exceptional wine. som – sparkling since 1856,fr:,nl:,ru:"; Dictionary<string, string> result = Regex .Matches(source, @"(?<lang>[a-z]{2}:)(?<value>.*?)(?=\,[a-z]{2}:|$)") .OfType<Match>() .ToDictionary(match => match.Groups["lang"].Value.TrimEnd(':'), match => match.Groups["value"].Value); Console.WriteLine(string.Join(Environment.NewLine, result .Select(pair => $"language: {pair.Key}; text: {pair.Value}"))); 

Outcome:

language: da; text: language: en; text: H a full - bodied, vinous wine, which attracts wine connoisseurs with its well-balanced and lively bouquet.It combines crisp with a rich taste and long-lasting finish.Wines from one single vintage form the basis of this exceptional wine. som – sparkling since 1856 language: fr; text: language: nl; text: language: ru; text: 
Sign up to request clarification or add additional context in comments.

2 Comments

is the key here en: or en
@famle: the key is en

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.