I am writing one regex to find rows which matches the Unicode char in text file
!Regex.IsMatch(colCount.line, @"^"[\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]"+$") below is the full code which I have written
var _fileName = @"C:\text.txt"; BadLinesLst = File .ReadLines(_fileName, Encoding.UTF8) .Select((line, index) => { var count = line.Count(c => Delimiter == c) + 1; if (NumberOfColumns < 0) NumberOfColumns = count; return new { line = line, count = count, index = index }; }) .Where(colCount => colCount.count != NumberOfColumns || (Regex.IsMatch(colCount.line, @"[^\p{IsBasicLatin}\p{IsLatinExtended-A}\p{IsLatinExtended-B}]"))) .Select(colCount => colCount.line).ToList(); File contains below rows
264162-03,66,JITK,2007,12,874.000 ,0.000 ,0.000
6420œ50-00,67,JITK,2007,12,2292.000 ,0.000 ,0.000
4804¥75-00,67,JITK,2007,12,1810.000 ,0.000 ,0.000
If file of row contains any other char apart from BasicLatin or LatinExtended-A or LatinExtended-B then I need to get those rows. The above Regex is not working properly, this is showing those rows as well which contains LatinExtended-A or B
NumberOfColumnsandDelimiter?480Œ475-00,67,JITK,2007,12,1810.000 ,0.000 ,0.000,фыв,ыыыыandaaalines, and got the result:фыв,ыыыыandaaa. Isn't it expected?trueto theStreamReader, if not, you should always be aware that your default code page will be used withEncoding.Default.