complex SQL string parsing

Question

I have the following text field in SQL Server table:

1!1,3!0,23!0,288!0,340!0,521!0,24!0,38!0,26!0,27!0,281!0,19!0,470!0,568!0,601!0,2!1,251!0,7!2,140!0,285!0,11!2,33!0

Would like to retrieve only the part before the exclamation mark (!). So for 1!1 I only want 1, for 3!0 I only want 3, for 23!0 I only want 23.
Would also like to retrieve only the part after the exclamation mark (!). So for 1!1 I only want 1, for 3!0 I only want 0, for 23!0 I only want 0.

Both point 1 and point 2 should be inserted into separate columns of a SQL Server table.

You shouldn't be storing delimited values in a single column in the first place. — user330315
– user330315, Commented Jan 10, 2013 at 15:14
Is that entire string a single record, or is 1!1 a record, 3!0 another record, and so on? — EmmyS
– EmmyS, Commented Jan 10, 2013 at 15:15
I have a question: Do people also use the wrong end of the hammer to hit the nails and wonder why it is inefficient? Or is it just the DB topic that brings out this phenomenon? — ppeterka
– ppeterka, Commented Jan 10, 2013 at 15:17

BStateham · Accepted Answer · 2013-01-10 15:46:48Z

I LOVE SQL Server's XML capabilities. It is a great way to parse data. Try this one out:

--Load the original string DECLARE @string nvarchar(max) = '1!2,3!4,5!6,7!8,9!10'; --Turn it into XML SET @string = REPLACE(@string,',','</SecondNumber></Pair><Pair><FirstNumber>') + '</SecondNumber></Pair>'; SET @string = '<Pair><FirstNumber>' + REPLACE(@string,'!','</FirstNumber><SecondNumber>'); --Show the new version of the string SELECT @string AS XmlIfiedString; --Load it into an XML variable DECLARE @xml XML = @string; --Now, First and Second Number from each pair... SELECT Pairs.Pair.value('FirstNumber[1]','nvarchar(1024)') AS FirstNumber, Pairs.Pair.value('SecondNumber[1]','nvarchar(1024)') AS SecondNumber FROM @xml.nodes('//*:Pair') Pairs(Pair);

The above query turned the string into XML like this:

<Pair><FirstNumber>1</FirstNumber><SecondNumber>2</SecondNumber></Pair> ...

Then parsed it to return a result like:

FirstNumber | SecondNumber ----------- | ------------ 1 | 2 3 | 4 5 | 6 7 | 8 9 | 10

MarkD · Accepted Answer · 2013-01-10 15:36:41Z

I completely agree with the guys complaining about this sort of data. The fact however, is that we often don't have any control of the format of our sources.

Here's my approach...

First you need a tokeniser. This one is very efficient (probably the fastest non-CLR). Found at http://www.sqlservercentral.com/articles/Tally+Table/72993/

CREATE FUNCTION [dbo].[DelimitedSplit8K] --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE! RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter ), cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ; GO

Then you consume it like so...

DECLARE @Wtf VARCHAR(1000) = '1!1,3!0,23!0,288!0,340!0,521!0,24!0,38!0,26!0,27!0,281!0,19!0,470!0,568!0,601!0,2!1,251!0,7!2,140!0,285!0,11!2,33!0' SELECT LEFT(Item, CHARINDEX('!', Item)-1) ,RIGHT(Item, CHARINDEX('!', REVERSE(Item))-1) FROM [dbo].[DelimitedSplit8K](@Wtf, ',')

The function posted and logic for parsing can be integrated in to a single function of course.

EricZ · Accepted Answer · 2013-01-10 15:42:34Z

I agree to normaliz the data is the best way. However, here is the XML solution to parse the data

DECLARE @str VARCHAR(1000) = '1!1,3!0,23!0,288!0,340!0,521!0,24!0,38!0,26!0,27!0,281!0,19!0,470!0,568!0,601!0,2!1,251!0,7!2,140!0,285!0,11!2,33!0' ,@xml XML SET @xml = CAST('<row><col>' + REPLACE(REPLACE(@str,'!','</col><col>'),',','</col></row><row><col>') + '</col></row>' AS XML) SELECT line.col.value('col[1]', 'varchar(1000)') AS col1 ,line.col.value('col[2]', 'varchar(1000)') AS col2 FROM @xml.nodes('/row') AS line(col)

Collectives™ on Stack Overflow

complex SQL string parsing

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related