I am trying to extract invoice which is surrounded between a boundary keyword in groovy, in the below example boudary key word is a92720f5836d4daaa4251e805cba228b and I tried extracting the invoice between the boundary and elimated the Content-Type line
String BOUNDARY = "boundary" def file = '''MIME-Version: 1.0 Date: Wed, 17 May 2017 20:59:57 +2 Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="SHA256"; boundary="a92720f5836d4daaa4251e805cba228b" --a92720f5836d4daaa4251e805cba228b Content-Type: text/plain; charset=us-ascii LEDES98BI V2[] INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|2|F|2.00|0|700|19990115|L510||A102|22547|Research attorney's fees, Trial pleading|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|112.00|VAT|112.00|[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|3|F|0.200|0|40|19990116|L510||A107|45875|Telephone conference with John Doe|24-6437381|200|Beaster, John|ASSOC|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Beaster|John|O|||||||||||||||.16|6.40|VAT|6.40|[] --a92720f5836d4daaa4251e805cba228b''' def boundaryline = file.split( '\n' ).find{it.contains( 'boundary' ) } def boundary = boundaryline.substring(boundaryline.indexOf(BOUNDARY) + BOUNDARY.length()+1).replaceAll('"','') def invoice = file.split("--"+boundary)[1] // find inovoice between boundary string def lines = invoice.trim().split('\\[]') def headerLine = lines[0].trim().split('\n') //eleminating content type from header line def header = headerLine[headerLine.length-1] lines[0] = header //assigning header to first index println lines I am getting the expected output as below
[LEDES98BI V2, INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY, 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|, 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|2|F|2.00|0|700|19990115|L510||A102|22547|Research attorney's fees, Trial pleading|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|112.00|VAT|112.00|, 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|3|F|0.200|0|40|19990116|L510||A107|45875|Telephone conference with John Doe|24-6437381|200|Beaster, John|ASSOC|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Beaster|John|O|||||||||||||||.16|6.40|VAT|6.40|] My code has so many String manipulations, can it be optimized and refactored to a better version?