1
\$\begingroup\$

I am trying to extract invoice which is surrounded between a boundary keyword in groovy, in the below example boudary key word is a92720f5836d4daaa4251e805cba228b and I tried extracting the invoice between the boundary and elimated the Content-Type line

String BOUNDARY = "boundary" def file = '''MIME-Version: 1.0 Date: Wed, 17 May 2017 20:59:57 +2 Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="SHA256"; boundary="a92720f5836d4daaa4251e805cba228b" --a92720f5836d4daaa4251e805cba228b Content-Type: text/plain; charset=us-ascii LEDES98BI V2[] INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|2|F|2.00|0|700|19990115|L510||A102|22547|Research attorney's fees, Trial pleading|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|112.00|VAT|112.00|[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|3|F|0.200|0|40|19990116|L510||A107|45875|Telephone conference with John Doe|24-6437381|200|Beaster, John|ASSOC|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Beaster|John|O|||||||||||||||.16|6.40|VAT|6.40|[] --a92720f5836d4daaa4251e805cba228b''' def boundaryline = file.split( '\n' ).find{it.contains( 'boundary' ) } def boundary = boundaryline.substring(boundaryline.indexOf(BOUNDARY) + BOUNDARY.length()+1).replaceAll('"','') def invoice = file.split("--"+boundary)[1] // find inovoice between boundary string def lines = invoice.trim().split('\\[]') def headerLine = lines[0].trim().split('\n') //eleminating content type from header line def header = headerLine[headerLine.length-1] lines[0] = header //assigning header to first index println lines 

I am getting the expected output as below

[LEDES98BI V2, INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY, 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|, 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|2|F|2.00|0|700|19990115|L510||A102|22547|Research attorney's fees, Trial pleading|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|112.00|VAT|112.00|, 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|3|F|0.200|0|40|19990116|L510||A107|45875|Telephone conference with John Doe|24-6437381|200|Beaster, John|ASSOC|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Beaster|John|O|||||||||||||||.16|6.40|VAT|6.40|] 

My code has so many String manipulations, can it be optimized and refactored to a better version?

\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Optimization for performance is not the main concern I have with this code. Rather, the problem is that extracting an attachment with ad hoc string manipulation is a fragile hack. This is a common task and a solved problem, for which you should not reinvent the wheel — poorly. A library — namely JavaMail — would take into account the relevant standards and could do the job properly, even if the input varies a bit.

import java.io.ByteArrayInputStream import javax.mail.Multipart import javax.mail.internet.MimeMessage def file = '''MIME-Version: 1.0 Date: Wed, 17 May 2017 20:59:57 +2 Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="SHA256"; boundary="a92720f5836d4daaa4251e805cba228b" --a92720f5836d4daaa4251e805cba228b Content-Type: text/plain; charset=us-ascii LEDES98BI V2[] INVOICE_DATE|INVOICE_NUMBER|CLIENT_ID|LAW_FIRM_MATTER_ID|INVOICE_TOTAL|BILLING_START_DATE|BILLING_END_DATE|INVOICE_DESCRIPTION|LINE_ITEM_NUMBER|EXP/FEE/INV_ADJ_TYPE|LINE_ITEM_NUMBER_OF_UNITS|LINE_ITEM_ADJUSTMENT_AMOUNT|LINE_ITEM_TOTAL|LINE_ITEM_DATE|LINE_ITEM_TASK_CODE|LINE_ITEM_EXPENSE_CODE|LINE_ITEM_ACTIVITY_CODE|TIMEKEEPER_ID|LINE_ITEM_DESCRIPTION|LAW_FIRM_ID|LINE_ITEM_UNIT_COST|TIMEKEEPER_NAME|TIMEKEEPER_CLASSIFICATION|CLIENT_MATTER_ID|PO_NUMBER|CLIENT_TAX_ID|MATTER_NAME|INVOICE_TAX_TOTAL|INVOICE_NET_TOTAL|INVOICE_CURRENCY|TIMEKEEPER_LAST_NAME|TIMEKEEPER_FIRST_NAME|ACCOUNT_TYPE|LAW_FIRM_NAME|LAW_FIRM_ADDRESS_1|LAW_FIRM_ADDRESS_2|LAW_FIRM_CITY|LAW_FIRM_STATEorREGION|LAW_FIRM_POSTCODE|LAW_FIRM_COUNTRY|CLIENT_NAME|CLIENT_ADDRESS_1|CLIENT_ADDRESS_2|CLIENT_CITY|CLIENT_STATEorREGION|CLIENT_POSTCODE|CLIENT_COUNTRY|LINE_ITEM_TAX_RATE|LINE_ITEM_TAX_TOTAL|LINE_ITEM_TAX_TYPE|INVOICE_REPORTED_TAX_TOTAL|INVOICE_TAX_CURRENCY[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|1|F|2.00|-70|630|19990115|L510||A102|22547|Research Attorney's fees, Set off claim|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|22240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|100.80|VAT|100.80|[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|2|F|2.00|0|700|19990115|L510||A102|22547|Research attorney's fees, Trial pleading|24-6437381|350|Arnsley, Robert|PARTNR|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Arnsley|Robert|O|||||||||||||||.16|112.00|VAT|112.00|[] 19990225|96542|00711|0528|1684.45|19990101|19990131|For services rendered|3|F|0.200|0|40|19990116|L510||A107|45875|Telephone conference with John Doe|24-6437381|200|Beaster, John|ASSOC|423-987|77654|76-1235|Merten Merger|694.20|2240.25|GBP|Beaster|John|O|||||||||||||||.16|6.40|VAT|6.40|[] --a92720f5836d4daaa4251e805cba228b ''' def inputStream = new ByteArrayInputStream(file.getBytes("ASCII")) def msg = new MimeMessage(null, inputStream) if (msg.contentType.startsWith("multipart")) { Multipart mp = (Multipart)msg.content println mp.getBodyPart(0).content } 

In addition, the intent of this solution is a lot more obvious than with your slicing and dicing.

(Note that file needs to end with a newline, which should naturally be the case in actual usage.)

\$\endgroup\$
3
  • \$\begingroup\$ No signature of method: javax.mail.internet.MimeMessage.getBodyPart() is applicable for argument types: (java.lang.Integer) values: [0] am getting this exception withmail-1.4.jar \$\endgroup\$ Commented May 24, 2017 at 2:09
  • \$\begingroup\$ Corrected in Rev 2. \$\endgroup\$ Commented May 24, 2017 at 3:10
  • \$\begingroup\$ Thanks, it works in groovy, but am getting compilation error on line mp.getBodyPart(0).getContent() in Java, can you please provide a java version too for ` mp.getBodyPart(0).content` \$\endgroup\$ Commented May 24, 2017 at 3:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.