-
- Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
I think there is a slight issue in getMessagePositions() in /fortuna/mstor/data/MboxFile.java.
An Mbox file is split into managable chunks of DEFAULT_BUFFER_SIZE. For each chunk a search is done for the pattern for the beginning of an email: (\\A|\\n{2}|(\\r\\n){2})^From .*$).
When creating the chunks, the last seven characters (FROM__PREFIX.length() + 2) from each chunk are added to the beginning of the next chunk. This is to avoid losing a From_ match due to splitting through the pattern when creating the chunks. Seven characters cater for any split within \n\nFrom_.
However when \r\n\r\nFrom_ is split between the 'm' and the blank then we are left with \n\r\nFrom_. The pattern \\A|\\n{2}|(\\r\\n){2})^From .*$ doesn't match this and the beginning of the email is skipped. Maybe adding \n\r\nFrom_ to the pattern would fix it? This would also cater for the case where an Mbox file contains mixed newline styles (and possibly \n\r\nFrom_ at the beginning of an email).