Skip to content

Sometimes the beginning of an email is not recognised in getMessagePositions() in /fortuna/mstor/data/MboxFile.java. #27

@jfarwer

Description

@jfarwer

Hi,

I think there is a slight issue in getMessagePositions() in /fortuna/mstor/data/MboxFile.java.

An Mbox file is split into managable chunks of DEFAULT_BUFFER_SIZE. For each chunk a search is done for the pattern for the beginning of an email: (\\A|\\n{2}|(\\r\\n){2})^From .*$).

When creating the chunks, the last seven characters (FROM__PREFIX.length() + 2) from each chunk are added to the beginning of the next chunk. This is to avoid losing a From_ match due to splitting through the pattern when creating the chunks. Seven characters cater for any split within \n\nFrom_.

However when \r\n\r\nFrom_ is split between the 'm' and the blank then we are left with \n\r\nFrom_. The pattern \\A|\\n{2}|(\\r\\n){2})^From .*$ doesn't match this and the beginning of the email is skipped. Maybe adding \n\r\nFrom_ to the pattern would fix it? This would also cater for the case where an Mbox file contains mixed newline styles (and possibly \n\r\nFrom_ at the beginning of an email).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions