It is so-far undecided what language will be used. Documents for parsing are likely to be a few hundred lines long so searching may become processor intensive meaning Go may be a good candidate, however Python offers an array of libraries which coule be helpful.
## File formats
Diagrams are received in DOCX format, however can be easily be converted to ODT, DOC, or PDF which provides flexibility in the languages and the libraries used in the implementation.
The aim of diagram-parser is to simplify the addition of PIS codes that are not yet in the OwlBoard data source. The planned implementation is as follows:
- diagram-parser is subscribed to an email inbox (IMAP/POP3)
The current process of adding new codes involves being made aware of them face to face, or finding them myself and manually finding and adding them to the data source.
## Points to Remember
- Emails received should be verified.
- A pre-authorised key in the subject field, any emails not matching the key should be discarded.
- Attachment formats may vary slightly.
- The format of the attachment should be checked and any errors handled gracefully.
- Issues opened should contain the missing PIS code in their title, this application should check for any open issues containing the missing code to avoid duplicated issues.