38 lines
2.0 KiB
Markdown
38 lines
2.0 KiB
Markdown
# diagram-parser
|
|
|
|
This is an experimental project and is not yet used as part of the OwlBoard stack.
|
|
|
|
## Language
|
|
|
|
It is so-far undecided what language will be used. Documents for parsing are likely to be a few hundred lines long so searching may become processor intensive meaning Go may be a good candidate, however Python offers an array of libraries which coule be helpful.
|
|
|
|
## File formats
|
|
|
|
Diagrams are received in DOCX format, however can be easily be converted to ODT, DOC, or PDF which provides flexibility in the languages and the libraries used in the implementation.
|
|
|
|
## Aims
|
|
|
|
The aim of diagram-parser is to simplify the addition of PIS codes that are not yet in the OwlBoard data source. The planned implementation is as follows:
|
|
|
|
- diagram-parser is subscribed to an email inbox (IMAP/POP3)
|
|
- Formatted train-crew schedule cards are sent to the inbox and loaded by diagram-parser
|
|
- List of existing PIS codes is loaded and a list of non-existent codes is compiled (0000-9999)
|
|
- If a code is found both in the diagram and on the list of non-existent codes, a Gitea issue is opened providing details of the code.
|
|
- Once the program has run and extracted only the relavent details, the email is deleted and the file is closed and not stored.
|
|
- The evantual aim is to avoid any manual searching of the files.
|
|
|
|
The current process of adding new codes involves being made aware of them face to face, or finding them myself and manually finding and adding them to the data source.
|
|
|
|
## Points to Remember
|
|
|
|
- Emails received should be verified.
|
|
- A pre-authorised key in the subject field, any emails not matching the key should be discarded.
|
|
- Attachment formats may vary slightly.
|
|
- The format of the attachment should be checked and any errors handled gracefully.
|
|
- Avoid duplicate issues
|
|
- Issues opened should contain the missing PIS code in their title, this application should check for any open issues containing the missing code to avoid duplicated issues.
|
|
|
|
## Main external dependencies (Expected)
|
|
- imaplib
|
|
- email
|