diff --git a/README.md b/README.md index e42ec93..fd63790 100644 --- a/README.md +++ b/README.md @@ -2,16 +2,24 @@ This is an experimental project and is not yet used as part of the OwlBoard stack. +## Language + +It is so-far undecided what language will be used. Documents for parsing are likely to be a few hundred lines long so searching may become processor intensive meaning Go may be a good candidate, however Python offers an array of libraries which coule be helpful. + +## File formats + +Diagrams are received in DOCX format, however can be easily be converted to ODT, DOC, or PDF which provides flexibility in the languages and the libraries used in the implementation. + ## Aims The aim of diagram-parser is to simplify the addition of PIS codes that are not yet in the OwlBoard data source. The planned implementation is as follows: - diagram-parser is subscribed to an email inbox (IMAP/POP3) - - Formatted train-crew schedule cards are sent to the inbox (DOCX - Maybe PDF alternatively - format) and loaded by diagram-parser + - Formatted train-crew schedule cards are sent to the inbox and loaded by diagram-parser - List of existing PIS codes is loaded and a list of non-existent codes is compiled (0000-9999) - If a code is found both in the diagram and on the list of non-existent codes, a Gitea issue is opened providing details of the code. - Once the program has run and extracted only the relavent details, the email is deleted and the file is closed and not stored. - - The evantual aim is to avoid any manual searching of the DOCX files. + - The evantual aim is to avoid any manual searching of the files. The current process of adding new codes involves being made aware of them face to face, or finding them myself and manually finding and adding them to the data source. @@ -23,4 +31,5 @@ The current process of adding new codes involves being made aware of them face t - The format of the attachment should be checked and any errors handled gracefully. ## Main external dependencies (Expected) - - mailbox (https://pypi.org/project/mailbox/) + - imaplib + - email