Update 'README.md'

This commit is contained in:
Fred Boniface 2023-08-05 15:59:27 +01:00
parent e1cdb6fb17
commit 242c77fae3
1 changed files with 12 additions and 3 deletions

View File

@ -2,16 +2,24 @@
This is an experimental project and is not yet used as part of the OwlBoard stack. This is an experimental project and is not yet used as part of the OwlBoard stack.
## Language
It is so-far undecided what language will be used. Documents for parsing are likely to be a few hundred lines long so searching may become processor intensive meaning Go may be a good candidate, however Python offers an array of libraries which coule be helpful.
## File formats
Diagrams are received in DOCX format, however can be easily be converted to ODT, DOC, or PDF which provides flexibility in the languages and the libraries used in the implementation.
## Aims ## Aims
The aim of diagram-parser is to simplify the addition of PIS codes that are not yet in the OwlBoard data source. The planned implementation is as follows: The aim of diagram-parser is to simplify the addition of PIS codes that are not yet in the OwlBoard data source. The planned implementation is as follows:
- diagram-parser is subscribed to an email inbox (IMAP/POP3) - diagram-parser is subscribed to an email inbox (IMAP/POP3)
- Formatted train-crew schedule cards are sent to the inbox (DOCX - Maybe PDF alternatively - format) and loaded by diagram-parser - Formatted train-crew schedule cards are sent to the inbox and loaded by diagram-parser
- List of existing PIS codes is loaded and a list of non-existent codes is compiled (0000-9999) - List of existing PIS codes is loaded and a list of non-existent codes is compiled (0000-9999)
- If a code is found both in the diagram and on the list of non-existent codes, a Gitea issue is opened providing details of the code. - If a code is found both in the diagram and on the list of non-existent codes, a Gitea issue is opened providing details of the code.
- Once the program has run and extracted only the relavent details, the email is deleted and the file is closed and not stored. - Once the program has run and extracted only the relavent details, the email is deleted and the file is closed and not stored.
- The evantual aim is to avoid any manual searching of the DOCX files. - The evantual aim is to avoid any manual searching of the files.
The current process of adding new codes involves being made aware of them face to face, or finding them myself and manually finding and adding them to the data source. The current process of adding new codes involves being made aware of them face to face, or finding them myself and manually finding and adding them to the data source.
@ -23,4 +31,5 @@ The current process of adding new codes involves being made aware of them face t
- The format of the attachment should be checked and any errors handled gracefully. - The format of the attachment should be checked and any errors handled gracefully.
## Main external dependencies (Expected) ## Main external dependencies (Expected)
- mailbox (https://pypi.org/project/mailbox/) - imaplib
- email