Introduction to Quaero Archive

Hello and welcome to this documentation. Quaero Archive is a powerful and highly customizable and it should be obvious by now that this documentation is a work in progress.

This document explains how Quaero Archive schema blah blah blah and information and yadda.

How Quaero Archive organizes your documents

Quaero Archive strives to impose the least amount of structure on your data. All the structure is comes from how you configure the Archive and how you search for your documents.

The document

The basic unit of data in Quaero Archive is the document. What exactly one document represents is up to your needs. It could be a customer invoice, a monthly report, and so on.

All documents in Quaero Archive are simply 2 computer files: a document file and information fields.

Document file

The document file is the normal file you need to archive. It could be a TIFF from a scanner, a PJL print job, a PDF of a sales report, a text file of a monthly report or any other of the formats supported by Quaero Archive.

Information fields

Information fields are what permit Quaero Archive to index, find and retrieve documents. Information fields are also called meta information or yaml because that's their native format. Information fields have a name and a value. A field's value is any string or number. The name is some short identifier made up of letters and numbers and the underscore (_). A field name may not contain accents. The field name is not displayed to the user; better descriptive French and English names are used.

Universal fields

Every document in a Quaero Archive will have the following fields. Many of them are produced by Quaero Archive when a document is archived.

NUM – a unique document number or identifier;
type – the short identifier of the document type;
date – the date the document was created;
time – the time at which the document was created;
pages – number of pages in the document;
format – format of the document.

NUM

The document number must be unique to one document. It would be tempting to use something like the invoice number as document number. If you do, please make sure you never reuse an invoice number. For example, if your invoice number looks like “MT78123” it's highly likely that it will wrap around from “MT99999” to “MT00001” at some point.

Quaero Archive also uses a DID (document identifier) internally, which is a unique number used for efficient indexing.

type

This field will hold the short tag of a document type. See that section for more details.

Quaero Archive also used TID (type identifier) internally, which is a numeric representation of a document type's short tag.

date + time

The date and time of a document are set when the document is created. This could be different from the date the document was archived. If you do not include the date and time in your information fields, Quaero Archive will use the archival date. The date is always formatted as “YYYY-MM-DD” and the time as “HH:MM:SS.”

Quaero Archive also uses MJD (modified Julian date) internally to speed up certain operations.

pages

Pages are counted by Quaero Archive when a document is archived.

format

Knowing a document's format is how Quaero Archive is able to process, index and display the document. This works similarly to file extensions under Windows.

When a document is archived, the format is inferred from the file extension of the document file by Quaero Archive when a document is archived. Using the wrong extension will cause bad things to happen. Documents of a given document type do not all have to be in the same file format.

Document formats are short names, like “txt”, “tif” or “docx”. Document formats are unambiguous, for instance “tif” is never “tiff” nor “TIFF”. However, if you upload a document file with a .tiff extension, it will be renamed to .tif.

Some document formats are to complex to be easily displayed, for instance all documents produced by Microsoft Office. Quaero Archive creates an intermediate PDF is created when these documents are archived.

Custom fields

Quaero Archive allows you to define your own information fields. For example, you could add a “client” field for your invoices, so you could easily search on client information. Or you could add a “financialperiod” field for your business reports, to easily select the financial period a report covers.

Example document

You want to an invoice to Quaero Archive. You will need to create 2 computer files : the invoice in it's final form and a YAML file containing the information fields. The invoice is produced by a legacy system, it is a PostScript file. We are going to use the calendar year and invoice number as our unique document number. The 2 files are named :

2019-812391.ps
2019-812391.yml

The file 2019-812391.yml will contain the following:

type: 01-invoice

invoice_no: 812391

date: 2019-04-22

client: 98135 JPD INC

end: of-file

The “01-invoice” type is a document type previously created. The “invoice_no” and “client” fields are custom fields created for the document type. The “end” field is how Quaero Archive makes sure the entire YAML file was transmitted.

Document type

The document type is about the only way Quaero Archive bundles documents.

Document type can be as specific or as general as you wish. For example, you could have an invoice document type for the entire company or you could chose to have an invoice type for each business group or branch office.

Document types have a short tag or identifier, used in the YAML file, and descriptive name in English and in French which are displayed to the user. The descriptive name can be changed as you want. The short tag should only be changed if you have deleted all documents of that type first. In fact, don't do that – just create a new document type and put any new documents into it.

Generating the information fields

You already know the data in a document's information fields; the document file was produced by an application. The ideal scenario is to fetch the information fields from the application's database and build

Document classification

If your document file is a formatted text report, you can configure a document classification filter to fetch the information fields from rows and columns of the document file.

Custom code

In some cases, we can write a custom program to fetch the information fields from inside a document file.

Manual data entry

This is the least desirable scenario – someone has to sit down and type in the information fields. This is necessary if the document is produced by an application or source outside your company. For example, your HR department could scan in doctors notes and need the employee's name and ID in the information fields, or your suppliers send you invoices as PDFs your accounting department would like supplier name and ID, and the invoice total as information fields. Quaero Archive has a highly customizable tool for setting this up and helping a user though the process.

Transferring a document to the archive.

There are a plethora of ways to transfer documents to Quaero Archive : ftp, SMB, REST, ipp and lp. Please discuss with your integrator the exact configuration used by your installation.

REST

REST uses the HTTP protocol to communicate with Quaero Archive in a robust and flexible manner. It used HTTP methods to access and manipulate information in Quaero Archive. It can be used to add new documents to Quaero Archive. Please read the REST documentation for more details.

Example REST access :

Endpoint: https://quaero.local.lan/dw/v1/

TokenID: 736cdd83c8c19830beda7b271c8ed202a61d2ae5a257fca1fc7bf577a8fdd096

Secret: 4e985de377092890b3d31da69fe807ee8e8fe90e4cc5619d0b0e5090ae5ce965

ftp

FTP is an old, insecure but robust protocol for transferring files. With this method you simply upload your document file and the yaml file to Quaero Archive and your document will be processed add to the archive. Your integrator will give you an IP address or host name, username and password along with a directory you must place your files in.

Example ftp access :

Host: quaero.local.lan

Path: /quaero/incoming

Username: dw

Password: dw-dw-2015

SMB

Also known as Samba or CIFS, SMB is one or more Windows shares exported by Quaero Archive. With this method, simply connect to the share with the username and password your integrator supplied and copy the document and information files over.

Example SMB share :

Path: \\quaero.local.lan\queue\incoming

Username: dw

Password: dw-dw-2015

ipp and lp

Quaero Archive may also act as a print queue to directly receive documents files. Given that it is impossible transfer 2 files at once over these protocols, you will need to hide the information fields inside your document file. This can be done with PJL comments or image meta data. If this is not feasible, further processing can be done by Quaero Archive. For instance, automatic filtering can be set up to find the information fields in a text document. Or a barcode could be used to match a scanned signed invoice with the original invoice. As a last resort, manual data entry can be used to create the information fields.

Example ipp queue :

ipp://quaero.local.lan/printers/to-dw

Example lp queue :

Host: quaero.local.lan

Queue: to-dw