Scanning documents with sequential numbering to single- or multi-page image PDF files
Scanning to PDF files of large volumes of documents from A8 (53*74 mm) up to A3 format with an automatic document scanner, and thereafter up to A1+ with an overhead scanner! The finished product consists of page images stored in PDF format. They are numbered with sequential numbers in scanning order, 0001.pdf, 0002.pdf and so on, and the numbering is applied either per page image or per multi-page PDF document. We normally scan in greyscale at 300 dpi, but we can also scan in black and white, greyscale or colour up to 600 dpi.
Document separation is carried out using a dedicated separator sheet with a barcode intended for this purpose.
PDF files are normally scanned in the strict PDF archiving format PDF/A-1b (ISO 19005-1:2005) (applies to image PDF).
Scanning to multi-page PDF files with document separation is well suited, for example, to due diligence, audit documentation and similar, expense reports and various types of case files. If the material is difficult to scan due to stapled-on receipts and the like, we can use our overhead scanner instead of an ordinary document scanner.
Scanning documents to PDF with an OCR-interpreted text layer for searching
After we have scanned the documents to image PDF files, we OCR-interpret them with the best software for mass processing!
Scanning a book, booklet or catalogue to PDF with an OCR-interpreted text layer for searching
We "sacrifice" the book/booklet (or use an overhead scanner), we measure the page size and scan at exactly the right size, we OCR-interpret with the best software; either (1) batch OCR without correcting uncertain characters, or (2) with correction of uncertain characters and words. Finally we crop the pages electronically by a couple of mm for a neat appearance (not permanent, it can be removed). We can also go through page by page and manually straighten certain pages and lines of text, and tidy up the margins, in dedicated software for jobs that require extra high visual quality!
For catalogues in small print we can first scan at high resolution and OCR-interpret, and then downsample the resolution and compress for a smaller size suitable for the internet.
Document scanning with automatic indexing
An extended variant of the document scanning above is to OCR-interpret only something, or a few individual fields on the images, perhaps from a cover sheet. This could, for example, be a personal identity number, company registration number or another unique identifier. We can use this interpreted information to name the PDFs automatically. They might be named after a personal identity number, company registration number, report number or contract number, for example 'personal-identity-number'.PDF, and so on. With the images named in this way, it becomes easy to find them in your folder on the computer. It also becomes easy to link to the documents from a database.
This automatic indexing is normally done with flexible data capture technology. Please read more about this technology in the "Scanning of forms" section.
Document scanning with manual indexing
In cases where it is not possible to automatically capture a field for indexing, we can register one or a few pieces of data from a page manually.
Renaming of documents
After we have captured or registered data, we can use this data to name the PDF files with new names.