Skip to content

Scan & OCR

DocView Web includes built-in support for scanning physical documents directly from your browser and extracting text using Optical Character Recognition (OCR). This turns paper documents into searchable, indexed digital files.

Web Scanning

The web scanning feature uses TWAIN-compatible scanner integration to capture documents directly into DocView Web:

  1. Navigate to the Scanner page (accessible from the Upload area or a dedicated menu item).
  2. Select your scanner from the scanner dropdown (the page detects all TWAIN-compatible scanners connected to your computer).
  3. Configure scan settings: resolution (DPI), colour mode (colour, greyscale, black & white), page size, and duplex (double-sided) if supported.
  4. Click Scan to capture the document.
  5. Preview the scanned image. You can rotate, crop, or rescan individual pages.
  6. Click Upload to send the scanned document into DocView Web for indexing.

OCR Processing

After scanning or uploading an image-based document (scanned PDF, TIFF, JPEG, PNG), OCR can be applied to extract the text content:

  • Tesseract OCR – An open-source OCR engine that runs locally on the server. Suitable for clear, well-scanned documents in common languages.
  • Azure AI Form Recognizer – A cloud-based AI service that provides higher accuracy, especially for complex layouts, handwriting, and multi-language documents.

Extracted text is stored as part of the document’s metadata, making it searchable via the global search and advanced search features.

Scanner Requirements

RequirementDetails
Scanner typeAny TWAIN-compatible document scanner.
BrowserGoogle Chrome or Microsoft Edge recommended for best compatibility.
Scanner driverThe TWAIN driver for your scanner must be installed on your local machine.
Local serviceA lightweight local scanning service may need to be installed to bridge the browser and the scanner hardware.

OCR Tips

  • Scan at 300 DPI or higher for best OCR accuracy.
  • Use black-and-white mode for text-heavy documents to reduce file size and improve recognition.
  • Ensure documents are placed straight on the scanner glass – skewed pages reduce OCR accuracy.
  • For handwritten content, use Azure AI Form Recognizer for significantly better results than standard Tesseract OCR.

TIP

Combine scanning with DocView Capture profiles for automated classification, indexing, and routing of scanned documents.

Guide Created by DevSoftUK Limited