Quantxt Theia FAQs

Theia is a cloud-based software that can detect and extract data from documents.
Theia can detect and extract phrases, dates, and numbers embedded in forms, tables or anywhere in a document and deliver them in a machine-readable format.
You can automate data entry tasks with Theia and reduce associated labor costs. It is fast, accurate and, can work 24/7.
  • Import data from documents into databases, applications, or Excel workbooks
  • Scale document scrubbing and data entry teams
  • Full-text search in static content files
  • Automate document processing workflows
Theia does not require any manual text selection or grid-based annotation of the input documents.
All you need to do is to provide a list of properties required to be extracted from the input documents.
Theia is layout agnostic and detects and extracts data regardless of their coordinates.
Here are just a few examples of documents you can accurately process with Theia:
  • Contracts
  • Appraisal reports
  • Insurance claims and policy forms
  • Fund prospectus reports
  • Court records
  • Loan documents
  • Resumes and CVs
  • Product fact sheets
First you need to create a data capture model for your documents. Creating a model takes as little as 5 minutes. All you need is to upload one sample document and type in the list of properties you'd like to extract.
You may extend your model by adding Regex rules and increase extraction accuracy.
Once a model is configured, you can simply upload your files and export the results in XLS, CSV or JSON.
You can also contact our support team to help with configuring models.
The accuracy of extraction heavily depends on the quality of documents. You can achieve very high accuracy on good quality documents with a clean background, while you will most likely be at a much lower accuracy on blurry documents.
Supported formats are PDF, PNG, JPG, GIF, TSV, CSV, XLS, XLSX, XLSB, XLSM and plain content such as in TXT and HTML files.
Pricing is based on Units processed at 10¢ per Unit for standard accounts. Volume discounts are available and for as low as 1¢ per Unit.
Units are defined as below:
  • For PDF, each Unit is equivalent of one page.
  • Each single page image such as JPG, PNG and GIF is considered one Unit.
  • For CSV, TSV, XLS, XLSX, XLSB and XLSM each Unit is equvalant to 200 non-empty cells.
  • For TXT and HTML each Unit is equivalent of 3,000 characters excluding spaces and newlines.
Yes! You have 30 days with 50 free Units to try Theia free of charge.