Start by accessing the “Downloads” section of this tutorial to retrieve the source code and example images.įrom there, take a look at the directory structure: |- scan_receipt.py We first need to review our project directory structure. Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.Īnd best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux! Project Structure Then join PyImageSearch University today!
If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide - it will have you up and running in a matter of minutes. Luckily, OpenCV is pip-installable: $ pip install opencv-contrib-python To follow this guide, you need to have the OpenCV library installed on your system.
Most importantly, I’ll show you which Tesseract PSM to use when building a receipt scanner, such that you can easily detect and extract each item and price from the receipt.įinally, we’ll wrap up this tutorial with a discussion of our results. We’ll then review our receipt scanner implementation line-by-line. In the first part of this tutorial, we will review our directory structure for our receipt scanner project. OCR’ing Receipts with OpenCV and Tesseract
This tutorial’s receipt scanner project serves as a starting point for building a full-fledged receipt scanner application. But until then, receipt scanners can save us a bunch of time and avoid the frustration of manually cataloging purchases. Perhaps in the future, it will become less tedious to track and report our expenses. It’s hard to believe that purchases are still tracked via a tiny, fragile piece of paper in this day and age! If you’re a business owner (like me) and need to report your expenses to your accountant, or if your job requires that you meticulously track your expenses for reimbursement, then you know how frustrating, tedious, and annoying it is to track your receipts. Looking for the source code to this post? Jump Right To The Downloads Section Automatically OCR’ing Receipts and Scansįrom there, we will use Tesseract to OCR the receipt itself and parse out each item, line-by-line, including both the item description and price.