In Part I of this three-part series, I walked you through the process of using Power Automate to build a Flow for collecting signatures with Adobe Sign. Now we are going to look at how we can extract information entered into a document. If you use a solution like Adobe Sign, you will be able to collect values from the form fields; however, if your process includes handwritten claims, you may want a process that automates some or all of the data collection.
We will use AI Builder for this example. If you’re not familiar with AI Builder and would like more information, have a look at my previous post: Overview of AI Builder’s Form Processing Model.
Things to Know About a Trained Model
Before jumping right in, I wanted to highlight a few limitations. AI Builder does a really good job of identifying the text you’d like to capture, especially when the text is typed in. In the example below, you can see the ‘Name of Insured’ field is properly identified with a 100% Confidence Score. When the information is handwritten, several things can affect the Confidence Score. A few examples include:
- Sloppy handwriting
- Writing beyond the boundaries (not staying within the lines)
- Areas that accept more than one piece of information
AI Builder is also designed to only read text. I’ll show an example of what happens when you try to detect whether a checkbox was checked. For detailed, updated information about the requirements and limitations, visit the Microsoft Docs page: Form processing model requirements and limitations. For now, here are the requirements as of the time of the writing of this post.
Form processing works on input documents that meet the following requirements:
- JPG, PNG, or PDF format (text or scanned). Text-embedded PDFs are better, because there won’t be any errors in character extraction and location.
- If PDFs are password-locked, you must remove the lock before submitting them.
- The combined file size of the documents used for training per collection must not exceed 50 MB. PDF documents shouldn’t be longer than 500 pages.
- Image dimensions must be between 50 × 50 and 10,000 × 10,000 pixels.
- PDF file dimensions cannot exceed 17 x 17 inches, corresponding to Legal or A3 paper sizes and smaller.
- Scans of paper documents should be high-quality.
- Must use the Latin alphabet (English characters).
Data Collection and Notifications
We have a trained model prepped and are expecting to receive handwritten claims. For the purposes of this post, we will start the Flow when a claim is uploaded to SharePoint. However, you can use a variety of triggers to start the Flow. A few examples include an email with attachments or a specific subject line, or you can use a webhook to subscribe to an event triggered by a submitted form.
In this Flow, we are collecting information about the file, running the file against our model which extracts the key data and updates the file’s properties in SharePoint to include information captured from it: namely, the Policy Number, Name of Insured, and Description of Incident. In a typical scenario, you would likely have another database where this information is to be posted.
Earlier, I mentioned that handwritten forms sometimes have a lower confidence score. For this reason, there are two extra steps in this example. We first identify if any of the three fields being captured has a Confidence Score lower than 70%. If any do, an email is sent to notify the recipient that there is a claim that may need to be reviewed for accuracy. Otherwise, the Flow simply completes.
Processing Handwritten Forms
Now that the process is ready, we can start accepting forms. In this example, you see an image of a form that I manually wrote into. You can see the pages are folded at the corners; I used mixed case on some words, I crossed out sections, and my monitor cable appears at the top of the image. Not exactly a perfect picture.
When we run it against the model, you can see that it identified where it found values it was expecting to capture. Two things I want to highlight are the confidence score and checkboxes. Beginning with the ‘Description of Incident’ field, you can see that it has a low 37% confidence score. There could be several reasons for this. For starters, the boundary I created for the text could’ve been better. My handwriting could’ve been an issue, although you can see that it captured the text perfectly. It may also be that I simply didn’t train the model well enough. I only used five examples of almost identical files which were all typed and didn’t contain handwritten text. That means this was the very first handwritten content it has seen.
I also want to point out the checkbox fields. Although I trained the model to include the checkboxes (and only the checkboxes), the first two failed to properly outline the first two checkboxes below the description of incident, and correctly outlined the second two. Regardless, this doesn’t work.
As noted in the requirements and limitations documentation, checkboxes are not supported. Currently, complex tables containing nested tables, checkboxes, radio buttons, signatures, and fillable pdf files are not supported. Ideally, you would use Adobe Sign or DocuSign to collect that type of information. If you have a manual process, you’ll want to collect key data but may still require manual verification that the data collected was correct and that fields which aren’t supported were also captured.
Continuing with our example: once the form is processed, you can see that the PDF taken from my cellphone camera using Office Lens was uploaded to the library and the information I wanted to capture was properly collected and entered into SharePoint alongside my file.
You can also see that an email was sent to inform me that the processed claim contains information I should review.
We began this series by showing you how to integrate Power Automate with Adobe Sign to automate the collection of signatures. This post walked through using Microsoft’s AI Builder to extract data from handwritten forms. We also discussed some limitations and things to look for when building this type of solution. The next post will cover a scenario where a legacy system is used to record information captured from the claim form. That post will highlight Robotic Process Automation to automate the data entry process when a web service or database connection is available. This is typically the case when dealing with older desktop applications. As always, if you’d like more information about AI or automating business processes, Anexinet is here to help. Please feel free to reach out to us with any questions.
SharePoint/Office 365 Architect
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
|cookielawinfo-checbox-analytics||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".|
|cookielawinfo-checbox-functional||11 months||The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".|
|cookielawinfo-checbox-others||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.|
|cookielawinfo-checkbox-necessary||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".|
|cookielawinfo-checkbox-performance||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".|
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.