{"id":15309,"date":"2021-09-23T12:36:17","date_gmt":"2021-09-23T16:36:17","guid":{"rendered":"http:\/\/www.iri.com\/blog\/?p=15309"},"modified":"2024-09-23T13:46:28","modified_gmt":"2024-09-23T17:46:28","slug":"preprocessing-images-for-ocr-darkshield","status":"publish","type":"post","link":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/","title":{"rendered":"Preprocessing Images to Improve OCR &#038; DarkShield Results"},"content":{"rendered":"<p>Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize text in standalone or embedded images during PII searching and masking operations.<\/p>\n<p>OCR does have its limits however; for accurate results, it requires the image to be vertically aligned, sized properly, and as clear as possible. Not every image meets those requirements!<\/p>\n<p>We must therefore find and use methods to adjust these images to meet our needs through preprocessing. This article discusses a few preprocessing techniques, and how they can improve the quality of OCR output in a DarkShield data masking context.<\/p>\n<h6><strong>Pre-Process Method 1: Image Scaling<\/strong><\/h6>\n<p>Image scaling is the easiest method to understand and consists of enlarging or shrinking an image to the desired size.<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-15314 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-1.jpg\" alt=\"\" width=\"500\" height=\"500\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-1.jpg 800w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-1-150x150.jpg 150w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-1-300x300.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-1-768x768.jpg 768w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/><span style=\"font-weight: 400;\">(fig. 1) <\/span><i><span style=\"font-weight: 400;\">Original image of a medical scan.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">Figure 1 above shows an X-ray image that is 800&#215;800 pixels and Figure 2 below shows the Tesseract OCR output of the image. Tesseract missed the age and the date of birth:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15315 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-2.png\" alt=\"\" width=\"561\" height=\"215\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-2.png 561w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-2-300x115.png 300w\" sizes=\"(max-width: 561px) 100vw, 561px\" \/><span style=\"font-weight: 400;\">(fig. 2) <\/span><i><span style=\"font-weight: 400;\">Tesseract OCR text detected from original image.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">This can be resolved by increasing the size of the image, which in turn will enlarge the text enough for Tesseract to recognize it.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15313 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-3.png\" alt=\"\" width=\"600\" height=\"376\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-3.png 662w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-3-300x188.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><span style=\"font-weight: 400;\">(fig. 3) <\/span><i><span style=\"font-weight: 400;\">Code for scaling an image by a factor of 2 and saving the new image.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">By importing the Pillow library, we can increase the size of the image while also maintaining the same aspect ratio. Maintaining this ratio is important because if we were to set the image to a specific size of X by Y pixels, we could potentially warp the image and do more harm than good.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The code from figure 3 doubles the size of our X-ray image from 800&#215;800 pixels to 1600&#215;1600 pixels. Feeding the resized image into Tesseract results in more data being captured as seen in figure 4:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15317 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-4.png\" alt=\"\" width=\"610\" height=\"225\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-4.png 610w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-4-300x111.png 300w\" sizes=\"(max-width: 610px) 100vw, 610px\" \/><span style=\"font-weight: 400;\">(fig 4) <\/span><i><span style=\"font-weight: 400;\">Date of birth was able to be detected correctly by the Tesseract OCR engine after scaling the (800&#215;800) image by a factor of 2x.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">It is important to note however that increasing the image size too much has diminishing returns, as seen in figure 5 where the X-ray image was increased by a multiple of 3x.<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15318 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-5.png\" alt=\"\" width=\"609\" height=\"196\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-5.png 609w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-5-300x97.png 300w\" sizes=\"(max-width: 609px) 100vw, 609px\" \/><span style=\"font-weight: 400;\">(fig 5) <\/span><i><span style=\"font-weight: 400;\">Increasing the scale to 3x for a total resolution of (2400&#215;2400) pixels actually decreased the OCR accuracy.<\/span><\/i><\/p>\n<p><span style=\"font-weight: 400;\">The amount by which an image must be increased or decreased is highly dependent on the size of the text displayed in the image.<\/span><\/p>\n<h6><b>Pre-Process Method 2: Binarization<\/b><\/h6>\n<p><span style=\"font-weight: 400;\">Binarization is the method of converting a color image into an image that consists of only black and white pixels; i.e., where the black pixel value = 0 and the white pixel value = 255.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This can be done by applying what is known as an <\/span><i><span style=\"font-weight: 400;\">Otsu <\/span><\/i><span style=\"font-weight: 400;\">threshold (named after Nobuyuki Otsu) where normally that threshold=127, half the pixel range between 0 and 255). If the pixel value is greater than the threshold, it is considered a white pixel, else a black pixel.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The Tesseract OCR engine applies Otsu binarization internally when processing images. Because of this, pre-processing with Otsu binarization is unnecessary. Nevertheless, binarization may not even produce desirable results, as seen here:<\/span><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15316 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-14.png\" alt=\"\" width=\"651\" height=\"422\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-14.png 700w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-14-300x195.png 300w\" sizes=\"(max-width: 651px) 100vw, 651px\" \/>(fig 14)<\/p>\n<p>For cases like this where the page background is of uneven darkness, using a different binarization algorithm such as <i>adaptive<\/i> thresholding may improve OCR results. In adaptive binarization, the threshold is not the same throughout the image; the threshold for a pixel in this method is based on a small region around instead of a global threshold for the image.<\/p>\n<p>Binarization can also be used in other ways, including preparing an image to be deskewed. This is because binarization can help the computer recognize text so that the angle of that text can be detected more easily.<\/p>\n<h6><b>Pre-Process Method 3: Deskewing<\/b><b><br \/>\n<\/b><\/h6>\n<p>Running physical documents through a scanner can result in a digital image that is slightly tilted, or skewed. To get the most accurate OCR output, we want to square up, or vertically align the input image as much as possible.<\/p>\n<p>To address the problem, we must calculate the angle by which the text is skewed, then counter- rotate that image. To do this, first we apply gray-scaling, blurring, and binarization to the image.<\/p>\n<p style=\"text-align: center;\">\u00a0<img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15337 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-6-7.png\" alt=\"\" width=\"637\" height=\"172\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-6-7.png 637w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-6-7-300x81.png 300w\" sizes=\"(max-width: 637px) 100vw, 637px\" \/>(fig 6,7)<\/p>\n<p>The original image above has a white background and black text, but after being prepped the image background is inverted and slightly blurred (temporarily to calculate where the text is).<\/p>\n<p>Next, to find the location of the text we dilate the white pixels to merge the text into blocks so we can find the largest block and apply a bounding box around it.<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15320 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-8.png\" alt=\"\" width=\"650\" height=\"139\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-8.png 949w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-8-300x64.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-8-768x164.png 768w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/>(fig 8) <i>Otsu binarization applied to help detect where the text is located.<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15321 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-9.jpg\" alt=\"\" width=\"499\" height=\"408\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-9.jpg 714w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-9-300x245.jpg 300w\" sizes=\"(max-width: 499px) 100vw, 499px\" \/>(fig 9)<\/p>\n<p>There are many methods to calculate the skew angle of the text, but the easiest is to take the largest text block and use its box angle. The final step is to just determine the angle of the box and then correct for it as seen in the following figures:<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15322 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-10-1024x274.png\" alt=\"\" width=\"751\" height=\"201\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-10-1024x274.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-10-300x80.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-10-768x206.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-10.png 1061w\" sizes=\"(max-width: 751px) 100vw, 751px\" \/>(fig10) <i>Determine the angle of the skew.<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15319 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-11-1024x208.png\" alt=\"\" width=\"748\" height=\"152\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-11-1024x208.png 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-11-300x61.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-11-768x156.png 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-11.png 1194w\" sizes=\"(max-width: 748px) 100vw, 748px\" \/>(fig 11) <i>Rotate the image by the skew angle that was detected<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15323 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-12.png\" alt=\"\" width=\"500\" height=\"409\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-12.png 714w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-12-300x245.png 300w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/>(fig 12)<\/p>\n<p>As you can see in Figure 12, the image is now de-skewed and we can now feed it through Tesseract to perform OCR:<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-15324 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-13.png\" alt=\"\" width=\"592\" height=\"267\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-13.png 592w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-fig-13-300x135.png 300w\" sizes=\"(max-width: 592px) 100vw, 592px\" \/>(fig 13) <i>Tesseract OCR accuracy was improved after correcting the skew in the image.<\/i><\/p>\n<p>Figure 13 shows that the skewed image did not produce any text in Tesseract, but the \u201cfixed\u201d version of the image captured it all perfectly. It should be noted that given this image, Tesseract can detect text with 100% accuracy within an angle of +\/- 2 degrees.<\/p>\n<p>Automation of this process is possible if the angle is tested on each image. If there is a difference after dividing the angle of the text in the image by 90 degrees, then the image is skewed. If the skew angle is greater than or equal to a specified threshold for triggering deskewing, then deskewing is performed.<\/p>\n<p>If you would like to learn more about how the deskewing process works, click <a href=\"https:\/\/becominghuman.ai\/how-to-automatically-deskew-straighten-a-text-image-using-opencv-a0c30aed83df\">here<\/a> to find the original source code and the breakdown.<\/p>\n<h6><b>Using Image Preprocessing Pipelines with the DarkShield API<\/b><\/h6>\n<p><a href=\"https:\/\/github.com\/TeamIRI\/darkshield-api-demos\/tree\/master\/image-preprocessing\">This demo<\/a> available in GitHub demonstrates how to integrate the preprocessing methods discussed in this article with the <a href=\"https:\/\/www.iri.com\/blog\/data-protection\/darkshield-files-rpc-api\/\">DarkShield API for (image) files<\/a>.<\/p>\n<p>The demo program allows either a single image or a folder of images to be specified to preprocess. Each image is first preprocessed, sent to the DarkShield-Files API for searching and masking, then post-processed to restore it to the original image.<\/p>\n<p>In such cases, the image still retains any black boxes that were placed by the DarkShield-Files API to mask sensitive data. Note that if applying adaptive binarization among the preprocessing methods, it is not possible to restore the image to its original coloring.<\/p>\n<p>The images are saved in a directory named <i>masked<\/i> that will be automatically created if it does not exist when running the demo. Additional arguments may be specified to the program that determine if a pipeline should be used.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15327 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-diagram.png\" alt=\"\" width=\"601\" height=\"440\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-diagram.png 983w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-diagram-300x220.png 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-diagram-768x563.png 768w\" sizes=\"(max-width: 601px) 100vw, 601px\" \/><\/p>\n<p>The default is to only rescale images if they are smaller than 1600 pixels in width, and only deskew images if the skew angle is greater than 5 degrees. These specific values can be modified by specifying command line arguments to the program <i>main.py<\/i>. The width in pixels can be specified with a <i>-w<\/i> flag, and the angle in degrees can be specified with a <i>-a<\/i> flag.<\/p>\n<p>The <i>-b <\/i>flag specifies whether adaptive thresholding should be used among the pipelines. Even if <i>-b <\/i>\u00a0is specified as <i>true<\/i>, a calculation will still be performed to determine if the image has uneven brightness.<\/p>\n<p>If the image does not have uneven brightness, this preprocessing method will not be applied as it would have no benefit to OCR accuracy. The <i>-t<\/i> argument specifies an integer between 0 and 100 that should be used for adaptive thresholding (the default value is 25).<\/p>\n<p>The demo includes a folder with a few example pictures. One picture is skewed, one has uneven lighting, and one is in need of resizing to improve the text that is recognized by OCR.<\/p>\n<p>The results of running the three images in the folder through the demo program are shown below:<\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15329 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png\" alt=\"\" width=\"499\" height=\"408\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png 714w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result-300x245.png 300w\" sizes=\"(max-width: 499px) 100vw, 499px\" \/><i>Final result of a skewed image that was preprocessed, sent to the DarkShield API, and post-processed.<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15331 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-xray.jpg\" alt=\"\" width=\"500\" height=\"500\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-xray.jpg 800w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-xray-150x150.jpg 150w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-xray-300x300.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-xray-768x768.jpg 768w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/><i>Final result of an image in need of scaling that was sent through a pipeline program that pre-processes, sends to the DarkShield-Files API, then post processes. The preprocessing helped DarkShield correctly find (match, and mask) the date of birth.<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15333 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-original-1024x681.jpg\" alt=\"\" width=\"650\" height=\"432\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-original-1024x681.jpg 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-original-300x200.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-original-768x511.jpg 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-original.jpg 1110w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><i>Original image with uneven lighting.<\/i><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15332 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-masked-1024x681.jpg\" alt=\"\" width=\"650\" height=\"432\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-masked-1024x681.jpg 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-masked-300x200.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-masked-768x511.jpg 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-masked.jpg 1110w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><i>Result of sending original image to DarkShield API without any preprocessing.<\/i><\/p>\n<p>Without preprocessing, many of the entries specified as sensitive are found and redacted. However, there are several entries that should have matched that did not due to inaccuracies in text detection by the OCR engine. These include <i>Davi<\/i> in the Surname, <i>Longwood University, Farmville, VA, <\/i>and <i>Marta<\/i> near the bottom of the image.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-15330 aligncenter\" src=\"\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-processed-1024x681.jpg\" alt=\"\" width=\"650\" height=\"432\" srcset=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-processed-1024x681.jpg 1024w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-processed-300x200.jpg 300w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-processed-768x511.jpg 768w, https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-visa-processed.jpg 1110w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/p>\n<p>However, when the image is preprocessed with adaptive thresholding binarization (and using the same search matchers and Tesseract OCR model), these entries are now matched correctly and redacted. Unlike other preprocessing methods, adaptive thresholding binarization cannot be reverted with post processing.<\/p>\n<p>Any of these image preprocessing methods can be modified or expanded upon to meet the specific needs of different images. Due to the diverse content that can be in an image, there is not a single preprocessing pipeline that can be the most effective for every possible image.<\/p>\n<p>DarkShield allows for versatility in how image preprocessing can be applied prior to sending images to its API. Any programming language capable of interacting with the DarkShield API through HTTP requests can be used.<\/p>\n<p>In a future DarkShield API release, it is possible that additional image preprocessing methods will be baked into the API itself, eliminating the need to incorporate them into \u2018glue code\u2019. For example, image rotation and deskewing is already automatically handled by the DarkShield API.<\/p>\n<h6><b>Alternative Solutions for OCR Inaccuracy<\/b><\/h6>\n<p>Also consider that even in ideal conditions, OCR is not guaranteed to be 100 percent accurate. Because of this, it may be worth erring on the side of caution when setting up search matchers.<\/p>\n<p>Fuzzy set search matchers can be set up that allow for entries to be still matched even if OCR misses, adds, or misinterprets a specified number of letters. Regular expression patterns can be modified to adjust for potential errors in the text identified by OCR.<\/p>\n<p>On the very cautious side, the (.*) pattern can be specified to match all text identified by OCR, but that may redact parts of the image that are not desired to be redacted. Generally, most OCR failure in accuracy is in detecting <i>what<\/i> the letters of the text are, not <i>where<\/i> the text is.<\/p>\n<h6><b>Other Solutions Being Considered for Development<\/b><\/h6>\n<p>Finally, additional search matching methods such as fuzzy regular expression matching and set elimination are being considered as additional search matching options. These are specifically designed to address the issue of some letters of words being incorrectly identified by OCR.<\/p>\n<p>In fuzzy regular expression matching, a threshold percentage can be specified. In typical regular expressions, the result is binary: there is either a match or not a match. With fuzzy regular expressions, if an 80% threshold is specified, and a six-letter word is off by one letter from matching the regular expression, the word will still be matched (as 83% is &gt; the 80% threshold).<\/p>\n<p>Set elimination is the inverse of the existing set matcher search method. In set elimination, a text file containing a list of entries separated by newlines can be given, and text that does <i>not<\/i> match any of these words will be matched.<\/p>\n<p>For example, a set containing all English (or other language) words found in a dictionary can be given. In this case, all entities, SSN, or other numbers will be matched, but any words that would be found in a dictionary of the language will not be matched. This search method errs on the side of caution, making it a good fit for the imperfections of OCR.<\/p>\n<p>If you have any questions or feedback around this article, please contact <a href=\"mailto:darkshield@iri.com\">darkshield@iri.com<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize text in standalone or embedded images during PII searching and masking operations. OCR does have its limits however; for accurate results, it requires the image to be vertically aligned, sized properly, and<\/p>\n<div><a class=\"btn-filled btn\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\" title=\"Preprocessing Images to Improve OCR &#038; DarkShield Results\">Read More<\/a><\/div>\n","protected":false},"author":158,"featured_media":15329,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[8],"tags":[1626,1494,1493,1625,1388,1624,1306],"class_list":["post-15309","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-protection","tag-binarization","tag-darkshield-api","tag-image-masking","tag-image-preprocessing","tag-iri-darkshield","tag-ocr","tag-pii-masking"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.4 (Yoast SEO v23.4) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Preprocessing Images to Improve OCR &amp; DarkShield Results - IRI<\/title>\n<meta name=\"description\" content=\"Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Preprocessing Images to Improve OCR &amp; DarkShield Results\" \/>\n<meta property=\"og:description\" content=\"Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\" \/>\n<meta property=\"og:site_name\" content=\"IRI\" \/>\n<meta property=\"article:published_time\" content=\"2021-09-23T16:36:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-09-23T17:46:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png\" \/>\n\t<meta property=\"og:image:width\" content=\"714\" \/>\n\t<meta property=\"og:image:height\" content=\"584\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Seth Lefferts and Devon Kozenieski\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Seth Lefferts and Devon Kozenieski\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\"},\"author\":{\"name\":\"Seth Lefferts and Devon Kozenieski\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/8f4848015a21aa0cc36cf6d36800c2de\"},\"headline\":\"Preprocessing Images to Improve OCR &#038; DarkShield Results\",\"datePublished\":\"2021-09-23T16:36:17+00:00\",\"dateModified\":\"2024-09-23T17:46:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\"},\"wordCount\":2002,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png\",\"keywords\":[\"binarization\",\"Darkshield API\",\"image masking\",\"image preprocessing\",\"IRI DarkShield\",\"OCR\",\"pii masking\"],\"articleSection\":[\"Data Masking\/Protection\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\",\"url\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\",\"name\":\"Preprocessing Images to Improve OCR & DarkShield Results - IRI\",\"isPartOf\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png\",\"datePublished\":\"2021-09-23T16:36:17+00:00\",\"dateModified\":\"2024-09-23T17:46:28+00:00\",\"description\":\"Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize\",\"breadcrumb\":{\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png\",\"width\":714,\"height\":584},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.iri.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Preprocessing Images to Improve OCR &#038; DarkShield Results\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.iri.com\/blog\/#website\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"name\":\"IRI\",\"description\":\"Total Data Management Blog\",\"publisher\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.iri.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.iri.com\/blog\/#organization\",\"name\":\"IRI\",\"url\":\"https:\/\/www.iri.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"contentUrl\":\"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png\",\"width\":750,\"height\":206,\"caption\":\"IRI\"},\"image\":{\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/\"}},[{\"@type\":[\"Person\"],\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/8f4848015a21aa0cc36cf6d36800c2de\",\"name\":\"Seth Lefferts\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"inLanguage\":\"en_US\",\"url\":\"\",\"caption\":\"Seth Lefferts\"}},{\"@type\":[\"Person\"],\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/8f4848015a21aa0cc36cf6d36800c2de\",\"name\":\"Devon Kozenieski\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/\",\"inLanguage\":\"en_US\",\"url\":\"\",\"caption\":\"Devon Kozenieski\"}}]]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Preprocessing Images to Improve OCR & DarkShield Results - IRI","description":"Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/","og_locale":"en_US","og_type":"article","og_title":"Preprocessing Images to Improve OCR & DarkShield Results","og_description":"Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize","og_url":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/","og_site_name":"IRI","article_published_time":"2021-09-23T16:36:17+00:00","article_modified_time":"2024-09-23T17:46:28+00:00","og_image":[{"width":714,"height":584,"url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png","type":"image\/png"}],"author":"Seth Lefferts and Devon Kozenieski","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Seth Lefferts and Devon Kozenieski","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#article","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/"},"author":{"name":"Seth Lefferts and Devon Kozenieski","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/8f4848015a21aa0cc36cf6d36800c2de"},"headline":"Preprocessing Images to Improve OCR &#038; DarkShield Results","datePublished":"2021-09-23T16:36:17+00:00","dateModified":"2024-09-23T17:46:28+00:00","mainEntityOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/"},"wordCount":2002,"commentCount":0,"publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png","keywords":["binarization","Darkshield API","image masking","image preprocessing","IRI DarkShield","OCR","pii masking"],"articleSection":["Data Masking\/Protection"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/","url":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/","name":"Preprocessing Images to Improve OCR & DarkShield Results - IRI","isPartOf":{"@id":"https:\/\/www.iri.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage"},"image":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage"},"thumbnailUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png","datePublished":"2021-09-23T16:36:17+00:00","dateModified":"2024-09-23T17:46:28+00:00","description":"Optical Character Recognition (OCR) software is technology that recognizes text within a digital image. OCR is used by IRI DarkShield software to recognize","breadcrumb":{"@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#primaryimage","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png","width":714,"height":584},{"@type":"BreadcrumbList","@id":"https:\/\/www.iri.com\/blog\/data-protection\/preprocessing-images-for-ocr-darkshield\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.iri.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Preprocessing Images to Improve OCR &#038; DarkShield Results"}]},{"@type":"WebSite","@id":"https:\/\/www.iri.com\/blog\/#website","url":"https:\/\/www.iri.com\/blog\/","name":"IRI","description":"Total Data Management Blog","publisher":{"@id":"https:\/\/www.iri.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.iri.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.iri.com\/blog\/#organization","name":"IRI","url":"https:\/\/www.iri.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","contentUrl":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2019\/02\/iri-logo-total-data-management-small-1.png","width":750,"height":206,"caption":"IRI"},"image":{"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/logo\/image\/"}},[{"@type":["Person"],"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/8f4848015a21aa0cc36cf6d36800c2de","name":"Seth Lefferts","image":{"@type":"ImageObject","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","inLanguage":"en_US","url":"","caption":"Seth Lefferts"}},{"@type":["Person"],"@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/8f4848015a21aa0cc36cf6d36800c2de","name":"Devon Kozenieski","image":{"@type":"ImageObject","@id":"https:\/\/www.iri.com\/blog\/#\/schema\/person\/image\/","inLanguage":"en_US","url":"","caption":"Devon Kozenieski"}}]]}},"jetpack_featured_media_url":"https:\/\/www.iri.com\/blog\/wp-content\/uploads\/2021\/12\/darkshield-ocr-image-preprocessing-final-skewed-result.png","_links":{"self":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/15309"}],"collection":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/users\/158"}],"replies":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/comments?post=15309"}],"version-history":[{"count":7,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/15309\/revisions"}],"predecessor-version":[{"id":15338,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/posts\/15309\/revisions\/15338"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media\/15329"}],"wp:attachment":[{"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/media?parent=15309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/categories?post=15309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.iri.com\/blog\/wp-json\/wp\/v2\/tags?post=15309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}