The Visual NLP Blog

Accurate Table Extraction from Documents & Images with Spark OCR

by

Mykola Melnyk

Extracting data formatted as a table (tabular data) is a common task — whether you’re analyzing financial statements, academic research papers, or clinical trial documentation. Table-based information varies heavily in...

Extract Tabular Data from PDF in Spark OCR

by

Mykola Melnyk

Introduction to Table Extraction The amount of data collected is increasing every day with many applications, tools, and online platforms booming in the current digital age. To make sense of,...

Signature Detection in Spark OCR

by

Mykola Melnyk

How to detect signature in image-based documents For document comprehension pipelines in the healthcare and the financial area, we need some time to detect the signature of the document or...

Text Detection in Spark OCR

by

Mykola Melnyk

Motivation Spark OCR already contains an ImageToText transformer for recognising text on the image. It works fine for documents in general, but needs custom preprocessing to recognise text contained on...

Table Detection and Extraction in Spark OCR

by

Mykola Melnyk

Converting tables in scanned documents & images into structured data Motivation Extracting data formatted as a table is a common task - whether you’re analyzing financial statements, academic research papers,...

Spark OCR Blog

Accurate Table Extraction from Documents & Images with Spark OCR

Extract Tabular Data from PDF in Spark OCR

Signature Detection in Spark OCR

Text Detection in Spark OCR

Table Detection and Extraction in Spark OCR

Join the Global Healthcare AI Community

The Technology

The Technology in Action

Industry Trends

Spark OCR Blog

Accurate Table Extraction from Documents & Images with Spark OCR

Extract Tabular Data from PDF in Spark OCR

Signature Detection in Spark OCR

Text Detection in Spark OCR

Table Detection and Extraction in Spark OCR