was successfully added to your cart.

    Keyword-based search for text, images, and PDFs in the NLP Lab 

    Avatar photo
    Ph.D. in Computer Science – Head of Product

    Searching for specific information within long text or PDF documents, or within images is important because it allows users to quickly and easily locate the information they need without having to manually scroll through the entire document. This can save time and make it more efficient for users to find the information they need. Additionally, PDFs often contain large amounts of information and can be difficult to navigate, so search functionality can help users easily find the information they are looking for within the document. If the goal is to extract data from PDF, Visual NLP tool is suitable.

    Task Search by Text, Label, and Choice

    NLP Labs offer advanced search features that help users identify the tasks they need based on the text or based on the annotations defined so far. Currently, supported search queries are:

    • text: patient -> returns all tasks which contain the string “patient”;
    • label: ABC -> returns all tasks that have at least one completion containing a chunk with label ABC;
    • label: ABC=DEF -> returns all tasks that have at least one completion containing the text DEF labeled as ABC;
    • choice: Sport -> returns all tasks that have at least one completion which classified the task as Sport;
    • choice: Sport, Politics -> returns all tasks that have at least one completion containing multiple choices Sport and Politics.

    Search functionality is case insensitive, thus the following queries label: ABC=DEF , label: Abc=Def or label: abc=def are considered equivalent.

    Keyword-based Search at Task Level

    NLP Lab supports task-level keyword-based searches. The keyword-based search feature works for text and Visual NER projects alike.

    • The search will work on all paginated pages.
    • It is also possible to navigate between search results, even if that result is located on another page.

    Important

    In the NLP Annotation Lab, the search feature was implemented with the help of an HTML tag, added to the Visual NER project configuration. In the NLP Lab, with the implementation of task-level search feature, the previous search tag should be removed from existing visual NER projects.

    Config to be removed from all existing Visual NER projects:

    <Search name="search" toName="image" placeholder="Search"/>

    text-searchKeyword-based search in text tasks.

    vOCR-search

    Keyword-based search in PDF/image tasks.

    Chunk-based Search in Visual NER Tasks

    In previous versions, users could only run token-based searches at page level. The search feature did not support searching a collection of tokens as a single chunk. With this release, users can find a chunk of tokens in the Visual NER task.

    chunk-search

    Getting Started is Easy

    The NLP Lab is a free tool that can be deployed in a couple of clicks on the AWS and Azure Marketplaces, or installed on-premise with a one-line Kubernetes script. Get started here: https://nlp.johnsnowlabs.com/docs/en/alab/install

    Get Started with NLP Lab

    How useful was this post?

    Try The Generative AI Lab - No-Code Platform For Model Tuning & Validation

    See in action
    Avatar photo
    Ph.D. in Computer Science – Head of Product
    Our additional expert:
    Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.

    Prompts Engineering in the Generative AI Lab

    NLP Lab became the Generative AI Lab. It comes with support for zero-shot learning via prompts. Prompt engineering is a very recent...
    preloader