The Spark NLP Blog

Scaling Up Text Analysis: Best Practices with Spark NLP n-gram Generation

by

David Cecchini

Spark NLP offers a powerful Python library for scalable text analysis tasks, and its NGramGenerator annotator simplifies n-gram generation. By following best practices, including setting up Spark NLP, loading and...

Text cleaning: removing stopwords from text with Spark NLP

by

Gursev Pirge

Stopwords removal in natural language processing (NLP) is the process of eliminating words that occur frequently in a language but carry little or no meaning. Removing stop words is useful...

Boost Your NLP Results with Spark NLP Stemming and Lemmatizing Techniques

by

David Cecchini

Stemming and lemmatization are vital techniques in NLP for transforming words into their base or root forms. Spark NLP provides powerful capabilities for stemming and lemmatization, enabling researchers and practitioners...

Text Cleaning: Standard Text Normalization with Spark NLP

by

Gursev Pirge

The Normalizer annotator in Spark NLP performs text normalization on data. It is used to remove all dirty characters from text following a regex pattern, convert text to lowercase, remove...

Unlock the Power of BERT-based Models for Advanced Text Classification in Python

by

Gursev Pirge

Sequence classification with transformers refers to the task of predicting a label or category for a sequence of data, which can include text as well as other types of data...

The Technology

The Technology in Action

Industry Trends

Spark NLP Blog

Scaling Up Text Analysis: Best Practices with Spark NLP n-gram Generation

Text cleaning: removing stopwords from text with Spark NLP

Boost Your NLP Results with Spark NLP Stemming and Lemmatizing Techniques

Text Cleaning: Standard Text Normalization with Spark NLP

Unlock the Power of BERT-based Models for Advanced Text Classification in Python