Metadata-Version: 2.1
Name: snowballstemmer
Version: 2.2.0
Summary: This package provides 29 stemmers for 28 languages generated from Snowball algorithms.
Home-page: https://github.com/snowballstem/snowball
Author: Snowball Developers
Author-email: snowball-discuss@lists.tartarus.org
License: BSD-3-Clause
Description: Snowball stemming library collection for Python
        ===============================================
        
        Python 3 (>= 3.3) is supported.  We no longer actively support Python 2 as
        the Python developers stopped supporting it at the start of 2020.  Snowball
        2.1.0 was the last release to officially support Python 2.
        
        What is Stemming?
        -----------------
        
        Stemming maps different forms of the same word to a common "stem" - for
        example, the English stemmer maps *connection*, *connections*, *connective*,
        *connected*, and *connecting* to *connect*.  So a searching for *connected*
        would also find documents which only have the other forms.
        
        This stem form is often a word itself, but this is not always the case as this
        is not a requirement for text search systems, which are the intended field of
        use.  We also aim to conflate words with the same meaning, rather than all
        words with a common linguistic root (so *awe* and *awful* don't have the same
        stem), and over-stemming is more problematic than under-stemming so we tend not
        to stem in cases that are hard to resolve.  If you want to always reduce words
        to a root form and/or get a root form which is itself a word then Snowball's
        stemming algorithms likely aren't the right answer.
        
        How to use library
        ------------------
        
        The ``snowballstemmer`` module has two functions.
        
        The ``snowballstemmer.algorithms`` function returns a list of available
        algorithm names.
        
        The ``snowballstemmer.stemmer`` function takes an algorithm name and returns a
        ``Stemmer`` object.
        
        ``Stemmer`` objects have a ``Stemmer.stemWord(word)`` method and a
        ``Stemmer.stemWords(word[])`` method.
        
        .. code-block:: python
        
           import snowballstemmer
        
           stemmer = snowballstemmer.stemmer('english');
           print(stemmer.stemWords("We are the world".split()));
        
        Automatic Acceleration
        ----------------------
        
        `PyStemmer <https://pypi.org/project/PyStemmer/>`_ is a wrapper module for
        Snowball's ``libstemmer_c`` and should provide results 100% compatible to
        **snowballstemmer**.
        
        **PyStemmer** is faster because it wraps generated C versions of the stemmers;
        **snowballstemmer** uses generate Python code and is slower but offers a pure
        Python solution.
        
        If PyStemmer is installed, ``snowballstemmer.stemmer`` returns a ``PyStemmer``
        ``Stemmer`` object which provides the same ``Stemmer.stemWord()`` and
        ``Stemmer.stemWords()`` methods.
        
        Benchmark
        ~~~~~~~~~
        
        This is a crude benchmark which measures the time for running each stemmer on
        every word in its sample vocabulary (10,787,583 words over 26 languages).  It's
        not a realistic test of normal use as a real application would do much more
        than just stemming.  It's also skewed towards the stemmers which do more work
        per word and towards those with larger sample vocabularies.
        
        * Python 2.7 + **snowballstemmer** : 13m00s (15.0 * PyStemmer)
        * Python 3.7 + **snowballstemmer** : 12m19s (14.2 * PyStemmer)
        * PyPy 7.1.1 (Python 2.7.13) + **snowballstemmer** : 2m14s (2.6 * PyStemmer)
        * PyPy 7.1.1 (Python 3.6.1) + **snowballstemmer** : 1m46s (2.0 * PyStemmer)
        * Python 2.7 + **PyStemmer** : 52s
        
        For reference the equivalent test for C runs in 9 seconds.
        
        These results are for Snowball 2.0.0.  They're likely to evolve over time as
        the code Snowball generates for both Python and C continues to improve (for
        a much older test over a different set of stemmers using Python 2.7,
        **snowballstemmer** was 30 times slower than **PyStemmer**, or 9 times slower
        with **PyPy**).
        
        The message to take away is that if you're stemming a lot of words you should
        either install **PyStemmer** (which **snowballstemmer** will then automatically
        use for you as described above) or use PyPy.
        
        The TestApp example
        -------------------
        
        The ``testapp.py`` example program allows you to run any of the stemmers
        on a sample vocabulary.
        
        Usage::
        
           testapp.py <algorithm> "sentences ... "
        
        .. code-block:: bash
        
           $ python testapp.py English "sentences... "
        
Keywords: stemmer
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: Arabic
Classifier: Natural Language :: Basque
Classifier: Natural Language :: Catalan
Classifier: Natural Language :: Danish
Classifier: Natural Language :: Dutch
Classifier: Natural Language :: English
Classifier: Natural Language :: Finnish
Classifier: Natural Language :: French
Classifier: Natural Language :: German
Classifier: Natural Language :: Greek
Classifier: Natural Language :: Hindi
Classifier: Natural Language :: Hungarian
Classifier: Natural Language :: Indonesian
Classifier: Natural Language :: Irish
Classifier: Natural Language :: Italian
Classifier: Natural Language :: Lithuanian
Classifier: Natural Language :: Nepali
Classifier: Natural Language :: Norwegian
Classifier: Natural Language :: Portuguese
Classifier: Natural Language :: Romanian
Classifier: Natural Language :: Russian
Classifier: Natural Language :: Serbian
Classifier: Natural Language :: Spanish
Classifier: Natural Language :: Swedish
Classifier: Natural Language :: Tamil
Classifier: Natural Language :: Turkish
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Database
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Text Processing :: Indexing
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/x-rst
