tokenizer_emoticons: 表情符号的分词器

不同的文本分词函数。

# 代码片段
从 mlxtend.text 导入 tokenizer_emoticons  
从 mlxtend.text 导入 tokenizer_words_and_emoticons

概述

用于自然语言处理任务（例如构建文本分类的词袋模型）的一些不同的文本标记化函数。

参考文献

示例 1 - 提取表情符号

from mlxtend.text import tokenizer_emoticons

tokenizer_emoticons('</a>This :) is :( a test :-)!')

[':)', ':(', ':-)']

示例 2 - 提取单词和表情符号

from mlxtend.text import tokenizer_words_and_emoticons

tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')

['this', 'is', 'a', 'test', ':)', ':(', ':-)']

API

tokenizer_emoticons(text)

Return emoticons from text

Examples

    >>> tokenizer_emoticons('</a>This :) is :( a test :-)!')
    [':)', ':(', ':-)']

    For usage examples, please see
    https://rasbt.github.io/mlxtend/user_guide/text/tokenizer_emoticons/

tokenizer_words_and_emoticons(text)

Convert text to lowercase words and emoticons.

Examples

    >>> tokenizer_words_and_emoticons('</a>This :) is :( a test :-)!')
    ['this', 'is', 'a', 'test', ':)', ':(', ':-)']

    For more usage examples, please see
    https://rasbt.github.io/mlxtend/user_guide/text/tokenizer_words_and_emoticons/