image thumbnail

Directly Extract Part Of Speech (POS) Information from MeCab

version 1.0.0 (20.8 KB) by Toru Ikegami
The functions in this repository enable to extract direct output of MeCab tokenizer. 形態素解析器MeCabの品詞分類を直接読み出すためのラッパー関数です.

38 Downloads

Updated 07 Jul 2020

View License

<Japanese: English version follows.>
MATLABのText Analytics Toolboxでは日本語の形態素解析器としてMeCabが使われていますが,結果として得られる単語の品詞(POS,Part of Speech)は15種類に絞られています.品詞情報を用いて単語の選別を行う際に,MeCabが提供するきめ細かい品詞情報を(69種)を使えないのはなんとも勿体ないなあと思い簡単な関数を作成しました.

tokenizedDocumentJP.m
形態素解析を行う関数です.通常の` tokenizedDocument `を呼ぶのと同様に使えます.ただし,出力される `tokenizedDocument `オブジェクトに対して,関数 `normalizeWords` を使用して原形を取得することは出来ません.また,形態素解析オプション` mecabOptions `の` LemmaExtractor `を指定している場合にはその指定は無効になります.これらは,関数の内部で `LemmaExtractor` として使用する関数を書き換えてしまうことに起因する制限です.

tokenDetailsJP.m
トークンの詳細情報をテーブル変数に取り出す関数です.こちらも通常の tokenDetails と同じように使用できます.

<English>
The functions in this repository are wrappers of the two functions `tokenizedDocument` and `tokenDetails` that enable to extract direct output of MeCab tokenizer.

The Japanese tokenizers (MeCab) used in MATLAB Text Analytics Toolbox consolidates POS output (69 kinds) from the MeCab tokenizer into fifteen kinds of POS that commonly used.

tokenizedDocumentJP.m
The tokenizer. The usage is same as the normal tokenizedDocument. Note that you cannot lemmatize the `tokenizedDocument` object created by this function with the function `normalizeWords`, since it replaces the Lemma information of the `tokenziedDocument` with the detailed POS information that MeCab generates. In addition, if you set `LemmaExtractor` property of the `mecabOptions` object to be used with `tokenizedDocumentJP`, the LemmaExtractor property will be ignored.

tokenDetailsJP.m
The function to extract details from the `tokenizedDocument`. It can be used as same manner as we use the function `tokenDetails`.

Cite As

Toru Ikegami (2021). Directly Extract Part Of Speech (POS) Information from MeCab (https://www.mathworks.com/matlabcentral/fileexchange/77870-directly-extract-part-of-speech-pos-information-from-mecab), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2020a
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!