C#でKuromojiを利用して形態素解析を試してみる

こんにちは。beaglesoftの真鍋です。

ちょっとC#で形態素解析をしたいと思いましてLucene.Net.Analysis.Kuromojiを利用してみました。

www.nuget.org

むかしC#で形態素解析をしたときにはNMeCabを利用しました。

www.nuget.org

とても便利に使わせていただきましたが、残念ながらdotnet coreに対応していません。そのため、以前Javaで利用したことのあるKuromojiのdotnet版がないかなぁと沙汰していたところLucene.Net.Analysis.Kuromojiにたどり着きました。

動作させてみる

テストコードを参考にとりあえず動作するものを作ってみました。いろいろと試してみたのですが、サンプルにあった辞書ファイルを読み込んで辞書の内容で取得できることを試しています。

$ dotnet run

対象の文字列:関西国際空港
---
ICharTermAttribute=>関西
ITermToBytesRefAttribute#BytesRef=>[]
IOffsetAttribute#StartOffset=>0
IOffsetAttribute#EndOffset=>2
IPositionIncrementAttribute=>1
IPositionLengthAttribute=>1
IBaseFormAttribute#GetBaseForm=>
IPartOfSpeechAttribute#GetPartOfSpeech=>テスト名詞
IReadingAttribute#GetReading=>カンサイ
IReadingAttribute#GetPronunciation=>
IInflectionAttribute#GetInflectionForm=>
IInflectionAttribute#GetInflectionType=>
---
---
ICharTermAttribute=>国際
ITermToBytesRefAttribute#BytesRef=>[]
IOffsetAttribute#StartOffset=>2
IOffsetAttribute#EndOffset=>4
IPositionIncrementAttribute=>1
IPositionLengthAttribute=>1
IBaseFormAttribute#GetBaseForm=>
IPartOfSpeechAttribute#GetPartOfSpeech=>テスト名詞
IReadingAttribute#GetReading=>コクサイ
IReadingAttribute#GetPronunciation=>
IInflectionAttribute#GetInflectionForm=>
IInflectionAttribute#GetInflectionType=>
---
---
ICharTermAttribute=>空港
ITermToBytesRefAttribute#BytesRef=>[]
IOffsetAttribute#StartOffset=>4
IOffsetAttribute#EndOffset=>6
IPositionIncrementAttribute=>1
IPositionLengthAttribute=>1
IBaseFormAttribute#GetBaseForm=>
IPartOfSpeechAttribute#GetPartOfSpeech=>テスト名詞
IReadingAttribute#GetReading=>クウコウ
IReadingAttribute#GetPronunciation=>
IInflectionAttribute#GetInflectionForm=>
IInflectionAttribute#GetInflectionType=>
---

ちょっと見づらいですが、関西国際空港が単語単位で分解されています。