斯坦福大學發布 StanfordNLP，支持多種語言

新聞 02-11

雷鋒網 AI 科技評論按，近日，斯坦福大學發布了一款用於 NLP 的 Python 官方庫，這個庫可以適用於多種語言，其地址是：https://stanfordnlp.github.io/stanfordnlp/，github 資源如下：

這是 Stanford 官方發布的 NLP 庫，詳細信息請訪問：https://stanfordnlp.github.io/stanfordnlp/

說明

如果在研究中使用了他們的神經管道，可以參考他們的 CoNLL 2018 共享任務系統描述文件：

@inproceedings{qi2018universal,
address = {Brussels, Belgium},
author = {Qi, Peng and Dozat, Timothy and Zhang, Yuhao and Manning, Christopher D.},
booktitle = {Proceedings of the {CoNLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},
month = {October},
pages = {160--170},
publisher = {Association for Computational Linguistics},
title = {Universal Dependency Parsing from Scratch},
url = {https://nlp.stanford.edu/pubs/qi2018universal.pdf},
year = {2018}

}

但是，這個版本和 Stanford 大學的 CoNLL 2018 共享任務系統不一樣。在這裡，標記解析器、詞性還原器、形態學特性和多詞術語系統是共享任務代碼系統的一個簡潔版本，但是作為對比，還使用了 Tim Dozat 的 Tensorflow 版本的標記器和解析器。PyTorch 中大體上對這個版本的代碼進行了複製，儘管與原始版本有一些不同。雷鋒網

啟動

StanfordNLP 支持 Python3.6 及其以上版本。最好的辦法是從 PyPI 安裝 StanfordNLP，如果已經安裝了 pip，那麼只需要運行：

pip install stanfordnlp

這也有助於解決 StanfordNLP 的所有依賴，例如對 PyTorch 1.0.0 或者更高版本的依賴。

還有一個辦法，是從 github 存儲庫的源代碼安裝，這可以使基於 StanfordNLP 的開發和模型訓練具有更大的靈活性。雷鋒網

git clone git@github.com:stanfordnlp/stanfordnlp.git
cd stanfordnlp
pip install -e .

運行 StanfordNLP

從神經管道開始

要運行第一個 StanfordNLP 管道，只需在 python 互動式解釋器中執行以下步驟：

>>> import stanfordnlp
>>> stanfordnlp.download("en") # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies

最後一個命令將列印輸入字元串（或文檔，如 StanfordNLP 所示）中第一個句子中的單詞，以及該句子中單詞的索引，以及單詞之間的依賴關係。輸出應如下所示：

("Barack", "4", "nsubj:pass")

("Obama", "1", "flat")
("was", "4", "aux:pass")
("born", "0", "root")
("in", "6", "case")
("Hawaii", "4", "obl")
(".", "4", "punct")

訪問 Java Stanford CoreNLP 伺服器

除了神經管道之外，這個項目還包括一個用 Python 代碼訪問 Java Stanford CaleNLP 伺服器的官方類。

有幾個初始設置步驟：