清華老師推薦30來項演算法代碼和工具包列表

知識 03-27

清華劉志遠老師花半天功夫整理了最近幾年和同學努力開源的三十來項演算法代碼和工具包列表。包括知識表示、關係抽取、義原計算、語義表示、網路表示、文本處理等各種任務，基本都放在GitHub上了。

Highlight Packages

THULAC: An Efficient Lexical Analyzer for Chinese.

[home]http://thulac.thunlp.org/

[Git C++]https://github.com/thunlp/thulac

[Git Java]https://github.com/thunlp/THULAC-Java

[Git Python]https://github.com/thunlp/THULAC-Python

THUCTC: An Efficient Chinese Text Classifier.

[home]：http://thuctc.thunlp.org/

[Git Java]https://github.com/thunlp/THUCTC

THUOCL: Open Chinese Lexicon.

[home]http://thuocl.thunlp.org/target=_blank

OpenKE: An Open-Source Package for Knowledge Embedding (KE).

[home]http://openke.thunlp.org/

[Git]https://github.com/thunlp/OpenKE

OpenNE: An Open-Source Package for Network Embedding (NE).

[Git]https://github.com/thunlp/OpenNE

Knowledge Graph and Relation Extraction

NRE: An Open-Source Package for Neural Relation Extraction.

[Git]https://github.com/thunlp/NRE

[TensorFlow Version]https://github.com/thunlp/TensorFlow-NRE

Neural relation extraction aims to extract relations from plain text with neural models, which has been the state-of-the-art methods for relation extraction. In this package, we provide our implementations of CNN [Zeng et al., 2014] and PCNN [Zeng et al.,2015] and their extended version with sentence-level attention scheme [Lin et al., 2016].

JointNRE: Joint Neural Relation Extraction with Text and KGs.

[Git]https://github.com/thunlp/JointNRE

This is the lab code of our AAAI 2018 paper "Neural Knowledge Acquisition via Mutual Attention between Knowledge Graph and Text".

PathNRE: Neural Relation Extraction with Relation Paths.

[Git]https://github.com/thunlp/PathNRE

This is the lab code of our EMNLP 2017 paper "Incorporating Relation Paths in Neural Relation Extraction".

Neural Entity Alignment.

[Git]https://github.com/thunlp/IEAJKE

This is the lab code of our IJCAI 2017 paper "Iterative Entity Alignment via Joint Knowledge Embeddings".

Neural Entity Typing.

[Git]https://github.com/thunlp/KNET

This is the lab code of our AAAI 2018 paper "Improving Neural Fine-Grained Entity Typing with Knowledge Attention".

Knowledge Representation Learning

OpenKE: An Open-Source Package for Knowledge Embedding (KE).

[Git]https://github.com/thunlp/OpenKE

KRLPapers: Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE).

[Git]https://github.com/thunlp/KRLPapers

TransX: An Efficient implementation of TransE and its extended models for Knowledge Representation Learning.

[Git]https://github.com/thunlp/Fast-TransX

[TensorFlow Version] https://github.com/thunlp/TensorFlow-TransX

KB2E: A package of Knowledge Base to Embeddings.

[Git]https://github.com/thunlp/KB2E

The package contains state-of-the-art knowledge representation learning methods including TransE, TransH, TransR and PTransE.

KR-EAR: Knowledge Representation Learning with Entities, Attributes and Relations. [Git]

This is the lab code of our IJCAI 2016 paper "Knowledge Representation Learning with Entities, Attributes and Relations".

CKRL: Confidence-aware Knowledge Representation Learning.

[Git]https://github.com/thunlp/CKRL

This is the lab code of our AAAI 2018 paper "Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence". The method is expected to support robust knowledge representation learning with noisy triples.

IKRL: Image-embodied Knowledge Representation Learning.

[Git]https://github.com/thunlp/IKRL

This is the lab code of our IJCAI 2017 paper "Image-embodied Knowledge Representation Learning". The method is expected to support knowledge representation learning with entity images.

TKRL: Type-embodied Knowledge Representation Learning

[Git]https://github.com/thunlp/TKRL

This is the lab code of our IJCAI 2016 paper "Representation Learning of Knowledge Graphs with Hierarchical Types". The method is expected to support knowledge representation learning with hierarchical types of entities.

DKRL: Description-embodied Knowledge Representation Learning.

[Git]https://github.com/thunlp/DKRL

This is the lab code of our AAAI 2016 paper "Representation Learning of Knowledge Graphs with Entity Descriptions". The method is expected to support knowledge representation learning with entity descriptions.

Network Representation Learning

OpenNE: An Open-Source Package for Network Embedding (NE).

[Git]https://github.com/thunlp/OpenNE

NRLPapers: Must-read papers on network representation learning (NRL) / network embedding (NE).

[Git]https://github.com/thunlp/NRLPapers

TransNet: Translation-Based Network Representation Learning.

[Git]https://github.com/thunlp/TransNet

This is the lab code of our IJCAI 2017 paper "TransNet: Translation-Based Network Representation Learning for Social Relation Extraction". The method is expected to model social networks by regarding relations as the translation between vertices.

NEU: Fast Network Embedding.

[Git]https://github.com/thunlp/NEU

This is the lab code of our IJCAI 2017 paper "Fast Network Embedding Enhancement via High Order Proximity Approximation". The method is expected to speed up network embedding by approximate update algorithm.

CANE: Context-Aware Network Embedding.

[Git] https://github.com/thunlp/CANE

This is the lab code of our ACL 2017 paper "CANE: Context-Aware Network Embedding for Relation Modeling". The method is expected to support context-aware network representation learning and model asymmetric relations.

MMDW: Max-Margin DeepWalk.

[Git]https://github.com/thunlp/MMDW

This is the lab code of our IJCAI 2016 paper "Max-Margin DeepWalk: Discriminative Learning of Network Representation". The method is expected to support discriminative network representation learning with node labels.

TADW: Text-Associated DeepWalk.

[Git]https://github.com/thunlp/TADW

This is the lab code of our IJCAI 2015 paper "Network Representation Learning with Rich Text Information". The method is expected to support network representation learning with rich text information within each node. The code requires a 64-bit linux machine with MATLAB installed.

Sememe-Driven NLP

SE-WRL: Improved Word Representation Learning with Sememes.

[Git]https://github.com/thunlp/SE-WRL

This is the lab code of our ACL 2017 paper "Improved Word Representation Learning with Sememes". Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. We proposed the improved word representation learning method with sememe knowledge annotated in HowNet.

Lexical Sememe Prediction.

[Git]https://github.com/thunlp/sememe_prediction

This is the lab code of our IJCAI 2017 paper "Lexical Sememe Prediction via Word Embeddings and Matrix Factorization".

Chinese LIWC Lexicon Expansion: Online Interpretable Word Embeddings.

[Git]https://github.com/thunlp/Auto_CLIWC

This is the lab code of our AAAI 2018 paper "Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention".

Language Representation Learning

CWE: Character Word Embeddings.

[Git]https://github.com/Leonard-Xu/CWE

This is the lab code of our IJCAI 2015 paper "Joint Learning of Character and Word Embeddings". This method is expected to learn Chinese word embeddings by taking those characters within words into consideration. The analogical reasoning dataset on Chinese is available in data folder.

CLWE: Cross-Lingual Word Embeddings.

[home]http://nlp.csai.tsinghua.edu.cn/~lzy/src/acl2015_bilingual.html

This is the lab code of our ACL 2015 short paper "Learning Cross-lingual Word Embeddings via Matrix Co-factorization". This method is expected to learn cross-lingual word embeddings with a matrix co-factorization framework.

OIWE: Online Interpretable Word Embeddings.

[Git]https://github.com/SkTim/OIWE

This is the lab code of our EMNLP 2015 short paper "Online Learning of Interpretable Word Embeddings". This method is expected to learn interpretable word embeddings based on OIWE-IPG model proposed in our paper.

TWE: Topical Word Embeddings.

[Git]https://github.com/thunlp/topical_word_embeddings

This is the lab code of our AAAI 2015 paper "Topical Word Embeddings". The method is expected to perform representation learning of words with their topic assignments by latent topic models such as Latent Dirichlet Allocation.

General NLP

THUCKE: An Open-Source Package for Chinese Keyphrase Extraction.

[Git]https://github.com/thunlp/THUCKE

The package can efficiently extract Chinese keyphrases by translating from documents to keyphrases, learned by word alignment models (WAM) that we propoased in[EMNLP][CoNLL].

TensorFlow-Summarization: An Open-Source Package for Neural Headline Generation. [Git]

https://github.com/thunlp/TensorFlow-Summarization

This is an implementation of sequence-to-sequence model using a bidirectional GRU encoder and a GRU decoder. This project aims to help people start working on Abstractive Short Text Summarization immediately. And hopefully, it may also work on machine translation tasks.

THUNSC: An Open-Source Package for Neural Sentiment Classification.

[Git]https://github.com/thunlp/NSC

Neural Sentiment Classification aims to classify the sentiment in a document with neural models, which has been the state-of-the-art methods for sentiment classification. In this package, we provide our implementations of NSC, NSC+LA and NSC+UPA[Chen et al., 2016] in which user and product information is considered via attentions over different semantic levels.

THUTAG: An Open-Source Package for Keyphrase Extraction and Social Tag Suggestion. [Git]

https://github.com/thunlp/THUTag

The package contains several keyphrase extraction methods including TextRank, ExpandRank, Topical PageRank and WAM, and social tag suggestion methods including KNN, PMI, TagLDA, TAM and WTM. The package has supported one of the most popular microblog apps, Weibo Keywords, which has got more than 3.5 million registered users.

PLDA+: An Open-Source Package for Parallel LDA.

[Git]https://code.google.com/archive/p/plda/

PLDA is a parallel C++ implementation of Latent Dirichlet Allocation (LDA). We present a highly optimized parallel implemention of the Gibbs sampling algorithm for the training/inference of LDA. The carefully designed architecture is expected to support extensions of this algorithm. PLDA+, an enhanced parallel implementation of LDA, can further improve scalability of LDA by signi?cantly reducing the unparallelizable communication bottleneck and achieve good load balancing.

原文：http://nlp.csai.tsinghua.edu.cn/~lzy/codes.html

－馬上學習AI挑戰百萬年薪－

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 AI講堂 的精彩文章:

※Google用「圍棋」再次敲開中國大門：證明「AI」能征服世界！
※強化學習——蒙特卡洛

TAG:AI講堂 |