當前位置:
首頁 > 知識 > Logstash-5.3.1的安裝導入數據到Elasticsearch5.3.1並配置同義詞過濾

Logstash-5.3.1的安裝導入數據到Elasticsearch5.3.1並配置同義詞過濾

閱讀此文請先閱讀上文:[大數據]-Elasticsearch5.3.1 IK分詞,同義詞/聯想搜索設置,前面介紹了ES,Kibana5.3.1的安裝配置,以及IK分詞的安裝和同義詞設置,這裡主要記錄Logstash導入mysql數據到Elasticsearch5.3.1並設置IK分詞和同義詞。由於logstash配置好JDBC,ES連接之後運行腳本一站式創建index,mapping,導入數據。但是如果我們要配置IK分詞器就需要修改創建index,mapping的配置,下面詳細介紹。


一、Logstash-5.3.1下載安裝:

  • 下載:https://www.elastic.co/cn/downloads/logstash
  • 解壓:tar -zxf logstash-5.3.1.tar.gz
  • 啟動:bin/logstash -e "input { stdin { } } output { stdout {} }" (參數表示終端輸入輸出)如下則成功。
  • Sending Logstash"s logs to /home/rzxes/logstash-5.3.1/logs which is now configured via log4j2.properties [2017-05-16T10:27:36,957][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/home/rzxes/logstash-5.3.1/data/queue"} [2017-05-16T10:27:37,041][INFO ][logstash.agent ] No persistent UUID file found. Generating new UUID {:uuid=>"c987803c-9b18-4395-bbee-a83a90e6ea60", :path=>"/home/rzxes/logstash-5.3.1/data/uuid"} [2017-05-16T10:27:37,581][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>125} [2017-05-16T10:27:37,682][INFO ][logstash.pipeline ] Pipeline main started The stdin plugin is now waiting for input: [2017-05-16T10:27:37,886][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

二、Logstash-5.3.1連接mysql作為數據源,ES作為數據輸出端:

由於此版本的logstash已經集成了jdbc插件,我們只需要添加一個配置文件xxx.conf。內容如下test.conf:

input {
stdin {
}
jdbc {
# 資料庫地址 埠 資料庫名
jdbc_connection_string => "jdbc:mysql://IP:3306/dbname"
# 資料庫用戶名
jdbc_user => "user"
# 資料庫密碼
jdbc_password => "pass"
# mysql java驅動地址
jdbc_driver_library => "/home/rzxes/logstash-5.3.1/mysql-connector-java-5.1.17.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "100000"
# sql 語句文件,也可以直接寫SQL,如statement => "select * from table1"
statement_filepath => "/home/rzxes/logstash-5.3.1/test.sql"
schedule => "* * * * *"
type => "jdbc"
}
}
output {
stdout {
codec => json_lines
}
elasticsearch {
hosts => "192.168.230.150:9200"
index => "test-1" #索引名稱
document_type => "form" #type名稱
document_id => "%{id}" #id必須是待查詢的數據表的序列欄位

} }

創建一個SQL文件:如上配置test.sql內容: select * from table1 test.conf,test.sql文件都在logstash的根目錄下。 運行logstash腳本導入數據: bin/logstash -f test.conf 啟動如下;

Logstash-5.3.1的安裝導入數據到Elasticsearch5.3.1並配置同義詞過濾

等待數據導入完成。開啟Es-head,訪問9100埠如下:

Logstash-5.3.1的安裝導入數據到Elasticsearch5.3.1並配置同義詞過濾

可以看到已經導入了11597條數據。

更多詳細的配置參考官方文檔:plugins-inputs-jdbc-jdbc_driver_library


三、logstash是如何創建index,mapping,並導入數據?

ES導入數據必須先創建index,mapping,但是在logstash中並沒有直接創建,我們只傳入了index,type等參數,logstash是通過es的mapping template來創建的,這個模板文件不需要指定欄位,就可以根據輸入自動生成。在logstash啟動的時候這個模板已經輸出了如下log:

[2017-05-23T15:58:45,801][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#}
[2017-05-23T15:58:45,805][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2017-05-23T15:58:45,979][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword"}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}

  • 添加IK分詞,只需要創建一個json文件: vim /home/rzxes/logstash-5.3.1/template/logstash.json 添加如下內容:
  • { "template": "*", "version": 50001, "settings": { "index.refresh_interval": "5s" }, "mappings": { "_default_": { "_all": { "enabled": true, "norms": false }, "dynamic_templates": [ { "message_field": { "path_match": "message", "match_mapping_type": "string", "mapping": { "type": "text", "norms": false } } }, { "string_fields": { "match": "*", "match_mapping_type": "string", "mapping": { "type": "text", "norms": false, "analyzer": "ik_max_word",#只需要添加這一行即可設置分詞器為ik_max_word "fields": { "keyword": { "type": "keyword" } } } } } ], "properties": { "@timestamp": { "type": "date", "include_in_all": false }, "@version": { "type": "keyword", "include_in_all": false } } } } }

  • 如需配置同義詞,需自定義分詞器,配置同義詞過濾。修改模板logstash.json如下:

  • { "template" : "*", "version" : 50001, "settings" : { "index.refresh_interval" : "5s", #分詞,同義詞配置:自定義分詞器,過濾器,如不配同義詞則沒有index這一部分 "index": { "analysis": { "analyzer": { "by_smart": { "type": "custom", "tokenizer": "ik_smart", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] }, "by_max_word": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] } }, "filter": { "by_tfr": { "type": "stop", "stopwords": [" "] }, "by_sfr": { "type": "synonym", "synonyms_path": "analysis/synonyms.txt" #同義詞路徑 } }, "char_filter": { "by_cfr": { "type": "mapping", "mappings": ["| => |"] } } } } # index --end-- }, "mappings" : { "_default_" : { "_all" : { "enabled" : true, "norms" : false }, "dynamic_templates" : [ { "message_field" : { "path_match" : "message", "match_mapping_type" : "string", "mapping" : { "type" : "text", "norms" : false }} }, { "string_fields" : { "match" : "*", "match_mapping_type" : "string", "mapping" : { "type" : "text", "norms" : false, #選擇分詞器:自定義分詞器,或者ik_mmax_word "analyzer" : "by_max_word", "fields" : { "keyword" : { "type" : "keyword" } } } } } ], "properties" : { "@timestamp" : { "type" : "date", "include_in_all" : false }, "@version" : { "type" : "keyword", "include_in_all" : false } } } } }

  • 有了自定義模板文件,test.conf中配置模板覆蓋使模板生效。test.conf最終配置如下:
  • input { stdin { } jdbc { # 資料庫地址 埠 資料庫名 jdbc_connection_string => "jdbc:mysql://IP:3306/dbname" # 資料庫用戶名 jdbc_user => "user" # 資料庫密碼 jdbc_password => "pass" # mysql java驅動地址 jdbc_driver_library => "/home/rzxes/logstash-5.3.1/mysql-connector-java-5.1.17.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_paging_enabled => "true" jdbc_page_size => "100000" # sql 語句文件 statement_filepath => "/home/rzxes/logstash-5.3.1/mytest.sql" schedule => "* * * * *" type => "jdbc" } } output { stdout { codec => json_lines } elasticsearch { hosts => "192.168.230.150:9200" index => "test-1" document_type => "form" document_id => "%{id}" #id必須是待查詢的數據表的序列欄位 template_overwrite => true template => "/home/rzxes/logstash-5.3.1/template/logstash.json" } }

  • 刪除上次創建的index(由於數據導入時會根據原有數據的index,mapping進行索引創建),重新啟動logstash。
  • 最終在Kibana中檢索關鍵詞 番茄,就會發現西紅柿也會被檢索到。如下圖:
  • 致此logstash數據導入的template重寫就完成了。
  • 另一種方式配置IK分詞:全局配置,不需要自定義模板。
  • curl -XPUT "http://192.168.230.150:9200/_template/rtf" -H "Content-Type: application/json" -d" { "template" : "*", "version" : 50001, "settings" : { "index.refresh_interval" : "5s", "index": { "analysis": { "analyzer": { "by_smart": { "type": "custom", "tokenizer": "ik_smart", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] }, "by_max_word": { "type": "custom", "tokenizer": "ik_max_word", "filter": ["by_tfr","by_sfr"], "char_filter": ["by_cfr"] } }, "filter": { "by_tfr": { "type": "stop", "stopwords": [" "] }, "by_sfr": { "type": "synonym", "synonyms_path": "analysis/synonyms.txt" } }, "char_filter": { "by_cfr": { "type": "mapping", "mappings": ["| => |"] } } } } }, "mappings" : { "_default_" : { "_all" : { "enabled" : true, "norms" : false }, "dynamic_templates" : [ { "message_field" : { "path_match" : "message", "match_mapping_type" : "string", "mapping" : { "type" : "text", "norms" : false }} }, { "string_fields" : { "match" : "*", "match_mapping_type" : "string", "mapping" : { "type" : "text", "norms" : false, "analyzer" : "by_max_word", "fields" : { "keyword" : { "type" : "keyword" } } } } } ], "properties" : { "@timestamp" : { "type" : "date", "include_in_all" : false }, "@version" : { "type" : "keyword", "include_in_all" : false } } } } }"

  • 可以使用curl查看模板: curl -XGET "http://192.168.230.150:9200/_template"

喜歡這篇文章嗎?立刻分享出去讓更多人知道吧!

本站內容充實豐富,博大精深,小編精選每日熱門資訊,隨時更新,點擊「搶先收到最新資訊」瀏覽吧!


請您繼續閱讀更多來自 達人科技 的精彩文章:

收集的常用簡單的演算法
springcloud(七):配置中心svn示例和refresh
Neo4j 第四篇:使用C更新和查詢Neo4j
mui.openWindow+自定義事件監聽操作讓alert()執行兩次
Spring Boot 聲明式事務結合相關攔截器

TAG:達人科技 |

您可能感興趣

Salesforce數據現在可以導入到Google Analytics 360中了
Zzreal的大數據筆記-StormDay02
Spread Studio 表格控制項V11.1發布,讓數據用 Excel 的方式說話
VR版Excel,數據可視化Virtualitics完成700萬美元B輪融資
Accolade的新款ANIC-200Kq,在100GbE帶寬下的數據處理的性能怪獸!
Slice數據:HomePod音箱45%的購買者為iPhone X用戶
Python領先優勢,PyTorch僅佔6.4%:2018年數據科學語言&工具排名
黑客竊取了MyFitnessPal用戶的數據多達1.5億
頭條:Saks,Lord&Taylor數據泄露影響500萬支付卡
talmo-design-cables數據線品牌與包裝設計
Spring Boot與Kotlin使用Spring-data-jpa簡化數據訪問層
數據分析平台搭建——Win10下tensorflow安裝
數據報告:Windows 10比Windows 7更安全
Python擴大領先優勢,PyTorch僅佔6.4%:2018年數據科學語言&工具排名
數據分析和繪圖:OriginPro 2018 SR1
微軟推出Surface Hub 2 泄露 300萬Facebook 用戶數據的個性測試應用
泄露5000萬用戶數據,WhatsApp創始人憤怒:是時候刪除Facebook了
Section 14-Halcon實戰寶典之數據結構
HomePod訂購數據:45%購買者有iPhoneX
PM2.5感測器+labview數據採集卡