基於網路的高維生物數據熱圖可視化和分析工具
論文標題:Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data
作者:Nicolas F. Fernandez, Gregory W. Gundersen, Adeeb Rahman, Mark L. Grimes, Klarisa Rikova, Peter Hornbeck, Avi Ma』ayan
數字識別碼:10.1038/sdata.2017.151
生物醫學研究領域高通量實驗方法的多樣性正在迅速增長。儘管數據採集的速度不斷加快,我們從這些數據中得出有效結論的能力卻有所滯後。數據可視化是生物數據初步分析的主要步驟,一些常用的降維方法,例如主分量分析(PCA)和t分布領域嵌入演算法(t-SNE),常常被用於將高維數據映射至二維或三維空間中,以實現可視化。然而,由高維空間向低維空間轉換的代價很大,經常導致信息的丟失。與之相對,聚類圖(clustergram)或者熱圖技術可直接將數據進行可視化而無需降維。聚類圖易於解釋,廣泛應用於生物數據可視化的印刷出版物中。
大多數分層聚類熱圖可視化工具產生的都是靜態圖像。發表在《科學數據》的一篇研究Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data中,來自美國依坎醫學院的AviMa』ayan及同事向我們詳細介紹了Clustergrammer。Clustergrammer是一個基於網路的可視化工具,擁有許多交互特徵,如縮放、平移、過濾、重排、分享、執行富集分析、提供動態基因注釋等。Clustergrammer可通過向網站上傳數據表或嵌入至Jupyter Notebooks,產生可分享的互動式可視化結果。開發者甚至可以將Clustergrammer的核心庫作為一個工具箱,嵌入自己的應用程序來實現可視化。
作者利用不同的數據演示了Clustergrammer的功能:癌細胞系百科全書(CCLE)的基因表達數據,利用質譜分析法從肺癌細胞系採集的原始翻譯後修飾數據,以及質譜流式細胞儀測得血液中單細胞蛋白質組的原始數據。結果證明,Clustergrammer可分析多種生物數據,併產生基於網路的互動式可視化圖表。
圖1. 互動式熱圖工具特徵對比
摘要:Most tools developed to visualize hierarchically clustered heatmaps generate static images. Clustergrammer is a web-based visualization tool with interactive features such as: zooming, panning, filtering, reordering, sharing, performing enrichment analysis, and providing dynamic gene annotations. Clustergrammer can be used to generate shareable interactive visualizations by uploading a data table to a web-site, or by embedding Clustergrammer in Jupyter Notebooks. The Clustergrammer core libraries can also be used as a toolkit by developers to generate visualizations within their own applications. Clustergrammer is demonstrated using gene expression data from the cancer cell line encyclopedia (CCLE), original post-translational modification data collected from lung cancer cells lines by a mass spectrometry approach, and original cytometry by time of flight (CyTOF) single-cell proteomics data from blood. Clustergrammer enables producing interactive web based visualizations for the analysis of diverse biological data.
期刊介紹:Scientific Data(https://www.nature.com/sdata/) is a peer-reviewed, open-access journal for descriptions of scientifically valuable datasets, and research that advances the sharing and reuse of scientific data. Scientific Data welcomes submissions from a broad range of research disciplines, including descriptions of big or small datasets, from major consortiums to single research groups. Scientific Data primarily publishes Data Descriptors, a new type of publication that focuses on helping others reuse data, and crediting those who share.
The 2017 journal metrics for Scientific Data are as follows:
?2-year impact factor: 5.305
?5-year impact factor: 5.862
?Immediacy index: 0.843
?Eigenfactor? score: 0.00855
?Article Influence Score: 2.597
?2-year Median: 2
※「手機基站輻射」之爭,中斷運營並非解決之道
※2018年全球Wi-Fi經濟價值近2萬億美元
TAG:今日科學 |