圖解機器學習：神經網路和 TensorFlow 的文本分類

知識 08-06

（點擊

上方藍字

，快速關注我們）

編譯：開源中國

www.oschina.net/translate/big-picture-machine-learning

如有好文章投稿，請點擊 → 這裡了解詳情

開發人員經常說，如果你想開始機器學習，你應該首先學習演算法。但是我的經驗則不是。

我說你應該首先了解：應用程序如何工作。一旦了解了這一點，深入探索演算法的內部工作就會變得更加容易。

那麼，你如何開發直覺學習，並實現理解機器學習這個目的？一個很好的方法是創建機器學習模型。

假設您仍然不知道如何從頭開始創建所有這些演算法，您可以使用一個已經為您實現所有這些演算法的庫。那個庫是 TensorFlow。

在本文中，我們將創建一個機器學習模型來將文本分類到類別中。我們將介紹以下主題：

TensorFlow 的工作原理

什麼是機器學習模型

什麼是神經網路

神經網路如何學習

如何操作數據並將其傳遞給神經網路

如何運行模型並獲得預測結果

你可能會學到很多新東西，所以讓我們開始吧！

TensorFlow

TensorFlow 是一個機器學習的開源庫，由 Google 首創。庫的名稱幫助我們理解我們怎樣使用它：tensors 是通過圖的節點流轉的多維數組。

tf.Graph

在 TensorFlow 中的每一個計算都表示為數據流圖，這個圖有兩類元素：

一類 tf.Operation，表示計算單元

一類 tf.Tensor，表示數據單元

要查看這些是怎麼工作的，你需要創建這個數據流圖：

（計算x+y的圖）

你需要定義 x = [1,3,6] 和 y = [1,1,1]。由於圖用 tf.Tensor 表示數據單元，你需要創建常量 Tensors：

import

tensorflow

constant

([

])

constant

([

])

現在你將定義操作單元：

import

tensorflow

constant

([

])

constant

([

])

add

(

)

你有了所有的圖元素。現在你需要構建圖：

import

tensorflow

my_graph

Graph

()

with

my_graph

as_default

()

constant

([

])

constant

([

])

add

(

)

這是 TensorFlow 工作流的工作原理：你首先要創建一個圖，然後你才能計算（實際上是用操作『運行』圖節點）。你需要創建一個 tf.Session 運行圖。

tf.Session

tf.Session 對象封裝了 Operation 對象的執行環境。Tensor 對象是被計算過的（從文檔中）。為了做到這些，我們需要在 Session 中定義哪個圖將被使用到：

import

tensorflow

my_graph

Graph

()

with

Session

(

graph

my_graph

)

sess

constant

([

])

constant

([

])

add

(

)

為了執行操作，你需要使用方法 tf.Session.run()。這個方法通過運行必要的圖段去執行每個 Operation 對象並通過參數 fetches 計算每一個 Tensor 的值的方式執行 TensorFlow 計算的一』步』：

import

tensorflow

my_graph

Graph

()

with

Session

(

graph

my_graph

)

sess

constant

([

])

constant

([

])

add

(

)

result

sess

run

(

fetches

)

(

result

)

>>>

[

]

預測模型

現在你知道了 TensorFlow 的工作原理，那麼你得知道怎樣創建預測模型。簡而言之

機器學習演算法+數據=預測模型

構建模型的過程就是這樣：

（構建預測模型的過程）

正如你能看到的，模型由數據「訓練過的」機器學習演算法組成。當你有了模型，你就會得到這樣的結果：

（預測工作流）

你創建的模型的目的是對文本分類，我們定義了：

input

: text,

result

: category

我們有一個使用已經標記過的文本（每個文本都有了它屬於哪個分類的標記）訓練的數據集。在機器學習中，這種任務的類型是被稱為監督學習。

「我們知道正確的答案。該演算法迭代的預測訓練數據，並由老師糾正

」?—?Jason Brownlee

你會把數據分成類，因此它也是一個分類任務。

為了創建這個模型，我們將會用到神經網路。

神經網路

神經網路是一個計算模型（一種描述使用機器語言和數學概念的系統的方式）。這些系統是自主學習和被訓練的，而不是明確編程的。

神經網路是也從我們的中樞神經系統受到的啟發。他們有與我們神經相似的連接節點。

感知器是第一個神經網路演算法。這篇文章很好地解釋了感知器的內部工作原理（「人工神經元內部」的動畫非常棒）。

為了理解神經網路的工作原理，我們將會使用 TensorFlow 建立一個神經網路架構。在這個例子中，這個架構被 Aymeric Damien 使用過。

神經網路架構

神經網路有兩個隱藏層（你得選擇網路會有多少隱藏層，這是結構設計的一部分）。每一個隱藏層的任務是把輸入的東西轉換成輸出層可以使用的東西。

隱藏層 1

（輸入層和第一個隱藏層）

你也需要定義第一個隱藏層會有多少節點。這些節點也被稱為特徵或神經元，在上面的例子中我們用每一個圓圈表示一個節點。

輸入層的每個節點都對應著數據集中的一個詞（之後我們會看到這是怎麼運行的）

如這裡所述，每個節點（神經元）乘以一個權重。每個節點都有一個權重值，在訓練階段，神經網路會調整這些值以產生正確的輸出（過會，我們將會學習更多關於這個的信息）

除了乘以沒有輸入的權重，網路也會增加一個誤差 (在神經網路中誤差的角色)。

在你的架構中，將輸入乘以權重並將值與偏差相加，這些數據也要通過激活函數傳遞。這個激活函數定義了每個節點的最終輸出。比如說：想像一下，每一個節點是一盞燈，激活函數決定燈是否會亮。

有很多類型的激活函數。你將會使用 Rectified Linear Unit (ReLu)。這個函數是這樣定義的：

f(x) = max(0,x) [輸出 x 或者 0（零）中最大的數]

例如：如果 x = -1, f(x) = 0(zero); 如果 x = 0.7, f(x) = 0.7.

隱藏層 2

第二個隱藏層做的完全是第一個隱藏層做的事情，但現在第二層的輸入是第一層的輸出。

(第一和第二隱藏層)

輸出層

現在終於到了最後一層，輸出層。你將會使用 One-Hot 編碼得到這個層的結果。在這個編碼中，只有一個比特的值是 1，其他比特的值都是 0。例如，如果我們想對三個分類編碼(sports, space 和computer graphics)編碼：

因此輸出節點的編號是輸入的數據集的分類的編號。

輸出層的值也要乘以權重，並我們也要加上誤差，但是現在激活函數不一樣。

你想用分類對每一個文本進行標記，並且這些分類相互獨立（一個文本不能同時屬於兩個分類）。考慮到這點，你將使用 Softmax 函數而不是 ReLu 激活函數。這個函數把每一個完整的輸出轉換成 0 和 1 之間的值，並且確保所有單元的和等於一。這樣，輸出將告訴我們每個分類中每個文本的概率。

1.2

0.46

0.9

[

softmax

]

0.34

0.4

0.20

現在有了神經網路的數據流圖。把我們所看到的都轉換為代碼，結果是：

# Network Parameters

n_hidden_1

# 1st layer number of features

n_hidden_2

# 2nd layer number of features

n_input

total_words

# Words in vocab

n_classes

# Categories: graphics, space and baseball

def

multilayer_perceptron

(

input_tensor

weights

biases

)

layer_1_multiplication

matmul

(

input_tensor

weights

[

"h1"

])

layer_1_addition

add

(

layer_1_multiplication

biases

[

"b1"

])

layer_1_activation

relu

(

layer_1_addition

)

# Hidden layer with RELU activation

layer_2_multiplication

matmul

(

layer_1_activation

weights

[

"h2"

])

layer_2_addition

add

(

layer_2_multiplication

biases

[

"b2"

])

layer_2_activation

relu

(

layer_2_addition

)

# Output layer with linear activation

out_layer_multiplication

matmul

(

layer_2_activation

weights

[

"out"

])

out_layer_addition

out_layer_multiplication

biases

[

"out"

]

return

out_layer_addition

(我們將會在後面討論輸出層的激活函數)

神經網路怎麼學習

就像我們前面看到的那樣，神經網路訓練時會更新權重值。現在我們將看到在 TensorFlow 環境下這是怎麼發生的。

tf.Variable

權重和誤差存儲在變數（tf.Variable）中。這些變數通過調用 run() 保持在圖中的狀態。在機器學習中我們一般通過正太分布來啟動權重和偏差值。

weights

{

"h1"

Variable

(

random_normal

([

n_input

n_hidden_1

])),

"h2"

Variable

(

random_normal

([

n_hidden_1

n_hidden_2

])),

"out"

Variable

(

random_normal

([

n_hidden_2

n_classes

]))

}

biases

{

"b1"

Variable

(

random_normal

([

n_hidden_1

])),

"b2"

Variable

(

random_normal

([

n_hidden_2

])),

"out"

Variable

(

random_normal

([

n_classes

]))

}

當我們第一次運行神經網路的時候（也就是說，權重值是由正態分布定義的）:

input

values

weights

bias

output

values

expected

values

expected

為了知道網路是否正在學習，你需要比較一下輸出值（Z）和期望值（expected）。我們要怎麼計算這個的不同（損耗）呢？有很多方法去解決這個問題。因為我們正在進行分類任務，測量損耗的最好的方式是交叉熵誤差。

James D. McCaffrey 寫了一個精彩的解釋，說明為什麼這是這種類型任務的最佳方法。

通過 TensorFlow 你將使用 tf.nn.softmax_cross_entropy_with_logits() 方法計算交叉熵誤差（這個是 softmax 激活函數）並計算平均誤差 (tf.reduced_mean())。

# Construct model

prediction

multilayer_perceptron

(

input_tensor

weights

biases

)

# Define loss

entropy_loss

softmax_cross_entropy_with_logits

(

logits

prediction

labels

output_tensor

)

loss

reduce_mean

(

entropy_loss

)

你希望通過權重和誤差的最佳值，以便最小化輸出誤差（實際得到的值和正確的值之間的區別）。要做到這一點，將需使用梯度下降法。更具體些是，需要使用隨機梯度下降。

（梯度下降。源: https://sebastianraschka.com/faq/docs/closed-form-vs-gd.html）

為了計算梯度下降，將要使用 Adaptive Moment Estimation (Adam)。要在 TensorFlow 中使用此演算法，需要傳遞 learning_rate 值，該值可確定值的增量步長以找到最佳權重值。

方法 tf.train.AdamOptimizer(learning_rate).minimize(loss) 是一個語法糖，它做了兩件事情：

compute_gradients(loss, <list of variables>)

apply_gradients(<list of variables>)

這個方法用新的值更新了所有的 tf.Variables ，因此我們不需要傳遞變數列表。現在你有了訓練網路的代碼：

learning_rate

0.001

# Construct model

prediction

multilayer_perceptron

(

input_tensor

weights

biases

)

# Define loss

entropy_loss

softmax_cross_entropy_with_logits

(

logits

prediction

labels

output_tensor

)

loss

reduce_mean

(

entropy_loss

)

optimizer

train

AdamOptimizer

(

learning_rate

minimize

(

loss

)

數據操作

將要使用的數據集有很多英文文本，我們需要操作這些數據將其傳遞給神經網路。要做到這一點，需要做兩件事：

為每一個工作創建索引

為每一個文本創建矩陣，在矩陣里，如果單詞在文本中則值為 1，否則值為 0

讓我們看著代碼來理解這個過程：

import

numpy

#numpy is a package for scientific computing

from

collections

import

Counter

vocab

Counter

()

text

"Hi from Brazil"

#Get all wordsfor word in text.split(" "):

vocab

[

word

]

#Convert words to indexes

def

get_word_2_index

(

vocab

)

word2index

{}

for

word

enumerate

(

vocab

)

word2index

[

word

]

return

word2index

#Now we have an index

word2index

get_word_2_index

(

vocab

)

total_words

len

(

vocab

)

#This is how we create a numpy array (our matrix)

matrix

zeros

((

total_words

dtype

float

)

#Now we fill the valuesfor word in text.split():

matrix

[

word2index

[

word

]]

1print

(

matrix

)

>>>

[

]

上面例子中的文本是『Hi from Brazil』，矩陣是 [ 1. 1. 1.]。如果文本僅是『Hi』會怎麼樣？

matrix

zeros

((

total_words

dtype

float

)

text

"Hi"

for

word

text

split

()

matrix

[

word2index

[

word

lower

()]]

1print

(

matrix

)

>>>

[

]

將會與標籤（文本的分類）相同，但是現在得使用獨熱編碼（one-hot encoding）：

zeros

((

dtype

float

)

categories

訓練模型

在神經網路的術語里，一次 epoch = 一個向前傳遞（得到輸出的值）和一個所有訓練示例的向後傳遞（更新權重）。

還記得 tf.Session.run() 方法嗎？讓我們仔細看看它：

tf.Session.run(fetches, feed_dict=None, options=None, run_metadata=None)

在這篇文章開始的數據流圖裡，你用到了和操作，但是我們也可以傳遞一個事情的列表用於運行。在這個神經網路運行中將傳遞兩個事情：損耗計算和優化步驟。

feed_dict 參數是我們為每步運行所輸入的數據。為了傳遞這個數據，我們需要定義tf.placeholders（提供給 feed_dict）

正如 TensorFlow 文檔中說的：

「佔位符的存在只作為輸入的目標，它不需要初始化，也不包含數據。」?—?Source

因此將要像這樣定義佔位符：

n_input

total_words

# Words in vocab

n_classes

# Categories: graphics, sci.space and baseball

input_tensor

placeholder

(

float32

None

n_input

name

"input"

)

output_tensor

placeholder

(

float32

None

n_classes

name

"output"

)

還將要批量分離你的訓練數據：

「如果為了能夠輸入而使用佔位符，可通過使用 tf.placeholder(…, shape=[None, …]) 創建佔位符來指定變數批量維度。shape 的 None 元素對應於大小可變的維度。」?—?Source

在測試模型時，我們將用更大的批處理來提供字典，這就是為什麼需要定義一個可變的批處理維度。

get_batches() 函數為我們提供了批處理大小的文本數。現在我們可以運行模型：

training_epochs

# Launch the graph

with

Session

()

sess

run

(

init

)

#inits the variables (normal distribution, remember?)

# Training cycle for epoch in range(training_epochs):

avg_cost

total_batch

int

(

len

(

newsgroups_train

data

)

batch_size

)

# Loop over all batches for i in range(total_batch):

batch_x

batch_y

get_batch

(

newsgroups_train

batch_size

)

# Run optimization op (backprop) and cost op (to get loss value)

sess

run

([

loss

optimizer

feed_dict

{

input_tensor

batch_x

output_tensor

batch_y

})

現在有了這個經過訓練的模型。為了測試它，還需要創建圖元素。我們將測量模型的準確性，因此需要獲取預測值的索引和正確值的索引（因為我們使用的是獨熱編碼），檢查它們是否相等，並計算所有測試數據集的平均值：

# Test model

index_prediction

argmax

(

prediction

)

index_correct

argmax

(

output_tensor

)

correct_prediction

equal

(

index_prediction

index_correct

)

# Calculate accuracy

accuracy

reduce_mean

(

cast

(

correct_prediction

"float"

))

total_test_data

len

(

newsgroups_test

target

)

batch_x_test

batch_y_test

get_batch

(

newsgroups_test

total_test_data

)

(

"Accuracy:"

accuracy

eval

({

input_tensor

batch_x_test

output_tensor

batch_y_test

}))

>>>

Epoch

0001

loss

1133.908114347

Epoch

0002

loss

329.093700409

Epoch

0003

loss

111.876660109

Epoch

0004

loss

72.552971845

Epoch

0005

loss

16.673050320

Epoch

0006

loss

16.481995190

Epoch

0007

loss

4.848220565

Epoch

0008

loss

0.759822878

Epoch

0009

loss

0.000000000

Epoch

0010

loss

0.079848485

Optimization

Finished

Accuracy

0.75

就是這樣！你使用神經網路創建了一個模型來將文本分類到不同的類別中。恭喜！

可在這裡(https://github.com/dmesquita/understanding_tensorflow_nn) 看到包含最終代碼的筆記本。

提示：修改我們定義的值，以查看更改如何影響訓練時間和模型精度。

還有其他問題或建議？留下你們的評論。謝謝閱讀！

看完本文有收穫？請轉發分享給更多人

關注「大數據與機器學習文摘」，成為Top 1%

喜歡這篇文章嗎？立刻分享出去讓更多人知道吧！

本站內容充實豐富，博大精深，小編精選每日熱門資訊，隨時更新，點擊「搶先收到最新資訊」瀏覽吧！

請您繼續閱讀更多來自 Python開發者 的精彩文章:

※Fedora 需要幾年時間才能從 Python 2 切換到 Python 3
※我是這樣挑戰不用 for 循環的，15篇 Python 技術熱文
※程序員必知的 Python 陷阱與缺陷列表
※GAFT：一個使用 Python 實現的遺傳演算法框架

TAG:Python開發者 |

您可能感興趣