研學社 · 入門組｜《終極演算法》前兩章總結及第三章學習

新聞 05-10

近些年，人工智慧領域發生了飛躍性的突破，更使得許多科技領域的學生或工作者對這一領域產生了濃厚的興趣。在入門人工智慧的道路上，The Master Algorithm 可以說是必讀書目之一，其重要性不需多言。作者 Pedro Domingos 看似只是粗略地介紹了機器學習領域的主流思想，然而幾乎所有當今已出現的、或未出現的重要應用均有所提及。本書既適合初學者從更宏觀的角度概覽機器學習這一領域，又埋下無數伏筆，讓有心人能夠對特定技術問題進行深入學習，是一本不可多得的指導性入門書籍。詼諧幽默的行文風格也讓閱讀的過程充滿趣味。

以這本書為載體，機器之心「人工智慧研學社 · 入門組」近期將正式開班（加入方式）！我們邀請所有對人工智慧、機器學習感興趣的初學者加入我們，通過對 The Master Algorithm 的閱讀與討論，宏觀、全面地了解人工智慧的發展歷史與技術原理。本文對該書的第一、二章進行了簡單總結，並給出了第三章的總結提綱（中英文），文末還附有小測試，來挑戰一下吧！

第一、二章總結

本章總結

機器學習是眾所周知的多面手，有許多不同的名字，比如：模式識別、統計建模、數據挖掘、知識發現、預測分析、數據科學、自適應系統、自組織系統等等。在這兩個導引章節中，你會開始熟悉一些該領域常用的術語，都是按應用分類的。這裡給出了一些重點趨勢：金融（預測股票漲跌）、挖掘企業資料庫（客戶關係管理、信用評分和欺詐檢測）和電子商務（個性化）。

The Master Algorithm 之於演算法就像是手之於筆、刀劍、螺絲刀和叉子。本書作者簡要介紹了機器學習的五大流派：

符號主義（Symbolists）
聯結主義（Connectionists）
進化主義（Evolutionaries）
貝葉斯派（Bayesians）
Analogizers

研學社 · 入門組｜《終極演算法》前兩章總結及第三章學習

機器學習領域內的技術術語不可勝計，所以一開始的時候你可能會感到無所適從。但是在大多數情況下，一些專業術語和幾個演算法就能幫助你理解絕大多數應用的關鍵思想。在後續的章節中，本書將帶領我們更近距離更細緻地了解每種機器學習流派。當一種理論被用於描述和建模真實世界時所能達到的簡單程度可以被用作該理論能力的一種指示。我們能做到足夠好嗎？首先，我們無法獲得用來完全確定這個世界的足夠數據。其次，即便我們有關於這個世界在某個時間點的所有知識，物理定律也讓我們無法確定其過去和未來。

第三章預習

本章總結

為了理解符號主義（Symbolism），我們必須先理解什麼是推導（deduction）以及其為什麼如此重要。The Master Algorithm 應該有能力事先就能掌握大量知識，並使用這種知識來引導新的數據泛化。分治法（divide and conquer）規則歸納演算法不能做到這一點，但歸納法規則（rule of induction）可以。

重要章節

no free lunch 定理：

在機器學習領域，預先確立的觀念是不可或缺的，我們的目標是找到能夠通過讀取數據繼續編寫自身的最簡單的演算法

啟動知識引擎：

機器學習領域的一個典型策略是從有限制的假設開始，然後在它們不能解釋數據時逐漸放鬆它們
我們也將遇到本書中的第一個真正的學習器（learner）

如果尋找世界的規則：

兩種典型學習演算法的示例

在盲目和幻覺之間：

過擬合問題以及幾種可能的解決方法

你可以相信的準確度：

為了避免過擬合所應該遵循的原則
「偏置（bias）」和「方差（variance）」的概念

歸納是推導的反面：

「推導（deduction）」和「歸納（induction）」的基本概念

Twenty Questions 遊戲：

決策樹的基本概念

符號主義：

機器學習符號主義流派的基本哲學思想總結

關鍵概念

休謨問題
過擬合
偏置和方差
推導和歸納
決策樹

小測驗

什麼是過擬合？
列出本章中提及的三種用於解決/改善過擬合的方法。
什麼是歸納？請用一個例子進行解釋。
構建一個基於你自己的案例的決策樹。

Chapter #1-2 Review

【Chapter Summary】

Machine learning is notably multi-faced and goes by a variety of names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems etc. In these two introductory chapters, you would start to get familiar with some commonly used terminologies in the fields, which are categorized by applications. Some notable trends are highlighted here: finance (predicting stock ups and downs), mining corporate databases (customer relationship management, credit scoring, and fraud detection), and e-commerce (personalization ).

The Master Algorithm to algorithms is what the hand is to pens, swords, screwdrivers and forks. The author briefly introduces five tribes in machine learning:

Symbolists
Connectionists
Evolutionaries
Bayesians
Analogizers

研學社 · 入門組｜《終極演算法》前兩章總結及第三章學習

The number of technical terms in machine learning is significant and nearly uncountable, so you may feel overwhelmed at the very beginning. However, in most of the cases, several jargons and a few algorithms are sufficient to understand the key idea of the vast majority of applications. The author will guide us to take a closer and more detailed look at each of machine learning tribes in the following chapters. A considerable indicator of the power of a theory is the extent of simplification that the theory could achieve when it is used to describe and model the real world. Can we do good enough? Firstly, we would never have enough data to completely determine the world. Secondly, even if we had the complete knowledge of the world at some point in time, the laws of physics would still not allow us to determine its past and future.

Chapter #3 Preview

【Chapter Summary】

In order to understand the Symbolism , we have to know what the deduction is and why it is so important. 「The Master Algorithm」 should be able to start with a large body of knowledge, and use it to guide new generalizations from data. The 「divide and conquer」 rule induction algorithm can』t do it, but the rule of induction can.

【Important Sections】

The 「no free lunch」 theorem:

In machine learning, preconceived notions are indispensable. Our goal is to find the simplest program that will continue to write itself by reading data.

Priming the knowledge pump:

A typical strategy in machine learning is starting with restrictive assumptions and gradually relaxing them if they fail to explain the data.
We also encounter the first actual learner in the book.