7장 – 앙상블 학습과 랜덤 포레스트(ensemble learning and random foreast)

Outline

. 투표 기반 분류기

. 배깅과 페이스팅

. 랜덤 패치와 랜덤 서브스페이스¶

. 랜덤 포레스트

. 부스팅

. 스태킹

7.0 설정

In [1]:
# 공통
import numpy as np
import os
from scipy.io import loadmat
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import font_manager, rc
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from matplotlib.colors import ListedColormap
from sklearn.metrics import mean_squared_error

from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingRegressor


# 일관된 출력을 위해 유사난수 초기화
np.random.seed(42)

# 맷플롯립 설정
font_name = font_manager.FontProperties(fname = "c:/Windows/Fonts/malgun.ttf").get_name()
rc('font',family = font_name)

plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['axes.unicode_minus'] = False

# 그림을 저장할 폴드
PROJECT_ROOT_DIR = "C:/Users/Admin/Desktop/ML/"
# PROJECT_ROOT_DIR = "C:/Users/sally/Desktop/ML/"
# PROJECT_ROOT_DIR = "C:/Users/User/Desktop/ML/"
# PROJECT_ROOT_DIR = "C:/Users/sally/Dropbox/2019-Fall-Semester/ML"

CHAPTER_ID = "ensembles"

IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)

def save_fig(fig_id,tight_layout = True):
    path = os.path.join(IMAGES_PATH,fig_id + ".png")
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path,format = 'png',dpi = 300)

7.1 투표 기반 분류기(voting classifier)

In [2]:
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

. 가장 좋은 모델 하나보다 일련의 예측기로부터 예측을 수집하면 더 좋은 예측을 얻을 수 있음 $\Rightarrow$ 앙상블 학습

. sklearn.ensemble.VotingClassifier : 직$\cdot$간접 투표 분류기

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html

. 직접 투표 분류기(hard voting classifier) : 각 분류기의 예측을 모아 가장 많이 선택된 클래스로 예측하는 것

. 간접 투표 분류기(soft voting classifier) : 모든 분류기가 클래스의 확률을 예측할 수 있으면 개별 분류기의 예측 확률을 평균 내어 확률이 가장 높은 클래스로 예측하는 것

Cap%202018-11-20%2009-17-47-832.jpg

Cap%202018-11-20%2009-18-00-032.jpg

In [3]:
log_clf = LogisticRegression(random_state=42)
rnd_clf = RandomForestClassifier(random_state=42)
svm_clf = SVC(random_state=42)

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
    voting='hard')
voting_clf.fit(X_train, y_train)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py:246: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
Out[3]:
VotingClassifier(estimators=[('lr', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=42, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)), ('rf', RandomFore...rbf', max_iter=-1, probability=False, random_state=42,
  shrinking=True, tol=0.001, verbose=False))],
         flatten_transform=None, n_jobs=None, voting='hard', weights=None)
In [4]:
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))
LogisticRegression 0.864
RandomForestClassifier 0.872
SVC 0.888
VotingClassifier 0.896
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py:246: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
In [5]:
svm_clf2 = SVC(probability=True, random_state=42)

voting_clf2 = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf2)],
    voting='soft')
voting_clf2.fit(X_train, y_train)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
Out[5]:
VotingClassifier(estimators=[('lr', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=42, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)), ('rf', RandomFore...'rbf', max_iter=-1, probability=True, random_state=42,
  shrinking=True, tol=0.001, verbose=False))],
         flatten_transform=None, n_jobs=None, voting='soft', weights=None)
In [6]:
for clf in (log_clf, rnd_clf, svm_clf, voting_clf2):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))
LogisticRegression 0.864
RandomForestClassifier 0.872
SVC 0.888
VotingClassifier 0.912
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

. 대수의 법칙(law of large numbers) : 동전을 던졌을 때 앞면이 51%, 뒷면이 49% 나온다고 가정. 동전을 한없이 던지면 앞면이 과반 이상 될 확률은 1로 수렴

.. $\sum_{x=500}^{1000}{1000\choose x}0.51^x 0.49^{1000-x}=1-{\rm pbinom}(499,1000,0.51)=0.7467502$

... 51%의 정확도를 가진 1000개의 분류기로 앙상블 모델을 구축한다고 할 때, 가장 많은 클래스를 예측으로 삼는다면 75%의 정확도를 기대할 수 있음

.. $\sum_{x=5000}^{10000}{10000\choose x}0.51^x 0.49^{10000-x}=1-{\rm pbinom}(4999,10000,0.51)=0.9777976$

.. $\sum_{x=50000}^{100000}{100000\choose x}0.51^x 0.49^{100000-x}=1-{\rm pbinom}(49999,100000,0.51)=1$

In [7]:
heads_proba = 0.51
coin_tosses = (np.random.rand(10000, 10) < heads_proba).astype(np.int32)
cumulative_heads_ratio = np.cumsum(coin_tosses, axis=0) / np.arange(1, 10001).reshape(-1, 1)
cumulative_heads_ratio[:10,]
Out[7]:
array([[1.        , 0.        , 0.        , 0.        , 1.        ,
        1.        , 1.        , 0.        , 0.        , 0.        ],
       [1.        , 0.        , 0.        , 0.5       , 1.        ,
        1.        , 1.        , 0.        , 0.5       , 0.5       ],
       [0.66666667, 0.33333333, 0.33333333, 0.66666667, 1.        ,
        0.66666667, 1.        , 0.        , 0.33333333, 0.66666667],
       [0.5       , 0.5       , 0.5       , 0.5       , 0.75      ,
        0.5       , 1.        , 0.25      , 0.25      , 0.75      ],
       [0.6       , 0.6       , 0.6       , 0.4       , 0.8       ,
        0.4       , 1.        , 0.2       , 0.2       , 0.8       ],
       [0.5       , 0.5       , 0.5       , 0.33333333, 0.66666667,
        0.33333333, 1.        , 0.33333333, 0.33333333, 0.83333333],
       [0.57142857, 0.57142857, 0.42857143, 0.42857143, 0.71428571,
        0.28571429, 1.        , 0.28571429, 0.42857143, 0.71428571],
       [0.5       , 0.625     , 0.5       , 0.375     , 0.625     ,
        0.25      , 0.875     , 0.375     , 0.5       , 0.75      ],
       [0.44444444, 0.55555556, 0.55555556, 0.44444444, 0.66666667,
        0.33333333, 0.77777778, 0.33333333, 0.44444444, 0.77777778],
       [0.5       , 0.5       , 0.5       , 0.4       , 0.6       ,
        0.4       , 0.7       , 0.4       , 0.5       , 0.8       ]])
In [8]:
plt.figure(figsize=(8,4))
plt.plot(cumulative_heads_ratio)
plt.plot([0, 10000], [0.51, 0.51], "k--", linewidth=2, label="51%")
plt.plot([0, 10000], [0.5, 0.5], "k-", label="50%")
plt.xlabel("동전을 던진 횟수")
plt.ylabel("앞면이 나온 비율")
plt.legend(loc="lower right")
plt.axis([0, 10000, 0.42, 0.58])

save_fig("law_of_large_numbers_plot")
plt.show()

7.2 배깅과 페이스팅(bagging and pasting)

. 훈련 세트의 서브셋을 무작위로 구성하고 각각 분류기를 학습시킴. 같은 샘플을 여러 개의 예측기에 걸처 사용 가능

.. 배깅 : 중복을 허용하여 샘플링. 한 예측기에서 같은 샘플을 여러 번 샘플 가능

.. 페이스팅 : 중복을 허용하지 않고 샘플링

Cap%202018-11-20%2011-20-17-273.png

In [9]:
# 배깅, 간접 투표 분류기
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1, random_state=42)
paste_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=False, n_jobs=-1, random_state=42)
In [10]:
bag_clf.fit(X_train, y_train)
Out[10]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=100, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
In [11]:
paste_clf.fit(X_train, y_train)
Out[11]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
         bootstrap=False, bootstrap_features=False, max_features=1.0,
         max_samples=100, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
In [12]:
y_bag_pred = bag_clf.predict(X_test)
y_paste_pred = paste_clf.predict(X_test)
In [13]:
print(accuracy_score(y_test, y_bag_pred))
print(accuracy_score(y_test, y_paste_pred))
0.904
0.912
In [14]:
tree_clf = DecisionTreeClassifier(random_state=42)
tree_clf.fit(X_train, y_train)
Out[14]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best')
In [15]:
y_pred_tree = tree_clf.predict(X_test)
print(accuracy_score(y_test, y_pred_tree))
0.856
In [16]:
def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.5, -1, 1.5], alpha=0.5, contour=True):
    x1s = np.linspace(axes[0], axes[1], 100)
    x2s = np.linspace(axes[2], axes[3], 100)
    x1, x2 = np.meshgrid(x1s, x2s)
    X_new = np.c_[x1.ravel(), x2.ravel()]
    y_pred = clf.predict(X_new).reshape(x1.shape)
    custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])
    plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
    if contour:
        custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])
        plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", alpha=alpha)
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", alpha=alpha)
    plt.axis(axes)
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.ylabel(r"$x_2$", fontsize=18, rotation=0)
In [17]:
plt.figure(figsize=(11,7))
plt.subplot(221)
plot_decision_boundary(tree_clf, X, y)
plt.title("결정 나무", fontsize=14)

plt.subplot(222)
plot_decision_boundary(bag_clf, X, y)
plt.title("배깅을 사용한 결정 나무", fontsize=14)

plt.subplot(223)
plot_decision_boundary(paste_clf, X, y)
plt.title("패스팅을 사용한 결정 나무", fontsize=14)

save_fig("decision_tree_without_and_with_bagging_plot")
plt.show()

7.2.2 oob 평가

. 어떤 샘플이 붓스트랩 샘플로 뽑히지 않을 확률 : $m$이 한없이 커지면, $(1-\frac{1}{m})^m \rightarrow e^{-1}=0.37$ $\Rightarrow$ 평균적으로 훈련 샘플의 63%만 각 예측기에 훈련 샘플로 뽑힘

. oob(out-of bag) 샘플 : 선택되지 않은 훈련 샘플의 나머지 37% 샘플. 예측기마다 남겨진 37%는 모두 다름

. 각 예측기의 oob 샘플을 사용해 평가

In [18]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    bootstrap=True, n_jobs=-1, oob_score=True, random_state=40)
bag_clf.fit(X_train, y_train)
Out[18]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=500, n_jobs=-1, oob_score=True,
         random_state=40, verbose=0, warm_start=False)
In [19]:
print(bag_clf.oob_score_)
0.9013333333333333
In [20]:
y_pred = bag_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))
0.912

7.3 랜덤 패치와 랜덤 서브스페이스

. BaggingClassifier는 특성 샘플링도 지원 : 각 예측기는 무작위로 선택한 입력 특성의 일부분으로 훈련

.. 고차원 자료(예, 이미지 등)를 다룰 때 사용

.. 매개변수 max_features, bootstrap_features로 조절

. 두 가지 방식

.. 랜덤 패치 방식(random patches method) : 샘플과 특성을 모두 샘플링

.. 랜덤 서브스페이스 방식(random subspace method) : 특성만 샘플링 $\Rightarrow$ bootstrap=False, max_samples=1.0, bootstrap_features=True, max_features < 1.0

In [21]:
iris = load_iris()
bag_clf = BaggingClassifier(DecisionTreeClassifier(random_state=42),
                            n_estimators=500, max_samples=50, bootstrap=True, 
                              n_jobs=-1, oob_score=True, random_state=42)
patch_clf = BaggingClassifier(DecisionTreeClassifier(random_state=42), 
                              n_estimators=500, max_samples=50, max_features=2,
                              bootstrap=True, bootstrap_features=True, 
                              n_jobs=-1, oob_score=True, random_state=42)
In [22]:
bag_clf.fit(iris["data"], iris["target"])
Out[22]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=50, n_estimators=500, n_jobs=-1, oob_score=True,
         random_state=42, verbose=0, warm_start=False)
In [23]:
patch_clf.fit(iris["data"], iris["target"])
Out[23]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='best'),
         bootstrap=True, bootstrap_features=True, max_features=2,
         max_samples=50, n_estimators=500, n_jobs=-1, oob_score=True,
         random_state=42, verbose=0, warm_start=False)
In [24]:
print(bag_clf.oob_score_)
print(patch_clf.oob_score_)
0.9533333333333334
0.9466666666666667

7.4 랜덤 포레스트

. 배깅이나 페이스팅 방식을 적용한 결정 트리의 앙상블

. RandomForestClassifier (RandomForestRegressor) 클래스 사용

.. sklearn.ensemble.RandomForestClassifier : 랜덤 포레스트 분류기

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

. 트리의 노드를 분할할 때 전체 특성 중에서 최적의 특성을 찾는 대신 무작위로 선택한 특성 후보 중에서 최적의 특성을 찾음

In [25]:
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(splitter="random", max_leaf_nodes=16, random_state=42),
    n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1, random_state=42)

bag_clf.fit(X_train, y_train)
Out[25]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=16,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='random'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
In [26]:
bag_clf.fit(X_train, y_train)
Out[26]:
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=16,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
            splitter='random'),
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
In [27]:
y_pred = bag_clf.predict(X_test)
In [28]:
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1, random_state=42)
rnd_clf.fit(X_train, y_train)
Out[28]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=16,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=42, verbose=0, warm_start=False)
In [29]:
y_pred_rf = rnd_clf.predict(X_test)
In [30]:
np.sum(y_pred == y_pred_rf) / len(y_pred)  # 거의 동일한 예측
Out[30]:
0.976

7.4.1 엑스트라 트리

. 트리를 더욱 무작위하게 만들기 위해서 최적의 임계값을 찾는 대신 후보 특성을 사용해 무작위로 분할하여 그 중에서 최상의 분할을 선택

. 속도가 빠름

. ExtraTreesClassifier (RandomForestClassifier) 클래스 사용

.. sklearn.ensemble.ExtraTreesClassifier : 엑스트라 나무 분류기

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html

In [31]:
xtree_clf = ExtraTreesClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1, random_state=42)
xtree_clf.fit(X_train, y_train)
Out[31]:
ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=16,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
           oob_score=False, random_state=42, verbose=0, warm_start=False)
In [32]:
y_pred_xt = xtree_clf.predict(X_test)
In [33]:
np.sum(y_pred == y_pred_xt) / len(y_pred)  # 거의 동일한 예측
Out[33]:
0.992
In [34]:
np.sum(y_pred_rf == y_pred_xt) / len(y_pred_rf)  # 거의 동일한 예측
Out[34]:
0.968

7.4.2 특성 중요도

. 어떤 특성을 사용한 노드가 랜덤 포레스트에 있는 모든 트리에 걸쳐서 평균적으로 불순도를 얼마나 감소시키는지를 확인하여 특성의 중요도를 측정

In [35]:
iris = load_iris()
rnd_clf = RandomForestClassifier(n_estimators=500, n_jobs=-1, random_state=42)
rnd_clf.fit(iris["data"], iris["target"])
Out[35]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=42, verbose=0, warm_start=False)
In [36]:
for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_):
    print(name, score)
sepal length (cm) 0.11249225099876374
sepal width (cm) 0.023119288282510326
petal length (cm) 0.44103046436395765
petal width (cm) 0.4233579963547681
In [37]:
mnist_path = os.path.join("datasets","mnist","mnist-original.mat")
mnist_raw = loadmat(mnist_path)
mnist = {
    "data": mnist_raw["data"].T, # 전치 70000 * 784 행렬
    "target": mnist_raw["label"][0],# 첫 번째 행
    "COL_NAMES": ["label", "data"],
    "DESCR": "mldata.org dataset: mnist-original"
}
In [38]:
rnd_clf = RandomForestClassifier(random_state=42)
rnd_clf.fit(mnist["data"], mnist["target"])
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py:246: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
Out[38]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False)
In [39]:
def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = matplotlib.cm.hot,interpolation="nearest")
    plt.axis("off")
In [40]:
plot_digit(rnd_clf.feature_importances_)

cbar = plt.colorbar(ticks=[rnd_clf.feature_importances_.min(), rnd_clf.feature_importances_.max()])
cbar.ax.set_yticklabels(['중요하지 않음','매우 중요함'])

save_fig("mnist_feature_importance_plot")
plt.show()

7.5 부스팅(boosting)

. 약한 학습기를 여러 개 연결하여 강한 학습기를 만드느 앙상블 방법

. 이전 까지의 오차를 보정하도록 예측기를 순차적으로 추가

. 대표적 방법

.. 아다부스트(adaptive boosting)

.. 그래디언트 부스팅(gradient boosting)

7.5.1 아다부스트(adaptive boosting; AdaBoost)

. 이전 모델이 과소 적합한 훈련 샘플의 가중치를 높이는 방법을 취함

. 과정 : 분류

.. 동일한 가중치를 써서 첫 번째 분류기로 훈련 세트를 훈련하고 예측

.. 잘못 분류된 훈련 샘플의 가중치를 상대적으로 높임

.. 업데이트된 가중치를 써서 두 번째 분류기로 훈련 세트를 훈련하고 예측

.. 반복

Cap%202018-11-23%2018-36-27-627.png

In [41]:
m = len(X_train)

plt.figure(figsize=(11, 4))
for subplot, learning_rate in ((121, 1), (122, 0.5)):
    sample_weights = np.ones(m)
    plt.subplot(subplot)
    if subplot == 121:
        plt.text(-0.7, -0.65, "1", fontsize=14)
        plt.text(-0.6, -0.10, "2", fontsize=14)
        plt.text(-0.5,  0.10, "3", fontsize=14)
        plt.text(-0.4,  0.55, "4", fontsize=14)
        plt.text(-0.3,  0.90, "5", fontsize=14)        
    for i in range(5):
        svm_clf = SVC(kernel="rbf", C=0.05, random_state=42)
        svm_clf.fit(X_train, y_train, sample_weight=sample_weights)
        plot_decision_boundary(svm_clf, X, y, alpha=0.2)
        plt.title("learning_rate = {}".format(learning_rate), fontsize=16)
        y_pred = svm_clf.predict(X_train)
        sample_weights[y_pred != y_train] *= (1 + learning_rate)

save_fig("boosting_plot")
plt.show()
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\svm\base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.
  "avoid this warning.", FutureWarning)

. 아다부스트 알고리즘

.. 각 샘플의 초기 가중치는 $\frac{1}{m}$으로

.. $j$ 번째 예측기의 학습 결과를 써서 $j$ 번째 훈련기의 에러율 $r_j$를 계산

.. 가중치를 적용한 $j$ 번째 예측기의 에러율 : $r_j=\frac{\sum_{i=1,\hat{y}_j^{(i)}\ne y^{(i)}}^m w^{(i)}}{\sum_{i=1}^m w^{(i)}}$

... $\hat{y}_j^{(i)}$ : $i$ 번째 샘플에 대한 $j$ 번째 예측기의 예측

.. $j$ 번째 예측기의 가중치 : $\alpha_j=\eta \log(\frac{1-r_j}{r_j})$

... 예측기가 정확할수록 가중치는 증가하고, 무작위 예측은 0, 무작위 예측보다 낮은면 음수 값을!

.. 가중치 업데이트 (tweaking) : $\hat{y}_j^{(i)}=y^{(i)}$이면 $w^{(i)}=w^{(i)}$; $\hat{y}_j^{(i)}\ne y^{(i)}$이면 $w^{(i)}=w^{(i)}\exp(\alpha_j)$

. $\eta$ : 학습률(기본값=1)

.. 가중치의 정규화 ($\sum_{i=1}^m w^{(i)}$으로 나눔)

.. 새 가중치로 훈련 세트를 훈련 $\Rightarrow$ 새 예측기 얻음

.. 반복

.. 아다부스트 예측 : $\hat{y}(x)={\rm argmax}_k\sum_{j=1, \hat{y}_j(x)=k}^N \alpha_j$ 즉, the predicted class is the one that receives the majority of weighted votes

... $N$ : 예측기의 수

. sklearn.ensemble.AdaBoostClassifier : 아다부스트 분류기

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

In [42]:
ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=200,
    algorithm="SAMME.R", learning_rate=0.5, random_state=42)
ada_clf.fit(X_train, y_train)
Out[42]:
AdaBoostClassifier(algorithm='SAMME.R',
          base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'),
          learning_rate=0.5, n_estimators=200, random_state=42)
In [43]:
plot_decision_boundary(ada_clf, X, y)

7.5.2 그래디언트 부스팅(gradient boosting)

. 샘플의 가중치를 수정하는 대신 이전 예측기가 만든 residual error에 새로운 예측기를 학습

. Boosting for Regression Trees

.. Set $\hat{f}(x) = 0$ and $r_i = y_i$ $\forall i$ in the training set

.. For $b = 1, 2,\ldots,B,$ repeat :

... Fit a tree $\hat f^b$ to the training data $(X, r)$

... Update $\hat f$ by $\hat{f}(x) + \lambda\hat{f}^b(x)$

... Update the residuals $r_i\leftarrow r_i − \lambda\hat{f}^b(x_i)$

.. Output the boosted model : $\hat{f}(x) =\sum^B_{b=1}\lambda\hat{ f}^b(x)$

In [44]:
np.random.seed(42)
X = np.random.rand(100, 1) - 0.5
y = 3*X[:, 0]**2 + 0.05 * np.random.randn(100)
In [45]:
tree_reg1 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg1.fit(X, y)
Out[45]:
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')
In [46]:
y2 = y - tree_reg1.predict(X)
tree_reg2 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg2.fit(X, y2)
Out[46]:
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')
In [47]:
y3 = y2 - tree_reg2.predict(X)
tree_reg3 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg3.fit(X, y3)
Out[47]:
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')
In [48]:
X_new = np.array([[0.8]])
In [49]:
y_pred = sum(tree.predict(X_new) for tree in (tree_reg1, tree_reg2, tree_reg3))
In [50]:
y_pred
Out[50]:
array([0.75026781])
In [51]:
def plot_predictions(regressors, X, y, axes, label=None, style="r-", data_style="b.", data_label=None):
    x1 = np.linspace(axes[0], axes[1], 500)
    y_pred = sum(regressor.predict(x1.reshape(-1, 1)) for regressor in regressors)
    plt.plot(X[:, 0], y, data_style, label=data_label)
    plt.plot(x1, y_pred, style, linewidth=2, label=label)
    if label or data_label:
        plt.legend(loc="upper center", fontsize=16)
    plt.axis(axes)

plt.figure(figsize=(11,11))

plt.subplot(321)
plot_predictions([tree_reg1], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$\hat{f}^1(x_1)$", style="g-", 
                 data_label="훈련 세트")
plt.ylabel("$r$", fontsize=16, rotation=0)
plt.title("잔여 오차와 나무의 예측", fontsize=16)

plt.subplot(322)
plot_predictions([tree_reg1], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$\hat{f}(x_1) = \hat{f}^1(x_1)$", 
                 data_label="훈련 세트")
plt.ylabel("$y$", fontsize=16, rotation=0)
plt.title("앙상블의 예측", fontsize=16)

plt.subplot(323)
plot_predictions([tree_reg2], X, y2, axes=[-0.5, 0.5, -0.5, 0.5], label="$\hat{f}^2(x_1)$", style="g-", 
                 data_style="k+", data_label="잔여 오치")
plt.ylabel("$r - \hat{f}^1(x_1)$", fontsize=16)

plt.subplot(324)
plot_predictions([tree_reg1, tree_reg2], X, y, axes=[-0.5, 0.5, -0.1, 0.8], 
                 label="$\hat f(x_1) = \hat f^1(x_1) + \hat f^2(x_1)$")
plt.ylabel("$y$", fontsize=16, rotation=0)

plt.subplot(325)
plot_predictions([tree_reg3], X, y3, axes=[-0.5, 0.5, -0.5, 0.5], label="$\hat f^3(x_1)$", style="g-", 
                 data_style="k+")
plt.ylabel("$r - \hat f^1(x_1) - \hat f^2(x_1)$", fontsize=16)
plt.xlabel("$x_1$", fontsize=16)

plt.subplot(326)
plot_predictions([tree_reg1, tree_reg2, tree_reg3], X, y, axes=[-0.5, 0.5, -0.1, 0.8], 
                 label="$\hat f(x_1) = \hat f^1(x_1) + \hat f^2(x_1) + \hat f^3(x_1)$")
plt.xlabel("$x_1$", fontsize=16)
plt.ylabel("$y$", fontsize=16, rotation=0)

save_fig("gradient_boosting_plot")
plt.show()

. sklearn.ensemble.GradientBoostingRegressor : 그래디언트 부스팅 회귀

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html

. sklearn.ensemble.GradientBoostingClassifier : 그래디언트 부스팅 분류기

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html

In [52]:
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0, random_state=42)
gbrt.fit(X, y)
Out[52]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=1.0, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=3, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
In [53]:
plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="앙상블의 예측")
plt.title("learning_rate={}, n_estimators={}".format(gbrt.learning_rate, gbrt.n_estimators), fontsize=14)

plt.show()
In [54]:
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=0.1, random_state=42)
gbrt.fit(X, y)
Out[54]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=3, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
In [55]:
gbrt_slow = GradientBoostingRegressor(max_depth=2, n_estimators=200, learning_rate=0.1, random_state=42)
gbrt_slow.fit(X, y)
Out[55]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=200, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
In [56]:
plt.figure(figsize=(11,4))

plt.subplot(121)
plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="앙상블의 예측")
plt.title("learning_rate={}, n_estimators={}".format(gbrt.learning_rate, gbrt.n_estimators), fontsize=14)

plt.subplot(122)
plot_predictions([gbrt_slow], X, y, axes=[-0.5, 0.5, -0.1, 0.8])
plt.title("learning_rate={}, n_estimators={}".format(gbrt_slow.learning_rate, gbrt_slow.n_estimators), fontsize=14)

save_fig("gbrt_learning_rate_plot")
plt.show()

. 조기 종료를 사용한 그래디언트 부스팅

In [57]:
X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=49)
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=120, random_state=42)
gbrt.fit(X_train, y_train)
Out[57]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=120, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
In [58]:
errors = [mean_squared_error(y_val, y_pred) for y_pred in gbrt.staged_predict(X_val)]
bst_n_estimators = np.argmin(errors)
print(bst_n_estimators)
55
In [59]:
gbrt_best = GradientBoostingRegressor(max_depth=2, n_estimators=bst_n_estimators, random_state=42)
gbrt_best.fit(X_train, y_train)
Out[59]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=55, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
In [60]:
min_error = np.min(errors)
print(min_error)
0.002712853325235463
In [61]:
plt.figure(figsize=(11, 4))

plt.subplot(121)
plt.plot(errors, "b.-")
plt.plot([bst_n_estimators, bst_n_estimators], [0, min_error], "k--")
plt.plot([0, 120], [min_error, min_error], "k--")
plt.plot(bst_n_estimators, min_error, "ko")
plt.text(bst_n_estimators, min_error*1.2, "최소", ha="center", fontsize=14)
plt.axis([0, 120, 0, 0.01])
plt.xlabel("나무 개수")
plt.title("검증 오차", fontsize=14)

plt.subplot(122)
plot_predictions([gbrt_best], X, y, axes=[-0.5, 0.5, -0.1, 0.8])
plt.title("최적 모델 (나무 %d 개)" % bst_n_estimators, fontsize=14)

save_fig("early_stopping_gbrt_plot")
plt.show()
In [62]:
gbrt = GradientBoostingRegressor(max_depth=2, warm_start=True, random_state=42)

min_val_error = float("inf")
error_going_up = 0
for n_estimators in range(1, 120):
    gbrt.n_estimators = n_estimators
    gbrt.fit(X_train, y_train)
    y_pred = gbrt.predict(X_val)
    val_error = mean_squared_error(y_val, y_pred)
    if val_error < min_val_error:
        min_val_error = val_error
        error_going_up = 0
    else:
        error_going_up += 1
        if error_going_up == 5:
            break  # Early stoppin
In [63]:
print(gbrt.n_estimators)
61
In [64]:
print("Minimum validation MSE:", min_val_error)
Minimum validation MSE: 0.002712853325235463

7.9 스태킹(stacking, stacked generalizaion)

. 블렌더 학습 과정 :

.. 훈련 세트를 두 개의 서브셋으로 나눔

.. 첫 번째 서브셋을 첫 번째 레이어를 훈련시키기 위해 사용

.. 첫 번째 레이어의 예측기를 사용하여 두 번째 서브셋 (홀드 아웃 셋)에 대해 예측

.. 타깃값은 그대로 쓰고 앞서 예측한 값을 입력 특성으로 하는 새 훈련 세트를 만듦

.. 블렌더가 새 훈련 세트를 훈련 (즉, 첫 번째 레이어의 예측을 가지고 타깃값을 예측하도록 훈련)

Cap%202018-11-24%2014-32-32-095.png

Cap%202018-11-24%2014-32-51-104.png

. 마지막 예측기(blender or meta learner)는 세 예측기의 예측 값을 입력으로 받아 최종적으로 예측

Cap%202018-11-24%2014-31-58-068.png

. 블렌더를 여러 개 훈련시킬 수도 있음 (eg, Linear regression, SVM, Random forest 등) $\rightarrow$ 블렌더 레이어가 생김

. 방법 :

. 훈련 세트를 세 개의 서브셋으로 나눔

. 첫 번째 훈련 세트 : 첫 번째 레이어를 훈련시키는 데 사용

. 두 번째 훈련 세트 : 첫 번째 레이어의 예측기로 두 번째 레이어를 훈련시키기 위한 훈련 세트를 만드는 데 사용

. 세 번째 훈련 세트 : 두 번째 레이어의 예측기로 세 번째 레이어를 훈련시키기 위한 훈련 세트를 만드는 데 사용

Cap%202018-11-24%2014-33-23-165.png

The End of Chapter 7