7장 – 앙상블 학습과 랜덤 포레스트(ensemble learning and random foreast)


. 투표 기반 분류기

. 배깅과 페이스팅

. 랜덤 패치와 랜덤 서브스페이스¶

. 랜덤 포레스트

. 부스팅

. 스태킹

7.0 설정

# 공통
import numpy as np
import os
from scipy.io import loadmat
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib import font_manager, rc
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from matplotlib.colors import ListedColormap
from sklearn.metrics import mean_squared_error

from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingRegressor

# 일관된 출력을 위해 유사난수 초기화

# 맷플롯립 설정
font_name = font_manager.FontProperties(fname = "c:/Windows/Fonts/malgun.ttf").get_name()
rc('font',family = font_name)

plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['axes.unicode_minus'] = False

# 그림을 저장할 폴드
PROJECT_ROOT_DIR = "C:/Users/Admin/Desktop/ML/"
# PROJECT_ROOT_DIR = "C:/Users/sally/Desktop/ML/"
# PROJECT_ROOT_DIR = "C:/Users/User/Desktop/ML/"
# PROJECT_ROOT_DIR = "C:/Users/sally/Dropbox/2019-Fall-Semester/ML"

CHAPTER_ID = "ensembles"

IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)

def save_fig(fig_id,tight_layout = True):
    path = os.path.join(IMAGES_PATH,fig_id + ".png")
    if tight_layout:
    plt.savefig(path,format = 'png',dpi = 300)

7.1 투표 기반 분류기(voting classifier)

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

. 가장 좋은 모델 하나보다 일련의 예측기로부터 예측을 수집하면 더 좋은 예측을 얻을 수 있음 $\Rightarrow$ 앙상블 학습

. sklearn.ensemble.VotingClassifier : 직$\cdot$간접 투표 분류기


. 직접 투표 분류기(hard voting classifier) : 각 분류기의 예측을 모아 가장 많이 선택된 클래스로 예측하는 것

. 간접 투표 분류기(soft voting classifier) : 모든 분류기가 클래스의 확률을 예측할 수 있으면 개별 분류기의 예측 확률을 평균 내어 확률이 가장 높은 클래스로 예측하는 것



log_clf = LogisticRegression(random_state=42)
rnd_clf = RandomForestClassifier(random_state=42)
svm_clf = SVC(random_state=42)

voting_clf = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
voting_clf.fit(X_train, y_train)
VotingClassifier(estimators=[('lr', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=42, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)), ('rf', RandomFore...rbf', max_iter=-1, probability=False, random_state=42,
  shrinking=True, tol=0.001, verbose=False))],
         flatten_transform=None, n_jobs=None, voting='hard', weights=None)
for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))
LogisticRegression 0.864
RandomForestClassifier 0.872
SVC 0.888
VotingClassifier 0.896
svm_clf2 = SVC(probability=True, random_state=42)

voting_clf2 = VotingClassifier(
    estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf2)],
voting_clf2.fit(X_train, y_train)
VotingClassifier(estimators=[('lr', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=42, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)), ('rf', RandomFore...'rbf', max_iter=-1, probability=True, random_state=42,
  shrinking=True, tol=0.001, verbose=False))],
         flatten_transform=None, n_jobs=None, voting='soft', weights=None)
for clf in (log_clf, rnd_clf, svm_clf, voting_clf2):
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    print(clf.__class__.__name__, accuracy_score(y_test, y_pred))
LogisticRegression 0.864
RandomForestClassifier 0.872
SVC 0.888
VotingClassifier 0.912
. 대수의 법칙(law of large numbers) : 동전을 던졌을 때 앞면이 51%, 뒷면이 49% 나온다고 가정. 동전을 한없이 던지면 앞면이 과반 이상 될 확률은 1로 수렴

.. $\sum_{x=500}^{1000}{1000\choose x}0.51^x 0.49^{1000-x}=1-{\rm pbinom}(499,1000,0.51)=0.7467502$

... 51%의 정확도를 가진 1000개의 분류기로 앙상블 모델을 구축한다고 할 때, 가장 많은 클래스를 예측으로 삼는다면 75%의 정확도를 기대할 수 있음

.. $\sum_{x=5000}^{10000}{10000\choose x}0.51^x 0.49^{10000-x}=1-{\rm pbinom}(4999,10000,0.51)=0.9777976$

.. $\sum_{x=50000}^{100000}{100000\choose x}0.51^x 0.49^{100000-x}=1-{\rm pbinom}(49999,100000,0.51)=1$

heads_proba = 0.51
coin_tosses = (np.random.rand(10000, 10) < heads_proba).astype(np.int32)
cumulative_heads_ratio = np.cumsum(coin_tosses, axis=0) / np.arange(1, 10001).reshape(-1, 1)
array([[1.        , 0.        , 0.        , 0.        , 1.        ,
        1.        , 1.        , 0.        , 0.        , 0.        ],
       [1.        , 0.        , 0.        , 0.5       , 1.        ,
        1.        , 1.        , 0.        , 0.5       , 0.5       ],
       [0.66666667, 0.33333333, 0.33333333, 0.66666667, 1.        ,
        0.66666667, 1.        , 0.        , 0.33333333, 0.66666667],
       [0.5       , 0.5       , 0.5       , 0.5       , 0.75      ,
        0.5       , 1.        , 0.25      , 0.25      , 0.75      ],
       [0.6       , 0.6       , 0.6       , 0.4       , 0.8       ,
        0.4       , 1.        , 0.2       , 0.2       , 0.8       ],
       [0.5       , 0.5       , 0.5       , 0.33333333, 0.66666667,
        0.33333333, 1.        , 0.33333333, 0.33333333, 0.83333333],
       [0.57142857, 0.57142857, 0.42857143, 0.42857143, 0.71428571,
        0.28571429, 1.        , 0.28571429, 0.42857143, 0.71428571],
       [0.5       , 0.625     , 0.5       , 0.375     , 0.625     ,
        0.25      , 0.875     , 0.375     , 0.5       , 0.75      ],
       [0.44444444, 0.55555556, 0.55555556, 0.44444444, 0.66666667,
        0.33333333, 0.77777778, 0.33333333, 0.44444444, 0.77777778],
       [0.5       , 0.5       , 0.5       , 0.4       , 0.6       ,
        0.4       , 0.7       , 0.4       , 0.5       , 0.8       ]])
plt.plot([0, 10000], [0.51, 0.51], "k--", linewidth=2, label="51%")
plt.plot([0, 10000], [0.5, 0.5], "k-", label="50%")
plt.xlabel("동전을 던진 횟수")
plt.ylabel("앞면이 나온 비율")
plt.legend(loc="lower right")
plt.axis([0, 10000, 0.42, 0.58])


7.2 배깅과 페이스팅(bagging and pasting)

. 훈련 세트의 서브셋을 무작위로 구성하고 각각 분류기를 학습시킴. 같은 샘플을 여러 개의 예측기에 걸처 사용 가능

.. 배깅 : 중복을 허용하여 샘플링. 한 예측기에서 같은 샘플을 여러 번 샘플 가능

.. 페이스팅 : 중복을 허용하지 않고 샘플링


# 배깅, 간접 투표 분류기
bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=True, n_jobs=-1, random_state=42)
paste_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    max_samples=100, bootstrap=False, n_jobs=-1, random_state=42)
bag_clf.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=100, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
paste_clf.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=False, bootstrap_features=False, max_features=1.0,
         max_samples=100, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
y_bag_pred = bag_clf.predict(X_test)
y_paste_pred = paste_clf.predict(X_test)
print(accuracy_score(y_test, y_bag_pred))
print(accuracy_score(y_test, y_paste_pred))
tree_clf = DecisionTreeClassifier(random_state=42)
tree_clf.fit(X_train, y_train)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
y_pred_tree = tree_clf.predict(X_test)
print(accuracy_score(y_test, y_pred_tree))
def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.5, -1, 1.5], alpha=0.5, contour=True):
    x1s = np.linspace(axes[0], axes[1], 100)
    x2s = np.linspace(axes[2], axes[3], 100)
    x1, x2 = np.meshgrid(x1s, x2s)
    X_new = np.c_[x1.ravel(), x2.ravel()]
    y_pred = clf.predict(X_new).reshape(x1.shape)
    custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])
    plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
    if contour:
        custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])
        plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", alpha=alpha)
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", alpha=alpha)
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.ylabel(r"$x_2$", fontsize=18, rotation=0)
plot_decision_boundary(tree_clf, X, y)
plt.title("결정 나무", fontsize=14)

plot_decision_boundary(bag_clf, X, y)
plt.title("배깅을 사용한 결정 나무", fontsize=14)

plot_decision_boundary(paste_clf, X, y)
plt.title("패스팅을 사용한 결정 나무", fontsize=14)


7.2.2 oob 평가

. 어떤 샘플이 붓스트랩 샘플로 뽑히지 않을 확률 : $m$이 한없이 커지면, $(1-\frac{1}{m})^m \rightarrow e^{-1}=0.37$ $\Rightarrow$ 평균적으로 훈련 샘플의 63%만 각 예측기에 훈련 샘플로 뽑힘

. oob(out-of bag) 샘플 : 선택되지 않은 훈련 샘플의 나머지 37% 샘플. 예측기마다 남겨진 37%는 모두 다름

. 각 예측기의 oob 샘플을 사용해 평가

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(random_state=42), n_estimators=500,
    bootstrap=True, n_jobs=-1, oob_score=True, random_state=40)
bag_clf.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=500, n_jobs=-1, oob_score=True,
         random_state=40, verbose=0, warm_start=False)
y_pred = bag_clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

7.3 랜덤 패치와 랜덤 서브스페이스

. BaggingClassifier는 특성 샘플링도 지원 : 각 예측기는 무작위로 선택한 입력 특성의 일부분으로 훈련

.. 고차원 자료(예, 이미지 등)를 다룰 때 사용

.. 매개변수 max_features, bootstrap_features로 조절

. 두 가지 방식

.. 랜덤 패치 방식(random patches method) : 샘플과 특성을 모두 샘플링

.. 랜덤 서브스페이스 방식(random subspace method) : 특성만 샘플링 $\Rightarrow$ bootstrap=False, max_samples=1.0, bootstrap_features=True, max_features < 1.0

iris = load_iris()
bag_clf = BaggingClassifier(DecisionTreeClassifier(random_state=42),
                            n_estimators=500, max_samples=50, bootstrap=True, 
                              n_jobs=-1, oob_score=True, random_state=42)
patch_clf = BaggingClassifier(DecisionTreeClassifier(random_state=42), 
                              n_estimators=500, max_samples=50, max_features=2,
                              bootstrap=True, bootstrap_features=True, 
                              n_jobs=-1, oob_score=True, random_state=42)
bag_clf.fit(iris["data"], iris["target"])
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=50, n_estimators=500, n_jobs=-1, oob_score=True,
         random_state=42, verbose=0, warm_start=False)
patch_clf.fit(iris["data"], iris["target"])
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=True, bootstrap_features=True, max_features=2,
         max_samples=50, n_estimators=500, n_jobs=-1, oob_score=True,
         random_state=42, verbose=0, warm_start=False)
7.4 랜덤 포레스트

. 배깅이나 페이스팅 방식을 적용한 결정 트리의 앙상블

. RandomForestClassifier (RandomForestRegressor) 클래스 사용

.. sklearn.ensemble.RandomForestClassifier : 랜덤 포레스트 분류기


. 트리의 노드를 분할할 때 전체 특성 중에서 최적의 특성을 찾는 대신 무작위로 선택한 특성 후보 중에서 최적의 특성을 찾음

bag_clf = BaggingClassifier(
    DecisionTreeClassifier(splitter="random", max_leaf_nodes=16, random_state=42),
    n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1, random_state=42)

bag_clf.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=16,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
bag_clf.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=16,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=42,
         bootstrap=True, bootstrap_features=False, max_features=1.0,
         max_samples=1.0, n_estimators=500, n_jobs=-1, oob_score=False,
         random_state=42, verbose=0, warm_start=False)
y_pred = bag_clf.predict(X_test)
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1, random_state=42)
rnd_clf.fit(X_train, y_train)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=16,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=42, verbose=0, warm_start=False)
y_pred_rf = rnd_clf.predict(X_test)
np.sum(y_pred == y_pred_rf) / len(y_pred)  # 거의 동일한 예측

7.4.1 엑스트라 트리

. 트리를 더욱 무작위하게 만들기 위해서 최적의 임계값을 찾는 대신 후보 특성을 사용해 무작위로 분할하여 그 중에서 최상의 분할을 선택

. 속도가 빠름

. ExtraTreesClassifier (RandomForestClassifier) 클래스 사용

.. sklearn.ensemble.ExtraTreesClassifier : 엑스트라 나무 분류기


xtree_clf = ExtraTreesClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1, random_state=42)
xtree_clf.fit(X_train, y_train)
ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=16,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
           oob_score=False, random_state=42, verbose=0, warm_start=False)
y_pred_xt = xtree_clf.predict(X_test)
np.sum(y_pred == y_pred_xt) / len(y_pred)  # 거의 동일한 예측
np.sum(y_pred_rf == y_pred_xt) / len(y_pred_rf)  # 거의 동일한 예측

7.4.2 특성 중요도

. 어떤 특성을 사용한 노드가 랜덤 포레스트에 있는 모든 트리에 걸쳐서 평균적으로 불순도를 얼마나 감소시키는지를 확인하여 특성의 중요도를 측정

In [35]:
iris = load_iris()
rnd_clf = RandomForestClassifier(n_estimators=500, n_jobs=-1, random_state=42)
rnd_clf.fit(iris["data"], iris["target"])
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=500, n_jobs=-1,
            oob_score=False, random_state=42, verbose=0, warm_start=False)
for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_):
    print(name, score)
sepal length (cm) 0.11249225099876374
sepal width (cm) 0.023119288282510326
petal length (cm) 0.44103046436395765
petal width (cm) 0.4233579963547681
mnist_path = os.path.join("datasets","mnist","mnist-original.mat")
mnist_raw = loadmat(mnist_path)
mnist = {
    "data": mnist_raw["data"].T, # 전치 70000 * 784 행렬
    "target": mnist_raw["label"][0],# 첫 번째 행
    "COL_NAMES": ["label", "data"],
    "DESCR": "mldata.org dataset: mnist-original"
rnd_clf = RandomForestClassifier(random_state=42)
rnd_clf.fit(mnist["data"], mnist["target"])
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=None,
            oob_score=False, random_state=42, verbose=0, warm_start=False)
def plot_digit(data):
    image = data.reshape(28, 28)
    plt.imshow(image, cmap = matplotlib.cm.hot,interpolation="nearest")
In [40]:

cbar = plt.colorbar(ticks=[rnd_clf.feature_importances_.min(), rnd_clf.feature_importances_.max()])
cbar.ax.set_yticklabels(['중요하지 않음','매우 중요함'])


7.5 부스팅(boosting)

. 약한 학습기를 여러 개 연결하여 강한 학습기를 만드느 앙상블 방법

. 이전 까지의 오차를 보정하도록 예측기를 순차적으로 추가

. 대표적 방법

.. 아다부스트(adaptive boosting)

.. 그래디언트 부스팅(gradient boosting)

7.5.1 아다부스트(adaptive boosting; AdaBoost)

. 이전 모델이 과소 적합한 훈련 샘플의 가중치를 높이는 방법을 취함

. 과정 : 분류

.. 동일한 가중치를 써서 첫 번째 분류기로 훈련 세트를 훈련하고 예측

.. 잘못 분류된 훈련 샘플의 가중치를 상대적으로 높임

.. 업데이트된 가중치를 써서 두 번째 분류기로 훈련 세트를 훈련하고 예측

.. 반복


m = len(X_train)

plt.figure(figsize=(11, 4))
for subplot, learning_rate in ((121, 1), (122, 0.5)):
    sample_weights = np.ones(m)
    if subplot == 121:
        plt.text(-0.7, -0.65, "1", fontsize=14)
        plt.text(-0.6, -0.10, "2", fontsize=14)
        plt.text(-0.5,  0.10, "3", fontsize=14)
        plt.text(-0.4,  0.55, "4", fontsize=14)
        plt.text(-0.3,  0.90, "5", fontsize=14)        
    for i in range(5):
        svm_clf = SVC(kernel="rbf", C=0.05, random_state=42)
        svm_clf.fit(X_train, y_train, sample_weight=sample_weights)
        plot_decision_boundary(svm_clf, X, y, alpha=0.2)
        plt.title("learning_rate = {}".format(learning_rate), fontsize=16)
        y_pred = svm_clf.predict(X_train)
        sample_weights[y_pred != y_train] *= (1 + learning_rate)

. 아다부스트 알고리즘

.. 각 샘플의 초기 가중치는 $\frac{1}{m}$으로

.. $j$ 번째 예측기의 학습 결과를 써서 $j$ 번째 훈련기의 에러율 $r_j$를 계산

.. 가중치를 적용한 $j$ 번째 예측기의 에러율 : $r_j=\frac{\sum_{i=1,\hat{y}_j^{(i)}\ne y^{(i)}}^m w^{(i)}}{\sum_{i=1}^m w^{(i)}}$

... $\hat{y}_j^{(i)}$ : $i$ 번째 샘플에 대한 $j$ 번째 예측기의 예측

.. $j$ 번째 예측기의 가중치 : $\alpha_j=\eta \log(\frac{1-r_j}{r_j})$

... 예측기가 정확할수록 가중치는 증가하고, 무작위 예측은 0, 무작위 예측보다 낮은면 음수 값을!

.. 가중치 업데이트 (tweaking) : $\hat{y}_j^{(i)}=y^{(i)}$이면 $w^{(i)}=w^{(i)}$; $\hat{y}_j^{(i)}\ne y^{(i)}$이면 $w^{(i)}=w^{(i)}\exp(\alpha_j)$

. $\eta$ : 학습률(기본값=1)

.. 가중치의 정규화 ($\sum_{i=1}^m w^{(i)}$으로 나눔)

.. 새 가중치로 훈련 세트를 훈련 $\Rightarrow$ 새 예측기 얻음

.. 반복

.. 아다부스트 예측 : $\hat{y}(x)={\rm argmax}_k\sum_{j=1, \hat{y}_j(x)=k}^N \alpha_j$ 즉, the predicted class is the one that receives the majority of weighted votes

... $N$ : 예측기의 수

. sklearn.ensemble.AdaBoostClassifier : 아다부스트 분류기


ada_clf = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), n_estimators=200,
    algorithm="SAMME.R", learning_rate=0.5, random_state=42)
ada_clf.fit(X_train, y_train)
          base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
          learning_rate=0.5, n_estimators=200, random_state=42)
plot_decision_boundary(ada_clf, X, y)

7.5.2 그래디언트 부스팅(gradient boosting)

. 샘플의 가중치를 수정하는 대신 이전 예측기가 만든 residual error에 새로운 예측기를 학습

. Boosting for Regression Trees

.. Set $\hat{f}(x) = 0$ and $r_i = y_i$ $\forall i$ in the training set

.. For $b = 1, 2,\ldots,B,$ repeat :

... Fit a tree $\hat f^b$ to the training data $(X, r)$

... Update $\hat f$ by $\hat{f}(x) + \lambda\hat{f}^b(x)$

... Update the residuals $r_i\leftarrow r_i − \lambda\hat{f}^b(x_i)$

.. Output the boosted model : $\hat{f}(x) =\sum^B_{b=1}\lambda\hat{ f}^b(x)$

X = np.random.rand(100, 1) - 0.5
y = 3*X[:, 0]**2 + 0.05 * np.random.randn(100)
In [45]:
tree_reg1 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg1.fit(X, y)
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')
y2 = y - tree_reg1.predict(X)
tree_reg2 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg2.fit(X, y2)
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')
y3 = y2 - tree_reg2.predict(X)
tree_reg3 = DecisionTreeRegressor(max_depth=2, random_state=42)
tree_reg3.fit(X, y3)
DecisionTreeRegressor(criterion='mse', max_depth=2, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=42, splitter='best')
X_new = np.array([[0.8]])
y_pred = sum(tree.predict(X_new) for tree in (tree_reg1, tree_reg2, tree_reg3))
def plot_predictions(regressors, X, y, axes, label=None, style="r-", data_style="b.", data_label=None):
    x1 = np.linspace(axes[0], axes[1], 500)
    y_pred = sum(regressor.predict(x1.reshape(-1, 1)) for regressor in regressors)
    plt.plot(X[:, 0], y, data_style, label=data_label)
    plt.plot(x1, y_pred, style, linewidth=2, label=label)
    if label or data_label:
        plt.legend(loc="upper center", fontsize=16)


plot_predictions([tree_reg1], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$\hat{f}^1(x_1)$", style="g-", 
                 data_label="훈련 세트")
plt.ylabel("$r$", fontsize=16, rotation=0)
plt.title("잔여 오차와 나무의 예측", fontsize=16)

plot_predictions([tree_reg1], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="$\hat{f}(x_1) = \hat{f}^1(x_1)$", 
                 data_label="훈련 세트")
plt.ylabel("$y$", fontsize=16, rotation=0)
plt.title("앙상블의 예측", fontsize=16)

plot_predictions([tree_reg2], X, y2, axes=[-0.5, 0.5, -0.5, 0.5], label="$\hat{f}^2(x_1)$", style="g-", 
                 data_style="k+", data_label="잔여 오치")
plt.ylabel("$r - \hat{f}^1(x_1)$", fontsize=16)

plot_predictions([tree_reg1, tree_reg2], X, y, axes=[-0.5, 0.5, -0.1, 0.8], 
                 label="$\hat f(x_1) = \hat f^1(x_1) + \hat f^2(x_1)$")
plt.ylabel("$y$", fontsize=16, rotation=0)

plot_predictions([tree_reg3], X, y3, axes=[-0.5, 0.5, -0.5, 0.5], label="$\hat f^3(x_1)$", style="g-", 
plt.ylabel("$r - \hat f^1(x_1) - \hat f^2(x_1)$", fontsize=16)
plt.xlabel("$x_1$", fontsize=16)

plot_predictions([tree_reg1, tree_reg2, tree_reg3], X, y, axes=[-0.5, 0.5, -0.1, 0.8], 
                 label="$\hat f(x_1) = \hat f^1(x_1) + \hat f^2(x_1) + \hat f^3(x_1)$")
plt.xlabel("$x_1$", fontsize=16)
plt.ylabel("$y$", fontsize=16, rotation=0)


. sklearn.ensemble.GradientBoostingRegressor : 그래디언트 부스팅 회귀


. sklearn.ensemble.GradientBoostingClassifier : 그래디언트 부스팅 분류기


gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=1.0, random_state=42)
gbrt.fit(X, y)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=1.0, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=3, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="앙상블의 예측")
plt.title("learning_rate={}, n_estimators={}".format(gbrt.learning_rate, gbrt.n_estimators), fontsize=14)

gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=3, learning_rate=0.1, random_state=42)
gbrt.fit(X, y)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=3, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
gbrt_slow = GradientBoostingRegressor(max_depth=2, n_estimators=200, learning_rate=0.1, random_state=42)
gbrt_slow.fit(X, y)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=200, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], label="앙상블의 예측")
plt.title("learning_rate={}, n_estimators={}".format(gbrt.learning_rate, gbrt.n_estimators), fontsize=14)

plot_predictions([gbrt_slow], X, y, axes=[-0.5, 0.5, -0.1, 0.8])
plt.title("learning_rate={}, n_estimators={}".format(gbrt_slow.learning_rate, gbrt_slow.n_estimators), fontsize=14)


. 조기 종료를 사용한 그래디언트 부스팅

X_train, X_val, y_train, y_val = train_test_split(X, y, random_state=49)
gbrt = GradientBoostingRegressor(max_depth=2, n_estimators=120, random_state=42)
gbrt.fit(X_train, y_train)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=120, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
errors = [mean_squared_error(y_val, y_pred) for y_pred in gbrt.staged_predict(X_val)]
bst_n_estimators = np.argmin(errors)
gbrt_best = GradientBoostingRegressor(max_depth=2, n_estimators=bst_n_estimators, random_state=42)
gbrt_best.fit(X_train, y_train)
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=2, max_features=None,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=1,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=55, n_iter_no_change=None, presort='auto',
             random_state=42, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
min_error = np.min(errors)
plt.figure(figsize=(11, 4))

plt.plot(errors, "b.-")
plt.plot([bst_n_estimators, bst_n_estimators], [0, min_error], "k--")
plt.plot([0, 120], [min_error, min_error], "k--")
plt.plot(bst_n_estimators, min_error, "ko")
plt.text(bst_n_estimators, min_error*1.2, "최소", ha="center", fontsize=14)
plt.axis([0, 120, 0, 0.01])
plt.xlabel("나무 개수")
plt.title("검증 오차", fontsize=14)

plot_predictions([gbrt_best], X, y, axes=[-0.5, 0.5, -0.1, 0.8])
plt.title("최적 모델 (나무 %d 개)" % bst_n_estimators, fontsize=14)

gbrt = GradientBoostingRegressor(max_depth=2, warm_start=True, random_state=42)

min_val_error = float("inf")
error_going_up = 0
for n_estimators in range(1, 120):
    gbrt.n_estimators = n_estimators
    gbrt.fit(X_train, y_train)
    y_pred = gbrt.predict(X_val)
    val_error = mean_squared_error(y_val, y_pred)
    if val_error < min_val_error:
        min_val_error = val_error
        error_going_up = 0
        error_going_up += 1
        if error_going_up == 5:
            break  # Early stoppin
print("Minimum validation MSE:", min_val_error)
Minimum validation MSE: 0.002712853325235463

7.9 스태킹(stacking, stacked generalizaion)

. 블렌더 학습 과정 :

.. 훈련 세트를 두 개의 서브셋으로 나눔

.. 첫 번째 서브셋을 첫 번째 레이어를 훈련시키기 위해 사용

.. 첫 번째 레이어의 예측기를 사용하여 두 번째 서브셋 (홀드 아웃 셋)에 대해 예측

.. 타깃값은 그대로 쓰고 앞서 예측한 값을 입력 특성으로 하는 새 훈련 세트를 만듦

.. 블렌더가 새 훈련 세트를 훈련 (즉, 첫 번째 레이어의 예측을 가지고 타깃값을 예측하도록 훈련)



. 마지막 예측기(blender or meta learner)는 세 예측기의 예측 값을 입력으로 받아 최종적으로 예측


. 블렌더를 여러 개 훈련시킬 수도 있음 (eg, Linear regression, SVM, Random forest 등) $\rightarrow$ 블렌더 레이어가 생김

. 방법 :

. 훈련 세트를 세 개의 서브셋으로 나눔

. 첫 번째 훈련 세트 : 첫 번째 레이어를 훈련시키는 데 사용

. 두 번째 훈련 세트 : 첫 번째 레이어의 예측기로 두 번째 레이어를 훈련시키기 위한 훈련 세트를 만드는 데 사용

. 세 번째 훈련 세트 : 두 번째 레이어의 예측기로 세 번째 레이어를 훈련시키기 위한 훈련 세트를 만드는 데 사용


The End of Chapter 7