2016-11-09

Regularized Greedy Forest (RGF) のロス関数をカスタマイズする

機械学習

RGF の公式実装のロス関数をカスタマイズするには、C++ のコードを直接書き換えることになる。とはいえそんなに難しくない。以下の方法がお手軽。

src/comm/AzLoss.cpp を開く
AzLoss::getLosses 関数に loss_type == AzLoss_Xtemp という場合分けを追加
o._loss1 にロス関数の 1 階微分を代入する
o.loss2 にロス関数の 2 階微分を代入する
使用時には loss に 'Xtemp' を指定する

getLoss の方には追加せずとも動いた。getLoss と getLosses とか、_loss1 と loss2 とか、命名が滅茶苦茶すぎてヤバい……。

2016-11-02

malloc にわざと失敗させる

#define _GNU_SOURCE

#include <dlfcn.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

typedef void* (*malloc_t)(size_t);

static malloc_t libc_malloc = NULL;
static unsigned long malloc_max_called = 1000 * 1000;

void initialize() {
  libc_malloc = (malloc_t)dlsym(RTLD_NEXT, "malloc");
  if (libc_malloc == NULL) {
    perror("dlsym");
    exit(1);
  }

  char const* const s = getenv("MALLOC_MAX_CALLED");
  if (s == NULL) return;

  sscanf(s, "%lu", &malloc_max_called);
}

unsigned long count = 0;

void* malloc(size_t n) {
  if (libc_malloc == NULL) initialize();
  if (count >= malloc_max_called) return NULL;

  count++;
  return libc_malloc(n);
}

% gcc -Wall -g -fPIC -shared failing_malloc.c -o failing_malloc.so -ldl
% MALLOC_MAX_CALLED=1000 LD_PRELOAD=./failing_malloc.so ./a.out

2016-10-26

機械学習アルゴリズムの直感を養えるデモ・記事

発見次第更新予定

デモ

Neural Network

Gradient Boosting

t-SNE

How to Use t-SNE Effectively — Distill

リンク集

Interactive demonstrations for ML courses

記事

How to Use t-SNE Effectively — Distill ・・・ t-SNE の出力はしばしば誤解を招くという話
Neural Networks, Manifolds, and Topology -- colah's blog ・・・ NN の非線形な変換による中間表現の直感

2016-10-21

NIPS のヤバいプロモーションビデオ

機械学習

www.youtube.com

音を出して観るべき。

2016-10-21

Python の処理系

Pysco
Pyjion
Nuitka
Skulpt

2016-10-19

Chainerを明示的にCUDA無しを指定してインストールする

python setup.py --cupy-no-cuda install

https://github.com/pfnet/cupy/blob/master/cupy_setup_build.py#L211github.com

nvvp とか使いたさに中途半端に手元のマシンに CUDA を入れているとこういう指定が必要になる。

2016-09-29

Deep Learning のデバッグ

ミニバッチ化時にミスることが多いので、以下のようなことをすると良いらしい。

データごとの計算を書いてみて、1 つずつ計算したものと照合する
ミニバッチサイズを 1 にして計算したものの和or平均とミニバッチで計算したものを比較する

2016-09-25

numpy の行列乗算：matmul, dot, @

stackoverflow.com

dot と matmul

2 次元では完全に同一。3 次元以上では異なる挙動をする。

dot は a の最後の軸と b の最後から 2 番目の軸を掛け合わせる
matmul は行列の配列だとみなして行列積を計算する

@ 演算子

Python 3.5 以降では @ 演算子や @= 演算子が存在する。これは __matmul__ を呼ぶが、numpy では matmul に相当するっぽい。

2016-09-24

t-SNE の実装はどれを使うべきなのか？

scikit-learn の問題点

scikit-learn 信者としてはとりあえず scikit-learn の実装を使いたくなるが、scikit-learn の実装はおすすめできないらしい。

-https://www.red dit.com/r/MachineLearning/comments/47kf7w/scikitlearn_tsne_implementation/ （はてなブログはred ditのURLを貼るとbad requestになり投稿できない謎仕様）

Besides being slower, sklearn's t-SNE implementation is fine once you realize the default learning rate is way too high for most applications. The definitive bh_tsne implementation by the author sets the learning rate to 200, and the original 2008 paper describes setting it to 100, while sklearn is set to 1000.

遅い
デフォルト値の learning rate が大きすぎる

とのこと。それに加えて、自分の経験としては、Barnes Hut 木を指定してもメモリをもりもり確保して（即 Θ(n^2) のメモリを確保してる気がする）メモリ不足で死ぬ。だめ。

公式実装に基づくものたち

GitHub - lvdmaaten/bhtsne: Barnes-Hut t-SNE
- 公式実装。軽い Python ラッパーも入っている。プロセスを起動する感じ。バイナリの置き場所を考えないといけないので面倒。
GitHub - danielfrg/tsne: A python wrapper for Barnes-Hut tsne
- 公式実装を中に置いてる。reddit でおすすめされている。
- pip install tsne → エラーが起きてインストールできず。Issue を見ると Python 3 系では動かなそう。
GitHub - dominiek/python-bhtsne: Python module for Barnes-Hut implementation of t-SNE (Cython)
- 同じく公式実装を中に置いている。動きました。
- max_iter 等の一部のパラメータが設定できない……

自分の結論

$ pip install bhtsne

からの

import sklearn.base
import bhtsne
import numpy as np


class BHTSNE(sklearn.base.BaseEstimator, sklearn.base.TransformerMixin):

    def __init__(self, dimensions=2, perplexity=30.0, theta=0.5, rand_seed=-1):
        self.dimensions = dimensions
        self.perplexity = perplexity
        self.theta = theta
        self.rand_seed = rand_seed

    def fit_transform(self, x):
        return bhtsne.tsne(
            x.astype(np.float64), dimensions=self.dimensions, perplexity=self.perplexity, theta=self.theta,
            rand_seed=self.rand_seed)