AdaBoost & Bayes
May. 29th, 2016 10:36 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Я нынче прочухал, как АдаБуст выглядит с байесианской точки зрения:
http://babkin-cep.blogspot.com/2016/05/adaboost-3-and-bayesian-logic.html
Заодно и вроде прочухал принцип того, как работает бустинг вообще, изображенный на пальцах. Вот он, оттуда:
The premise of boosting is that we're able to find a number of methods (what they call "hypotheses" in AdaBoost) to predict the correct outcomes of the training cases, each method correctly predicting more than 50% of the training cases. Then if we collect a large-ish number of these methods, we can predict the correct outcomes of all the training cases simply by averaging the predictions of these methods. And the other cases will follow the training cases (unless an overfitting happens). Since more than 50% of the cases are correctly predicted by each method, after the averaging more than 50% of the votes for each training case will be correct, and thus the result will be correct too. Of course this depends on the correct predictions being distributed pretty evenly among the cases. If we have a thousand methods that predict correctly the same cases and incorrectly the other cases, obviously after averaging these other cases will still be predicted incorrectly. So the selection of methods must somehow shuffle the preference for the cases, so that the next picked method will predict well the cases that have been predicted poorly by the previous picked methods. That's it, that's the whole basic idea.
http://babkin-cep.blogspot.com/2016/05/adaboost-3-and-bayesian-logic.html
Заодно и вроде прочухал принцип того, как работает бустинг вообще, изображенный на пальцах. Вот он, оттуда:
The premise of boosting is that we're able to find a number of methods (what they call "hypotheses" in AdaBoost) to predict the correct outcomes of the training cases, each method correctly predicting more than 50% of the training cases. Then if we collect a large-ish number of these methods, we can predict the correct outcomes of all the training cases simply by averaging the predictions of these methods. And the other cases will follow the training cases (unless an overfitting happens). Since more than 50% of the cases are correctly predicted by each method, after the averaging more than 50% of the votes for each training case will be correct, and thus the result will be correct too. Of course this depends on the correct predictions being distributed pretty evenly among the cases. If we have a thousand methods that predict correctly the same cases and incorrectly the other cases, obviously after averaging these other cases will still be predicted incorrectly. So the selection of methods must somehow shuffle the preference for the cases, so that the next picked method will predict well the cases that have been predicted poorly by the previous picked methods. That's it, that's the whole basic idea.