Model Tuning - Label Postprocessing : 실전 적용

Data_Pistachio 2020. 10. 2. 01:38

https://steadiness-193.tistory.com/291

Validation - Label Postprocessing (라벨 후처리)

https://steadiness-193.tistory.com/286 Validation - KFold www.kaggle.com/c/titanic/data Titanic: Machine Learning from Disaster Start here! Predict survival on the Titanic and get familiar with ML b..

steadiness-193.tistory.com

위 포스팅에선 train_test_split을 이용해서 라벨 후처리와 점수 측정까지 동시에 해보았다.

그러나 실제 대회에선 저렇게 나누지 않고 진행한다.

라벨 후처리는 이진 분류에서 쓰이니, 익숙한 타이타닉으로 연습해보자

www.kaggle.com/c/titanic

Titanic: Machine Learning from Disaster

Start here! Predict survival on the Titanic and get familiar with ML basics

www.kaggle.com

전처리는 아래 포스팅에 있으나

https://steadiness-193.tistory.com/290

Validation - OOF Ensemble (Out-of-Fold)

steadiness-193.tistory.com

행 삭제와 train_test_split을 진행하지 않는다.

또한 titanic 공식 점수 측정 방식인 정확도(accuracy_score)로 비교한다.

oof_pred, oof_train

train 전체 행인 891행을 예측해서 모아둔 확률 값을 얻어냈다.

scoring

100번 반복해 얻어낸 최종 threshold 값은 0.413이다.

그냥 argmax를 이용해서 구한 값과 비교하기 위해 두 개를 제출해보자

제출 결과 비교

아쉽게도 그냥 oof ensemble한 oof_pred 보다 오히려 정답률이 낮아졌다.

이렇게 라벨 후처리를 한 것과 안한 것의 제출 값을 비교해야할 수도 있다.

저작자표시 (새창열림)