Kaggle - Bike Sharing Demand : Fit, Predict (Random Forest)

https://steadiness-193.tistory.com/229

Kaggle - Bike Sharing Demand : EDA & Feature Engineering (2)

https://steadiness-193.tistory.com/228 Kaggle - Bike Sharing Demand : EDA & Feature Engineering 데이터 출처 https://www.kaggle.com/c/bike-sharing-demand/data 컬럼 설명 자료형 datetime 일시 연-월-일..

steadiness-193.tistory.com

위 포스팅에서 전처리 완료한 feature들과

https://steadiness-193.tistory.com/231

Kaggle - Bike Sharing Demand : Hyperparameter Tuning (하이퍼파라미터 튜닝)

https://steadiness-193.tistory.com/230 Kaggle - Bike Sharing Demand : Evaluation (Random Forest, Cross-validation) https://steadiness-193.tistory.com/228 Kaggle - Bike Sharing Demand : EDA & Feature..

steadiness-193.tistory.com

위 포스팅에서 찾아낸 하이퍼파라미터를 이용해서

test 데이터셋의 자전거 대여량을 예측해보자

Use Random Forest

이전 포스팅에서 best_max_depth는 99, best_max_features는 0.861319로 찾아냈다.

n_estimators는 최대한 높은 값을 준다.(3000)

알고리즘 학습용 데이터셋

count를 바로 예측하는 것이 아니라

casual과 registered에 log를 씌운 컬럼을 각각 예측한 뒤 더해주면 된다.

[Fit]

머신러닝 알고리즘 학습 (fitting)

train 데이터셋의 feature와 train의 label이 필요.

[Predict]

fit이 끝난 뒤 예측.

test 데이터셋의 feature가 필요.

Fit & Predict - casual

Fit & Predict - registered

최종 예측값

위에서 구한 casual_predict와 registered_predict를 더해주면

최종적인 자전거 대여량 예측값이 나오게 된다.

주의

exp로 변환한 값을 더해주어야 한다.

아래 weird 처럼 log값을 더한 뒤 exp를 하게 되면 명백히 다른 값이 나온다.

저장 후 제출

Kaggle에서 제공한 sampleSubmission 파일이다.

count 컬럼의 값을, 최종 예측값인 predictions로 바꿔주면 된다.

index=False 필수

결과 확인

등수 확인

0.38194는 전체 3,242팀 중 123위

상위 약 3.8%

저작자표시 (새창열림)

'Kaggle' 카테고리의 다른 글

Kaggle - train, test를 한꺼번에 전처리하기 (0)	2020.08.25
Kaggle - Bike Sharing Demand : Hyperparameter Tuning (하이퍼파라미터 튜닝) (0)	2020.08.19
Kaggle - Bike Sharing Demand : Evaluation (Random Forest, Cross-validation) (0)	2020.08.19
Kaggle - Bike Sharing Demand : EDA & Feature Engineering (2) (0)	2020.08.19
Kaggle - Bike Sharing Demand : EDA & Feature Engineering (0)	2020.08.18

Steadiness