Training/Testing Datasets

Hello,

My team is participating in the “Outperform the S&P 500” challenge and we had a question concerning the split between the training and testing datasets in the relevant time period. As we only have access to the data between 2007-2016 and our model is evaluated on the backtest performed on the same time period, how are we suppose to limit the look-ahead bias?


Basically are we suppose to train our model on a portion of the same data we are testing it on? Or is there separate testing data we will be evaluated on?


Any help would be appreciated. Really looking forward to compete in the competition.

Jason

Hello Jbohne3,


It is up to you to split the data like to train set to generate signals and evaluate your model as needed. Please have a look at this page to see how to limit look-ahead bias: Alphien Dashboard


Please also have a look at the competition selection criteria on how the performance of the strategy is evaluated: Alphien Dashboard


Thank you.