Question on Task 2 marking criteria

yaow7 · September 25, 2020, 7:34am

Hi all,

I have a question about the marking criteria of Task 2 Beat the Speed of Complex Pricing Models. According to the criteria, the rank will be based on the geometric mean of maximum absolute error of the prediction. I’m wondering if it means our model will predict from different new batches and the mean is calculated based on the max absolute error of each batch? Any chances we can have the mathematical expression for clarification?

Thanks a lot for helping me understand the criteria and please bear with me if I’m asking stupid questions.

ls · September 25, 2020, 4:41pm

Good morning - Let me try to answer your question :

max(|Y* - Y|) where Y* is the vector of your prediction for a specific batch, Y is the vector of the real solution. |xxx| is absolute value. max gives you the the maximum for a vector.

The geometric mean can be used if you use multiple batch but in the case of the prediction, we will look at a set of data ‘unknown’ and we will look at the maximum absolute error of the batch.

Basically you need to make sure that your model has ‘no outliers’ as the penalty of the outlier will be very high. This is a pricing model so you don’t want ‘any’ price to be completely off (even if the chance of having that wrong price is very low).

I hope that clears up the question - let us know if you need more clarification.

Thanks, Lionel.

ccqqkk · September 25, 2020, 8:50pm

Hi Lionel,

Thanks for the detailed answer for this question.

You mentioned that in actual marking, only one batch will be used for prediction and the “geometric mean of maximum absolute errors” is simply the max of |Y - Y_hat|. Could you please confirm on this (at least for the submission of 27th of Sep), it does change the approach of how we evaluate our models.

Regards

ls · September 25, 2020, 9:03pm

A hidden set of data will be used to evaluate your model. One or multiple batches can be used and will not be revealed as per request from UBS. Yes it should change the approach of how you evaluate your model, probably you should be looking at the the max of |Y* - Y| for each batch when you train your model. On a multiple batch basis we may use the geometric mean of all the ‘hidden’ batches we are using to evaluate your model.

Basically, you need to train your model putting a high penalty on the ‘worst’ outlier, if the worst outlier is not far away from the second worst then that’s fine but if your worst outlier is very far away from your second worst, your ranking will be suffering…

Does this clarify your point ? I would suggest to all to submit early your model (even as soon as today), we will be able to give you feedback before the deadline of Sunday depending on our time.

Let us know if we can clarify further, Lionel.