Hi Alphien Team,
How do you rank robustness? Any advices how one can improve his/her robustness score?
Hi Alphien Team,
How do you rank robustness? Any advices how one can improve his/her robustness score?
The idea of robustness is to test your strategy against ‘unseen’ scenarios to try to make sure you model is build to forecast the future more than explaining the past.
One way we do it is to add / swap / change and stress test your code with different data, with this ‘unseen’ data we check how your code will react (similar to what could happen with future data).
Robustness testing is done differently for each competition and to avoid ‘gaming’ the robustness scoring we don’t want to be 100% transparent, there is no such thing as a perfect Robustness score, in financial forecasting only future will tell. Alphien’s approach to robustness scoring is fair, equitable and uses the same methodology for everyone.
You can improve your robustness score by avoiding the typical mistakes of quantitative investing, in particular for this competition (with the short amount of historical data available):
1- Storytelling : Understand why your strategy would work in the future and in a different environment instead of believing in your own ‘story’ that your model that has worked in the past will work in the future.
2- Signal decay and turnover : Understand whether your weight change makes sense, the more turnover with less justifications of why it trades usually increases signal decay
3- Outlier: look at outlier trades / bull run / drawdowns, is your strategy built around this or robust to other environment. Spectacular success and avoidance of big failure should not make all your model.
4- Asymmetric weight on certain asset vs other asset, is your model constructed to favor one asset in particular ? Or is your model agnostic on the underlying assets. If your model is biased is more likely to be less robust.
5- Last but probably the more important : Data mining and snooping bias. In finance this is badly perceived as the behavior of extensively search for the patterns or rules that fit a model perfectly, given that the data that you have is insufficient to create a model which works in the future, will undeniably reduce robustness of your model.
The above is a non-exhaustive list of the guiding principle of what we would want to measure in our robustness ranking, happy to get also other ideas if anyone has.
Happy coding ! Lionel.
P.S.: Feel free to check Alphien Dashboard
Thank you for the illuminating post. I have to admit that I’ve kind of given up on the competitions, instead seeing them as a route into live replication. From what I have seen elsewhere (e.g. Jane Street competition on Kaggle), the prizes are awarded on a substantial (3 - 6 months) out of sample period that starts when the competition ends. Hopefully, this is what live replication achieves.
Yes this is correct - live replication is helping but is also not sufficient specially 3-6 months is by far not representative as models work well in certain regime that can last for longer. Running it through one ‘cycle’ (whether that’s an economic cycle or any other seasonality that the data suggest would also help).
The competition will bring result which will be imperfect but as live replication stands longer, the model will be further tested through real data and will have more chance to be licensed for sure. The competition ranking is almost like a first filter to allow strategies to be featured and licensed in real funds or real money.
At Alphien we intend to keep paper replication for years after the end of the competition, and this is where the value will come for both the research scientist who has built the model and the investor who wants to replicate it !
Thanks for participating, Lionel.
Is the robustness scoring methodology finalized or are you still working on it as you said earlier?
Also, why are the performances not exactly the same (for public leaderboard) with respect to port.backtest(tc=0.1)?
A new / improved version of Robustness is being implemented and documented. Hopefully will be released in a few days.
There are discrepancies in performances as we have implemented a slightly more accurate way to calculate annualised performances. The mismatch should disappear in a few days, thank you for pointing this out.
I also encountered the same issues, backtest results are completely different in the public leaderboard.
If you have implemented a new way to calculate performances, will the formulas for the ratios in port.backtest change too?
The new robustness ranking has been released which in addition of data implied robustness (data testing on other data that is provided),
Note that following the steps on Crypto Competition Essential Steps will help the robustness scoring.
Please share any comment / questions and feedback, this week points have been distributed using the new Robustness ranking framework.
Thank you to all participants for all the contribution and excellent submission so far.
Adding a small point -
At every version update, the Robustness Scores will be re-calculated.
You may just mention the changes in the Description box when you submit.
Thank you. The most obvious potential weakness I have observed is the sensitivity of some of the submissions to the time period chosen. This can be seen by comparing the Sharpe Ratio on the Community Leaderboard versus the Sharpe Ratio in the Public Leaderboard. I don’t know how far your data goes back, but I struggle to see how a system can attract a stellar robustness ranking if the time period chosen matters so much.
In many ways, the Alphien Team is effectively deciding the results of the competition on a qualitative basis.
I do appreciate the feedback from my submissions, but it makes me quite nervous signing away the rights to my ideas upfront if everything is so subjective.
Hi, thanks for the update. I am just wondering how the score is actually calculated.
Is it split between the 4 points you mentioned, like 1/4 from data implied robustness, 1/4 from code quality, 1/4 from rational and 1/4 from parameters?
I have added clear references/motivations and my commented codes mostly are within the payout function which I was told not to include in the Critical Steps, would it affect the score?
Is the calculation relative to other submissions? As I find myself ranked the lowest in robustness, so I am trying to understand why.
I kind of agree here, I think robustness score should be made more robust itself, probably with more weight on data implied robustness, and/or maybe its relative weights to other metrics lowered a bit; I mean if i was buying these models I would definitely select the ones with sharpe on test set over 6/7 and DD < 40 regardless of that metric which seem to change a lot with small modifications from one submit to the next. Also same question as orz : where is the code to be commented; sensitivity analysis displayed? I was also told that the notebook should be essentially empty with everyting in qlib?
Alphien guiding principle is to be equitable, fair for everyone and as quantitative as possible.
As you point out in your post there is a tradeoff between using limited data with arbitrary choice that we make. The new robustness score framework is meant to exactly pare this issue.
Alphien (and the sponsor) may make qualitative choices on the methodology but the choices are the same for all. We remain neutral to all submissions with no bias to anyone. Are you sure that you saying that "everything is so subjective’ at Alphien ? (i.e.: based on beliefs or feelings rather than facts). The entire company is build on objectivity and facts, help us improve and let us know any mistake we are making.
I would like to point out that we can be qualitative without being subjective, actually Alphien approach (as differently than other platforms) is to really partner with you to improve the quality of your work.
Note that your ideas are treated with the outmost confidentiality inside Alphien, only limited individuals can have access to your code to preserve confidentiality (even shared your code remains your own forever as per the T&C of the platform),
This is our intention to improve the quality of your submission, our comments on your submission can be qualitative but in all research set-up that should be beneficial to you. That why we review your code, we help improve it, we have people trying to license it and we provide the entire environment. We will never provide any information on other people submissions.
I hope that everyone enjoy and see that Alphien is trying to achieve a fair upside for both parties, this is not easy to achieve - trust me !
I hope that answer your nervousness and makes your more comfortable with Alphien.
Thanks for your participation, Lionel.
Hello - As mentioned above, we do not disclose the full methodology of the scoring. If you have question on your submission please send your submission for feedback and we will look at it and tell you why the robustness score is weak. The methodology is the same for all.
Thanks for the question, Lionel.
I think I was very clear in my post that there is a major difference in the Sharpe Ratio performance on the Community Leaderboard and the Public Leaderboard for one of the high ranking submissions.
It is not fair to point out which submission, but 60% of the competition ranking seems to be vulnerable to the chosen start date of 2017.01.01.
I wasn’t being emotional and you did ask for feedback. If a submission can win a prize despite being vulnerable to a slight change in the start date, then something doesn’t quite seem right.
Is there anything you don’t understand in our documentation : Essential Steps. Help us improve it if that’s not clear. Check point 4. Feel free to answer and put questions on your own submission and we will try to be helpful.
Thank you. Lionel.
P.S. : Be assured that we provide to interested licensor all metrics that they require, we do not provide the code or anything that could leak confidential information for the to reproduce the algo without Alphien, that’s not our join interest to continue receiving and sharing licensing fees !
At that moment, we are auditing all inconsistencies around Sharpe Ratio calculations, we do not think a slight change in sharpe ratio that we apply the same for everyone will affect the ranking though.
All feedback is appreciated great to hear your concerns, we are here to help alleviate them.
Thanks for participation, Lionel.
We will try to disclose more and more about our robustness methodology as we have questions related to it. Today I shared this article Degrees of Freedom in a Model that you may all be interested in.
Thanks and happy coding. Lionel.
For your information - we have found inconsistencies in the community leaderboard. The data in the community leaderboard is not correct and will be corrected by monday EOD. Particular the statistics including Drawdown, Sharpe and others for full period for the Crypto competition were all erroneous.
For your ranking and statistics please only refer to :