Data Science Team & Tech Lead

Tag: Forecasting

  • Midpoint review of M6 competition – results

    As a quick follow-up to my last post on the midpoint review of M6 competition, I have looked into the actual performance statistics of my entries in the first half of the competition.

    The results are suprising in a few aspects :

    • The results exhibit huge fluctuations from month to month. (Perhaps partially reflecting the current uncertain and volatile market conditions.)
    • My forecasting results are better than my own initial expectation, beating the benchmark (i.e. 0.16) in most months. (Something happened in Mar & Apr 2022 that I will explain below.)
    • My investment decision results are bad as expected, due to the conservative strategy that I have used.
    Submission NumberDateOverall RankPerformance (Forecasts)Rank (Forecasts)Performance (Decisions)Rank (Decisions)
    1stFeb 202236.50.15984464.2849227
    2ndMar 2022111.50.1661992-6.08808131
    3rdApr 2022115.50.161321090.04113122
    4thMay 2022640.1594960-1.9618168
    5thJune 2022650.152946-0.7612124
    6thJuly 2022180.1436613.7955635
    M6 results for Dull AI thus far, broken down by individual month. Note that the results for July 2022 is not finalized yet.

    Huge fluctuations in results

    The fluctuations in results for both the forecasting and investment decision categories are huge, changing quite wildly from month to month despite (mostly) the same approaches being used throughout the period.

    This can be explained by the widely known low signal-to-noise ratio phenomenon observed in stock markets, which makes forecasting and investment decision difficult problems to solve.

    However, the stock markets in the first half of 2022 are also more volatile than usual, due to a combination of unknown factors at work (including pandemic, war, recession and inflation). One example of this can be seen in the elevated levels of the CBOE Volatility Index (VIX), which roughly tracks the “fear” in the S&P500 index.

    CBOE Volatility Index (VIX) taken from Google

    Forecasting results are better than my own expectations

    Out of the previous 5 months, my forecasting submission was able to beat the benchmark in 3 out of 5 months (i.e. to be lower than 0.16). The benchmark is at 0.16, which means one assumes no knowledge of the future ranked returns of each stock.

    Of the 2 months that I failed to beat the benchmark, actually something happened behind the scene. It mostly happened because I lack the time to properly vet and test the solution. For the submissions in Mar and Apr 2022, my algorithm spat out results that I uploaded as usual. But I noticed my performance degraded severely in those 2 months, so I did a few changes to the algorithm:

    • Setup unit tests around a few of the critical functions
    • Checked data sources in detail to ensure quality and remove any low-quality data sources as inputs
    • Re-written my own forecasting module, casting away a pre-built pipeline that I used from a library
    • Added diagnostic checks around the models that I trained each time to ensure that there was nothing off during the training process

    A funny side story. It was after I managed to add the diagnostic checks that I noticed the models I trained in Mar and Apr 2022 cannot distinguish between the worst and best-performing stocks effectively (i.e. almost interchangeably).

    Even now, I still do not know what is the exact cause (if any), or if my algorithm is mostly bug-free now, or even if my current result is just a fluke.

    Investment decision results are as bad as expected

    My investment decision results are as bad as expected.

    Firstly I spent the least time on this part, without setting up a proper framework (i.e. backtesting integration) even now. Secondly and partially because of the first reason, I am running my investment using a very traditional and conservative approach which I will not name now as the competition is still ongoing.

    Compared to the forecasting component, my investment decision component is mostly so simple that anyone can replicate it in an Excel file. The only more complex part is the portfolio allocation algorithm. I have a simple portfolio allocation algorithm written to manage risks, but it is very trivial that doesn’t worth any additional special mention.

    As a fun fact, I did a drastic change to my investment strategy as of the 6th submission. It would be interesting to see how it turns out.

    Ending

    While it feels nervous posting this performance summary when I realised the results for my latest (best so far) 6th submission is still pending, I still decided to share it for now just to stick to my blog post schedule.

    Hopefully you get something out of this post, and I hope that this will not be laughed at as an example of pre-mature celebration. Let’s see how things go for the remaining half of this competition!

  • Midpoint review of M6 competition

    With the ending of June, it is now the halfway point of M6 competition. It may be a good time to do a quick review of my progress and learnings from the M6 competition so far.

    (And also to get me into the habits of regularly writing blogs!)

    Progress in M6 competition

    For a brief period at the beginning of M6 competition, I was among the top 20 on the leaderboard (overall rank). But ever since then, I have been languishing between rank 80-110.

    I tried a few ways to improve the results (e.g. adding unit testing, expanding security universe), however the results either (1) did not improve, or (2) I lack the time/energy to fully implement them.

    As of now, I still have 24 items/ideas on my to-do list to be tested or implemented to improve my solution for M6 competition!

    That being said, I would still say that I have achieved my original goal, which is to use M6 competition as a motivator to build an investment pipeline (including automated data retrieval, forecasting and portfolio optimisation).

    If you are interested in my exact methodology, perhaps as a counter-example of what not to do, I will share it once the competition is over.

    Learnings from M6 competition

    1. Getting access to required investment data is hard

    Before you jump in and mention that everyone can easily get free price data from Yahoo Finance or other similar sources, I just want to say that I agree with you.

    But getting access to price data is the first step. Typically you will also want to be able to screen for securities to create your investable universe. And this screening requires non-direct price data, e.g. market cap and valuation metrics. The problem becomes even tougher if you intend to create a cross-country/cross-exchange universe.

    Assuming that you got access to a screening capability (the easiest way is to buy it from a provider), the next step is to build a stable (ideally automated) connection to a chosen data provider. This step can be tricky, as it depends on how much you are willing to pay for a data service, and this roughly correlates with how stable the provided data API will be.

    Don’t even start thinking about getting data from multiple providers to be merged together. Just trying to get the data index (in this case, tickers or other security identifiers) to align will make you crazy if this is not part of your full-time job).

    Lastly, once you have all these in place, there remains the question of data quality. I briefly tested adjusted OHLC EOD price data from a few retail investor-friendly data sources (i.e. annual subscription price of less than 4 digits).

    My rough conclusions are:

    • Yahoo Finance
      • Pretty good pricing data that agrees with direct data from exchanges.
      • But often has random outlier spikes (e.g. 100x price on a single day).
      • Possible to get some fundamental data as well, but the API is very unstable.
      • Free (but in grey area).
    • Interactive Brokers
      • Very good pricing data for recent dates.
      • But historical adjusted prices are systematically off. Perhaps due to a different adjustment calculation method.
      • Requires an IB account and a running instance of TWS to get data.
      • Very cheap subscription price. Single digit per month to get all US pricing data.
    • EOD Historical
      • Just started testing it out, as it was also used by M6 competition to calculate rankings.
      • No comment on data quality yet, as I have yet to test them.
      • Very user-friendly API and reasonable pricing.

    2. Forecasting prices is hard (really hard)

    As with doing any kind of predictive modelling, forecasting stock prices are hard. Any type of price is hard to forecast, because there is often no ground truth to a price and the relationships between factors affecting a price change often.

    A price is a reflection of not the intrinsic value of an item most of the time, but a reflection of how much someone is willing to pay for it at that moment in time.

    With this in mind, I have a feeling (just a hypothesis that I have not yet checked out) that an approach that tries to model prices (or other price proxies) as point estimates are very unlikely to work out. A probabilistic approach seems to be the best bet, but it makes the computation and interpretation of the results (e.g. how to trade on the estimates) more difficult. Plus this approach also falls slightly out of my knowledge domain.

    The difficulty of this problem can be seen by the huge fluctuation in the ranking on the leaderboard as well. A +/- 20 position move from month to month is not a rare occurrence. Although this may also be due to the (1) current ultra-volatile stock/economic environment and (2) changes in users’ forecasting methods across submissions.

    This is a rather long competition that lasts for one year. But I have the feeling that for a stock market forecasting competition, it may need to run for 2-5 years to filter out the methods that are winning just due to chance. Then only we can see who is truly swimming naked. (Disclaimer: I am not implying that my method can stand the test of time, because I don’t think it can.)

    3. Setting up trading strategy and portfolio allocation is also hard

    This is another huge topic by itself, and often is rather distinct from the stock price forecasting problem. As mentioned by Prof. Makridakis (the M6 competition organiser) in one of his LinkedIn posts, there is not a strong correlation between accurate forecasting and good investment return.

    As I have a rather basic understanding of how to build a profitable trading strategy and portfolio allocation, I am not able to comment much here. But I would say that in the absence of strong convictions, buying the market is not a bad idea in general.

    4. Clean code/solution structuring helps

    For most M6 participants, focusing on code/solution structuring is perhaps among the last thing they would do (or so I guess, please correct me if I am wrong). What I mean by code/solution structuring is ensuring that various parts of the codes (e.g. data processing, forecasting, portfolio optimisation) are written and structured according to software engineering best practices.

    For me, this is the part that I spent the most time on. I know that some of you will be laughing at me because you think I deserve to rank near the bottom due to this (again I agree with you). But I truly enjoyed the time that I took to (1) structure my codes to follow the Python package cookiecutter template, (2) incorporate CI/CD practices (e.g. using Git, pre-commit), and (3) write clean codes with proper linting and docstrings.

    As I work on the codebase on a part-time basis, having a clean code structure has enabled me to easily dive back into parts of the codebase. It reduces the time I need to figure out how my codes all link together, and hopefully can ensure that my codes are reusable if I do decide to repurpose them for something else.

    Ending

    That’s all for my midpoint review. I will continue to participate in M6 competition by making submissions, but I doubt I will have the time/energy to reverse the tide. Either way, I got a lot out of this competition already.

    If you have read through my lengthy post, hope you gained some useful insights (or at least had a fun read)!

  • Time Series Forecasting – sktime

    Time Series Forecasting – sktime

    Code template for running time series forecasting in sktime.

    Link to website : https://sktime.org/

    Link to repository : https://github.com/alan-turing-institute/sktime

  • Time Series Forecasting – pytorch-forecasting

    Time Series Forecasting – pytorch-forecasting

    Code template for running time series forecasting in pytorch-forecasting.

    Link to website : https://pytorch-forecasting.readthedocs.io/en/stable/

    Link to repository : https://github.com/jdb78/pytorch-forecasting

  • Time Series Forecasting – pmdarima

    Time Series Forecasting – pmdarima

    Code template for running time series forecasting in pmdarima.

    Link to website : http://alkaline-ml.com/pmdarima/

    Link to repository : https://github.com/alkaline-ml/pmdarima

  • Time Series Forecasting – kats

    Time Series Forecasting – kats

    Code template for running time series forecasting in kats.

    Link to website : https://facebookresearch.github.io/Kats/

    Link to repository : https://github.com/facebookresearch/Kats