Data Science Team & Tech Lead

Blog

  • FIRE planner

    FIRE planner

    Built a new dashboard to help you plan financially for your potential FIRE (financial independence retire early).

    Link to dashboard : https://cheeyeelim.com/apps/fireplanner

    It takes a few inputs to help you visualize your (+ your partner’s) personal cash flows throughout your lifetime.

    Besides simple income and expense adjustments, it also simulates housing/mortgage and child-related expenses.

    Unfortunately, the dashboard is specific to Singapore-based residents for now, as it uses Singapore average values (e.g. costs to raise children) and incorporates only the Singapore retirement scheme (i.e. CPF).

    All parameters are based on point estimates for now (e.g. inflation, investment return), so complex scenario simulations are not supported.

    p/s : This dashboard took me longer than expected to build, not due to the complexity of the simulation, but the high number of user inputs supported.

  • Infrastructure and framework behind my personal websites

    Infrastructure and framework behind my personal websites

    I decided to set up my own website at the end of 2020.

    3 years later, I run 2 websites backed by multiple supporting services (see image below), all set up and operated by myself.

    My goals are (1) to set up a robust infrastructure that can ensure my websites/services are always up, and (2) to set up a development framework that minimises maintenance efforts.

    For the infrastructure, each service is dockerised with custom images and deployed on my favourite cloud service provider (DigitalOcean).

    Uptime monitor (UptimeRobot) and web analytics service (Google Analytics) have been set up to constantly check the status of the services.

    As for the development framework, I develop locally on VS Code with Windows Subsystem for Linux (WSL), with enforced linting and formatting via pre-commit hooks.

    Codes are pushed to repos on GitHub, while images are pushed to the container registry on Docker Hub.

    I paid special attention to code quality, especially on Python codes, to make maintenance easier. But overall code quality is not as high as I would like it to be, because I need to work with multiple languages (i.e. Python, Bash, Javascript, PHP, HTML/CSS, SQL) on this stack and I am less familiar with some of these languages.

    So far I am quite on track with my goals, with (1) these services achieving 99.5% SLA yearly over the past 3 years and (2) each service taking about 3-4 hours of maintenance time per year. Granted, I am not operating high-volume or complex websites, but still achieving these requires some discipline.

    I realise there are some parts that are still missing from this stack/setup, for example, full CI/CD integration, Kubernetes for service deployment, and MLOps services.

    But perhaps I should stop tinkering with the infrastructure, and start working on more content creations?

  • Excel + Office Scripts

    Excel + Office Scripts

    If you have worked with Excel automation in the past, you may have painful memories working with complex Excel functions or VBA codes.

    Recently I realized newer versions of Excel support Office Scripts, which changes my impression of Excel automation completely.

    Some highlights of Excel + Office Scripts:

    ✔️ Can be run on the cloud, with easy integration into web-based Power Automate

    ✔️ Office Scripts, as a variant of TypeScript/JavaScript, belongs to a mainstream programming language with wider adoptions and better documentations

    ✔️ Excel comes with an embedded code editor (a lightweight variant of VS Code that supports syntax highlight, linting etc.) that lets you edit codes directly in it

    ✔️ Can connect to the scripts inside Excel files using VS Code

    Some improvement areas of Excel + Office Scripts:

    ❌ No support of external JavaScript libraries at the moment (even common routines need to be coded from scratch)

    ❌ Not easy to version control Office Scripts codes, as it sits inside an Excel file

    ❌ Require commercial or educational licenses to use (not available for personal license)

    Definitely good to know that this option exists in those situations where you need to work with Excel-only automations.

  • Makefile

    A small titbit to share today, the Makefile.

    A Makefile can be used to define sets of commonly used commands to save time and to ensure the commands run in the correct order with needed pre-requisites.

    For example, you can define a list of build-related commands under a target called “build”.

    build: 
        docker-compose build image-1 
        docker-compose push image-1 

    Then next time you can execute the build by calling “make build”, instead of manually typing out all the commands in sequence.

    Recently I have started to use it more often, as it really simplifies the development and deployment steps.

    (p/s: In case you are wondering about the GPG_TTY environment variable, that is needed for GPG to properly prompt for the password when docker is authenticating with its private container registry.)

  • Rocket League reinforcement learning-trained bot

    Rocket League reinforcement learning-trained bot

    As a passionate gamer, I have been reading about the Rocket League Nexto cheat situation with keen interest (https://kotaku.com/rocket-league-machine-learning-cheating-nexto-bot-1849980593).

    For those unfamiliar with games, Rocket League is a competitive online game where players control cars to play football.

    Someone has built a bot trained via reinforcement learning, and offered it as a cheating solution to help people win in competitive ranked games against other unsuspecting players.

    Unsurprisingly, the bot is extremely good at the game, and helped the cheaters win against many human players.

    To be fair, most online competitive games have plenty of “hackers” or people who use cheating solutions to win, so there is nothing new about this.

    However most of these cheating solutions are simple hacks (e.g. wallhack, auto-aim) to the existing games, so that the cheating player can gain certain advantages.

    Using a full-blown reinforcement learning-trained bot to cheat in online games is rare, which makes this news interesting.

    One of the reasons this is made possible for Rocket League is because someone has built a third-party Python package, RLGym, that solves the challenges of (1) creating a representative gym and (2) interacting with the game (read game state & relay user input).

    While I do not condone cheating with reinforcement training-trained bots, I have a small dream that one day game developers can setup dedicated channels/servers, where only user-designed bots can join and fight each other out to claim the throne of best AI.

    Such a geeky thing I know, but imagine how fun would that be!

    (p/s: The consultant in me said that game developers will never set this up, as this feature costs a lot to implement & maintain, but is unlikely to bring in additional revenue streams.)

  • Timeline of LLM

    Timeline of LLM

    “Classic quant signals might work, but you can’t explain them; ChatGPT might not work, but it can explain itself.

    In a sense this is the opposite of a classic “black box” machine-learning investment algorithm.”

    This is an interesting take by Matt Levine, on the ability of ChatGPT to be an asset manager (https://www.bloomberg.com/opinion/articles/2023-01-26/chatgpt-is-not-much-of-a-pitch-robot).

    One of the reasons ChatGPT is so popular is because of its ability to elaborate topics in a structured way confidently.

    Maybe this highlights an important skill set to be acquired by aspiring/practising data scientists?

    Anyway if you are lost in the recent deluge of large language model (LLM) models, I recommend this nicely curated timeline by Dr Alan Thompson, https://lifearchitect.ai/timeline/.

  • Soft skill – +1 mindset

    Soft skill – +1 mindset

    Recently I reflected on my professional career on things that I have done differently earlier on in my career versus now.

    One thing that stood out is, that I practice the +1 mindset now.

    The +1 mindset considers not just my views, but also the others that I interact with.

    Early on in my career, I tend to question many decisions made by senior management, because I thought some of those decisions do not make sense.

    Disgruntled, I felt. Why can they miss something so obvious, I thought.

    ”Why you wouldn’t use my ML model when it is shown to perform better than your Excel model?”

    ”Why can’t we do things the proper way, that may cost a lot but is more scalable over the long term?”

    ”Why do I have to do this support task? We should hire someone else to do it.”

    As I progressed in my career and see the same issues popping up everywhere I go, I realized it may be due to my way of thinking and perhaps a lack of understanding of certain issues. I started to put myself in the shoes of my stakeholders, i.e. bosses, clients and cross-functional teams.

    Then things start to get clearer.

    ”My ML model is performing well on these technical metrics, but they do not translate into actual metrics cared about by the business.”

    ”Business values need to be proven first to justify high costs, especially when we do not know if the tool will really be useful.”

    ”End-to-end ownership of a tool including post-deployment support gives a better understanding of the tool I developed. As the tool scales, maybe I can justify getting a support person to handle it.”

  • Things they didn’t teach you in software engineering

    Whenever you feel disillusioned about the mismatch in promises between what you were taught in university/bootcamp versus what you actually worked on in a job.

    I recommend reading this article, https://vadimkravcenko.com/shorts/things-they-didnt-teach-you/.

    It is written for software engineering, but many points mentioned apply to data science/analytics as well.

  • Data science solutions – Build vs buy

    Data science solutions – Build vs buy

    Many data scientists are working in companies with less advanced technology infrastructure.

    But these companies still wish to solve their business problems with the help of data science / analytics.

    At this point usually a question arises, “Should we build or buy a solution to solve this problem with data science?”.

    As nicely pointed out by this whitepaper from Anaconda, there are a few factors that we should consider before making the decision:
    ✔️Cost-effectiveness / return on investment
    ✔️Needs for customisation
    ✔️Time to value
    ✔️Vendor dependence
    ✔️Support required Most data scientists, myself included, have a strong urge to build our own solution to solve a problem.

    But is this always the right approach?

  • Being a data scientist in consulting

    A very accurate and vivid description of being a data scientist in a consulting context.

    I can definitely say that I have experienced this myself, and it does take a while to get used to presenting comfortably (therefore confidently).

    But like it or not, the holy grail in data science has always been about connecting and bridging the gap between business and technical.

    An excerpt from the article (https://vicki.substack.com/p/selling-data-science) below:

    ——————————————————————————

    Me? Present? To executives? The people that wear ties and Bluetooth earpieces and say things like, “Sharon, hold my 2 o’clock, I have a meeting at 1:30 and I’m coming in hot,” unironically? He might as well have asked me to present to the UN Security Council.

    “I’ll help you tweak it a bit,” my manager said, encouragingly. “Ok,” I said. “First, take out all the slides where you talk about how you did the analysis. Put those in the back. Move the charts forward. Delete these numbers. Make the headlines bigger.”