Data Science Team & Tech Lead

Tag: Production Deployment

  • 3 Micro Learnings Over the Weekend

    3 Micro Learnings Over the Weekend

    3 micro learnings over this weekend :

    (1) cloudpickle works better than pickle in storing trained sklearn models

    Have you ever proudly saved a trained sklearn model to be used for serving elsewhere, only for it to complain of missing imports or classes when you try to load it?

    Other than making the imports or classes available in the model inference environment, I realise cloudpickle allows me to store the necessary model classes together with the trained model.

    cloudpickle.register_pickle_by_value to the rescue.

    (2) the purpose of using SQLAlchemy is to not write raw SQL codes

    I have been using SQLAlchemy with pandas to interact with various databases for years.

    However for some reasons that are unknown even to me, I never fully realise that SQLAlchemy is an ORM (object relational mapper) that helps abstract SQL operations into Python codes regardless of the underlying SQL dialect.

    And I had been defining SQL tables manually without relying on SQLAlchemy’s MetaData and Table constructs.

    (3) chatGPT is amazing in writing boilerplate code

    I have to write tests for our local Airflow dev instance.

    Instead of trying to dig through tutorials to find how to instantiate an Airflow DAG for testing purposes, I asked chatGPT to write them for me.

    Granted I need to do minor modifications to the tests written by chatGPT, but it saved me at least 30 mins in googling for the boilerplate codes required.

  • FIRE planner

    FIRE planner

    Built a new dashboard to help you plan financially for your potential FIRE (financial independence retire early).

    Link to dashboard : https://cheeyeelim.com/apps/fireplanner

    It takes a few inputs to help you visualize your (+ your partner’s) personal cash flows throughout your lifetime.

    Besides simple income and expense adjustments, it also simulates housing/mortgage and child-related expenses.

    Unfortunately, the dashboard is specific to Singapore-based residents for now, as it uses Singapore average values (e.g. costs to raise children) and incorporates only the Singapore retirement scheme (i.e. CPF).

    All parameters are based on point estimates for now (e.g. inflation, investment return), so complex scenario simulations are not supported.

    p/s : This dashboard took me longer than expected to build, not due to the complexity of the simulation, but the high number of user inputs supported.

  • Infrastructure and framework behind my personal websites

    Infrastructure and framework behind my personal websites

    I decided to set up my own website at the end of 2020.

    3 years later, I run 2 websites backed by multiple supporting services (see image below), all set up and operated by myself.

    My goals are (1) to set up a robust infrastructure that can ensure my websites/services are always up, and (2) to set up a development framework that minimises maintenance efforts.

    For the infrastructure, each service is dockerised with custom images and deployed on my favourite cloud service provider (DigitalOcean).

    Uptime monitor (UptimeRobot) and web analytics service (Google Analytics) have been set up to constantly check the status of the services.

    As for the development framework, I develop locally on VS Code with Windows Subsystem for Linux (WSL), with enforced linting and formatting via pre-commit hooks.

    Codes are pushed to repos on GitHub, while images are pushed to the container registry on Docker Hub.

    I paid special attention to code quality, especially on Python codes, to make maintenance easier. But overall code quality is not as high as I would like it to be, because I need to work with multiple languages (i.e. Python, Bash, Javascript, PHP, HTML/CSS, SQL) on this stack and I am less familiar with some of these languages.

    So far I am quite on track with my goals, with (1) these services achieving 99.5% SLA yearly over the past 3 years and (2) each service taking about 3-4 hours of maintenance time per year. Granted, I am not operating high-volume or complex websites, but still achieving these requires some discipline.

    I realise there are some parts that are still missing from this stack/setup, for example, full CI/CD integration, Kubernetes for service deployment, and MLOps services.

    But perhaps I should stop tinkering with the infrastructure, and start working on more content creations?

  • Makefile

    A small titbit to share today, the Makefile.

    A Makefile can be used to define sets of commonly used commands to save time and to ensure the commands run in the correct order with needed pre-requisites.

    For example, you can define a list of build-related commands under a target called “build”.

    build: 
        docker-compose build image-1 
        docker-compose push image-1 

    Then next time you can execute the build by calling “make build”, instead of manually typing out all the commands in sequence.

    Recently I have started to use it more often, as it really simplifies the development and deployment steps.

    (p/s: In case you are wondering about the GPG_TTY environment variable, that is needed for GPG to properly prompt for the password when docker is authenticating with its private container registry.)