I am currently a Cloud Technical Specialist at Enfinity Solutions Limited, with my current role focused on:
- Architecting and delivering cloud-native solutions to our customers on AWS and Alibaba Cloud, and
- Daily operations and maintenance of our on-premises GitLab and Kubernetes infrastructure
In my spare time, I write blog posts on DevOps practices and tools on my personal website: donaldsebleung.com
Performed a successful migration of all existing internal and customer-owned Terraform IaC projects to OpenTofu following the officially supported procedure outlined in the OpenTofu documentation, for a total of 8 projects
Eliminated the potential legal risks of managing internal and customer-owned infrastructure with proprietary tooling, ensuring the sustainability of affected IaC projects in the long run
Implemented a centralized identity management solution backed by Active Directory for on-premises Rancher RKE2 cluster, leveraging Dex as the OIDC connector
Consolidated authN + authZ under a single umbrella across on-premises IT infrastructure with Active Directory as the sole IdP and single source of truth, ensuring the scalability of access management for on-premises resources in the long term
Performed a successful on-premises DR drill for GitLab EE 16.x and supporting Rancher RKE2 infrastructure with Kasten K10 for cross-cluster application level backup and disaster recovery, with:
- Ansible for automated provisioning of new Rancher RKE2 cluster in a 3-server HA control plane setup with CIS-1.23 hardening
- A confirmed RPO of 48 hours based on daily Kasten K10 backup
- A confirmed RTO of 24 hours based on the actual DR drill duration for complete service resumption of GitLab EE 16.x (and other supporting services)
Confirmed that the DR handbook procedure is fully functional and actionable, ensuring business continuity in the event of a complete infrastructure failure
Drafted a comprehensive DR handbook for on-premises GitLab EE 16.x installation and supporting Rancher RKE2 infrastructure leveraging Kasten K10 for cross-cluster application-level backup and recovery, followed up by an initial DR test drill on AWS leveraging an automation stack for efficiency, reproducibility and standardization, developed in-house with:
- OpenTofu for provisioning AWS resources mirroring the on-premises infrastructure
- Ansible for provisioning a fresh Rancher RKE2 cluster in a 3-server HA control plane setup with CIS-1.23 hardening on provisioned infrastructure
- GitLab CI with manual approval step for E2E validation of the complete OpenTofu + Ansible automation stack
Confirmed the feasibility of performing the procedures outlined in the DR handbook and uncovered fundamental limitations of a lift-and-shift recovery of on-premises Rancher RKE2 infrastructure to AWS, laying the groundwork for performing a complete DR drill going forward and ensuring business continuity in the event of a complete infrastructure failure
Deployed GitLab EE 16.x to on-premises Rancher RKE2 cluster to accelerate software delivery and enable DevOps, DevSecOps and GitOps workflows, with:
- Rancher RKE2 deployed in HA mode (3 server nodes) for fault tolerance
- Rancher RKE2 deployed with CIS-1.23 profile enabled for security hardening
- Prometheus and Grafana for monitoring, observability and email alerting
- Flux v2 for cluster-wide GitOps management with Microsoft Teams notifications and alerting
Enabled rapid prototyping, drastically reduced time to production and standardized software delivery processes with a security-first approach
Implemented dashboards exporting instance and host-level metrics from 30+ cloud VMs across AWS and Azure for a major airline company, with:
- Amazon CloudWatch for monitoring and observability
- Amazon SNS for email alerting on storage-related events
- AWS Lambda with a container-based deployment model for recurring critical alarms
- GitLab CI for an automated DevSecOps workflow involving multiple pipelines:
- Terraform pipeline with GitLab-managed remote backend and manual apply step for semi-automated provisioning and management of AWS infrastructure (CloudWatch, SNS)
- Container-oriented pipeline with unit tests, SAST, image build, Trivy scan and push to Amazon ECR for automated deployment and quality assurance of Lambda-based microservices
Real-time visibility into AWS and Azure infrastructure, improving SLA and reducing incident response times
Specialist in IT consulting at China Resources Enterprise Limited (2021/07-2023/05; 1 year 10 months)
Assisted in meeting room setup for videoconferencing, Windows desktop and laptop software installation, Windows and Outlook troubleshooting, printer troubleshooting at Wan Chai HQ office
Oversaw the technical execution of the HQ firewall replacement from FortiGate to H3C for a BU in the catering industry, jointly devised and executed an appropriate action plan with a Chinese network service provider during off-hours to minimize business disruption and ensure business continuity
Improved the network security of the affected BU by retiring an EOL firewall product and contributed to the enforcement of Chinese SOE compliance requirements
Oversaw the replacement of a leased line affecting the HQ network topology for a BU in the catering industry, devised and executed an appropriate action plan modifying the HQ FortiGate firewall configuration during off-hours to minimize business disruption and ensure business continuity
Optimized the network topology for the affected BU, reducing network latency and error rates
Implemented the sorting network for the ARTIQ control system used in physics experiments, leveraging:
- nMigen for its Python DSL
- Yosys / SymbiYosys for formal verification
Ported the Minerva RISC-V (RV32M) soft core from Verilog to nMigen, leveraging:
- Python for its powerful abstractions and language features
- Automated SMT solvers (Yosys, SymbiYosys) for verifying the functional correctness of the FPGA core
Assisted in teaching duties for 2 classes of students aged 8-10 on frontend development with HTML5+CSS3+JS
BEng. Computer Science and Engineering (4Y), The Hong Kong University of Science and Technology, Class of 2021, First Class Honors, GGA: 3.742
- Blogging on DevOps practices and tools on donaldsebleung.com
- PR contributor to CNCF Sandbox and Linux Foundation projects
- Former community moderator on deepin Forums (2023 Q2-Q3)
- PR contributor to deepin GNU/Linux projects
Please kindly refer to my profile and CV for details.