Progressive delivery on a high traffic digital platform

Introduction

Our Digital platform is used every month in our different countries by about 35 million visitors. We are proud of it and want to keep growing these numbers by improving the user experience. Business identified that a key improvement axis was on our Checkout. The idea was to improve Checkout user experience to ease the purchase and therefore increase the conversion rate. Based on these business requirements the technical team decided to introduce a new technical stack. Here is how we proceed to introduce a progressive delivery methodology for the new Checkout on our digital platform.

The business Target

As a client centric company, we listen to our clients, the market standards and we compare to the competition. These three channels lead our Product Owners (PO) to request a deep evolution of our Checkout to make it faster for the Clients: more interactive with live updates of prices and fees based on user selections for payment and delivery for example.

One page checkout is a quite standard approach for many e-commerce websites with pros and cons as usual. We believe that in our case it should bring user benefits and increase conversion.

We then decide to go from a stepper like this :

Stepper checkout

To a One page Checkout:

SPA Checkout

The technical solution

Usually when the PO thinks of a one page checkout the IT people think of a “Single-Page Application” (SPA), with technical solutions like Angular, Vue or React. We choose React for the team’s knowledge and experience on this library. The change from a Form based application to a SPA is huge, it is not just only the language. It is switching from a document with validation to an App with components and events. This goes beyond the front, to a different approach on Back End interactions.

The fact is that our existing checkout was based on ASP .NET and JQuery hosted on VMs. For the new one we decided to change the dev technology but also the hosting one since we moved to .Net Core for the Services, React for the App and Kubernetes for the hosting. For more details we invite you to visit our previous post/article on this tech stack. In this one we will focus on the way to manage such a change on a live platform.

Our checkout transition architecture

The architecture schema of the two checkout versions and the routing logic

Building the new platform is not the tricky part and it is always nice as a dev team to start from the ground up. Things start to become more complex when you want to define your go live strategy. We decided to go live in several steps that are linked to our Backend and Backoffice evolution and our ability to operate and monitor the new hosting pattern. This means that we decided to have both checkout technical stacks live (the existing one and the new one) and manage load balancing of the users on each of them to compare the performance.

The Path to progressive go Live

Once you decide that you won’t go for a big bang deploy to replace the existing systems by the new ones, you have to define the roll out strategy, and the technical means to achieve it.
We decided to start with mono articles baskets on small articles then extend to all articles (which impacts the delivery methods and introduces some specificity on delivery fees). Then manage multiple article orders, which introduces another layer of complexity when all the products of the order are not delivered in the same place.
To proceed in this progressive rollout, we setup a custom eligibility rules engine triggered when user starts to validate his Basket. It allows to select eligible baskets based on feature flags associated to each use case and APIs and then apply a filtering rule to select which amount of traffic we are routing to the new checkout. We choose this custom solution VS a market feature flag management system because we didn’t want to introduce an additional new component in the technical stack. Beyond that we wanted to be able to manage precisely our feature flag eligibility rules based on basket contents.

Progressive feature activation

We activate on production progressively feature by feature from a first limited scope (10% of features activated to the full functional scope 100% of the features)

This allowed us to progressively increase the scope of orders that we redirected to the new checkout in a usual A/B testing flavor. We have defined several conditions that allows to make a granular selection like:

the number of products in the baskets
the current device of the user
the product category
the product vendor
Is a whitelisted user

In addition to that we introduced a computation based on a part of the users_Id to select the number of users that we put in the sample for the A/B Test.
To ensure consistency of user experience when a user have been routed once on the new checkout, and as long as his future baskets are compatible with the eligibility rules he will be redirected on the new checkout.
Associated with our ability to deliver on a daily basis, this allowed a real agility in adapting the content of the eligible baskets, fix the issues or bring the improvements that this live phase allowed us to identify.
This brings a very good control level of the production environment, allowing to activate a feature for 3 or 4 hours without having to do a production delivery, collect logs and analytics data to analyze them. Developers can push to prod features deactivated once they are tested and activate them as soon as associated Back office services are up and running without having to synchronize the deliveries.

What we learnt

Business results

After several weeks of activation we have business metrics showing a significant conversion rate increase of up to 5 points on the desktop users and on the mobile and App ones. This is a validation of the choice of the single page pattern for our checkout. We are confident that we can further improve this with improvements of the page in the future.

Be granular

With the right granularity to define eligibility rules, we have managed very precisely what we opened and what we removed from our tests. This was very helpful to manage the different targets (Web VS Mobile Apps) and the scope, small articles, Bulky, Marketplace, …

Be progressive

The business feels more confident in activating new features as we deliver them with this process since they know it will be easy to activate / deactivate the feature. This has allowed to manage the few issues we had without having to stop all the test but just the impacted part. And then reintroduce the feature when we fixed the issues. For the testing teams it is also interesting since they can focus their test cases on the new feature we introduce progressively rather than having to discover and test a full product scope.

Be analytic

This is a good way to start collecting data step by step on such a big change. It allowed to validate with the business that the conversion rate is at the right target to make business more confident before deciding to switch the whole traffic to the new version. This also means that you are ready from the beginning with the right tracking level to have your business KPIs. You can read here how we setup our technical integration of Tagging for React applications . This is key for single page checkout where it is considered that tracking is more complex to setup.
In addition to the classical user event tracking, we used a session record tool to allow user actions replay and discover some improvement in term of UX/UI.

Be observable

Beyond the business standpoint, from the technical one it also allows to check and test the new logs and trace system we put in place based on OpenTelemetry and adapt our monitoring and dashboards with the Ops and support teams. We also used this progressive roll out to ensure that our sizing and management of Kubernetes were fine and reliable.
Changing such a critical component of an online retail website is both exciting and a bit frightening. With this progressive rollout we kept the excitement and converted the freaky part in serenity by letting the business decide which part of the scope is active at each stage of the deployment.

Author

Christophe Bredeche

Digital enthusiast since last century. Lover of dives and waves.
View all posts