WebProNews

How Grab Scales to Huge Demand While Staying Stable

Grab, a Singapore based Uber type service, is so popular that if their IT infrastructure didn’t remain stable during peak usage transportation would literally ground to a halt. Grab is the leading ride sharing service in Singapore, Malaysia, Indonesia, Thailand, Vietnam and the Philippines.

In the US, the Grab app still works via a partnership with Lyft.

“Grab is the leading ride sharing service in Southeast Asia,” says Ditesh Kumar, Director of Engineering at Grab. “We do 1.5 million bookings (a day). If we are not running basically transportation comes to a standstill.”

Kumar says that Grab has the biggest land fleet in Southeast Asia and that they are very concerned about uptime both for their passengers and their drivers, who he says depend on them for their livelihoods.

“With this comes two challenges, because of the tremendous amount of demand we need to scale, but because so many people depend on us, we need to stay stable.”

Scale and Stability are Two Opposing Forces

Kumar notes the difficulties of keeping your IT infrastructure stable while simultaneously scaling platform usage. “You can scale easily if you don’t have to be stable, and you can be very stable if you don’t have to scale.”

He says that the answer to this huge problem, after a lot of reflection, is addressing their infrastructure fundamentals.

“If we can make sure that our infrastructure is built not just for the needs of today, but also meets the needs of the future, it completely changes the types of conversations we have,” says Kumar.

That’s why Grab decided on the AWS infrastructure from the start. All of the AWS components he says are built to allow infinite scaling and are also extremely reliable.

Grab started out using a basic AWS setup but as they expanded they added the full gamut of AWS services in order to support their scaling needs. “We needed to start thinking about caching layers so we started using Amazon ElastiCache,” said Kumar.

They also started using Amazon Redshift, which is s petabyte-scale data warehouse service based in the cloud.

“It’s huge, it’s massive,” exclaimed Kumar. “Everybody in the company uses it. It’s not just the engineers, the product guys, the marketing team and the ATM team all use it.”

Real-Time Computation Requires Real-Time Data Streams

“In addition to that, we are doing real-time computation, and in order to do real-time demand and supply matching we need to have real-time data streams,” says Kumar.

“The end result is that our drivers will be told the demand is at this place right now, because with this high demand the drivers will be paid more.”

Building Predictive Models

Moving forward the company wants to build predictive models to make their service even more efficient for both passengers and drivers.

“In two hours time this area will have high demand and if you want to take advantage of that move to this area,” explains Kumar. “The way we can do that is by taking into account multiple factors, building data models around it and using the infrastructure to compute those models and come up with an actionable item.”

Why Grab is on AWS

Their are many benefits to being on the cloud,” says Kumar. “Such as not having to deal with physical issues, going down to a data center at 3am to the change a failed hard disk or deal with a server that is overheating because a fan has stopped rotating.”

He noted how companies of his size in the past had to have dedicated operational teams to deal with these sorts of issues. He sees little value in that for the organization and is not a great use for an engineering team. Grab gives every engineer an AWS account to run “full-blown experiments in their sub account” to look for potential problems.

“They would find things that might be a problem 3 months from now or six months from now, and giving engineers that ability is unparalleled,” he says. “I estimate that we have saved 30-40% of our resourcing and manpower that then went to serving our core focus, and our core focus is about serving our customers.”

He says this allowed our team to move significantly faster. “In the startup environment that is make-or-break!”