Cloud architecture

2025-04-01

Cool Maze uses cloud servers to transfer data from the mobile source to the target computer. The service needs cloud components with these capabilities:

a web server for the static assets (HTML, JS, CSS) of the website coolmaze.io,
a web backend to handle the logic of API calls from the websites and from the mobile apps,
transient storage for payloads of a few megabytes in transit,
a “server push” component able to initiate a message from the server to the target browser,
a database, for analytics purposes.

Here is our stack. Click the components to show details.

Some of the logic is handled by the clients. For example, the mobile apps are responsible for encrypting user data, while the web app frontend decrypts it. The server does not handle user data encryption at all. Our end-to-end encryption system ensures that the server manages only opaque encrypted resources and never has access to the cleartext user data or to the secret encryption keys.

The role of the cloud servers is to enable the mobile app to upload resources, allow the web app to download resources, and store anonymous usage data for analytics.

Note that all of the Google Cloud products mentioned here have a generous free tier, helping reduce the operational costs to the bare minimum. A frugal design and several optimizations also contribute to making the service fast and cost-efficient.

We’re hosting the website static assets on Firebase Hosting.

We’re hosting the dynamic backend server on Cloud Run. The server is written in Go. Cloud Run is a fantastic option for our stateless HTTP server, as it is a managed platform that automatically scales the number of instances needed to serve the traffic at any given time.

Stateless?

If the server has no state, then where is the user data stored during transit? In this context, stateless means that the server instances are not the “source of truth” for the data. This property is very important to enable autoscaling. Two incoming requests may be served by two distinct server instances. When data is involved, the instances need to rely on another component, like a database. Sharing a picture with Cool Maze typically incurs 7 API calls to the servers. We use Firestore as a shared short-term memory where these requests can write and read data.

Transient memory

For maximal performance, we also write the data to an in-memory cache local to each instance. When we’re reading data that happens to exist in the local memory, the operation is extremely fast. When the data is not in the local cache, then we read it in the Firestore database shared by all instances. This strategy works well for small resources up to 1MB.

For larger resources, such as a high-resolution photo, we use Cloud Storage (GCS) instead. In this case, the backend generates “signed URLs” to let the mobile app write the resource to Cloud Storage.

We set an expiration policy on Firestore documents (doc) and on GCS objects (doc). Automated expiration is perfect for ciphered data shared via Cool Maze, as a transfer takes only a few seconds, after which the encrypted payload will not be used anymore and can be discarded.

Scalability

We achieve high performance even when the system is under heavy load, because each of the cloud components has a scalable and distributed infrastructure: Firebase Hosting, Cloud Run, Firestore, and Cloud Storage.

All the “Share” actions made by all the Cool Maze users are independent from each other, thus all the clients operate only on their own data, without any contention.

Concurrency

In Cloud Run we leverage three levels of concurrency.

First, a user request may trigger several concurrent operations. For example, when sharing 5 photos, 5 pairs of secure URLs are generated concurrently, using goroutines.

Second, a server instance can handle many incoming requests concurrently. We don’t need to wait for the previous request to be finished. As each request is being served fast, and requests are processed concurrently, a single instance is usually enough to sustain all of the load.

Third, Cloud Run automatically starts new instances when CPU utilization reaches a treshold, or when too many requests are queueing. This horizontal scaling works smoothly, as the stateless instances never need to communicate directly with each other. As we wrote the server in Go, we enjoy a fast instance startup, and powerful concurrency within each instance.

Server push

When the mobile app sends a message to the cloud backend, it makes a traditional HTTPS request. But then the cloud backend needs to notify the web browser currently displaying the QR code. How can a server effectively initiate a message to a client? We solved the “Server push” problem by (ab)using the Firestore realtime updates capabilities. Firestore can notify a client currently “listening” to the changes of a document. This database feature was designed to let apps always display the fresh, up-to-date value of their data. We built a message bus on top of it, by having the webpage listen to a specific “Message” document. The backend writes to this document, and the webpage is promptly notified and receives the message contents.

Analytics

The backend asynchronously writes anonymous usage information to a BigQuery dataset: size of the data shared, duration of ciphering and upload, duration of download and deciphering, approximate location (country and city), etc. This dataset is the source of fantastic analytics insight, that we will present in details in upcoming articles.