Counting is Hard and the (Pre Alpha) Solution

Posted on by

Jason Collins, our CTO, has been burdened with a problem: counting. And not just the kind Big Bird is constantly chirping about. Developers are often tasked with coding pages containing many data points, which can be a lot of information to get to a user quickly. Jason wants to find a way to get accurate data quickly.

In his example, Jason uses VendAsta’s Social Marketing software. The dashboard houses various data points, including the number of reviews a business receives. So we simply have to ask the database, right? When we input the simple command to store a review — review.put( ) — that information is stored in a datastore.

data store 1

The above, of course, is a simplified version. In the world of Google app Engine, that data will be stored in several different datastores, in various geographic locations:

review put various citites

The statement attempts to house the review in all locations, but only waits until it hears from a majority of the datastores in at least two geographic centers. This algorithm is called paxos.

review A

While all the data is not in one datastore, the information will be “eventually consistent.” Datastores are copying all this data trying to bring the system into consistency, so eventually all reviews will be in all data sets. In the case below, reviews A,B and C will ultimately be in each datastore. While it is frequently updating, it is often out of sync as new information is constantly added.

eventual consistency

The (pre-alpha) Solution

Accurately counting the information by scanning all the datastores is time-consuming and expensive. What Jason has done is found a way to capture a “snapshot” of the data using BigQuery. After the initial snapshot, which has information from all the datastores, then the system will track deltas — how much to add or subtract each hour.

data snapshotOccasionally, it will be necessary to apply a new Snapshot, as the information becomes out of sync. As logic follows, the more frequent the snapshots, the more likely the data is to be accurate.

snapshot 2

Click on the video to see VendAsta CTO Jason Collins presenting his theory to our office — the real thing is even better than the write up.

Counting is Hard

Nykea Marie Behiel

Nykea is the Director of Content at Vendasta, where she heads up our content marketing team and inbound marketing initiatives.