Metric and Probability-Based Recommendations

How many of you like to eat, play, and most importantly, drink? (Hopefully, everyone!)

I helped create an app called LocALL. It gives 50,000 recommendations for short trips to eat, play, and drink and support local businesses in Austin. We can extend our app to cover ~400 metropolitan cities in the US and create 20 million recommendations. (Big data and machine learning!)

At the heart of our solution is math. We used spherical geometry and probability to create simple, fast recommendations.

1. Problem statement

Given a set of places to eat, places to play, and places to drink, select 1 of each so that the travel distance is small.

maps_austin
Some of the places to eat bbq, go for a hike, and drink local beer in Austin.

We can make several such recommendations. Hence, come up with a way to choose better recommendations more often.

2. Spherical geometry

Imagine that the three places form the vertices of a triangle. Then, the perimeter would tell us how close the places are to each other. We can also compare different triangles (i.e. recommendations). The question is, how should we measure the distance between two vertices to calculate the perimeter?

We can measure the perimeter of a triangle
We can measure the perimeter of a triangle in different ways.

A metric is a function that satisfies certain properties and lets us describe the distance between two items in a set. For example, we can use the Euclidean distance,

d_{E}(P_{1},\,P_{2}) := \sqrt{\bigl(x_{2} - x_{1}\bigr)^{2} + \bigl(y_{2} - y_{1}\bigr)^{2} + \bigl(z_{2} - z_{1}\bigr)^{2}},

to describe the distance between points P_{1} = (x_{1},\,y_{1},\,z_{1}) and P_{2} = (x_{2},\,y_{2},\,z_{2}).

The problem is, the Euclidean distance underestimates the actual distance between two points due to obstacles, such as buildings, water, street turns, and traffic. It also doesn’t match our reality well, since the triangle formed by the Euclidean distance is flat, while the surface of our Earth is curved.

Luckily, we have a branch in geometry dedicated to spheres. The spherical distance, like the Euclidean distance, finds the distance of the shortest path between two points along a sphere. Intuitively, we see that,

d_{S}(P_{1},\,P_{2}) > d_{E}(P_{1},\,P_{2}),\,\,\,\forall\,P_{1} \neq P_{2},

i.e. the spherical distance predicts the actual distance better.

The haversine formula,

\displaystyle d_{S}(P_{1},\,P_{2}) := 2r \,\cdot\, \arcsin\Biggl(\,\sqrt{\sin^{2}\Bigl(\frac{\varphi_{2} - \varphi_{1}}{2}\Bigr) + \cos(\varphi_{1})\cos(\varphi_{2})\sin^{2}\Bigl(\frac{\lambda_{2} - \lambda_{1}}{2}\Bigr)}\,\Biggr)

calculates the spherical distance given the latitude \varphi and longitude \lambda of each point, and the radius r of the Earth. This formula is perfect for our application, as Google APIs use latitudes and longitudes.

When two points are close relative to r (e.g. they are in the same city), we can use the Taylor approximation \arcsin(x) \approx x to simplify the formula:

\boxed{\displaystyle d_{S}(P_{1},\,P_{2}) \approx 2r\,\sqrt{\sin^{2}\Bigl(\frac{\varphi_{2} - \varphi_{1}}{2}\Bigr) + \cos(\varphi_{1})\cos(\varphi_{2})\sin^{2}\Bigl(\frac{\lambda_{2} - \lambda_{1}}{2}\Bigr)}}.

Finally, we use the spherical perimeter to measure the closeness of three places:

\boxed{m(P_{1},\,P_{2},\,P_{3}) := d_{S}(P_{1},\,P_{2}) + d_{S}(P_{2},\,P_{3}) + d_{S}(P_{3},\,P_{1})}.

We will consider the spherical perimeter to be a metric (of three items), because it is a sum of metrics.

3. Probability

We want to display recommendations with a low metric more often, but still give those with a high metric a fighting chance.

Consider a monotonically decreasing function,

\boxed{\displaystyle l(m) = \frac{1}{(\ln(1 + m))^{2}}}.

The function l models the likelihood of an event given its metric. The lower the metric, the higher the likelihood.

A monotonically decreasing function
We can model the likelihood of an event with a monotonically decreasing function.

Given a set of N recommendations, we can now select ones with a lower metric more often. Set the probability of choosing the i-th recommendation to be,

\boxed{\displaystyle p_{i} = \frac{l(m_{i})}{l(m_{1}) + \cdots + l(m_{N})}}.

4. Conclusion

Using spherical geometry and probability, we created simple, fast recommendations for places to eat, play, and drink that are close to each other. Our method is easy to understand and implement—we just need latitudes and longitudes!—but it ignores physical obstacles and opening hours. We will include these constraints in the future.

5. Notes

You can find the code in its entirety here:

Download from GitHub

Advertisements

Leave a reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s