This post covers the PL/pgSQL function volatility property and it's effect on query planning. Although some of you might already know of it, we've recently met it as a performance problem while we were working on a new feature in Top Eleven. And we've met it because we didn't really pay attention to how we defined functions, but rather just always copy the function definition, and just change the SQL code inside of it.

The change I made introduced some replacement IDs for certain clubs, a function that checks whether it is a replacement ID, and a function that translates the ID into the real club ID. Replacement IDs were inside of a certain range (had the same prefix).

The check was just a basic bitwise AND operation, on a predefined mask. For the sake of the example, we'll call the function is_fake_id(id).

Translation was done by reading a static table which mapped translated ids into replacement ids. Example name: get_real_id(id).

The usage was to get data for a club, based on club ID in match, which could be a fake ID. The code was:

IF ("public"."is_fake_id"(home_club_id)) THEN
    SELECT INTO home_club * FROM "club"
    WHERE club.id = "public"."get_real_id"(home_club_id);
ELSE
    SELECT INTO home_club * FROM "club" where club.id = home_club_id;
END IF;

It seemed nice and easy, just check the club's ID and get it. It was also the only change in the already existing function (mind you, it was also executed for away club). Then we were doing regression on a test server, and found out that the fetching of matches was taking a lot of time. We've tried to find the problem, checked logs, checked code, and the lowest level for debugging was checking this SQL function. And, the problem lied there. For regular matches, not containing fake ids, function executes at 10-30ms per match, but for matches containing a fake id, it took over a second. So, we analysed the query plan for the select with get_real_id function call, and it yielded a sequential scan (for the people unfamiliar to SQL, it goes through the entire table and checks each row instead of using an index structure). We haven't had a real idea why, until we realized that VOLATILE keyword in function definition actually has a purpose and it is a directive to the query planner whether it can be called only once per query.

So, other than the VOLATILE option, which causes the query to execute it for each row, there are two more options:

  • STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values. In other words, it's a perfect match for our is_fake_id function.
  • IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values. It also says that it doesn't do database look ups. But, we figured that we can ignore that one, since our table was stable, so we used it for our get_real_id function.

By doing that, we made the function's performance good again.