Look Before You Leap Year

Leap Years are Stupid

Actually, leap years are absurd, and consequently extremely stupid. From the day some long-dead forbear of astronomers¹ and astrologers² peered up at both the Sun and Moon in the same sky, put two and two together, and realized that they were in the same place as the last time it was this bloody cold³, we’ve been stuck with them, whether we realized it or not. In a sad quest to pretend that an ellipse⁴ is, nevertheless, a circle, we doomed ourselves to forever having to maybe offset our mental clocks when looking sufficiently far forward or back in time, lest history and reality drift dangerously out of sync.

In the modern West the leap year is usually attributed to Julius Caesar, who so successfully promoted the concept that it only took the Romans fifty years after his death⁵ to reliably incorporate it into what history now tries to forget as the Julian calendar. Unfortunately Caesar and his scholars – being pretty poor programmers – had made a wee bit of a rounding error, and so in 1582 Pope Gregory XIII was forced to “correct” the situation⁶ by shifting the average year from 365.25 to 365.2425 days. And so was born the Gregorian calendar we all know and use today⁷. This approximation was good enough for the Church, and so history has been positively littered with February 29th every so often ever since.

I feel contempt for the leap year. So why am I writing about it?

It’s an Ideal Programming Exercise

It turns out that determining if a particular Gregorian calendar year⁸ is, or is not, a leap year is a problem with some unusual properties that make it an excellent test of a programmer’s mastery of some of the most fundamental tools they have available to them. That, in turn, means that it’s not only a very good challenge for the budding programmer, but it could – and I’d argue should – also be used as an impressively diagnostic question to ask a candidate in a technical interview.

In every programming language that I’ve yet encountered there exists a short, simple, and performance-optimal solution for this problem. Indeed in most cases the optimum implementation is a single line long⁹, but most who attempt it will end up with something quite far from elegant on the first try. And, unlike most problems with such an ideal answer, if solved inelegantly it’s also almost certainly solved inefficiently.

The beauty of the exercise is that the pathway to the ideal solution can easily be discovered just by employing the sort of straightforward reasoning that is the bread and butter of professional programming. It encourages you to think through your solution by first understanding what you’re trying to solve.

Why does all this matter?

Well maybe you’ve just joined Exercism. Or perhaps you’re just looking for a good code kata. Or you’ve been assigned to a team working with calendars. Maybe you’re simply a masochist. Or worse, perhaps your interviewer is a sadist.

One way or another, let’s assume you’ve just been assigned the task of implementing a function is_leap that takes an integer argument year and returns a Boolean result.

Luckily There’s an Algorithm for That

Unusually for a programming problem the basic algorithm will most likely be presented to you up front, and you need merely implement it. The algorithm is very old, and has been stated in more or less precise ways over the centuries, so it helps to transcribe it into pseudocode:

IF the year is a multiple of 4 THEN
    IF the year is a multiple of 100 THEN
        IF the year is a multiple of 400 THEN
            RETURN True
        RETURN False
    RETURN True
RETURN False

Notice the chevron-like > shape that results from the multiple layers of nesting? It turns out that’s fairly typical of a first-pass mental model of the algorithm, at least in a language where indentation is meaningful. It’s not the most elegant form – we’ll get to that – but it conveys some important properties about our algorithm:

It’s simple. You have one argument, you perform (up to) three tests on it using a basic math operation, and it’s the same operation for all three tests. The most complicated thing you need to know to solve it is how your language performs the modulo operation. Once you’ve got that, comparison with zero, simple conditions, and how to return a Boolean-equivalent value in your language of choice you’re golden. After five minutes of well-focused instruction a novice to any high-level language should be able to bash out a general solution that mimics the above pseudocode.

It’s progressive. After each successive test you either know the answer or you know what to ask next. Most algorithms – indeed most non-trivial problems – aren’t nearly so cut and dried. There is no need for complicated branching, or your language’s version of else_if, you need merely proceed forward until the moment you are certain you know the answer.
It’s robust. There are no bugs, no weird or exceptional edge cases; every possible (integer) input maps directly to a single, unwavering output. Additionally there are no side effects, no state to mutate… you don’t even need to carry forward the results of each modulo test. It’s as pure as a function can get, and so constrained that you can know from glancing at it if it will always return the correct answer.

Know Thine Enemy and Thine Self

Understanding the algorithm is great, but understanding the problem it’s addressing is better, and the insights gained above tell us nothing about the domain we’re working with. We’ve treated the various modulo tests as black boxes, but really they’re specialized filters that apply to specific, progressively smaller, subsets of the overall domain of Gregorian years.

So what can we glean from looking a little more closely at those tests?

The answer is usually no: From the very first test we know that 3 out of every 4 years won’t be a leap year. By looking at the rest we can see that an additional 3 out of every 4 centuries won’t be either… returning False by default will correctly answer >75% of all possible questions without any additional effort.
They rapidly cull the herd: Just 1 in every 4 years is even a potential leap year, just 1 year in every 100 is a century, and just 1 year in 400 is a century that’s also a leap year… we learn a lot fast about our year.
They have a natural order: Each test is a special case of the previous one, and from the above we know that each applies to a progressively smaller number of cases. Sure, if we test division by 400 first we solve the problem very quickly in an extremely small number of cases, but if it doesn’t give us the answer it also doesn’t give us any useful knowledge about the question being asked. Proceed in the natural order, however, and each step refines the last.

Taking the first of these to heart we can rewrite our pseudocode so that we need only ever consider the paths to a True answer:

IF the year is a multiple of 4 THEN
    IF the year is NOT a multiple of 100 THEN
        RETURN True
    IF the year is a multiple of 400 THEN
        RETURN True
RETURN False

But this is interesting. Glancing at the above we can now see that there are exactly two criteria that result in a year being a leap year, and that those criteria are exclusive. So for any given year the answer is True if either of the following is True:

the year is a multiple of 4, but not of 100
the year is a multiple of 4, 100, and 400

The Path of the Righteous

From either of the above the path to the ideal form can pretty neatly unroll in front of you. Especially if we introduce some abstraction and call our three tests a, b, and c.

For instance, if we progressively introduce Boolean logic operations to our pseudocode:

IF a THEN
    IF NOT b OR c THEN
        RETURN True
RETURN False

… becomes…

IF a AND (IF NOT b OR c) THEN
    RETURN True
RETURN False

… which, in a language that allows you to return an expressions becomes:

RETURN a AND (NOT b OR c)

Oooh, that’s pretty concise.

Or We Could Go The Other Way

And if we’d started from the criteria we derived alone?

IF a AND NOT b THEN
    RETURN True
IF a AND b AND c THEN
    RETURN True
RETURN False

… can be combined with another Boolean logic operator as:

IF (a AND NOT b) OR (a AND b AND c) THEN
    RETURN True
RETURN False

… which allows us to extract the common operation:

IF a AND ((NOT b) OR (b AND c)) THEN
    RETURN True
RETURN False

… but there’s no reason to perform the same Boolean test twice, so:

IF a AND (NOT b OR c) THEN
    RETURN True
RETURN False

… and since we’re returning expressions brings us right back to:

RETURN a AND (NOT b OR c)

All Roads Lead Here

And here lies the ultimate, optimal form. In a language that has short-circuiting Boolean operations this is also the most performant form, because neither side of the or will be evaluated for the vast majority of cases. Thankfully most languages either do short-circuit Boolean operations, or they tend to provide an alternate short-circuiting operator like and_also and or_else, so if you’re not sure yours does you’ll need to do a bit of research.

Speaking of research, you might be tempted to remove those parentheses, which certainly does look nicer. But therein lies the final lesson is_leap can teach you; your ability to elide those parentheses depends entirely on the evaluation strategy and operator precedence rules for your language. If your language evaluates left to right and binds or at the same or lower precedence as and, then no, you cannot remove those significant parentheses. At least not without paying the price of checking division by 400 for a year that doesn’t even divide by 4.

Which, like leap years, would be stupid.

Those who look up at the night sky and find it full of stuff. ↩︎
Those who look up at the night sky and find it full of nonsense. ↩︎
They’ve never actually occupied the same absolute positions twice, of course, and never will, but neither angular momentum or orbital mechanics were yet available to our prototypical yutz. ↩︎
Actually something closer to an ellipse being frantically sketched over and over on an imaginary two dimensional plane being pulled away from the artist in four dimensional spacetime. ↩︎
Caesar’s assassination, which famously happened on March 15th, 44 BCE, rather threw off the adoption of leap years. Things got a bit wonky until 8 CE and, since no one in this period could be relied on to write anything down consistently, we know exactly when he died, just not when in our calendar. ↩︎
Gregory XIII may have been liturgically infallible, but the fact that we’ve since had to introduce leap seconds tends to imply he wasn’t mathematically infallible as well. ↩︎
Unless you use the Hindu, Islamic, or Buddhist calendars, are Thai or Chinese, subscribe to the Juche ideology of North Korea, or measure everything in Unix time. ↩︎
Specifically, for the truly pedantic, the proleptic Gregorian calendar with astronomical year numbering and a year zero. Which is fine for the purposes of an exercise. To accurately reflect leap years actually observed by humans you’d need at minimum a quite convoluted lookup table before the 9th century CE, and probably a time machine. ↩︎
Your mileage may vary; in Perl it may even be less, but no one will know because of the braces. If you’re writing in 8086 Assembly Language then, honestly, why are you even reading this? ↩︎