Bold dream
Imagination is limitless. So is stupidity.

Sending null to /dev/null

June 12th 2009 in Programming

In a recent talk at QCon labeled Null References: The Billion Dollar Mistake1, Sir Charles Antony Richard Hoare himself – the inventor of Null (and QuickSort, and many other things that shaped our industry) states that Null was/is a bad idea.

What is Null?

Here is an explanation from wikipedia.

Null is a special pointer value (or other kind of object reference) used to signify that a pointer intentionally does not point to (or refer to) an object.

From an object oriented language point of view Null is a a value, whose type is a subclass of every other type in the system. It’s always at the bottom of the hierarchy. Because of this and the Liskov substitution principle Null can be used everywhere normally other type would be used. Have a method, that is marked to return string? It can return Null.

What is the problem with Null

The problem with it is that every method can return Null. And you, as a consumer of that method, need to check every time if the return value is Null. If you don’t – you (usually) get a Null pointer exception and your program crashes.

Now, exceptions are a very good tool – but particularly the Null pointer exception is a bad thing. Why? Because if it bubbles more than one level up it’s a sign of a leaky abstraction. Let me back up this claim:

First, what does a Null pointer exception mean? Let’s have a look at the documentation of it in Java:

Thrown when an application attempts to use null in a case where an object is required. These include:

  • Calling the instance method of a null object.
  • Accessing or modifying the field of a null object.
  • Taking the length of null as if it were an array.
  • Accessing or modifying the slots of null as if it were an array.
  • Throwing null as if it were a Throwable value.

Basically it says that an object is expected, but no such is provided. Well, this is an implementation detail. In your business logic I hardly believe that you will be talking about references and such. We usually talk about real world object like people, accounts, etc. Any method that throws a Null pointer exception unveils parts of its implementation.

How many times has happened to you to forget to fill a field in some form and get a Null pointer exception.
I forgot to write my email – what does that Null thing has to do with it?!?

A method should never throw Null pointer exceptions (I’ve seen people doing it) – an Illegal argument exception is a much better one. Most of the time the underlying platform is the one throwing Null pointer exceptions.

It’s just a convention, but Null pointer exceptions should be extinguished when sighted.

So…

How can we handle Null

Take 1

The most straightforward and wide spread solution is to tell everyone to check the values for Null references. But that is a blacklist approach and, for this kind of problem, it doesn’t work. People forget stuff – and sooner or later someone will forget to do it and your 24/7 program will crash at 3AM on Sunday. Unit testing can help a lot here, but it requires work and you still cannot be 100% sure.

Can we do better?

Take 2

I would like to point out that the following method makes more sense in a statically typed language, where you have a compiler, checking your types for validity.

Most of the time we return Null as a value from a method that we want to return nothing, but some return value is expected and we throw Null at it.

Let’s have a look at the following Java code:

public String tld(String domain) {
    int dot = domain.lastIndexOf('.');
    if (dot == -1 || dot == domain.length()) {
        return null;
    } else {
        return domain.substring(dot + 1);
    }
}

The method above returns the top level domain part, or null if none found.
The “right” user code of this method is like this:

String d = tld("example.org");
if (d != null) {
    // Save to database
} else {
    // Show the user a message
}

But, as I pointed out earlier – there will be time that you will forget to make the check and the user will see a nasty exception, leaving him clueless about what she did wrong.

What we can do instead is signify our intentions that we might not return a value.
In Scala, we have the Option class – heavily inspired by Haskell‘s Maybe monad.
The Option class has two (final) subclasses2: Some and None. You return Some when you have something to return and None when you don’t.
Let’s look at the tld method implemented in Scala using the Option class:

def tld(domain: String): Option[String] = {
  val dot = domain lastIndexOf '.'
  if (dot == -1 || dot == domain.length) None
  else Some(domain.substring(dot + 1))
}

And the usage if it:

val com: String = tld("example.com").getOrElse("unknown")

That line above will return either the tld part the domain or "unknown" if none is found. Concise, isn’t it?
If we omit the getOrElse part the compiler with complain the we are not giving it a String value, but None instead.

// Does not compile!
val com: String = tld("example.com")

That way we are able to catch the error early (say 17:00 o’clock on Friday which is much better than 3:00 in the morning on Sunday).

What we’ve done? We told the type system that we might return a value, but not necessary. Specifying the return type of tld to be Option[String] instead of just String does just that.

So… Can we do that in Java?

Let’s give it a try.
We’ll have a JOption interface with JSome and JNone implementing classes:

public interface JOption<T> {
    public boolean isEmpty();
    public T getOrElse(T other);
}

The JNone class will mark the lack of value.

public class JNone<T> implements JOption<T> {
    public JNone() {}
 
    public boolean isEmpty() {
        return true;
    }
 
    public T getOrElse(T other) {
        return other;
    }
}

and the JSome class – the presence of it.

public class JSome<T> implements JOption<T> {
    private T value;
 
    public JSome(T value) {
        this.value = value;
    }
 
    public boolean isEmpty() {
        return false;
    }
 
    public T getOrElse(T other) {
        return value;
    }
}

And now our revised tld method, using the new JOption machinery:

public JOption<String> tld(String domain) {
    int dot = domain.lastIndexOf('.');
    if (dot == -1 || dot == domain.length()) {
        return new JNone<String>();
    } else {
        return new JSome<String>(domain.substring(dot + 1));
    }
}

And using it becomes:

String com = tld("example").getOrElse("unknown");

The tld method isn’t longer than our original Null-returning version, but now the compiler will help us, because if we write:

// Does not even compile
String com = tld("example");

we’ll get a compiler error.

Final thoughts

The proposed method probably isn’t the best and can be improved3, but it illustrates the concept.

Also, while this solution might help, the Null value still exists and people canwill (ab)use it.
Scala has a Null value as well – for compatibility with Java.

On the bright side – Haskell, Erlang and other functional languages don’t have anything like Null in them and use some form of the described approach.

But on the dark side we have Javascript – there you have undefined, null and NaN that adds even more codding horror. F# is in similar state.

Please, share your thoughts bellow.
__________________
1 On 25 August 2009 InfoQ published the video.
2 To be precise, None is an object (a singleton in Scala).
3 We can also add other methods to JOption like get() that will throw an exception in the case of JNone and return the value for JSome, as well as a proper equals implementation.


18 comments to...
“Sending null to /dev/null”
Avatar
S2

Thank you, nice post.


Avatar
Max Bolingbroke

You can do better than this in C# because it has value types, which are guaranteed to be non-null. This can be used to add option types to the language without the danger of accidentally returning null instead of Nothing.

Unfortunately, this approach has two major problems: first, none of the existing libraries use this pattern, so you can only check a subset of the calls. Secondly, normal references can still be null, so you can get errors when building a Just value.

The right solution is to either add non-nullable types to the language (like Spec# has) or to just give up and use a well-thought-out language like Haskell instead (which is what I try and do these days :-)


Avatar
miles

getOrElse() sounds threatening


Avatar
Matías Giovannini

A couple of thoughts: since you’re modelling JOption on ML’s option, I’d rather have JNull be a static singleton (there can be only one), and JSome keep its value in a final field.

See http://alaska-kamtchatka.blogspot.com/2007/10/algebraic-data-types-and-java-generics.html for some design decisions I’d rather take.


Avatar
Emil Ivanov

Matías: Those are very good proposals, and in fact, this is exactly how it’s done in Scala, but in Java this will require a lot of boilerplate (especially the singleton).


Avatar
Stefan Sonnenberg

Oh my god.
This _is_ the reason why I went over to functional programming.
Either you have an empty list (nil) or a list of values, at least a list with one value.
You will never run into these types of problems, because a map or reduce or whatever on an empty list results in an empty list.

That in general is not a problem of nil, null, None or anything else: it is a problem of your mindset.

In python you could write:

tld = lambda x: x.split(‘.’)[-1]

It will return a list with tld if a ‘.’ is present, otherwise a empty list, which is ok.

If _you_ accept, that an empty list is ok.

If you insist to get _something_ you can do that:

tld = lambda x: ([a for a in x.split('.') if a][-1] + ['unknown'])[-1]

But that is not needed. (list comprehension is only needed, because if you give split a separator, and it is not found, it would return a list with an empty string, so bool(['']) returns True, but I’d prefer bool([]) -> False)

Example:

tld = lambda x: [a for a in x.split('.') if a][-1]
com = tld(‘pythonmeister.com’)
if com:
print com[0] # com is a list with one entry !
else:
print ‘unknown’ # list is empty

and you are sane.

The “getOrElse” thing really sucks.

You are citing an example in scala.
A pramming language with functional possibilities.
But you use as if it was Pascal.

Get a functional mindset, and such problems are gone.


Avatar
Vadim

Take a look at functionaljava which has an Option monad as long as some other functional goodies. Although, for the most part, in Java they’re waaay to verbose to use comfortably.


Avatar
maht

Great, so now you have to check for your own made up “null” which has the value “unknown”.

Instead you can forget that and have it pollute the database.


Avatar
player2

You should take a look at Objective-C, where messaging nil is perfectly acceptable. The rules for what you get back are well-defined: messages that return void obviously return void, messages that return objects return nil, and messages that return numbers return zero.


Avatar
Bender

In new projects, I would rather catch the null(able) values by convention.

Methods that may return a null value are prefixed with “try”.

Variables (locals and class) are named with a pending ‘_’ to indicate that they may be null. All other methods / variables never use null (library code needs to be interfaced appropriately).

I call this the “flatline” convention and it works quite very well in my projects.

http://www.replicator.org/node/94


Avatar
hxa7241

In Scala, this doesn’t entirely solve the problem: where you don’t use Option, you still need to check for null.

Fortunately Scala offers a supporting feature: mix in NotNull to a class, and then it is guaranteed by the type system not to have any null instances (like OCaml and Haskell).


Avatar
Emil Ivanov

@maht: This is just an example and it’s a perfectly valid case if you are going to show that string to a user. Try seeing the bigger picture…

@hxa7241: I didn’t know about the NotNull trait. Thank you for pointing it out.


Avatar
Joel

Wrong, a reduce on empty list will result in a error. try it -Joel


Avatar
Joel

The above comment was @ Stefan. Also its not above having a functional or an imperative mindset, its about making sure that you catch errors early at compile time. Even Functional Languages like Haskel have explicit typing that helps the compiler catch errors early. Having a notion for not null can be helpful, more so throwing a more meaningful exception.

A question comes to mind though,
What about methon chaining in languages like python or ruby, say you run a.b().c(), and it is ok for b() to return null in other contexts. But when you run that, you get an error saying method b() on nill not found.

The getorElse trick would look ugly, a.b().getorElse(“something”).c().

Ruby has an ‘andand’ module, but that looks clumsy too, a.b.andand.c . no way

any ideas? -Joel


Avatar
Joel

one last comment-
as an example of ruby on rails,
rails builds methods automatically based on database reflection (column name ‘age’ becomes an age method in say a Person class).
Now database values commonly are ‘nil’ or none,
and say I want to run:
p = Person.new
p.age.is_teenager?
now if age is nil, we get a null pointer exception (more precisely is_teenager method not found in null object)

Is there anyway to avoid this in a dynamically typed language? because the above looks much better than say, p.age.orElse(“undefined”)

Probably if ruby was statically typed, and the function made it explicit that it was returning nil, then the compiler could help. am i correct? but is there a more elegant way (than checking for null everywhere) to handle such problems in dynamic languages at compile time?


Avatar
Emil Ivanov

In a dynamically typed language where you don’t have a compilation phase (at least not the classical one) you’re out of luck. Write good tests that will catch those kinds of problems. If you want to be chaining methods, they should not be returning null – throwing an exception is still a reasonable case. Otherwise, with the Option thing – I cant’ come up with something better. (I actually think that this a.b.getOrElse(“…”).c is not that bad.)


Avatar
Joel

Hi Emil, actually jit compilation or byte code compilation happens for most dynamic languages for obvious reasons, the type inference part is done at run time right? also in dynamically typed languages like erlang and lisp, the compilation phase is pretty useful in catching errors early, like I write a lot of common lisp and I’ve found sbcl’s compiler warnings to very useful, I usually make sure that my module compiles without any warnings (though I recently ran into a problem where reduce didn’t like the nil argument I was giving it and I had to put in a check for nil, though the program did compile without any warnings)

I totally agree with you that writing good test cases is a solution. probably this also shows that the amount of code and time saved writing code in a dynamic language means more code and time spent writing unit tests.


Avatar
Joel

am sorry, jit compilation happens just in time. ;-)




required



required - won't be displayed


Your Comment:

Today is a great day for John. He has just got a job as a programmer in MegaCorp138. He is 22 years old and a whole new world opens up in front of him. He will meet a lot of new people, get good salary, and most importantly, program in The Language. John studied programming [...]

Previous Entry

A few weeks ago I got myself a HTC Magic – an Android based mobile device that, among many things, has a phone in it. But not the phone is the part that I’m going to talk about – it’s the video player.

According to the android spec only h.263, h.264 and MPEG-4 SP are supported. [...]

Next Entry

Blogroll