Json Jerk: a flexible JSON parser

9 12 2011

I just pushed Json Jerk to Github. Json Jerk is a flexible and fast JSON parser written in Java. It consists of several composable parts for tokenizing, (un)escaping, parsing, and handling of semantic actions. Furthermore it provides a light weight and type safe object model for JSON documents.

Details and examples are in the readme and in the test cases.

Update:I renamed the parser from Flex Json to Json Jerk due to a name clash with an existing project.





Union types

12 06 2011

In his recent blog post Miles Sabin came up with an ingenious way of expressing union types in Scala. A union type is the union of some types: its values are the union of the values of each of the individual types.

In a nutshell he first defines the negation of types as

type ¬[A] = A => Nothing

and then the union of two types via De Morgan’s law

type ∨[T, U] = ¬[¬[T] with ¬[U]]

With the following auxiliary constructs

type ¬¬[A] = ¬[¬[A]]
type |∨|[T, U] = { type λ[X] = ¬¬[X] <:< (T ∨ U) }

union types can be used in a very intuitive way

def size[T: (Int |∨| String)#λ](t: T) = t match {
    case i: Int => i
    case s: String => s.length
}

scala> size(3)
res0: Int = 3

scala> size("three")
res1: Int = 5

scala> size(4.2)
:13: error: Cannot prove that ((Double) => Nothing) => Nothing >: Nothing with (java.lang.String) => Nothing) => Nothing.
       size(4.2)
           ^

… and beyond

With type negation and disjunction from above, it becomes possible to express all types whose set of values can be expressed by a term in propositional calculus. But can we do better? That is, is it possible to express types which don’t have a corresponding term in propositional calculus?

Generalizing the type constructor ∨[T, U] to some arbitrary acceptor

type Acceptor[T, U] = { type λ[X] = ... }

it becomes apparent, that all types for which there is a corresponding type level acceptor function are expressible. Since type level calculations in Scala are Turing complete, it should be possible to find an acceptor for any recursive function. This means that – in theory at least – Scala’s type system is powerful enough to express any type whose set of values is recursive.





Regular expression matching in <100 lines of code

6 12 2010

The recent discussion about the Yacc is dead paper on Lambda the Ultimate sparked my interest in regular expression derivatives. The original idea goes back to the paper Derivatives of Regular Expressions published in 1964 (!) by Janusz A. Brzozowski. For a more modern treatment of the topic see Regular-expression derivatives reexamined.

The derivative of a set of strings with respect to a character, is the set of strings which results from removing the first character from all the strings in the set which start with that character. Let for example S = \{foo, bar, baz\} then \partial_b S = \{ar, az\} . It turns out that regular languages are closed under derivatives. That is, any derivative of a regular language is again a regular language. Furthermore, it is possible to extend the notion of derivatives to regular expression such that given a regular expression r which generates the language \mathcal{L}(r) and a character c , one can derive a regular expression \partial_c r such that \mathcal{L}(\partial_c r) = \partial_c(\mathcal{L}(r)).

This is a key ingredient for a very elegant regular expression matching algorithm: to match a string against a regular expression repeatedly calculate the derivative of the regular expression for each characters in the string. When no character is left, check whether the last derivative accepts the empty string. If so we have a match and otherwise not.

The exact algorithm for finding whether a regular expression is nullable (i.e. accepts the empty string) is given in Regular-expression derivatives reexamined as is the algorithm for calculating derivatives of regular expressions. Below is a direct implementation of that algorithm in Scala (with a slight modification to allow for strings instead of individual characters).

trait RegExp {
  def nullable: Boolean
  def derive(c: Char): RegExp
}

case object Empty extends RegExp {
  def nullable = false
  def derive(c: Char) = Empty
}

case object Eps extends RegExp {
  def nullable = true
  def derive(c: Char) = Empty
}

case class Str(s: String) extends RegExp {
  def nullable = s.isEmpty
  def derive(c: Char) =
    if (s.isEmpty || s.head != c) Empty
    else Str(s.tail)
}

case class Cat(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable && s.nullable
  def derive(c: Char) =
    if (r.nullable) Or(Cat(r.derive(c), s), s.derive(c)) 
    else Cat(r.derive(c), s)
}

case class Star(r: RegExp) extends RegExp {
  def nullable = true
  def derive(c: Char) = Cat(r.derive(c), this)
}

case class Or(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable || s.nullable
  def derive(c: Char) = Or(r.derive(c), s.derive(c))
}

case class And(r: RegExp, s: RegExp) extends RegExp {
  def nullable = r.nullable && s.nullable
  def derive(c: Char) = And(r.derive(c), s.derive(c))
}

case class Not(r: RegExp) extends RegExp {
  def nullable = !r.nullable
  def derive(c: Char) = Not(r.derive(c))
}

Having these constructors we need a way to match strings against regular expressions.

object Matcher {
  def matches(r: RegExp, s: String): Boolean = {
    if (s.isEmpty) r.nullable
    else matches(r.derive(s.head), s.tail)
  }
}

Here are some pimps to make usage of the regular expression constructors more convenient.

object Pimps {
  implicit def string2RegExp(s: String) = Str(s)

  implicit def regExpOps(r: RegExp) = new {
    def | (s: RegExp) = Or(r, s)
    def & (s: RegExp) = And(r, s)
    def % = Star(r)
    def %(n: Int) = rep(r, n)
    def ? = Or(Eps, r)
    def ! = Not(r)
    def ++ (s: RegExp) = Cat(r, s)
    def ~ (s: String) = Matcher.matches(r, s)
  }

  implicit def stringOps(s: String) = new {
    def | (r: RegExp) = Or(s, r)
    def | (r: String) = Or(s, r)
    def & (r: RegExp) = And(s, r)
    def & (r: String) = And(s, r)
    def % = Star(s)
    def % (n: Int) = rep(Str(s), n)
    def ? = Or(Eps, s)
    def ! = Not(s)
    def ++ (r: RegExp) = Cat(s, r)
    def ++ (r: String) = Cat(s, r)
    def ~ (t: String) = Matcher.matches(s, t)
  }

  def rep(r: RegExp, n: Int): RegExp =
    if (n <= 0) Star(r)
    else Cat(r, rep(r, n - 1))
}

And finally here is how to use it:

object Test {
  import Pimps._

  val digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
  val int = ("+" | "-").? ++ digit.%(1)
  val real = ("+" | "-").? ++ digit.%(1) ++ ("." ++ digit.%(1)).? ++ (("e" | "E") ++ ("+" | "-").? ++ digit.%(1)).?

  def main(args: Array[String]) {
    val ints = List("0", "-4534", "+049", "99")
    val reals = List("0.9", "-12.8", "+91.0", "9e12", "+9.21E-12", "-512E+01")
    val errs = List("", "-", "+", "+-1", "-+2", "2-")

    ints.foreach(s => assert(int ~ s))
    reals.foreach(s => assert(!(int ~ s)))
    errs.foreach(s => assert(!(int ~ s)))

    ints.foreach(s => assert(real ~ s))
    reals.foreach(s => assert(real ~ s))
    errs.foreach(s => assert(!(real ~ s)))
}

Now that’s 48 + 6 + 32 = 86 lines of code for a regular expression matching library!





Generic array factory in Java: receipt for disaster

4 11 2010

Let’s implement a generic factory method for arrays in Java like this:

static <T> T[] createArray(T... t) {
    return t;
}

We can use this method to create any array. For example an array of strings:

String[] strings = createArray("some", "thing");

Now let’s add another twist:

static <T> T[] crash(T t) {
    return createArray(t);
}

String[] outch = crash("crash", "me"); 

Running this code will result in a ClassCastException on the last line:

Exception in thread "main" java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String;

At first this seems strange. There is no cast anywhere here. So what is going on? Basically the Java compiler is lying at us: calling the crash method with string arguments, it tells us that we get back an array of strings. Now looking at the exception we see that this is not true. What we really get back is an array of objects!

Actually the Java compiler issues a warning on the createArray call in the crash method:

Type safety : A generic array of T is created for a varargs parameter

This is how it tells us about its lying: “Since I don’t know the actual type of T, I’ll just return an array of Object instead.” I thinks this is wrong. And others seem to think along the same lines.





So Scala is too complex?

24 08 2010

There is currently lots of talk about Scala being to complex. Instead of more arguing I implemented the same bit of functionality in Scala and in Java and let everyone decide for themselves.

There is some nice example code in the manual to the The Scala 2.8 Collections API which partitions a list of persons into two lists of minors and majors. Below are the fleshed out implementations in Scala and Java.

First Scala:

object ScalaMain {
  case class Person(name: String, age: Int)
    
  val persons = List(
    Person("Boris", 40),
    Person("Betty", 32),
    Person("Bambi", 17))

  val (minors, majors) = persons.partition(_.age <= 18) 
   
  def main(args: Array[String]) = {
    println (minors.mkString(", "))
    println (majors.mkString(", "))
  }
}

And now Java:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

class Person {
    private final String name;
    private final int age;

    public Person(String name, int age) {
        super();
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    @Override
    public boolean equals(Object other) {
        if (this == other) {
            return true;
        }
        else if (other instanceof Person) {
            Person p = (Person) other;
            return name == null ? p.name == null : name.equals(p.name)
                    && age == p.age;

        }
        else {
            return false;
        }
    }

    @Override
    public int hashCode() {
        int h = name == null ? 0 : name.hashCode();
        return 39*h + age;
    }

    @Override
    public String toString() {
        return new StringBuilder("Person(")
            .append(name).append(",")
            .append(age).append(")").toString();
    }
}

public class JavaMain {

    private final static List<Person> persons = Arrays.asList(
        new Person("Boris", 40),
        new Person("Betty", 32),
        new Person("Bamby", 17));

    private static List<Person> minors = new ArrayList<Person>();
    private static List<Person> majors = new ArrayList<Person>();

    public static void main(String[] args) {
        partition(persons, minors, majors);
        System.out.println(mkString(minors, ","));
        System.out.println(mkString(majors, ","));
    }

    private static void partition(List<? extends Person> persons,
            List<? super Person> minors, List<? super Person> majors) {

        for (Person p : persons) {
            if (p.getAge() <= 18) minors.add(p);
            else majors.add(p);
        }
    }

    private static <T> String mkString(List<T> list, String separator) {
        StringBuilder s = new StringBuilder();
        Iterator<T> it = list.iterator();
        if (it.hasNext()) {
            s.append(it.next());
        }
        while (it.hasNext()) {
            s.append(separator).append(it.next());
        }
        return s.toString();
    }

}

Impressive huh? And the Java version is not even entirely correct since its equals() method might not cope correctly with super classes of Person.





Type Level Programming: Equality

18 06 2010

Apocalisp has a great series on Type Level Programming with Scala. At some point the question came up whether it is possible to determine equality of types at run time by having the compiler generate types representing true and false respectively. Here is what I came up with.

trait True { type t = True }
trait False { type t = False }

case class Equality[A] {
  def check(x: A)(implicit t: True) = t
  def check[B](x: B)(implicit f: False) = f
}
object Equality {
  def witness[T] = null.asInstanceOf[T]
  implicit val t: True = null
  implicit val f: False = null
}

// Usage:
import Equality._
    
val test1 = Equality[List[Boolean]] check witness[List[Boolean]]
implicitly[test1.t =:= True]
// Does not compile since tt is True
// implicitly[test1.t =:= False]  

val test2 = Equality[Nothing] check witness[AnyRef]
// Does not compile since ft is False
// implicitly[test2.t =:= True]  
implicitly[test2.t =:= False]

Admittedly this is very hacky. For the time being I don’t see how to further clean this up. Anyone?





Working around type erasure ambiguities (Scala)

14 06 2010

In my previous post I showed a workaround for the type erasure ambiguity problem in Java. The solution uses vararg parameters for disambiguation. As Paul Phillips points out in his comment, this solution doesn’t directly port over to Scala. Java uses Array to pass varargs, Scala uses Seq. Unlike Array, Seq is not reified so Seq[String] and Seq[Int] again erase to the same type putting us back to square one.

However, there is another way to add disambiguation parameters to the methods: implicits! Here is how:

implicit val x: Int = 0
def foo(a: List[Int])(implicit ignore: Int) { }
  
implicit val y = ""
def foo(a: List[String])(implicit ignore: String) { }

foo(1::2::Nil)
foo("a"::"b"::Nil)




Working around type erasure ambiguities

30 05 2010

In an earlier post I already showed how to work around ambiguous method overloads resulting from type erasure. In a nut shell the following code wont compile since both overloaded methods foo erase to the same type.

Scala:

def foo(ints: List[Int]) {}
def foo(strings: List[String]) {}

Java:

void foo(List<Integer> ints) {}
void foo(List<String> strings) {}

It turns out that there is a simple though somewhat hacky way to work around this limitations: in order to make the ambiguity go away, we need to change the signature of foo in such a way that 1) the erasure of the foo methods are different and 2) the call site is not affected.

Here is a solution for Java:

void foo(List<Integer> ints, Integer... ignore) {}
void foo(List<String> strings, String... ignore) {}

We can now call foo passing either a list of ints or a list of strings without ambiguity:

foo(new ArrayList<Integer>());
foo(new ArrayList<String>());

This doesn’t directly port over to Scala (why?). However, there is a similar hack for Scala. I leave this as a puzzle for a couple of days before I post my solution.





[ANN] Talking at Scala Days 2010 in Lausanne next Thursday

11 04 2010

I’ll be talking at Scala Days 2010 in Lausanne on April 15th about the Scala scripting engine for Apache Sling. While my talk at Jazoon 09 was mainly about using Scala from Sling, this session will be more focused on internals of the Scala scripting engine.

Unfortunately (or fortunately depending on the point of view) the conference is sold out already. Watch my Scala for scripting page for the session slides and other upcoming support material.





Scala type level encoding of the SKI calculus

29 01 2010

In one of my posts on type level meta programming in Scala the question of Turing completeness came up already. The question is whether Scala’s type system can be used to force the Scala compiler to carry out any calculation which a Turing machine is capable of. Various of my older posts show how Scala’s type system can be used to encode addition and multiplication on natural numbers and how to encode conditions and bounded loops.

Motivated by the blog post More Scala Typehackery which shows how to encode a version of the Lambda calculus which is limited to abstraction over a single variable in Scala’s type system I set out to further explore the topic.

The SKI combinator calculus


Looking for a calculus which is relatively small, easily encoded in Scala’s type system and known to be Turing complete I came across the SKI combinator calculus. The SKI combinators are defined as follows:

Ix \rightarrow x ,
Kxy \rightarrow x ,
Sxyz \rightarrow xz(yz) .

They can be used to encode arbitrary calculations. For example reversal of arguments. Let R \equiv S(K(SI))K . Then

R x y \equiv
S(K(SI))K x y \rightarrow
K(SI)x(Kx)y \rightarrow
SI(Kx)y \rightarrow
Iy(Kxy) \rightarrow
Iyx \rightarrow yx .

Self application is used to find fixed points. Let \beta \equiv S(K\alpha)(SII) for some combinator \alpha . Then \beta\beta \rightarrow \alpha(\beta \beta) . That is, \beta\beta is a fixed point of \alpha . This can be used to achieve recursion. Let R be the reversal combinator from above. Further define

A_0 x \equiv c for some combinator c and
A_n x \equiv x A_{n-1} .

That is, combinator A_n is the combinator obtained by applying its argument to the combinator A_{n-1}. (There is a bit of cheating here: I should actually show that such combinators exist. However since the SKI calculus is Turing complete, I take this for granted.) Now let \alpha be R in \beta from above (That is we have \beta \equiv S(KR)(SII) now). Then

\beta\beta A_0 \rightarrow c

and by induction

\beta\beta A_n \rightarrow \beta\beta A_{n-1} \rightarrow c.

Type level SKI in Scala


Encoding the SKI combinator calculus in Scala’s type system seems not too difficult at first. It turns out however that some care has to be taken regarding the order of evaluation. To guarantee that for all terms which have a normal form, that normal form is actually found, a lazy evaluation order has to be employed.

Here is a Scala type level encoding of the SKI calculus:

trait Term {
  type ap[x <: Term] <: Term
  type eval <: Term
}
  
// The S combinator
trait S extends Term {
  type ap[x <: Term] = S1[x] 
  type eval = S
}
trait S1[x <: Term] extends Term {
  type ap[y <: Term] = S2[x, y]
  type eval = S1[x]
}
trait S2[x <: Term, y <: Term] extends Term {
  type ap[z <: Term] = S3[x, y, z]
  type eval = S2[x, y]
}
trait S3[x <: Term, y <: Term, z <: Term] extends Term {
  type ap[v <: Term] = eval#ap[v]
  type eval = x#ap[z]#ap[y#ap[z]]#eval
}

// The K combinator
trait K extends Term {
  type ap[x <: Term] = K1[x]
  type eval = K
}
trait K1[x <: Term] extends Term {
  type ap[y <: Term] = K2[x, y]
  type eval = K1[x]
}
trait K2[x <: Term, y <: Term] extends Term {
  type ap[z <: Term] = eval#ap[z]
  type eval = x#eval
}
  
// The I combinator
trait I extends Term {
  type ap[x <: Term] = I1[x]
  type eval = I
}
trait I1[x <: Term] extends Term {
  type ap[y <: Term] = eval#ap[y]
  type eval = x#eval
}

Further lets define some constants to act upon. These are used to test whether the calculus actually works.

trait c extends Term {
  type ap[x <: Term] = c
  type eval = c
}
trait d extends Term {
  type ap[x <: Term] = d
  type eval = d
}
trait e extends Term {
  type ap[x <: Term] = e
  type eval = e
}

Eventually the following definition of Equals lets us check types for equality:

case class Equals[A >: B <:B , B]()

Equals[Int, Int]     // compiles fine
Equals[String, Int] // won't compile

Now lets see whether we can evaluate some combinators.

  // Ic -> c
  Equals[I#ap[c]#eval, c]
  
  // Kcd -> c
  Equals[K#ap[c]#ap[d]#eval, c]

  // KKcde -> d
  Equals[K#ap[K]#ap[c]#ap[d]#ap[e]#eval, d]
  
  // SIIIc -> Ic
  Equals[S#ap[I]#ap[I]#ap[I]#ap[c]#eval, c]

  // SKKc -> Ic
  Equals[S#ap[K]#ap[K]#ap[c]#eval, c]

  // SIIKc -> KKc
  Equals[S#ap[I]#ap[I]#ap[K]#ap[c]#eval, K#ap[K]#ap[c]#eval]

  // SIKKc -> K(KK)c
  Equals[S#ap[I]#ap[K]#ap[K]#ap[c]#eval, K#ap[K#ap[K]]#ap[c]#eval]

  // SIKIc -> KIc
  Equals[S#ap[I]#ap[K]#ap[I]#ap[c]#eval, K#ap[I]#ap[c]#eval]

  // SKIc -> Ic
  Equals[S#ap[K]#ap[I]#ap[c]#eval, c]
  
  // R = S(K(SI))K  (reverse)
  type R = S#ap[K#ap[S#ap[I]]]#ap[K]
  Equals[R#ap[c]#ap[d]#eval, d#ap[c]#eval]

Finally lets check whether we can do recursion using the fixed point operator from above. First lets define \beta .

  // b(a) = S(Ka)(SII)
  type b[a <: Term] = S#ap[K#ap[a]]#ap[S#ap[I]#ap[I]]

Further lets define some of the A_ns from above.

trait A0 extends Term {
  type ap[x <: Term] = c
  type eval = A0
}
trait A1 extends Term {
  type ap[x <: Term] = x#ap[A0]#eval
  type eval = A1
}
trait A2 extends Term {
  type ap[x <: Term] = x#ap[A1]#eval
  type eval = A2
}

Now we can do iteration on the type level using a fixed point combinator:

  // Single iteration
  type NN1 = b[R]#ap[b[R]]#ap[A0]
  Equals[NN1#eval, c]

  // Double iteration
  type NN2 = b[R]#ap[b[R]]#ap[A1]
  Equals[NN2#eval, c]

  // Triple iteration
  type NN3 = b[R]#ap[b[R]]#ap[A2]
  Equals[NN3#eval, c]

Finally lets check whether we can do ‘unbounded’ iteration.

trait An extends Term {
  type ap[x <: Term] = x#ap[An]#eval
  type eval = An
}
// Infinite iteration: Smashes scalac's stack
  type NNn = b[R]#ap[b[R]]#ap[An]
  Equals[NNn#eval, c]

Well, we can 😉

$ scalac SKI.scala
Exception in thread "main" java.lang.StackOverflowError
        at scala.tools.nsc.symtab.Types$SubstMap.apply(Types.scala:3165)
        at scala.tools.nsc.symtab.Types$SubstMap.apply(Types.scala:3136)
        at scala.tools.nsc.symtab.Types$TypeMap.mapOver(Types.scala:2735)