lablog

Haskell data types and XML Sebas

haskell 7 comments

Here at typLAB it wasn’t evident from the beginning what would be the best choice for a storage back-end. We knew that we were about to build a web based editor and would be dealing with a lot of HTML5 documents with lots of meta data. After some careful consideration we decided to go for an XML database. More specifically, the Berkeley XML Database, lovingly called DBXML by its authors.

We figured that using DBXML would give us some important advantages:

Once we decided to go for an DBXML back-end we had to figure out how to easily get values form our Haskell program in and out of the database. The rest of this post will be dealing with the last point of our enumeration: how to get a nice mapping from Haskell’s algebraic datatypes to our DBXML back-end.

XML queries

The DBXML binding for Haskell is a shallow wrapper around the existing C++ API. This library allows us to perform the common create, read, update and delete queries for entire XML documents or parts of it. Communication with the XML database happens mainly via XQuery. Queries and query parameters are passed into the API (and results will come out) using Haskell ByteStrings. It is up to the programmer to setup the queries with the right XML structure and encoding. Take for example the (somewhat simplified) type signature of the query function:

query :: Collection -> Query -> Parameters -> IO [ByteString]

This function takes an identification of the XML collection, which is somewhat like a database handle, an XML query, a set of query parameters and returns a possibly empty list of XML snippets as ByteStrings. Too bad that all our domain objects are well-typed Haskell algebraic datatypes and not raw sequences of bytes. We need a simple XML (de)serialization tool for this.

XML picklers

The Haskell XML Toolbox (HXT) is a library containing a (quite extended) collection of XML processing tools. The library has support for XML parsing, pretty printing, XPath queries, XSL stylesheets, DTD, XSD and RelaxNG schemas and a lot more. Interestingly, HXT exposes a type class and an accompanying set of combinators called XML picklers. XML picklers can be used to build conversion functions from Haskell datatypes to XML and vice versa. The type class looks like this:

  class XmlPickler a where
    xpickle :: PU a

So for every type in the XmlPickler type class there is some PU available. The PU datatype is composed of a pair of pickle (serialize) and unpickle (deserialize) functions together with a schema description. Because we won’t be using the schema definitions we will ignore them for now. There is probably no need to ever touch the functions inside the PU type, because the library supplies a vast amount of basic pickler combinators to be used instead.

To illustrate the usage of HXT picklers take this simple Haskell datatype representing a single user in our system:

data User = User
  { name     :: String
  , email    :: String
  , password :: String
  , openID   :: String
  }

Using some of the basic pickler combinators from the library it is very easy to come up with a suitable XmlPickler instance:

class XmlPickler User where
  userPickle =
    xpElem "user"
    $ xpWrap
      ( (\(a, b, c, d, e) -> User a b c d e)
      , (\(User a b c d e) -> (a, b, c, d, e))
      )
    $ xp5Tuple
      (xpElem "username" xpText)
      (xpElem "name"     xpText)
      (xpElem "password" xpText)
      (xpElem "email"    xpText)
      (xpElem "openid"   xpText0)

This instance uses the xp5Tuple function to pickle five sub-picklers into a big tuple. The five fields will be appropriately named elements from which the text value will be used. The tuple will be converted into a value of the User datatype using the xpWrap function. This is all you to need to manually write XML serialization and deserialization code.

A bit off topic but interesting to note is the fact that the xpWrap function can be seen as a pickler specific and bidirectional version of the well known fmap for Functors. The xpWrap is used to define true isomorphisms. When we generalize the type of xpWrap to work arbitrary containers, lets call this function bifmap, and compare the type signatures this similarity becomes obvious:

fmap   :: (a -> b)         -> f a -> f b
bifmap :: (a -> b, b -> a) -> f a -> f b

So, taking the XmlPickler instance for our User datatype we can now easily convert users into XML and read them back in, like the following example:

User "jd" "John Doe" "secret" "john@doe" "none"
<user>
  <username>jd</username>
  <name>John Doe</name>
  <password>secret</password>
  <email>john@doe</email>
  <openid>none</openid>
</user>

Using the xpickle function from the type class and the xunpickleVal from the HXT library we can now write a more suitable query function on top of the raw version. This version does not return ByteStrings, but values of any type we can convert to. Off course, the information in the database should match your datatype, otherwise the unpickler function will just produce parse errors resulting in an empty list.

query :: XmlPickler a => Collection -> Query -> Parameters -> IO [a]

Although this pickler example for our user is a very simple one, even more complicated datatypes, including multi-constructor and possibly mutual recursive datatypes, can quite easily be made an instance of the XmlPickler class. Unfortunately we still have to write them all by hand.

Going generic

After a few of years at the University of Utrecht we learned at least one valuable lesson, never write functions by hand when they can be derived generically. We decided to write a generic XML pickler function using the generic programming library Regular, developed (not entirely coincidentally) at the University of Utrecht. Regular is a relatively simple but powerful tool for writing data type generic functions. The library has support for deriving embedding projection pairs (conversions from and to a generic representation) using Template Haskell and provides enough reflection to inspect constructor names and record labels.

The generic representation is encoded as a type family (the pattern functor, or PF) over the original data type. The generic pickler function we developed has the following signature:

gxpickle :: (Regular a, GXmlPickler (PF a)) => PU a

This means that for every type that we can convert to a generic representation (indicated by the Regular type class) and for every type that has a GXmlPickler instance for its generic representation, we can deliver a PU. The regular-xmlpickler package implements the GXmlPickler type class and the instances for the types of which the representations are composed. So, all we need now for our User datatype is to derive a generic representation and use the generic implementation for the XmlPickler instance.

$(deriveAll ''User "PFUser")
type instance PF User = PFUser

instance XmlPickler User where
  xpickle = gxpickle

Using this automatically derived XML pickler and the query function described above we can now query the DBXML backend for all users that satisfy a certain property:

jd :: IO [User]
jd = query myCollection "/user[username=$name]" [("name", "jd")]

Now we’re able to query a database for pieces of XML and reify these as true Haskell values with almost no boilerplate involved! Because of the bidirectional behavior of the XmlPickler type class it shouldn’t be difficult to imagine that the same trick is applicable to inserting and updating database entries.

We will discuss writing generic functions using the Regular library in a later post.

Excellent! Do you have a typed way to write the XQuery expressions as well? It would be nice to see that it matches the schema.

Not yet. Our current system doesn’t really need complex queries, but that might change in the future. Having type safe queries would indeed be a nice addition.

Neat article. You guys really need an RSS feed on this page, though.

Justin: We have a feed for all articles, all comments, and comments on a single article. What more do you need?

Max

Hi, I wrote a similar generic package to derive XmlPickler instances for types. I didn’t use the Regular package, just hand crafted TH. Perhaps we should combine efforts?

Max: When you have implemented features that we did not include in our version we would certainly like to know. We’ll probably soon release the generic pickler code so we can compare the efforts. Is your code on Hackage?

Hamish Mackenzie

I also have something similar, but I did not use XmlPickler or Regular. Just TH and Happstack.Data’s XML types. Its not complete or pretty, but it works for the XML files I have. I would love to try out and contribute to your library.

I need to be able to specify names with different capitalisation (eg “UserName” in place of “userName” or “ABCData” in place of “abcData”). Currently I provide a list of names and use a function to convert them in TH to field names (so I get compiler errors if the field is not found).

I also need some data to go in XML attributes. Currently I just read attributes as if they were elements, but it would be nice if it knew to write them as well.

Post a comment