Haskell data types and XML
Here at typLAB it wasn’t evident from the beginning what would be the best choice for a storage back-end. We knew that we were about to build a web based editor and would be dealing with a lot of HTML5 documents with lots of meta data. After some careful consideration we decided to go for an XML database. More specifically, the Berkeley XML Database, lovingly called DBXML by its authors.
We figured that using DBXML would give us some important advantages:
- Collections of HTML5 documents will form the basis of data model. Only one trivial conversion from HTML5 to syntactically valid XML is needed to get our documents into the XML database. Once stored we can perform some interesting queries over our data.
- XML databases allow for the storage of complex data layouts without having a strict schema. Without a schema it will be easier to adjust our data model over time without instantly breaking our software.
- XQuery is a very expressive (almost-purely functional) querying language which is at least as powerful as SQL and far more flexible in the structure of the data to target.
- XML can be used to both encode strictly defined datatypes and store free-form documents in the same document collection. This will enable us to put both our meta data and our documents in the same database.
- A quick look on Hackage revealed there is an out-of-the-box easy-to-use Haskell binding available for the Berkeley XML database. No need to create custom bindings ourselves.
- We are in the advantage (or disadvantage) of having Haskell as our language of choice for our server software. Because of the hierarchical nature of both XML and Haskell algebraic datatypes, an XML database feels like a perfect fit.
Once we decided to go for an DBXML back-end we had to figure out how to easily get values form our Haskell program in and out of the database. The rest of this post will be dealing with the last point of our enumeration: how to get a nice mapping from Haskell’s algebraic datatypes to our DBXML back-end.
XML queries
The DBXML binding for Haskell is a shallow wrapper around the existing C++ API. This library allows us to perform the common create, read, update and delete queries for entire XML documents or parts of it. Communication with the XML database happens mainly via XQuery. Queries and query parameters are passed into the API (and results will come out) using Haskell ByteStrings. It is up to the programmer to setup the queries with the right XML structure and encoding. Take for example the (somewhat simplified) type signature of the query function:
query :: Collection -> Query -> Parameters -> IO [ByteString]
This function takes an identification of the XML collection, which is somewhat like a database handle, an XML query, a set of query parameters and returns a possibly empty list of XML snippets as ByteStrings. Too bad that all our domain objects are well-typed Haskell algebraic datatypes and not raw sequences of bytes. We need a simple XML (de)serialization tool for this.
XML picklers
The Haskell XML Toolbox (HXT) is a library containing a (quite extended) collection of XML processing tools. The library has support for XML parsing, pretty printing, XPath queries, XSL stylesheets, DTD, XSD and RelaxNG schemas and a lot more. Interestingly, HXT exposes a type class and an accompanying set of combinators called XML picklers. XML picklers can be used to build conversion functions from Haskell datatypes to XML and vice versa. The type class looks like this:
class XmlPickler a where
xpickle :: PU a
So for every type in the XmlPickler type class there is some PU available. The PU datatype is composed of a pair of pickle (serialize) and unpickle (deserialize) functions together with a schema description. Because we won’t be using the schema definitions we will ignore them for now. There is probably no need to ever touch the functions inside the PU type, because the library supplies a vast amount of basic pickler combinators to be used instead.
To illustrate the usage of HXT picklers take this simple Haskell datatype representing a single user in our system:
data User = User
{ name :: String
, email :: String
, password :: String
, openID :: String
}
Using some of the basic pickler combinators from the library it is very easy to come up with a suitable XmlPickler instance:
class XmlPickler User where
userPickle =
xpElem "user"
$ xpWrap
( (\(a, b, c, d, e) -> User a b c d e)
, (\(User a b c d e) -> (a, b, c, d, e))
)
$ xp5Tuple
(xpElem "username" xpText)
(xpElem "name" xpText)
(xpElem "password" xpText)
(xpElem "email" xpText)
(xpElem "openid" xpText0)
This instance uses the xp5Tuple function to pickle five sub-picklers into a big tuple. The five fields will be appropriately named elements from which the text value will be used. The tuple will be converted into a value of the User datatype using the xpWrap function. This is all you to need to manually write XML serialization and deserialization code.
A bit off topic but interesting to note is the fact that the xpWrap function can be seen as a pickler specific and bidirectional version of the well known fmap for Functors. The xpWrap is used to define true isomorphisms. When we generalize the type of xpWrap to work arbitrary containers, lets call this function bifmap, and compare the type signatures this similarity becomes obvious:
fmap :: (a -> b) -> f a -> f b
bifmap :: (a -> b, b -> a) -> f a -> f b
So, taking the XmlPickler instance for our User datatype we can now easily convert users into XML and read them back in, like the following example:
User "jd" "John Doe" "secret" "john@doe" "none"
jd
John Doe
secret
john@doe
none
Using the xpickle function from the type class and the xunpickleVal from the HXT library we can now write a more suitable query function on top of the raw version. This version does not return ByteStrings, but values of any type we can convert to. Off course, the information in the database should match your datatype, otherwise the unpickler function will just produce parse errors resulting in an empty list.
query :: XmlPickler a => Collection -> Query -> Parameters -> IO [a]
Although this pickler example for our user is a very simple one, even more complicated datatypes, including multi-constructor and possibly mutual recursive datatypes, can quite easily be made an instance of the XmlPickler class. Unfortunately we still have to write them all by hand.
Going generic
After a few of years at the University of Utrecht we learned at least one valuable lesson, never write functions by hand when they can be derived generically. We decided to write a generic XML pickler function using the generic programming library Regular, developed (not entirely coincidentally) at the University of Utrecht. Regular is a relatively simple but powerful tool for writing data type generic functions. The library has support for deriving embedding projection pairs (conversions from and to a generic representation) using Template Haskell and provides enough reflection to inspect constructor names and record labels.
The generic representation is encoded as a type family (the pattern functor, or PF) over the original data type. The generic pickler function we developed has the following signature:
gxpickle :: (Regular a, GXmlPickler (PF a)) => PU a
This means that for every type that we can convert to a generic representation (indicated by the Regular type class) and for every type that has a GXmlPickler instance for its generic representation, we can deliver a PU. The regular-xmlpickler package implements the GXmlPickler type class and the instances for the types of which the representations are composed. So, all we need now for our User datatype is to derive a generic representation and use the generic implementation for the XmlPickler instance.
$(deriveAll ''User "PFUser")
type instance PF User = PFUser
instance XmlPickler User where
xpickle = gxpickle
Using this automatically derived XML pickler and the query function described above we can now query the DBXML backend for all users that satisfy a certain property:
jd :: IO [User]
jd = query myCollection "/user[username=$name]" [("name", "jd")]
Now we’re able to query a database for pieces of XML and reify these as true Haskell values with almost no boilerplate involved! Because of the bidirectional behavior of the XmlPickler type class it shouldn’t be difficult to imagine that the same trick is applicable to inserting and updating database entries.
We will discuss writing generic functions using the Regular library in a later post.

Excellent! Do you have a typed way to write the XQuery expressions as well? It would be nice to see that it matches the schema.
Not yet. Our current system doesn’t really need complex queries, but that might change in the future. Having type safe queries would indeed be a nice addition.
Neat article. You guys really need an RSS feed on this page, though.
Justin: We have a feed for all articles, all comments, and comments on a single article. What more do you need?
Hi, I wrote a similar generic package to derive XmlPickler instances for types. I didn’t use the Regular package, just hand crafted TH. Perhaps we should combine efforts?
Max: When you have implemented features that we did not include in our version we would certainly like to know. We’ll probably soon release the generic pickler code so we can compare the efforts. Is your code on Hackage?
I also have something similar, but I did not use XmlPickler or Regular. Just TH and Happstack.Data’s XML types. Its not complete or pretty, but it works for the XML files I have. I would love to try out and contribute to your library.
I need to be able to specify names with different capitalisation (eg “UserName” in place of “userName” or “ABCData” in place of “abcData”). Currently I provide a list of names and use a function to convert them in TH to field names (so I get compiler errors if the field is not found).
I also need some data to go in XML attributes. Currently I just read attributes as if they were elements, but it would be nice if it knew to write them as well.
hi sebas and erik,
Great work going on at silk! I have seen your silkapp appeared om the TNW.com website.
Yet these articles are very dated. So what’s up?
The question: The data you are crunching are assertions in nature. So why use an XML database and not using a tripplestore using RDF?
XML needs a strict data model, if new assertions are made, different from the specification in your internal datamodel, it would be more difficult to insert new data, and relate this data to existing assertions in your database.
I know the strictness of XML is easy to use with; you get what you asked for because the system ‘knows’ that to expect.
Yet this would make it very labor intensive task to insert new data and unexpected assertions beyond your wildest dreams, offering the user a whole lot more then the data in the datamodel specification. For example the user can be offered more assertions from the databases in the linkeddata cloud. The drawback of this, I guess, it is harder to create a neat user interface, and overwhelming the end-user with the richness.
Well, please leave a note about your thoughts using RDF and SPARQL services.
kind regards
Hi Maurice,
You are right, this post is a bit dated, we don’t use XML anymore as the storage for assertions. We use a separate graph-database now that encodes the semantic information, structured similar to RDF. We don’t use SPARQL itself, but a similar much simpler query language.
Hi Sebas, thanks your your reply.
Sorry, you just tickled my curiosity. Could you provide some details about what graph-database you used and what the simple-sparql language is you use, and why you have made these choices?
I would love to learn more about what is pragmatic and ‘gangbaar’[nl] these days.
kind regards