A non-technical, beginners' guide to ONIX for Books

ONIX is a type of XML (whatever that is)

ONIX is built using something called XML. Let’s not worry what those letters stand for. (It’s ‘Extensible Markup Language’, but you don’t need to know that.)

XML is one of those boring ideas that can make businesses run more smoothly, like ISBN-13 numbers or barcodes. Really it’s just some general rules for how to write down information so that computers as well as people can read it (but mainly computers). It’s not even a full set of rules; it’s just enough to help people make a start on designing their own formats for sharing information.

If there’s a particular kind of information you want to store in a standardised way, you can take the basic rules of XML and add your own extra rules appropriate to the type of data you care about. When you’re finished, you can share that combined set of rules with your friends and call it a standard.

Within publishing, people wanted to share information about new titles, so a consortium of publishing experts called EDItEUR took the basic XML rules and added a lot of book-related rules in order to create what they called the ‘ONIX’ standard.

Onix is short for ONline Information eXchange (but you’ll never need to know that). Frankly, it’s a terrible acronym because it gives no clue about its purpose; a name like that could apply to almost any form of electronic communication. On the other hand, the word ‘ONIX’ is fairly memorable, even if it doesn’t particularly make you think of publishing.

EDItEUR created rules for how to specify each piece of title-related info until they arrived at a standardised way for describing everything they thought a typical book shop, library or distributor would need to know about a new title – or an old one, for that matter. As you can imagine, that kind of standardisation has the potential to make it much easier for the whole industry to share information with thousands of organisations by sending them all a copy of the same ‘ONIX message’ file.

But XML itself wasn’t created by publishers. In other sectors, XML is being used as the starting point for storing a multitude of different types of information. There are XML standards for creating an invoice or describing a gene – or (to return to publishing) to store the contents of an e-book in a standardised way. With that in mind, we’ll cover the absolute basics of XML and then - long before your eyes glaze over - we’ll switch back to ONIX and talk about its pros, cons and practicalities.

The guts of XML (made palatable for non-techies)

The first thing to learn about XML is good news: it’s written in English. Or rather it’s written using ordinary words, with a few squiggles added, and not in some sort of computer hieroglyphics.

So, imagine that you’re sending out details of a new book that you’re publishing. Naturally you want everyone to add its details to their stock systems so they can easily order it. Let’s start with the title, ‘The Life and Times of Ned Lud’. A human being can take a guess that it’s a book title, but in XML you always label information to make it clear. So we might write this:

1
<TitleText>The Life and Times of Ned Lud</TitleText>

It’s like that maxim for giving an informative talk: tell people what you’re going to say, then say it, then tell them what you just said. So that line above says: Here comes something called TitleText, ‘The Life and Times of Ned Lud’, that’s the end of the TitleText. When you put ‘/’ in front of a label you’re marking the end of something.

Of course you can make up your own names for information. You could choose <NameOfBook> or <FieldDD6_Alpha_Gobbledygook>, but it just so happens that <TitleText> is the name that’s been agreed on by a large group of book publishers as part of the ONIX standard. The bit in the angle brackets is called a ‘tag’. It’s fairly easy to see why; you ‘tag’ information to say what it means.

For instance, if all you have is the piece of text ‘Winston Churchill’, it’s difficult to tell whether that’s a book about him or a book written by him – both are plausible. So everything gets tagged for clarity’s sake. Take a look at this:

1
2
3
4
<Author>Katie Daynes</Author>
<TitleText>Winston Churchill</TitleText>
<ISBN>074606814X</ISBN>
<Publisher>Usborne Publishing Ltd</Publisher>

Hopefully it’s pretty obvious what that all means. Each ‘tag’ has a start and an end, and the bit in the middle is the information you want to share – also known as the ‘contents’ of the tag. A computer can easily read it and so can a person (with a little effort). Unfortunately for us, information about books gets complicated and so the XML used to store it has to get complicated too.

For instance, what do we do if there’s more than one author? Or if there’s no author, just an editor and some contributors? And what about all the other pieces of information we might want to share, like publication date, price and distributor details? The people who wrote ONIX came up with a format that allows you store a vast amount of information about a title in a structured way.

One feature of XML that the ONIX designers made use of was the idea of putting one tag inside another. It would be helpful to show you what that looks like, but real ONIX documents are difficult to read, so this next example is just a made-up one; it doesn’t follow the ONIX standard. But on the plus side, it’s actually possible for a human to understand it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
    <Book>
      <Title>
        <MainTitle>The Life and Times of Ned Lud</MainTitle>
        <SubTitle>Backward Looking Visionary</SubTitle>
      </Title>
      <Author>
        <FirstName>Emma</FirstName>
        <Surname>Barnes</Surname>
      </Author>
      <Illustrator>
        <FirstName>Rob</FirstName>
        <Surname>Jones</Surname>
      </Illustrator>
    </Book>

So in this made-up example, if a <Surname> tag is inside an <Author> tag, then it’s the name of an Author; if it’s inside an <Illustrator> tag, then it’s the name of an illustrator.

There’s lots more to be said about XML, but it doesn’t get any more interesting. Plus we’ve probably covered everything we need to in order to understand how it applies to ONIX and the publishing industry.

The ONIX Standard

If you want an easy way to tell Nielsen or Amazon or Waterstones about a new book, you can put all the relevant info in an ONIX message and e-mail or FTP it to them. In case you’re interested, the British contributors to the ONIX standard were the BIC, made up of the Library Association, The British Library, The Booksellers Association and The Publishers Association, so it’s got some weight behind it.

It’s a gigantic and complicated standard because it needs to be able to hold gigantic and complicated amounts of information for each title. For instance, it gives you tags for listing the back cover quotes on your book, and giving the names of each quote contributor and the organisations they work for. It lets you include details of discounts and promotions by date and region. It holds information on formats and rights and physical dimensions – and even what units the measurements are being given in.

Unfortunately for anyone who wants to open an ONIX message and actually read the contents, the standard also makes use of numbers where a name would have been easier to read. For instance, if you want to know whether a book has been published yet you could look at the relevant tag. Here it is:

1
    <PublishingStatus>04</PublishingStatus>

But what does ‘04’ mean? Well, if you hunt down a copy of the ONIX documentation you’ll find a list called ‘ONIX Code Lists Issue 11 List 64: Publishing status’, with entries such as:

1
  04: Active. The product was published, and is still active in the sense that the publisher will accept orders for it, though it may or may not be immediately available, for which see <SupplyDetail>.

and

1
    05: No longer our product. Ownership of the product has been transferred to another publisher (with details of acquiring publisher if possible in PR.19).

And many more. There are over 150 of these lists explaining what the various different numbers and codes mean. So while humans can get the gist of what’s in an ONIX message, the details are often hard to follow. The ONIX people could have chosen to use the words ‘Active’ and ‘No longer our product’ instead of the numbers ‘04’ and ‘05’, but they probably took the view that machines, rather than people, would be reading these messages and felt that numbers were more computer-y.

In what ways might I mess up an ONIX message?

Given all that opaque complexity, it’s easy to imagine trying to create an ONIX message and making a mess of it. In fact there are at least three different levels of mess that you could make.

The first hurdle to clear is whether an ONIX message is ‘well-formed’. For instance, if you’ve opened a tag but not closed it (like with the ISBN element below), then regardless of whatever riveting information your ONIX message contains, it’s going to be consigned to the bin.

1
2
3
4
    <Author>Katie Daynes</Author>
    <TitleText>Winston Churchill</TitleText>
    <ISBN>074606814X
    <Publisher>Usborne Publishing Ltd</Publisher>

The rules for being ‘well-formed’ are really just the rules for writing XML of any kind. So it doesn’t matter if you’re using the ONIX standard or an XML standard for describing molecules, if you don’t get the basics right, your message will not be readable by anyone’s systems.

It’s well-formed but is it valid?

That leads us to the next kind of mistake that one can make. An XML file can be ‘well-formed’ but not ‘valid’. To check validity you need a standard to compare it against, which in our case would be the ONIX standard.

So perhaps your XML is perfect, but by mistake you’ve used the Advanced Rocketry Asteroid Configuration standard. Commiserations! You’ve perfectly described an asteroid, rather than a book. According to the ARAC standard, your file is valid; according to the ONIX standard it is not.

A more likely scenario however is that you’ve attempted to use the ONIX standard but you’ve missed out something that’s mandatory or fallen foul of any of hundreds of other pitfalls.

The peeps in charge of XML have provided a way to specify all the requirements that a message must fulfil in order to be valid. They call that specification a DTD. (You don’t need to know what DTD stands for. It’s Document Type Description, but now you can forget that.)

Each standard should have a DTD. The DTD says which tags are required and which are optional. It lists the tag names that are allowed and the kind of data each one is allowed to contain. It says whether there can be more than one tag of a particular type and whether it’s allowed to contain other tags within it.

If you have the appropriate kind of software, you can show it your attempt at an ONIX message, make sure the software can find the DTD for ONIX and then ask it whether your message has passed or failed its validity check.

Don’t (necessarily) blame ONIX

The third and final level of mistake you can make when creating an ONIX message concerns the information it contains. Nothing within the ONIX standard can help you if you mistakenly set the TitleText of your book to ‘The Fridges of Madison County’ instead of ‘The Bridges of Madison County’.

Likewise, if you accidentally send out an ONIX message which details not just the cost price that the recipient of the message must pay but also the cost that the recipient’s competitors are being charged, then the resultant egg on your face is self-inflicted. The ONIX standard tells you what you must include and what you can include, but not what it’s wise to include or whether you spelled it properly.

Of course, after the third and final level of problems comes the fourth and most problematic level which arises when different companies and organisations within the world of publishing invent their own flavours, subsets and artificial dialects of ONIX.

Sometimes these take the form of additional information required or particular ways of specifying things. In many cases these requirements come from limitations in the software or processes that a particular organisation is using. Specify too many of a particular type of tag and that organisation might not be able to read your ONIX message. The standard allows it but their systems can’t handle it.

Standard? Which standard?

That fourth type of problem with your ONIX message can also arise from the fact that standards are often a moving target and not everyone manages to keep up.

Few standardisation processes manage to hit the bullseye first time. The ONIX standard is no exception. As of 2022, we’ve left version 1.0 far behind us, passed through minor revisions, then a big leap to 2.0, then more revisions, both big and small - with version 2.1 being very popular - to arrive at the most recent version, which is 3.0.x - where that x denotes which minor revision of 3.0 you’re using. At the time of writing, the current version is 3.0.8.

But not everyone is using version 3.0. Previous versions work well enough, and updating software or processes costs money. So despite the fact that it was superseded in 2009, version 2.1 is still as far as some companies have progressed.

Send a company whose systems only speak version 2.1 an ONIX message validated against the 3.0 DTD and the outcome is unlikely to be the one you hoped for.

How to create an ONIX message

In theory, if you only had a handful of titles to worry about, you could attempt to type out your ONIX messages by hand. They are, after all, just text files. But first of all there are all those pesky tags that require you to know that, say, according to ‘ONIX Code Lists Issue 11 List 7: Product form code’ you need to use the code ‘BC’ to denote a paperback. There’s also the need to check your message against the appropriate DTD to see if it’s valid.

Realistically, hand-typing an ONIX message that contains anything more than the most minimal and basic of information is completely impractical.

The nice thing about using a publishing management system such as Consonance to generate ONIX messages is that it will already know about all of your current and upcoming titles. So when you want to send out an ONIX message, Consonance can generate it for you, filling in all the relevant tags with information about your titles. And (like the example mentioned above) it will know about all the confusing codes such as the need to use ‘04’ for a title with an ‘active’ publishing status. In effect it can convert from the human-readable information you’ve set the system up with into the machine-friendly codes and tags of ONIX.

When you originally entered that information into Consonance, you would have used straightforward, familiar terms. Better still, during data entry many pieces of information will have been selected from lists of valid options so that there’s no possibility of entering something that’s mistyped or garbled.

And if you already have ONIX messages describing your titles and you want to start using Consonance, we can probably read those messages into the system to save you having to type any of it out again.

There’s a lot more to be said about the ONIX standard - much of it comprised of gotchas and cautionary tales - but if you’ve read this far you’ll hopefully know enough to join in fun dinner party conversations on the subject and look techies in the eye without flinching, so we’ll leave it there.

Are your current systems sabotaging your growth ambitions? Are you hungry to implement new business models, but concerned you lack the strong administrative foundations needed for innovation?

We're always amazed at how resigned publishers have had to become to the low bar in publishing management systems. Demand more.