XMLserdes — XML Serialisation and Deserialisation

Mechanisms for serializing Python objects to XML, and deserializing them from XML. The top-level object in a serialization is almost always an instance of some class having multiple properties to be de/serialized. Support is provided for declarative specification of how this is to be done.

Top-level functions

xmlserdes.serialize(obj, tag)

Entry point function to serialize a Python object to an XML element.

Parameters:

obj (instance of class having xml_descriptor attribute) – Python object to serialize

Returns:

XML element, as instance of etree.Element.

xmlserdes.deserialize(cls, elt, expected_tag)

Entry point function to deserialize a Python object from an XML element.

Parameters:
  • cls – class of object to deserialize

  • elt (etree.Element) – XML element

Returns:

instance of class cls.

See also xmlserdes.XMLSerializable and xmlserdes.XMLSerializableNamedTuple for an ‘intrusive’ API.

Classes and functions

class xmlserdes.XMLSerializable

Base class for types which become serializable to XML via instance method as_xml, and deserializable from XML via class method from_xml. XML behaviour is specified via an xml_descriptor class attribute (of the derived class), which is a list of terse type-descriptor expressions — see xmlserdes.TypeDescriptor.from_terse() for details.

class xmlserdes.XMLSerializableNamedTuple

Base class for types which are essentially named tuples with the field-names taken from the xml_descriptor.

>>> class Rectangle(xmlserdes.XMLSerializableNamedTuple):
...     xml_default_tag = 'rect'
...     xml_descriptor = [('wd', int), ('ht', int)]
>>> r = Rectangle(10, 20)
>>> print(r)
Rectangle(wd=10, ht=20)
>>> r.wd
10
>>> r.ht
20
>>> print(xmlserdes.utils.str_from_xml_elt(r.as_xml()))
<rect><wd>10</wd><ht>20</ht></rect>

If a field’s name specifies that the its value is to be stored in an XML attribute (by starting with the '@' character), then the field name of the class removes that '@':

>>> class Ellipse(xmlserdes.XMLSerializableNamedTuple):
...     xml_default_tag = 'oval'
...     xml_descriptor = [('@major', int), ('@minor', int),
...                       ('colour', str)]
>>> e = Ellipse(8, 5, 'red')
>>> print(xmlserdes.utils.str_from_xml_elt(e.as_xml()))
<oval major="8" minor="5"><colour>red</colour></oval>

If class has no xml_default_tag attribute, it is created with value equal to the class name:

>>> class Circle(xmlserdes.XMLSerializableNamedTuple):
...     xml_descriptor = [('radius', int)]
>>> c = Circle(42)
>>> print(c)
Circle(radius=42)
>>> c.radius
42
>>> print(xmlserdes.utils.str_from_xml_elt(c.as_xml()))
<Circle><radius>42</radius></Circle>

To suppress this behaviour, define an xml_default_tag attribute with value None. This is useful if you wish to force callers of as_xml() to supply the tag:

>>> class Sphere(xmlserdes.XMLSerializableNamedTuple):
...     xml_default_tag = None
...     xml_descriptor = [('radius', int)]
>>> s = Sphere(100)
>>> print(xmlserdes.utils.str_from_xml_elt(s.as_xml('round-object')))
<round-object><radius>100</radius></round-object>
>>> x = s.as_xml()
... 
Traceback (most recent call last):
    ...
AttributeError: 'Sphere' object has no attribute 'xml_default_tag'

If you create a subclass of a XMLSerializableNamedTuple subclass, and do not explicitly specify an xml_default_tag, then the sub-subclass inherits the sub-class’s xml_default_tag:

>>> class ShinyCircle(Circle):
...     pass
>>> sc = ShinyCircle(42)
>>> print(sc)
ShinyCircle(radius=42)
>>> sc.radius
42
>>> sc_xml = sc.as_xml()
>>> print(xmlserdes.utils.str_from_xml_elt(sc_xml))
<Circle><radius>42</radius></Circle>

(Note that the tag in the XML is Circle and not ShinyCircle.)

But extracting a ShinyCircle from an XML element works as expected:

>>> sc_round_trip = ShinyCircle.from_xml(sc_xml, 'Circle')
>>> print(sc_round_trip)
ShinyCircle(radius=42)
class xmlserdes.ElementDescriptor(*args, **kwargs)

Object which represents the mapping between an XML element and a property of a Python object, together with the native Python type of that property.

Parameters:
  • tag (str) – tag for XML element to de/serialize from/to

  • value_from (callable) – function (or other callable) which extracts the value from the containing object

  • type_descr (subclass of xmlserdes.TypeDescriptor) – type-descriptor which can de/serialize the Python object from/to the contents of an XML element

A more convenient way of constructing a xmlserdes.ElementDescriptor is to use the xmlserdes.ElementDescriptor.new_from_tuple() method.

classmethod new_from_tuple(tup)

Construct a new xmlserdes.ElementDescriptor from a two- or three-element tuple, covering the most common cases.

Parameters:

tup (tuple) – two- or three-element tuple describing required instance

The tuple must be one of:

  • a pair (tag, type_descriptor), in which case the value_from field of the resulting ElementDescriptor is attrgetter(tag)

  • a triple (tag, field_name_or_callable, type_descriptor); if field_name_or_callable is a str, it is taken as an attribute and the resulting value_from is attrgetter(field_name_or_callable); otherwise field_name_or_callable must be a callable.

xml_element(obj, _xpath=[])

Serialize, into an XML element, the relevant property from the given object.

>>> descr = ElementDescriptor.new_from_tuple(('width', xmlserdes.Atomic(int)))
>>> shape = collections.namedtuple('Shape', 'width')(42)
>>> print(xmlserdes.utils.str_from_xml_elt(descr.xml_element(shape)))
<width>42</width>
>>> descr_different_tag = ElementDescriptor.new_from_tuple(('shape-width',
...                                                         'width',
...                                                         xmlserdes.Atomic(int)))
>>> print(xmlserdes.utils.str_from_xml_elt(descr_different_tag.xml_element(shape)))
<shape-width>42</shape-width>
extract_from(elt, _xpath=[])

Deserialize, from an XML element, a value of the relevant type.

>>> descr = ElementDescriptor.new_from_tuple(('width', xmlserdes.Atomic(int)))
>>> xml_elt = etree.fromstring('<width>99</width>')
>>> descr.extract_from(xml_elt)
99
xmlserdes.SerDesDescriptor(children)

Convenience function for constructing a list of xmlserdes.ElementDescriptor instances from a list of abbreviated tuples.

Parameters:

children (iterable of tuples) – descriptions of property/sub-element mappings; each should be a tuple suitable for passing to xmlserdes.ElementDescriptor.new_from_tuple().

Returns:

New list of instances of xmlserdes.ElementDescriptor.

class xmlserdes.TypeDescriptor

Instances of classes derived from xmlserdes.TypeDescriptor support two operations on objects of a particular type:

  • xml_element() — serialize a given object as an XML element.

  • extract_from() — extract an object of the correct type from a given XML element.

The following static method is also available:

  • tag_is_valid() — return True or False according to whether the given tag is valid for this type-descriptor. For example, lists cannot be stored in attributes, and so a tag beginning with '@' is not valid for a type-descriptor storing a list.

This base type is not useful. Concrete derived types are:

See those classes’ individual docstrings for more details.

extract_from(elt, expected_tag, _xpath=[])

Extract and return an object from the given XML element. The tag of elt should be the given expected tag, otherwise an XMLSerDesError is raised.

Parameters:
  • elt (etree.Element) – XML element

  • expected_tag (str) – tag which elt must have

Return type:

depends on concrete subclass of TypeDescriptor

classmethod from_terse(descr)

Method to construct an instance of xmlserdes.TypeDescriptor from a terse expression. Many types for the expression argument are supported:

atomic type object

A xmlserdes.Atomic instance is created for that type. The list of known ‘atomic’ types is stored in TypeDescriptor.atomic_types.

>>> td = TypeDescriptor.from_terse(int)
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(42, 'answer')))
<answer>42</answer>
bool type object

An instance of xmlserdes.AtomicBool is created.

>>> td = TypeDescriptor.from_terse(bool)
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(False, 'is-blue')))
<is-blue>false</is-blue>
Enum-derived class [Python 3.4 onwards]

An instance of xmlserdes.AtomicEnum is created.

>>> import sys
>>> if sys.version_info >= (3, 4):
...     from enum import Enum
...     Animal = Enum('Animal', 'Cat Dog Rabbit')
...     td = TypeDescriptor.from_terse(Animal)
...     pet = Animal.Cat
...     print(xmlserdes.utils.str_from_xml_elt(td.xml_element(pet, 'pet')))
... else:
...     print('<pet>Cat</pet>')
<pet>Cat</pet>
string instance

A xmlserdes.Atomic instance is created, where the contained type is found by interpreting the given string as a Numpy dtype code.

>>> td = TypeDescriptor.from_terse('i2')
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(np.int16(42), 'answer')))
<answer>42</answer>
non-atomic type object

A xmlserdes.Instance instance is created, where the contained type is the given type. The type must have an xml_descriptor attribute.

>>> class Blob(xmlserdes.XMLSerializableNamedTuple):
...     xml_descriptor = [('size', int)]
>>> td = TypeDescriptor.from_terse(Blob)
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(Blob(42), 'blob')))
<blob><size>42</size></blob>

(This example uses the terse type-descriptor format to specify the XML behaviour of the one field of Blob.)

list instance

A xmlserdes.List instance is created.

The given list must have either one or two elements.

If two elements, they are taken as the contained type and contained tag:

>>> td = TypeDescriptor.from_terse([int, 'ans'])
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element([42, 99], 'answers')))
<answers><ans>42</ans><ans>99</ans></answers>

If one element, it must be a type having an xml_default_tag attribute, which is used as the contained tag:

>>> class Blob(xmlserdes.XMLSerializableNamedTuple):
...     xml_default_tag = 'blob'
...     xml_descriptor = [('size', int)]
>>> td = TypeDescriptor.from_terse([Blob])
>>> blobs = [Blob(42), Blob(99)]
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(blobs, 'blobs')))
<blobs><blob><size>42</size></blob><blob><size>99</size></blob></blobs>
tuple instance

Depending on the tuple length, either a xmlserdes.NumpyAtomicVector or a xmlserdes.NumpyRecordVectorStructured is created. In all cases, the first element of the tuple must be the Numpy.ndarray type object

two-element tuple

The second tuple element must be an atomic Numpy dtype, and a xmlserdes.NumpyAtomicVector for that dtype is returned.

>>> td = xmlserdes.TypeDescriptor.from_terse((np.ndarray, np.int32))
>>> xs = np.array([1, 2, 3], dtype = np.int32)
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(xs, 'answers')))
<answers>1,2,3</answers>
three-element tuple

The second element must be a record dtype, and the third element must be a string naming the contained elements. A xmlserdes.NumpyRecordVectorStructured is created.

This example uses very short tag names to keep the output of a reasonable length:

>>> Rect = np.dtype([('w', np.uint16), ('h', np.uint16)])
>>> td = xmlserdes.TypeDescriptor.from_terse((np.ndarray, Rect, 'r'))
>>> rects = np.array([(10, 20), (3, 4)], dtype = Rect)
>>> print(xmlserdes.utils.str_from_xml_elt(td.xml_element(rects, 'rs')))
<rs><r><w>10</w><h>20</h></r><r><w>3</w><h>4</h></r></rs>
xml_element(obj, tag, _xpath=[])

Return an XML element, with the given tag, corresponding to the given object.

Parameters:
  • obj – object to be serialized into an XML element

  • tag (str) – tag for the returned XML element

Return type:

XML element (as etree.Element instance)

See examples under subclasses of xmlserdes.TypeDescriptor for details.

abstract xml_node(obj, tag, _xpath=[])

Return either an xml element or an xml attribute.

class xmlserdes.Atomic(inner_type)

A xmlserdes.TypeDescriptor for handling ‘atomic’ types. The concept of an ‘atomic’ type is not explicitly defined, but anything which can be faithfully represented as a string via str(), and can be parsed from a string using the type name, will work.

Parameters:

inner_type – The native Python atomic type to be serialized and deserialized.

For example, an xmlserdes.Atomic type-descriptor to handle an integer:

>>> atomic_type_descriptor = xmlserdes.Atomic(int)

Serializing an integer into an XML element:

>>> print(xmlserdes.utils.str_from_xml_elt(atomic_type_descriptor.xml_element(42, 'answer')))
<answer>42</answer>

Deserializing an integer from an XML element:

>>> xml_elt = etree.fromstring('<weight>99</weight>')
>>> atomic_type_descriptor.extract_from(xml_elt, 'weight')
99

Unexpected tag:

>>> atomic_type_descriptor.extract_from(xml_elt, 'length')
... 
Traceback (most recent call last):
    ...
xmlserdes.errors.XMLSerDesError: expected tag "length" but got "weight" at /
class xmlserdes.List(contained_descriptor, contained_tag)

A xmlserdes.TypeDescriptor for handling homogeneous lists of elements.

Parameters:
  • contained_descriptor (xmlserdes.TypeDescriptor) – specification of the type of each element in the lists

  • contained_tag (str) – tag for each sub-element of the sequence element

For example, a xmlserdes.List type-descriptor to handle lists of integers, where each integer in the list will be represented by an XML element with tag answer.

>>> list_of_ints_td = List(xmlserdes.Atomic(int), 'answer')

Serializing a list of integers into an XML element:

>>> print(xmlserdes.utils.str_from_xml_elt(
...     list_of_ints_td.xml_element([42, 123, 99], 'list-of-answers')))
<list-of-answers><answer>42</answer><answer>123</answer><answer>99</answer></list-of-answers>

Deserializing a list of integers from an XML element:

>>> xml_elt = etree.fromstring('''<list-of-answers>
...                                 <answer>1</answer>
...                                 <answer>10</answer>
...                                 <answer>100</answer>
...                               </list-of-answers>''')
>>> list_of_ints_td.extract_from(xml_elt, 'list-of-answers')
[1, 10, 100]
class xmlserdes.Instance(cls)

A xmlserdes.TypeDescriptor for handling homogeneous instances of a ‘complex’ class having an ‘XML descriptor’.

Parameters:

cls – class whose instances are to be de/serialized; must have attribute named xml_descriptor which is a list of instances of xmlserdes.ElementDescriptor

Note

Possible to-do is allow separate passing-in of descriptor rather than requiring it to be an attribute of the to-be-serialized class.

Define class and augment it with xml_descriptor attribute:

>>> Rectangle = collections.namedtuple('Rectangle', 'wd ht')
>>> Rectangle.xml_descriptor = xmlserdes.SerDesDescriptor([('wd', xmlserdes.Atomic(int)),
...                                                        ('ht', xmlserdes.Atomic(int))])

Define type-descriptor to handle de/serialization:

>>> rectangle_td = Instance(Rectangle)

Serialize instance of the Rectangle class:

>>> r = Rectangle(210, 297)
>>> print(xmlserdes.utils.str_from_xml_elt(rectangle_td.xml_element(r, 'rect')))
<rect><wd>210</wd><ht>297</ht></rect>

Deserialize instance:

>>> xml_elt = etree.fromstring('<rect><wd>4</wd><ht>3</ht></rect>')
>>> rectangle_td.extract_from(xml_elt, 'rect')
Rectangle(wd=4, ht=3)
class xmlserdes.DTypeScalar(dtype)

A xmlserdes.TypeDescriptor for handling Numpy scalars of custom dtype.

The XML representation has one sub-element per field of the dtype. Atomic-type fields are represented as their repr; structured-dtype fields are represented with children corresponding to their fields, and so on.

Note

Currently the XML tags for the fields of each record are the same as the field names of the dtype. Possible to-do is to allow a mapping between these two sets of names.

Parameters:

dtype – Numpy record dtype of the vector

Define record dtype whose fields are all atomic types:

>>> import numpy as np
>>> ColourDType = np.dtype([('red', np.uint8),
...                         ('green', np.uint8),
...                         ('blue', np.uint8)])

Define type-descriptor for a scalar instance of it:

>>> colour_scalar_td = xmlserdes.DTypeScalar(ColourDType)

Serialize a scalar (the [()] construct extracts a scalar element from the 0-dimensional array):

>>> colour = np.array((20, 40, 50), dtype = ColourDType)[()]
>>> print(xmlserdes.utils.str_from_xml_elt(
...           colour_scalar_td.xml_element(colour, 'colour'),
...           pretty_print = True).rstrip())
<colour>
  <red>20</red>
  <green>40</green>
  <blue>50</blue>
</colour>
>>> xml_elt = etree.fromstring(
... '<green><red>0</red><green>64</green><blue>0</blue></green>')
>>> extracted_colour = colour_scalar_td.extract_from(xml_elt, 'green')
>>> print(extracted_colour)
(0, 64, 0)
>>> print(extracted_colour.dtype)
[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')]

Define a record dtype with nested custom field:

>>> PatternDType = np.dtype([('background', ColourDType),
...                          ('foreground', ColourDType)])

Define type-descriptor for a scalar instance of it:

>>> pattern_scalar_td = xmlserdes.DTypeScalar(PatternDType)

Serialize a scalar (the [()] construct extracts a scalar element from the 0-dimensional array):

>>> pattern = np.array(((120, 140, 150), (20, 40, 50)), dtype = PatternDType)[()]
>>> print(xmlserdes.utils.str_from_xml_elt(
...           pattern_scalar_td.xml_element(pattern, 'pattern'),
...           pretty_print = True).rstrip())
<pattern>
  <background>
    <red>120</red>
    <green>140</green>
    <blue>150</blue>
  </background>
  <foreground>
    <red>20</red>
    <green>40</green>
    <blue>50</blue>
  </foreground>
</pattern>
class xmlserdes.NumpyAtomicVector(dtype)

A xmlserdes.TypeDescriptor for handling Numpy vectors (i.e., one-dimensional ndarray instances) where the dtype is an ‘atomic’ type. Serialization is done as a CSV string. Complex types are not supported.

Parameters:

dtype – Numpy dtype of the vector

Define type-descriptor to handle de/serialization of a Numpy vector of uint16 elements:

>>> import numpy as np
>>> vector_td = NumpyAtomicVector(np.uint16)

Serialize a vector:

>>> v = np.arange(4, dtype = np.uint16)
>>> print(xmlserdes.utils.str_from_xml_elt(vector_td.xml_element(v, 'values')))
<values>0,1,2,3</values>

Deserialize a vector:

>>> xml_elt = etree.fromstring('<values>10,20,30</values>')
>>> vector_td.extract_from(xml_elt, 'values')
array([10, 20, 30], dtype=uint16)
class xmlserdes.NumpyRecordVectorStructured(dtype, contained_tag)

A xmlserdes.TypeDescriptor for handling Numpy vectors (i.e., one-dimensional ndarray instances) where the dtype is a Numpy record type. The record-type’s fields can be of scalar atomic type or custom dtype in turn.

The XML representation has one sub-element per element of the vector. Each of those sub-elements has sub-sub-elements corresponding to the fields of the record type.

Note

Currently the XML tags for the fields of each record are the same as the field names of the dtype. Possible to-do is to allow a mapping between these two sets of names.

Parameters:
  • dtype – Numpy record dtype of the vector

  • contained_tag (str) – tag to use for the element representing each element of the vector

Define record dtype:

>>> import numpy as np
>>> ColourDType = np.dtype([('red', np.uint8),
...                         ('green', np.uint8),
...                         ('blue', np.uint8)])

Define type-descriptor for it:

>>> colour_vector_td = xmlserdes.NumpyRecordVectorStructured(ColourDType, 'colour')

Serialize a vector:

>>> colours = np.array([(20, 40, 50),
...                     (128, 128, 128),
...                     (255, 0, 255)],
...                    dtype = ColourDType)
>>> print(xmlserdes.utils.str_from_xml_elt(
...           colour_vector_td.xml_element(colours, 'colours'),
...           pretty_print = True).rstrip())
<colours>
  <colour>
    <red>20</red>
    <green>40</green>
    <blue>50</blue>
  </colour>
  <colour>
    <red>128</red>
    <green>128</green>
    <blue>128</blue>
  </colour>
  <colour>
    <red>255</red>
    <green>0</green>
    <blue>255</blue>
  </colour>
</colours>
>>> xml_elt = etree.fromstring(
...     '''<greens>
...          <colour><red>0</red><green>64</green><blue>0</blue></colour>
...          <colour><red>0</red><green>192</green><blue>0</blue></colour>
...        </greens>''')
>>> extracted_colours = colour_vector_td.extract_from(xml_elt, 'greens')
>>> print(extracted_colours)
... 
[(0, 64, 0) (0, 192, 0)]
>>> print(extracted_colours.dtype)
[('red', 'u1'), ('green', 'u1'), ('blue', 'u1')]

Custom dtype one of whose fields is non-atomic:

>>> StripeDType = np.dtype([('colour', ColourDType), ('width', np.uint16)])

Type-descriptor for vector of such elements:

>>> stripe_vector_td = xmlserdes.NumpyRecordVectorStructured(StripeDType, 'stripe')

Serialize a vector:

>>> stripes = np.array([((20, 30, 40), 100), ((120, 130, 140), 200)],
...                    dtype = StripeDType)
>>> print(xmlserdes.utils.str_from_xml_elt(
...           stripe_vector_td.xml_element(stripes, 'stripes'),
...           pretty_print = True).rstrip())
<stripes>
  <stripe>
    <colour>
      <red>20</red>
      <green>30</green>
      <blue>40</blue>
    </colour>
    <width>100</width>
  </stripe>
  <stripe>
    <colour>
      <red>120</red>
      <green>130</green>
      <blue>140</blue>
    </colour>
    <width>200</width>
  </stripe>
</stripes>
xmlserdes.NumpyVector(dtype, contained_tag=None)

Convenience function to instantiate an instance of the appropriate xmlserdes.TypeDescriptor subclass chosen from

If a contained_tag is given, a xmlserdes.NumpyRecordVectorStructured is created. If not, a xmlserdes.NumpyAtomicVector.

>>> import numpy as np
>>> int_vector_td = xmlserdes.NumpyVector(np.int32)
>>> ColourDType = np.dtype([('red', np.uint8),
...                         ('green', np.uint8),
...                         ('blue', np.uint8)])
>>> colour_vector_td = xmlserdes.NumpyVector(ColourDType, 'colour')