Skip to content

26. From dictionary to XML and vice versa

Here, we’ll explore the [xml2dict] module, which allows you to convert:

  • an XML string into a dictionary:
  • a dictionary into an XML string;

Before the advent of JSON, web service responses were often in XML (eXtended Markup Language). Furthermore, the protocol for these web services was often SOAP (Simple Object Access Protocol). SOAP is a protocol based on the web’s HTTP protocol. Currently (2020), web services are mostly of the REST (Representational State Transfer) type. The web services we have studied are not of either of these types but are definitely closer to REST than to SOAP. Nevertheless, I prefer to say that they are of the ‘free’ or ‘unknown’ type because they do not follow all the rules of REST.

We’ll show how easy it is to transform our JSON client/server architectures into XML client/server architectures. All you need to do is use the [xmltodict] module.

We’ll start by installing it in a Python terminal:


(venv) C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\packages>pip install xmltodict
Collecting xmltodict
  Using cached xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Installing collected packages: xmltodict
Successfully installed xmltodict-0.12.0

Now that this is done, let’s look at an example of what we can do with this module:

Image

The [xml_01] script is as follows:


from collections import OrderedDict

import xmltodict


# xmltodict.parse to convert XML to a dictionary. The dictionary must have a root
# the resulting dictionary is of type OrderedDict
# xmltodict.unparse to convert from a dictionary to XML

def ordereddict2dict(ordered_dictionary) -> dict:
    


def transform(message: str, dictionary: dict):
    # logs
    print(f"\n{message}-------")
    print(f"dictionary={dictionary}")
    # dict -> xml
    xml1 = xmltodict.unparse(dictionary)
    print(f"xml={xml1}")
    # xml -> OrderedDict
    ordereddict_dictionary1 = xmltodict.parse(xml1)
    print(f"ordereddict_dictionary1={ordereddict_dictionary1}")
    # OrderedDict -> dict
    print(f"dict_dictionary1={ordereddict2dict(ordereddict_dictionary1)}")


# test 1
transform("test 1", {"name": "selene"})
# test 2
transform("test 2", {"family": {"father": {"first_name": "andré"}, "mother": {"first_name": "angèle"}, "last_name": "séléné"}})
# test 3
transform("test 3", {"family": {"last_name": "Selene", "father": {"first_name": "Andre"}, "mother": {"first_name": "Angele"},
                                 "hobbies": ["singing", "jogging"]}})
# test 4
transform("test 4", {'response': {
    'errors': ['GET method required with only the [married, children, salary] parameters', '[married] parameter missing',
                '[children] parameter missing', '[salary] parameter missing']}})
# test 5
transform("test 5", {'response': {
    'result': {'id': 0, 'married': 'yes', 'children': 2, 'salary': 50000, 'tax': 1384, 'discount': 384, 'surcharge': 0,
               'reduction': 347, 'rate': 0.14}}}}
# test 6
transform("test 6", {"root": {'list': ["one", "two", "three"]}})
# test 7
transform("test 7", {"root": {'list': [{"one": [10, 11]}, {"two": [20, 21]}, {"three": [30, 31]}]}})
  • lines 14–25: the [transform] function takes a text to be written [message] and a dictionary [dictionary];
  • line 16: displays the message;
  • Line 17: The received dictionary is displayed;
  • lines 19–20: this dictionary is converted into an XML string, which is then displayed. The method that performs this operation is [xmltodict.unparse];
  • lines 21–23: the previous XML string is converted into a dictionary, which is then displayed. The method that performs this task is [xmltodict.parse]. This method does not produce a dictionary of type [dict] but of type [OrderedDict] (line 1);
  • lines 24–25: the resulting [OrderedDict] type is converted to [dict] using the (not yet written) method [ordereddict2dict]. This method works recursively. If certain values in the dictionary are of type [OrderedDict, list], the values of these collections are examined to determine whether they, too, are of type [OrderedDict]. If so, they are converted to type [dict]. Note that the [xmltodict.parse] method does not produce any dictionaries of type [dict];

Before examining the missing functions, let’s look at the results to see what we’re looking for:

Test 1 (lines 28–29) produces the following results:


test 1-------
dictionary={'name': 'selene'}
xml=<?xml version="1.0" encoding="utf-8"?>
<name>selene</name>
ordereddict_dictionary1=OrderedDict([('name', 'selene')])
dict_dictionary1={'name': 'selene'}
  • line 2: the dictionary being tested. Note an important point: the [xml2dict.unparse] method requires that the dictionary be in the form {‘key’: value}, where [value] can then be a dictionary, a list, or a simple type;
  • Lines 3–4: The XML string generated from the dictionary. It is preceded by the header [<?xml version="1.0" encoding="utf-8"?>\n], which is normally the first line of an XML file;
  • line 5: the [OrderedDict] type obtained by the [xml2dict.parse] method, which takes the preceding XML string as a parameter;
  • line 6: the dictionary of type [dict] obtained by applying the [ordereddict2dict] method to the previous type. This is the original dictionary from line 2;

All other tests follow the same pattern and should help you understand how to convert from a dictionary to an XML string and then from that XML string back to the original dictionary.

The other tests yield the following results:


test 2-------
dictionary={'family': {'father': {'first_name': 'andré'}, 'mother': {'first_name': 'angèle'}, 'last_name': 'séléné'}}
xml=<?xml version="1.0" encoding="utf-8"?>
<family><father><first_name>andré</first_name></father><mother><first_name>angèle</first_name></mother><last_name>séléné</last_name></family>
ordereddict_dictionary1=OrderedDict([('family', OrderedDict([('father', OrderedDict([('first_name', 'andré')])), ('mother', OrderedDict([('first_name', 'angèle')])), ('last_name', 'selene')]))])
dict_dictionary1={'family': {'father': {'first_name': 'andré'}, 'mother': {'first_name': 'angèle'}, 'last_name': 'séléné'}}

test 3-------
dictionary = {'family': {'last_name': 'selene', 'father': {'first_name': 'andre'}, 'mother': {'first_name': 'angele'}, 'hobbies': ['singing', 'jogging']}}
xml=<?xml version="1.0" encoding="utf-8"?>
<family><lastName>selene</lastName><father><firstName>andré</firstName></father><mother><firstName>angèle</firstName></mother><hobbies>singing</hobbies><hobbies>jogging</hobbies></family>
ordereddict_dictionary1 = OrderedDict([('family', OrderedDict([('lastName', 'Selene'), ('father', OrderedDict([('firstName', 'Andre')])), ('mother', OrderedDict([('firstName', 'Angele')])), ('hobbies', ['singing', 'jogging'])]))])
dict_dictionary1={'family': {'last_name': 'selene', 'father': {'first_name': 'andré'}, 'mother': {'first_name': 'angèle'}, 'hobbies': ['singing', 'jogging']}}

test 4-------
dictionary={'response': {'errors': ['GET method required with only the parameters [married, children, salary]', 'parameter [married] missing', 'parameter [children] missing', 'parameter [salary] missing']}}
xml=<?xml version="1.0" encoding="utf-8"?>
<response><errors>GET method required with only the parameters [married, children, salary]</errors><errors>parameter [married] missing</errors><errors>parameter [children] missing</errors><errors>parameter [salary] missing</errors></response>
ordereddict_dictionary1 = OrderedDict([('response', OrderedDict([('errors', ['GET method required with only the parameters [married, children, salary]', '[married] parameter missing', '[children] parameter missing', '[salary] parameter missing'])]))])
dict_dictionary1={'response': {'errors': ['GET method required with only the parameters [married, children, salary]', 'parameter [married] missing', 'parameter [children] missing', 'parameter [salary] missing']}}

test 5-------
dictionary={'response': {'result': {'id': 0, 'married': 'yes', 'children': 2, 'salary': 50000, 'tax': 1384, 'discount': 384, 'surcharge': 0, 'reduction': 347, 'rate': 0.14}}}
xml=<?xml version="1.0" encoding="utf-8"?>
<response><result><id>0</id><married>yes</married><children>2</children><salary>50000</salary><tax>1384</tax><discount>384</discount><surcharge>0</surcharge><reduction>347</reduction><rate>0.14</rate></result></response>
ordereddict_dictionary1 = OrderedDict([('response', OrderedDict([('result', OrderedDict([('id', '0'), ('married', 'yes'), ('children', '2'), ('salary', '50000'), ('tax', '1384'), ('discount', '384'), ('surcharge', '0'), ('reduction', '347'), ('rate', '0.14')]))]))])
dict_dictionary1 = {'response': {'result': {'id': '0', 'married': 'yes', 'children': '2', 'salary': '50000', 'tax': '1384', 'discount': '384', 'surcharge': '0', 'reduction': '347', 'rate': '0.14'}}}

test 6-------
dictionary = {'root': {'list': ['one', 'two', 'three']}}
xml=<?xml version="1.0" encoding="utf-8"?>
<root><list>one</list><list>two</list><list>three</list></root>
ordereddict_dictionary1 = OrderedDict([('root', OrderedDict([('list', ['one', 'two', 'three'])]))])
dict_dictionary1={'root': {'list': ['one', 'two', 'three']}}

test 7-------
dictionary={'root': {'list': [{'one': [10, 11]}, {'two': [20, 21]}, {'three': [30, 31]}]}}
xml=<?xml version="1.0" encoding="utf-8"?>
<root><list><one>10</one><one>11</one></list><list><two>20</two><two>21</two></list><list><three>30</three><three>31</three></list></root>
ordereddict_dictionary1 = OrderedDict([('root', OrderedDict([('list', [OrderedDict([('one', ['10', '11'])]), OrderedDict([('two', ['20', '21'])]), OrderedDict([('three', ['30', '31'])])])]))])
dict_dictionary1={'root': {'list': [{'one': ['10', '11']}, {'two': ['20', '21']}, {'three': ['30', '31']}]}}

Process finished with exit code 0
  • Lines 23 and 27 highlight an important point:
    • line 23: the values associated with the keys of the [result] dictionary are numbers;
    • line 26: the values associated with the keys of the [ordereddict_dictionary1] dictionary are strings. This is a limitation of the [xmltodict] library. Its [parse] method produces only strings. This is easily understood:
      • line 25: the XML string from which the dictionary is generated. In this string, there is no indication of the data type encapsulated within the XML tags. [xmltodict.parse] does what makes the most sense: it leaves everything as strings in the resulting dictionary. There are other libraries similar to [xmltodict] where the type of the encapsulated data is specified in the tags. For example, one might find the tag [<children type='int'>2</children>];
      • the consequence of this is that when using a dictionary produced by the [xmltodict] module, one must know the type of the data it encapsulates in order to convert from the ‘str’ type to the actual data type;

Let's now take a closer look at the [ordereddict2dict] method, which converts an [OrderedDict] type to a [dict] type:


# xmltodict.parse to convert from XML to a dictionary. The dictionary must have a root
# the resulting dictionary is of type OrderedDict
# xmltodict.unparse to convert from a dictionary to XML

def check(value):
    # if the value is of type OrderedDict, we convert it
    if isinstance(value, OrderedDict):
        value2 = ordereddict2dict(value)
    # if the value is a list, convert it
    elif isinstance(value, list):
        value2 = list2list(value)
    else:
        # this is a simple type, not a collection
        value2 = value
    # return the new value
    return value2


def list2list(list: list) -> list:
    # the new list
    newlist = []
    # process the elements of the parameter list
    for value in list:
        # add value to the new list
        newlist.append(check(value))
    # return the new list
    return newlist


def ordereddict2dict(ordered_dictionary: OrderedDict) -> dict:
    # Convert OrderedDict to a dictionary recursively
    newdict = {}
    for key, value in ordered_dictionary.items():
        # store the value in the new dictionary
        newdict[key] = check(value)
    # return the dictionary
    return newdict
  • line 30: the [ordereddict2dict] function takes an [OrderedDict] type as a parameter;
  • line 32: the dictionary of type [dict] that will be returned on line 37 by the function;
  • line 33: we iterate over all the (key, value) tuples in the [ordered_dictionary] dictionary;
  • Line 35: In the new dictionary, the key [key] is retained, but the associated value is not [value] but [check(value)]. The function [check(value)] is responsible for finding, if [value] is a collection, all elements of type [OrderedDict] and converting them to type [dict];

The [check] method is defined on lines 5–16:

  • line 5: we do not know the type of [value], so we could not write [value: type];
  • lines 7–8: if [value] is of type [OrderedDict], then we recursively call the function [ordereddict2dict] that we just commented out;
  • lines 9–11: another possible case is that [value] is a list. In this case, on line 11, we call the function [list2list] from lines 19–27;
  • lines 12–14: the final case is that [value] is not a collection but a simple type. The function [check], like the functions [ordereddict2dict] and [list2list], is recursive. We know that in this case, we must always handle the situation where the recursion terminates. Lines 12–14 handle this case;
  • line 16: the [check] function, whether called recursively or not, produces a value [value2] that must replace the [value] parameter in line 5;

The [list2list] method defined in lines 19–27 processes a list passed as a parameter. It will traverse it and replace any [OrderedDict] values found within it with a [dict] type.

  • line 21: the new list that the function will create;
  • Lines 23–25: All [value] values in the list are traversed and replaced with the value [check(value)]. This [value] may itself contain elements of type [list] or [OrderedDict]. They will be handled correctly by the recursive function [check];