6. XML and Java
In this chapter, we introduce the use of XML documents with Java. We will do so in the context of the tax application studied in the previous chapter.
6.1. XML Files and XSL Style Sheets
Consider the following XML file, simulations.xml, which could represent the results of tax calculation simulations:
<?xml version="1.0" encoding="ISO-8859-1"?>
<simulations>
<simulation married="yes" children="2" salary="200000" tax="22504"/>
<simulation married="no" children="2" salary="200000" tax="33388"/>
</simulations>
When viewed with IE 6, the following result is obtained:

IE6 recognizes that it is dealing with an XML file (thanks to the file’s .xml extension) and formats it in its own way. With Netscape, you get a blank page. However, if you look at the source code (View/Source), you can see the original XML file:

Why doesn’t Netscape display anything? Because it needs a stylesheet to tell it how to transform the XML file into an HTML file that it can then display. It turns out that IE 6 has a default stylesheet when the XML file does not provide one, which was the case here.
There is a language called XSL (eXtended StyleSheet Language) that allows you to describe the transformations needed to convert an XML file into any text file. XSL supports numerous instructions and closely resembles programming languages. We won’t go into detail here, as that would take dozens of pages. We’ll simply describe two examples of XSL stylesheets. The first is the one that will transform the XML file simulations.xml into HTML code. We modify the latter so that it specifies the stylesheet that browsers can use to transform it into an HTML document, which they can then display:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<?xml-stylesheet type="text/xsl" href="simulations.xsl"?>
<simulations>
<simulation wife="yes" children="2" salary="200000" tax="22504"/>
<simulation spouse="no" children="2" income="200000" tax="33388"/>
</simulations>
The XML command
designates the simulations.xsl file as an xml-stylesheet of type text/xsl, i.e., a text file containing XSL code. This stylesheet will be used by browsers to transform the XML text into an HTML document. Here is the result obtained with Netscape 7 when loading the XML file simulations.xml:

When we view the document’s source code (View/Source), we see the original XML document rather than the displayed HTML document:

Netscape used the simulations.xsl stylesheet to transform the XML document above into a displayable HTML document. It is now time to look at the contents of this stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Tax Calculation Simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
<hr/>
<table border="1">
<th>married</th><th>children</th><th>salary</th><th>tax</th>
<xsl:apply-templates select="/simulations/simulation"/>
</table>
</center>
</body>
</html>
</xsl:template>
<xsl:template match="simulation">
<tr>
<td><xsl:value-of select="@marie"/></td>
<td><xsl:value-of select="@children"/></td>
<td><xsl:value-of select="@salary"/></td>
<td><xsl:value-of select="@tax"/></td>
</tr>
</xsl:template>
</xsl:stylesheet>
- An XSL stylesheet is an XML file and therefore follows XML rules. Among other things, it must be "well-formed," meaning that every opening tag must be closed.
- The file begins with two XML directives that can be included in any XSL stylesheet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
The encoding="ISO-8859-1" attribute allows accented characters to be used in the stylesheet.
- The <xsl:output method="html" indent="yes"/> tag tells the XSL interpreter that you want to produce "indented" HTML.
- The <xsl:template match="element"> tag is used to define the element in the XML document to which the instructions found between <xsl:template ...> and </xsl:template> will be applied.
In the example above, the element "/" denotes the root of the document. This means that as soon as the start of the XML document is encountered, the XSL commands located between the two tags will be executed.
- Anything that is not an XSL tag is included as-is in the output stream. The XSL tags themselves are executed. Some of them produce a result that is included in the output stream. Let’s examine the following example:
<xsl:template match="/">
<html>
<head>
<title>Tax calculation simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
<hr/>
<table border="1">
<th>married</th><th>children</th><th>salary</th><th>tax</th>
<xsl:apply-templates select="/simulations/simulation"/>
</table>
</center>
</body>
</html>
</xsl:template>
Note that the XML document being analyzed is as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>
<simulations>
<simulation married="yes" children="2" salary="200000" tax="22504"/>
<simulation married="no" children="2" salary="200000" tax="33388"/>
</simulations>
From the start of the parsed XML document (match="/"), the XSL processor will output the text
<html>
<head>
<title>Tax calculation simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
<hr>
<table border="1">
<th>married</th><th>children</th><th>salary</th><th>tax</th>
Note that in the original text we had <hr/> and not <hr>. In the original text, we could not write <hr>, which, while a valid HTML tag, is an invalid XML tag. However, we are dealing here with XML text that must be "well-formed," meaning that every tag must be closed. We therefore write <hr/>, and because we wrote <xsl:output text="html ...> the XSL processor will transform the text <hr/> into <hr>. Following this text will be the text produced by the XSL command:
We will see later what this text is. Finally, the interpreter will add the text:
The <xsl:apply-templates select="/simulations/simulation"/> directive instructs the XSL processor to apply the "template" to the /simulations/simulation element. It will be executed every time the XSL interpreter encounters a <simulation>..</simulations> or <simulation/> tag within a <simulations>..</simulations> tag in the parsed XML text. Upon encountering the <simulation> tag, the interpreter will execute the instructions of the following template:
<xsl:template match="simulation">
<tr>
<td><xsl:value-of select="@marie"/></td>
<td><xsl:value-of select="@children"/></td>
<td><xsl:value-of select="@salaire"/></td>
<td><xsl:value-of select="@tax"/></td>
</tr>
</xsl:template>
Consider the following XML lines:
The line <simulation ..> corresponds to the template for the XSL instruction <xsl:apply-templates select="/simulations/simulation">. The XSL interpreter will therefore attempt to apply the instructions that match this template. It will find the template <xsl:template match="simulation"> and execute it. Recall that anything that is not an XSL command is passed through unchanged by the XSL interpreter, while XSL commands are replaced by the result of their execution. The XSL instruction <xsl:value-of select="@champ"/> is thus replaced by the value of the "champ" attribute of the parsed node (here, a <simulation> node). Parsing the previous XML line will produce the following output:
XSL | output |
<tr><td> | <tr><td> |
<xsl:value-of select="@marie"/> | yes |
</td><td> | </td><td> |
<xsl:value-of select="@children"/> | 2 |
</td><td> | </td><td> |
<xsl:value-of select="@salary"/> | 200000 |
</td><td> | </td><td> |
<xsl:value-of select="@tax"/> | 22504 |
</td></tr> | </td></tr> |
In total, the XML line
will be converted into the following HTML line:
All these explanations are a bit rudimentary, but it should now be clear to the reader that the following XML text:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="simulations.xsl"?>
<simulations>
<simulation married="yes" children="2" salary="200000" tax="22504"/>
<simulation married="no" children="2" salary="200000" tax="33388"/>
</simulations>
accompanied by the following XSL stylesheet simulations.xsl:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Tax Calculation Simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
<hr/>
<table border="1">
<th>married</th><th>children</th><th>salary</th><th>tax</th>
<xsl:apply-templates select="/simulations/simulation"/>
</table>
</center>
</body>
</html>
</xsl:template>
<xsl:template match="simulation">
<tr>
<td><xsl:value-of select="@marie"/></td>
<td><xsl:value-of select="@children"/></td>
<td><xsl:value-of select="@salary"/></td>
<td><xsl:value-of select="@tax"/></td>
</tr>
</xsl:template>
</xsl:stylesheet>
produces the following HTML text:
<html>
<head>
<title>Tax Calculation Simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
<hr>
<table border="1">
<th>married</th><th>children</th><th>salary</th><th>tax</th>
<tr>
<td>yes</td><td>2</td><td>200,000</td><td>22,504</td>
</tr>
<tr>
<td>no</td><td>2</td><td>200,000</td><td>3,338</td>
</tr>
</table>
</center>
</body>
</html>
The XML file simulations.xml, along with the stylesheet simulations.xsl, when viewed in a modern browser (here, Netscape 7), is displayed as follows:

6.2. Tax calculation application: version 6
6.2.1. The XML files and XSL stylesheets of the tax calculation application
Let’s return to the tax web application and modify it so that the response sent to clients is in XML format rather than HTML. This XML response will be accompanied by an XSL stylesheet so that browsers can display it. In the previous section, we presented:
- the simulations.xml file, which is a prototype of an XML response containing tax calculation simulations
- the simulations.xsl file, which will be the XSL stylesheet accompanying this XML response
We must also account for the case of a response containing errors. The prototype for the XML response in this case will be the following errors.xml file:
<?xml version="1.0" encoding="windows-1252"?>
<?xml-stylesheet type="text/xsl" href="errors.xsl"?>
<errors>
<error>error 1</error>
<error>error 2</error>
</errors>
The errors.xsl stylesheet used to display this XML document in a browser will be as follows:
<?xml version="1.0" encoding="windows-1252"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Tax Calculation Simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
</center>
<hr/>
The following errors occurred:
<ul>
<xsl:apply-templates select="/errors/error"/>
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="error">
<li><xsl:value-of select="."/></li>
</xsl:template>
</xsl:stylesheet>
This stylesheet introduces an XSL command not yet encountered: <xsl:value-of select="."/>. This command outputs the value of the parsed node, in this case a <error>text</error> node. The value of this node is the text between the opening and closing tags, in this case "text".
The errors.xml code is transformed by the errors.xsl stylesheet into the following HTML document:
<html>
<head>
<title>Tax Calculation Simulations</title>
</head>
<body>
<center>
<h3>Tax Calculation Simulations</h3>
</center>
<hr>
The following errors occurred:
<ul>
<li>Error 1</li>
<li>error 2</li>
</ul>
</body>
</html>
The errors.xml file, along with its style sheet, is displayed by a browser as follows:

6.2.2. The xmlsimulations servlet
We create an index.html file and place it in the impots application directory. The displayed page is as follows:

This HTML document is a static document. Its code is as follows:
<html>
<head>
<title>impots</title>
<script language="JavaScript" type="text/javascript">
function clear(){
// Clear the form
with(document.frmImpots){
optMarie[0].checked=false;
optMarie[1].checked=true;
txtChildren.value = "";
txtSalary.value="";
txtTaxes.value="";
}//with
}//clear
function calculate(){
// Check parameters before sending them to the server
with(document.frmImpots){
//number of children
champs = /^\s*(\d+)\s*$/ .exec(txtEnfants.value);
if(champs==null){
// The pattern does not match
alert("The number of children was not provided or is incorrect");
nbEnfants.focus();
return;
}//if
//salary
fields = /^\s*(\d+)\s*$/ .exec(txtSalary.value);
if(champs==null){
// the pattern does not match
alert("Salary was not provided or is incorrect");
salary.focus();
return;
}//if
// OK - submit
submit();
}//with
}//calculate
</script>
</head>
<body background="/impots/images/standard.jpg">
<center>
Tax Calculation
<hr>
<form name="frmImpots" action="/impots/xmlsimulations" method="POST">
<table>
<tr>
<td>Are you married?</td>
<td>
<input type="radio" name="optMarie" value="yes">yes
<input type="radio" name="optMarie" value="no" checked>no
</td>
</tr>
<tr>
<td>Number of children</td>
<td><input type="text" size="3" name="txtEnfants" value=""></td>
</tr>
<tr>
<td>Annual salary</td>
<td><input type="text" size="10" name="txtSalary" value=""></td>
</tr>
<tr></tr>
<tr>
<td><input type="button" value="Calculate" onclick="calculate()"></td>
<td><input type="button" value="Clear" onclick="clear()"></td>
</tr>
</table>
</form>
</center>
</body>
</html>
Note that the form data is posted to the URL /impots/xmlsimulations. This application is a Java servlet configured as follows in the impots application's web.xml file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
"http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
...........
<servlet>
<servlet-name>xmlsimulations</servlet-name>
<servlet-class>xmlsimulations</servlet-class>
<init-param>
<param-name>xslSimulations</param-name>
<param-value>simulations.xsl</param-value>
</init-param>
<init-param>
<param-name>xslErrors</param-name>
<param-value>errors.xsl</param-value>
</init-param>
<init-param>
<param-name>DSNimpots</param-name>
<param-value>mysql-dbimpots</param-value>
</init-param>
<init-param>
<param-name>admimpots</param-name>
<param-value>admimpots</param-value>
</init-param>
<init-param>
<param-name>mdpimpots</param-name>
<param-value>mdpimpots</param-value>
</init-param>
</servlet>
........
<servlet-mapping>
<servlet-name>xmlsimulations</servlet-name>
<url-pattern>/xmlsimulations</url-pattern>
</servlet-mapping>
</web-app>
- The servlet is named xmlsimulations and is based on the xmlsimulations.class.
- Its parameters are DSNimpots, admimpots, and mdpimpots, which are required to access the tax database. Additionally, it accepts two other parameters:
- xslSimulations, which is the name of the style sheet file that must accompany the XML response containing the simulations
- xslErrors, which is the name of the style sheet that must accompany the XML response containing any errors
- It has an alias, xmlsimulations, which makes it accessible via the URL http://localhost:8080/impots/xmlsimulations.
The skeleton of the xmlsimulations servlet is similar to that of the simulations servlet already discussed. The main difference is that it must generate XML instead of HTML. This will result in the removal of the JSP files used in previous applications. Their main role was to improve the readability of the generated HTML code by preventing it from being buried within the servlet’s Java code. This role is no longer necessary. The servlet has two types of XML code to generate:
- one for the simulations
- one for errors
We previously presented and examined the two types of XML responses to be provided in these two cases, as well as the style sheets that must accompany them. The servlet code is as follows:
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.util.regex.*;
import java.util.*;
public class xmlsimulations extends HttpServlet{
// instance variables
String errorMessage = null;
String xslSimulations = null;
String xslErrors = null;
String taxDSN = null;
String taxAdmin = null;
String taxPassword = null;
JDBC impots impots = null;
//-------- GET
public void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException, ServletException{
// retrieve the write stream to the client
PrintWriter out = response.getWriter();
// specify the response type
response.setContentType("text/xml");
// the list of errors
ArrayList errors = new ArrayList();
// Did the initialization succeed?
if (errorMessage != null) {
// Done—send the response with errors to the server
errors.add(errorMessage);
sendErrors(out, xslErrors, errors);
// Done
return;
}
// retrieve previous simulations from the session
HttpSession session = request.getSession();
ArrayList simulations = (ArrayList) session.getAttribute("simulations");
if (simulations == null) simulations = new ArrayList();
// retrieve the parameters of the current request
String optMarie = request.getParameter("optMarie"); // marital status
String txtChildren = request.getParameter("txtChildren"); // number of children
String txtSalary = request.getParameter("txtSalary"); // annual salary
// Do we have all the expected parameters?
if(optMarie==null || txtEnfants==null || txtSalaire==null){
// parameters are missing
// send the response with errors
errors.add("Incomplete request. Parameters are missing");
sendErrors(out, xslErrors, errors);
// done
return;
}
// we have all the parameters - let's check them
// marital status
if( ! optMarriage.equals("yes") && ! optMarriage.equals("no")){
// error
errors.add("Invalid marital status");
}
// number of children
txtChildren = txtChildren.trim();
if(! Pattern.matches("^\\d+$",txtChildren)){
// error
errors.add("Incorrect number of children");
}
// salary
txtSalary = txtSalary.trim();
if(! Pattern.matches("^\\d+$",txtSalary)){
// error
errors.add("Incorrect salary");
}
if (errors.size() != 0) {
// if there are errors, report them
sendErrors(out, xslErrors, errors);
} else {
// no errors
try{
// we can calculate the tax due
int numChildren = Integer.parseInt(txtChildren);
int salary = Integer.parseInt(txtSalary);
String txtTaxes = "" + taxes.calculate(optMarried.equals("yes"), nbChildren, salary);
// add the current result to the previous simulations
String[] simulation = {optMarriage.equals("yes") ? "yes" : "no", txtChildren, txtSalary, txtTaxes};
simulations.add(simulation);
// send the response with simulations
sendSimulations(out, xslSimulations, simulations);
} catch (Exception ex) {}
}//if-else
// Put the list of simulations back into the session
session.setAttribute("simulations", simulations);
}//GET
//-------- POST
public void doPost(HttpServletRequest request, HttpServletResponse response)
throws IOException, ServletException{
doGet(request, response);
}//POST
//-------- INIT
public void init(){
// retrieve the initialization parameters
ServletConfig config = getServletConfig();
xslSimulations = config.getInitParameter("xslSimulations");
xslErrors = config.getInitParameter("xslErrors");
DSNimpots = config.getInitParameter("DSNimpots");
admimpots = config.getInitParameter("admimpots");
mdpimpots = config.getInitParameter("mdpimpots");
// Are the parameters OK?
if(xslSimulations==null || DSNimpots==null || admimpots==null || mdpimpots==null){
errorMessage="Incorrect configuration";
return;
}
// Create an instance of impotsJDBC
try{
taxes = new taxesJDBC(taxDSN, taxadmin, taxmdp);
} catch (Exception ex) {
errorMessage = ex.getMessage();
}
}//init
//-------- sendErrors
private void sendErrors(PrintWriter out, String xslErrors, ArrayList errors) {
String response="<?xml version=\"1.0\" encoding=\"windows-1252\"?>"
+ "<?xml-stylesheet type=\"text/xsl\" href=\""+xslErrors+"\"?>\n"
+"<errors>\n";
for(int i=0;i<errors.size();i++){
response+="<error>"+(String)errors.get(i)+"</error>\n";
}//for
response+="</errors>\n";
// send the response
out.println(response);
}
//-------- sendSimulations
private void sendSimulations(PrintWriter out, String xslSimulations, ArrayList simulations){
String response="<?xml version=\"1.0\" encoding=\"windows-1252\"?>"
+ "<?xml-stylesheet type=\"text/xsl\" href=\""+xslSimulations+"\"?>\n"
+ "<simulations>\n";
String[] simulation = null;
for (int i = 0; i < simulations.size(); i++) {
// simulation #i
simulation = (String[])simulations.get(i);
response += "<simulation "
+"marie=\""+(String)simulation[0]+"\" "
+"children=\""+(String)simulation[1]+"\" "
+"salary=\""+(String)simulation[2]+"\" "
+"tax=\""+(String)simulation[3]+"\" />\n";
}//for
response+="</simulations>\n";
// send the response
out.println(response);
}
}
Let’s break down the main new features of this code compared to what we already knew:
- The init procedure retrieves new parameters from the web.xml configuration file: the names of the two XSL stylesheets that must accompany the response are stored in the variables xslSimulations and xslErrors. These two stylesheets are the simulations.xsl and errors.xsl files discussed earlier. They are located in the impots application directory:
dos>dir E:\data\serge\Servlets\impots\*.xsl
08/27/2002 08:15 1,030 simulations.xsl
08/27/2002 09:23 795 errors.xsl
- The GET procedure begins by checking whether an error occurred during initialization. If so, it calls the sendErrors procedure, which generates the appropriate XML response for that case and then terminates. The XML response includes a tag specifying the style sheet to be used.
- If no errors occurred, the GET procedure analyzes the parameters of the client’s request. If it finds any error, it reports it using the sendErrors procedure. Otherwise, it calculates the new simulation, adds it to the previous ones stored in the current session, and finishes by sending its XML response via the sendSimulations procedure. The latter proceeds in a manner analogous to the sendErrors procedure.
- Note that the servlet declares its response as type text/xml:
Here are some examples of execution. The initial form is filled out as follows:

The MySQL database has not been started, making it impossible to construct the impots object in the servlet’s init procedure. The servlet’s response is therefore as follows:

The code received by the browser (View/Source) is as follows:

If we now run two more simulations after starting the MySQL database, we get the following result:

This time, the browser received the following code:

Note that our new application is simpler than before due to the removal of the JSP files. Some of the work previously done by these pages has been transferred to the XSL stylesheets. The advantage of our new division of tasks is that once the XML format of the servlet’s responses has been established, the development of the stylesheets is independent of that of the servlet.
6.3. Parsing an XML Document in Java
Versions 7 and 8 of our impots application will be clients programmed for the previous xmlsimulations servlet. These clients will receive XML code that they will need to parse to extract the information they need. We will now take a break from our various versions to learn how to parse an XML document in Java. We will do this using an example included with JBuilder 7 called MySaxParser. The program is called as follows:
The MySaxParser application accepts one parameter: the URI (Uniform Resource Identifier) of the XML document to be parsed. In our example, this URI will simply be the name of an XML file located in the MySaxParser application directory. Let’s consider two examples of execution. In the first example, the XML file being parsed is errors.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="errors.xsl"?>
<errors>
<error>error 1</error>
<error>error 2</error>
</errors>
The analysis yields the following results:
dos> java MySaxParser errors.xml
Start of document
Start of <errors> element
Start of <error> element
[error 1]
End of <error> element
Start of <error> element
[error 2]
End of <error> element
End of <errors> element
End of document
We haven't yet explained what the MySaxParser application does, but here we can see that it displays the structure of the parsed XML document. The second example parses the XML file simulations.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="simulations.xsl"?>
<simulations>
<simulation married="yes" children="2" salary="200000" tax="22504"/>
<simulation married="no" children="2" salary="200000" tax="33388"/>
</simulations>
The analysis yields the following results:
dos>java MySaxParser simulations.xml
Start of document
Start of <simulations> element
Start of <simulation> element
married = yes
children = 2
salary = 200000
tax = 22504
End of <simulation> element
Start of <simulation> element
married = no
children = 2
salary = 200,000
tax = 33,388
End of element <simulation>
End of <simulations> element
End of document
The MySaxParser class contains everything we need in our tax application since it was able to retrieve both the errors and the simulations that the web server might send. Let’s examine its code:
import java.io.IOException;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;
import java.util.regex.*;
// the class
public class MySaxParser extends DefaultHandler {
// value of an element in the XML tree
private StringBuffer value = new StringBuffer();
// a regular expression for the value of an element when you want to ignore
// the "whitespace" preceding or following it
private static Pattern ptnValue = null;
private static Matcher results = null;
// -------- main
public static void main(String[] argv) {
// Check the number of parameters
if (argv.length != 1) {
System.out.println("Usage: java MySaxParser [URI]");
System.exit(0);
}
// retrieve the URI of the XML file to parse
String uri = argv[0];
try {
// create an XML parser
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
// Tell the parser which object will implement the methods
// startDocument, endDocument, startElement, endElement, characters
MySaxParser MySaxParserInstance = new MySaxParser();
parser.setContentHandler(MySaxParserInstance);
// initialize the value pattern for an element
ptnValue = Pattern.compile("^\\s*(.*?)\\s*$");
// tell the parser which XML document to parse
parser.parse(uri);
}
catch(Exception ex) {
// error
System.err.println("Error: " + ex);
// trace
ex.printStackTrace();
}
}//main
// -------- startDocument
public void startDocument() throws SAXException {
// procedure called when the parser encounters the start of the document
System.out.println("Start of document");
}//startDocument
// -------- endDocument
public void endDocument() throws SAXException {
// procedure called when the parser reaches the end of the document
System.out.println("End of document");
}//endDocument
// -------- startElement
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// procedure called by the parser when it encounters a start tag
// uri: URI of the document being parsed?
// localName: name of the element currently being parsed
// qName: same as above, but "qualified" by a namespace if one exists
// attributes: list of the element's attributes
// continued
System.out.println("Start of element <"+localName+">");
// Does the element have attributes?
for (int i = 0; i < attributes.getLength(); i++) {
System.out.println(attributes.getLocalName(i) + " = " + attributes.getValue(i));
}//for
}//startElement
// -------- characters
public void characters(char[] ch, int start, int length) throws SAXException {
// procedure called repeatedly by the parser when it encounters text
// between two <tag>text</tag> tags
// the text is in ch starting from the start character for length characters
// the text is added to the buffer value
value.append(ch, start, length);
}//characters
// -------- endElement
public void endElement(String uri, String localName, String qName)
throws SAXException {
// procedure called by the parser when it encounters an end tag
// uri: URI of the parsed document?
// localName: name of the element currently being parsed
// qName: same as above, but "qualified" by a namespace if one exists
// display the element's value
String strValue = value.toString();
if (ptnValue == null) System.out.println("null");
results = ptnValue.matcher(strValue);
if (results.find() && ! results.group(1).equals("")){
System.out.println("[" + results.group(1) + "]");
}//if
// set the element's value to empty
value.setLength(0);
// next
System.out.println("End of element <"+localName+">");
}//endElement
}//class
First, let’s define an acronym that frequently appears in XML document analysis: SAX, which stands for Simple API for XML. It is a set of Java classes that facilitate working with XML documents. There are two versions of the API: SAX1 and SAX2. The application above uses the SAX2 API.
The application imports a number of packages:
The first two come with JDK 1.4, but the third does not. The xerces.jar package is available on the Apache Web Server website. It comes with JBuilder 7 as well as with Tomcat 4.x:

So if you want to compile the previous application outside of JBuilder 7 and you have JDK 1.4 and Tomcat 4.x, you can write:
When running the application, do the same:
dos>java -classpath ".;E:\Program Files\Apache Tomcat 4.0\common\lib\xerces.jar" MySaxParser simulations.xml
The MySaxParser class extends the DefaultHandler class. We’ll come back to that later. Let’s examine the code for the main procedure:
// retrieve the URI of the XML file to be parsed
String uri = argv[0];
try {
// create an XML parser
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
// Tell the parser which object will implement the methods
// startDocument, endDocument, startElement, endElement, characters
MySaxParser MySaxParserInstance = new MySaxParser();
parser.setContentHandler(MySaxParserInstance);
// initialize the value pattern for an element
ptnValue = Pattern.compile("^\\s*(.*?)\\s*$");
// tell the parser which XML document to parse
parser.parse(uri);
}
catch(Exception ex) {
// error
System.err.println("Error: " + ex);
// trace
ex.printStackTrace();
}
To parse an XML document, our application needs an XML code parser.
The XML parser used is the one provided by the xerces.jar package. The object returned is of type XMLReader. XMLReader is an interface, and we use two of its methods here:
tells the parser which ContentHandler object will handle the events it generates while parsing the XML document | |
starts parsing the XML document passed as a parameter |
When the parser analyzes the XML document, it emits events such as: "I have encountered the start of the document, the start of a tag, a tag attribute, the content of a tag, the end of a tag, the end of the document, ...". It passes these events to the ContentHandler object that has been provided to it. ContentHandler is an interface that defines the methods to be implemented to handle all the events that the XML parser can generate. DefaultHandler is a class that provides a default implementation of these methods. The methods implemented in DefaultHandler do nothing, but they exist. When we need to tell the parser which object will handle the events it generates using the statement
it is convenient to pass an object of type DefaultHandler as a parameter. If we stopped there, no parser events would be handled, but our program would be syntactically correct. In practice, we pass an object derived from the DefaultHandler class to the parser as a parameter, in which the methods handling only the events that interest us are redefined. This is what is done here:
// we tell the parser which object will implement the methods
// startDocument, endDocument, startElement, endElement, characters
MySaxParser MySaxParserInstance = new MySaxParser();
parser.setContentHandler(MySaxParserInstance);
// we tell the parser which XML document to parse
parser.parse(uri);
We pass to the parser an instance of the mySaxParser class, which is our class and was defined earlier by the declaration
and we start parsing the document whose URI was passed as a parameter. From there, the parsing of the XML document begins. The parser emits events and, for each one, calls a specific method of the object responsible for handling these events—in this case, our MySaxParser object. This object handles five specific events; the others are ignored:
event emitted by the parser | handling method |
void startDocument() | |
void endDocument() | |
public void startElement(String uri, String localName, String qName, Attributes attributes) uri: ? localName: name of the parsed element. If the encountered element is <simulations>, localName will be "simulations". qName: namespace-qualified name of the parsed element. An XML document can define a namespace, such as XX. The qualified name of the preceding tag would then be XX:simulations. attributes: list of the tag's attributes | |
public void characters(char[] ch, int start, int length) ch: character array start: index of the first character to use in the ch array length: number of characters to take from the ch array The characters method can be called repeatedly. To construct the value of an element, we use a buffer that we:
| |
void endElement(String uri, String localName, String qName) The parameters are the same as those of the startElement method. |
The startElement method allows you to retrieve the element's attributes using the attributes parameter of type Attributes:
- The number of attributes is available in attributes.getLength()
- The name of attribute i is available in attributes.getLocalName(i)
- The value of attribute i is available in attributes.getValue(i)
- the value of the localName attribute in attributes.getValue(localName)
With this explained, the previous program and its execution examples are self-explanatory. A regular expression was used to retrieve the values of the elements so that an XML text such as:
returns the text "error 1" as the value associated with the <error> tag, stripped of any spaces and line breaks that might precede and/or follow it.
6.4. Tax calculation application: Version 7
We now have all the elements to write clients for our tax service that delivers XML. We’ll use version 4 of our application for the client and keep version 6 for the server. In this client-server application:
- the tax calculation simulation service is handled by the xmlsimulations servlet. The server’s response is therefore in XML format, as we saw in version 6.
- the client is no longer a browser but a standalone Java client. Its graphical interface is that of version 4.
Here are a few examples of the application in action. First, an error scenario: the client queries the xmlsimulations servlet even though it failed to initialize correctly because the MySQL DBMS was not running:

We start MySQL and run a few simulations:

The client in this new version differs from the version 4 client only in how it processes the server’s response. Nothing else changes. In version 4, the client received HTML code from which it extracted the information it needed using regular expressions. Here, the client receives XML code from which it retrieves the information it needs using an XML parser.
Let’s review the main steps of the procedure associated with the Calculate menu in version 4 of our client, since that is where the changes are primarily taking place:
void mnuCalculate_actionPerformed(ActionEvent e) {
....
try{
// calculate the tax
calculateTaxes(taxURL, rdYes.isSelected(), numberOfChildren.intValue(), salary);
} catch (Exception ex) {
// display the error
JOptionPane.showMessageDialog(this, "The following error occurred: " + ex.getMessage(), "Error", JOptionPane.ERROR_MESSAGE);
}
....
}//mnuCalculate_actionPerformed
public void calculateTaxes(URL taxURL, boolean married, int numberOfChildren, int salary)
throws Exception{
// tax calculation
// taxURL: URL of the tax service
// married: true if married, false otherwise
// nbEnfants: number of children
// salary: annual salary
// retrieve the information needed to connect to the tax server from urlImpots
....
try{
// connect to the server
....
// Create the TCP client's input and output streams
....
// request the URL - send HTTP headers
....
// read the first line of the response
....
// read the response until the end of the headers, looking for any cookies
while((response = IN.readLine()) != null) {
.... }//while
// the HTTP headers are finished—move on to the HTML code
// to retrieve the simulations
ArrayList simulationList = getSimulations(IN, OUT, simulations);
simulations.clear();
for (int i = 0; i < simulationsList.size(); i++) {
simulations.addElement(simulationsList.get(i));
}
// Done
....
}//calculateTaxes
private ArrayList getSimulations(BufferedReader IN, PrintWriter OUT, DefaultListModel simulations) throws Exception{
....
}
All of this code remains valid in the new version. Only the processing of the server's HTML response (boxed section above) and its display need to be replaced by the processing of the server's XML response and its display:
// That's it for the HTTP headers—now we move on to the XML code
// to retrieve the simulations or errors
ImpotsSaxParser parser = new ImpotsSaxParser(IN);
ArrayList errorList = parser.getErrors();
ArrayList simulationList = parser.getSimulations();
// Close connection to the server
client.close();
// Clear the display list
simulations.clear();
// errors
if(errors.size()!=0){
// concatenate all errors
String errorMessage = "The server reported the following errors:\n";
for(int i=0;i<errorList.size();i++){
errorMessage+=" - "+(String)errorList.get(i);
}
// display errors
throw new Exception(errorMessage);
}//if
// simulations
for (int i=0; i < simulationsList.size(); i++) {
simulations.addElement(simulationsList.get(i));
}
return;
What does the code snippet above do?
- It creates an XML parser and passes it the IN stream, which contains the XML code sent by the server. This stream also contained the HTTP headers, but these have already been read and processed. Therefore, only the XML portion of the response remains. The parser produces two lists of strings: the list of errors, if there were any, or the list of simulations. These two lists are mutually exclusive.
- If the list of errors is not empty, the messages in the list are concatenated into a single error message, and an exception is thrown with that message as its parameter. This exception is displayed in the mnuCalculer_actionPerformed procedure that called calculerImpots.
- If the list of simulations is not empty, it is displayed in the jList component of the graphical user interface.
Let’s now explore the parser for the server’s XML response, a parser that stems directly from our previous study on how to parse an XML document in Java:
import java.io.IOException;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;
import java.util.regex.*;
import java.io.*;
import java.util.*;
import javax.swing.*;
// the class
public class ImpotsSaxParser extends DefaultHandler {
// value of an element in the XML tree
private StringBuffer value = new StringBuffer();
// a regular expression for the value of an element when you want to ignore
// the "whitespace" preceding or following it
private Pattern ptnValue = null;
private Matcher results = null;
// lists of XML elements
private ArrayList simulationList = new ArrayList();
private ArrayList errorList = new ArrayList();
// XML elements
private ArrayList elements = new ArrayList();
String element = "";
// -------- constructor
public ImpotsSaxParser(BufferedReader IN) throws Exception{
// Create an XML parser
XMLReader parser = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
// tell the parser which object will implement the methods
// startDocument, endDocument, startElement, endElement, characters
parser.setContentHandler(this);
// initialize the value pattern for an element
ptnValue = Pattern.compile("^\\s*(.*?)\\s*$");
// Initially, there is no current XML element
elements.add("");
// parse the document
parser.parse(new InputSource(IN));
}//constructor
// -------- startElement
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
// procedure called by the parser when it encounters a start tag
// uri: URI of the document being parsed?
// localName: name of the element currently being parsed
// qName: same as above, but "qualified" by a namespace if one exists
// attributes: list of the element's attributes
// note the element's name
element = localName.toLowerCase();
elements.add(element);
// Does the element have attributes?
if(element.equals("simulation") && attributes.getLength()==4){
// It is a simulation—retrieve the attributes
String simulation = attributes.getValue("marie") + "," +
attributes.getValue("children") + "," +
attributes.getValue("salary") + "," +
attributes.getValue("tax");
// Add the simulation to the list of simulations
simulationList.add(simulation);
}//if
}//startElement
// -------- characters
public void characters(char[] ch, int start, int length) throws SAXException {
// procedure called repeatedly by the parser when it encounters text
// between two <tag>text</tag> tags
// the text is in ch starting from the start character for length characters
// the text is added to the buffer value if it is the error element
if (element.equals("error"))
value.append(ch, start, length);
}//characters
// -------- endElement
public void endElement(String uri, String localName, String qName)
throws SAXException {
// procedure called by the parser when it encounters an end tag
// uri: URI of the parsed document?
// localName: name of the element currently being parsed
// qName: same as above, but "qualified" by a namespace if one exists
// error case
if(element.equals("error")){
// retrieve the value of the error element
String strValue = value.toString();
// remove unnecessary whitespace and store it in the list of
// errors if it is not empty
results = ptnValue.matcher(strValue);
if (results.find() && ! results.group(1).equals("")){
errorList.add(results.group(1));
}//if
}
// set the element's value to empty
value.setLength(0);
// reset the element's name
elements.remove(elements.size()-1);
element = (String)elements.get(elements.size() - 1);
}//endElement
// --------- getErrors
public ArrayList getErrors(){
return errorList;
}
// --------- getSimulations
public ArrayList getSimulations(){
return simulationsList;
}
}//class
- The constructor receives the IN XML stream to be parsed and immediately performs this parsing. Once this is complete, the object has been constructed, and the lists (ArrayList) of errors (errorList) and simulations (simulationList) have been created. All that remains for the procedure that constructed the object is to retrieve the two lists using the getErrors and getSimulations methods.
- Only three events generated by the XML parser are of interest here:
- the start of an XML element, an event that will be handled by the startElement procedure. This procedure will handle the tags <simulation marie=".." enfants=".." salaire=".." impot=".."> and <erreur>...</erreur>.
- The value of an XML element, an event that will be handled by the characters procedure.
- End of an XML element, an event that will be handled by the endElement procedure.
- In the startElement procedure, if we are dealing with the <simulation marie=".." enfants=".." salaire=".." impot=".."> element, we retrieve the four attributes using attributes.getValue("attribute name"). In all cases, we store the element name in a variable element and add it to a list (ArrayList) of elements: elem1, elem2, ..., elemN. This list is managed as a stack, where the last element is the XML element currently being parsed. When the "end of element" event occurs, the last element in the list is removed and the new current element is set. This is done in the endElement procedure.
- The characters procedure is identical to the one studied in a previous example. We simply take care to verify that the current element is indeed the <error> element, a precaution that is normally unnecessary here. This type of precaution was also taken in the startElement procedure to verify that we were dealing with a <simulation> element.
6.5. Conclusion
Thanks to its XML response, the impots application has become easier to manage for both its designer and the designers of client applications.
- The design of the server application can now be entrusted to two types of people: the Java developer of the servlet and the graphic designer who will manage the appearance of the server response in browsers. The latter simply needs to know the structure of the server’s XML response to build the stylesheets that will accompany it. Note that these are contained in separate XSL files independent of the Java servlet. The UI designer can therefore work independently of the Java developer.
- Client application designers, too, simply need to know the structure of the server’s XML response. Any changes the graphic designer might make to the style sheets have no impact on this XML response, which always remains the same. This is a huge advantage.
- How can the developer update their Java servlet without breaking everything? First of all, as long as the XML response remains unchanged, they can organize the servlet however they like. They can also update the XML response as long as they retain the <error> and <simulation> elements expected by their clients. They can thus add new tags to this response. The front-end developer will account for them in their style sheets, and browsers will be able to receive the new versions of the response. Programmatic clients, however, will continue to function with the old model, as the new tags are simply ignored. For this to work, the tags being searched for must be clearly identified in the XML parsing of the server’s response. This is what was done in our XML client for the tax application, where the procedures specifically stated that we were processing the <error> and <simulation> tags. As a result, the other tags are ignored.