- XML (Extensible Markup Language) is text based language generally used for communication between different apps
- Standard means to transport and store data
- Java has rich set of libraries to parse, modify or inquire XML docs
- Topics
- Basic XML concepts
- Usage of Java based XML parsers
- Pre-requisites
- Basic concepts
-
It is text based language designed to store and transport data in plain text format
-
Salient features
- XML is a markup language
- XML is tag based language like HTML
- XML tags are not predefined like HTML
- We can define own tags (extensible)
- XML tags are self-descriptive
- XML is W3C reccomendation for data storage and data transfer
-
Example:
<?xml version="1.0"?> <Class> <Name>First</Name> <Sections> <Section> <Name>A</Name> <Students> <Student>Rohan</Student> <Student>Mohan</Student> <Student>Sohan</Student> <Student>Lalit</Student> <Student>Vinay</Student> </Students> </Section> <Section> <Name>B</Name> <Students> <Student>Robert</Student> <Student>Julie</Student> <Student>Kalie</Student> <Student>Michael</Student> </Students> </Section> </Sections> </Class>
- Technology agnostic: It is plain text, technology independent
- Can be used by any technology for data storage
- Human readable
- Extensible: Custom tags can be created and used easily
- Allow Validation: XSD, DTD validation
- Redundant Syntax: Usually contains lot of repititive terms
- Verbose: file size increases transmission and storage costs
- Parsing: Going through XML document to access or modify data
- Types of parsers commonly used to parse XML documents:
- Dom Parser: Parses by loading complete contents of document into complete hierarchical tree in memory
- SAX Parser: Parses XML document on event-based triggers
- Does not load complete document into memory
- JDOM Parser: Similar to DOM parser but in easier way
- StAX Parser: Similar to SAX parser but in more efficient way
- XPath Parser: Parsing based on expression and is usually used in conjunction with XSLT
- DOM4J Parser: Parse XML, XPath and XSLT using Java Collections framework
- Supports DOM, SAX and JAXP
- JAXB and XSLT APIs are available to handle XML parsing in object-oriented way
- DOM is official recommendation of W3C
- Defines an interface that enables programs to access and update style, structure, and contents of XML docs
- XML parsers that support DOM implement the interface
- Need to know a log about the structure of the document
- Need to move parts of XML document around
- Sorting certain elements
- ...
- Need to use information in XML document more than once
- A tree structure that contains all elements of your document. DOM has functions which can be used to examine contents and structure of document.
- Java code written for one DOM-compliant parser should run on any other DOM-compliant parser without having to do any modifications
- Different DOM interfaces:
- Node: base datatype of DOM
- Element: Vast majority of objects
- Attr: Represents attribute of element
- Text: Actual content of element or attr
- Document: Represents entire XML document
- Referred to as DOM tree
- Methods:
- Document.getDocumentElement() - Returns root element of document
- Node.getFirstChild() - Returns first child of given Node
- Node.getLastChild() - Returns last child of given Node
- Node.getPreviousSibling() - Methods return previous sibling of given Node
- Node.getAttribute(attrName) - For given node, returns attribute with requested name
- Steps
- Import XML-related packages
- Construct a SAXBuilder
- Construct a Document from a file or stream
- Extract the root element
- Examine attributes
- Examine sub-elements
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
StringBuilder xmlStringBuilder = new StringBuilder();
xmlStringBuilder.append("<?xml version="