Determine if a String is XML using Java and Regular Expressions

So again I am posting something I have to do every now and then and have to spend time, each time, to check the pattern or usage etc. for.

Once in a while, in an app that does not do much XML, and therefore is not already using an XML parser of some kind, will need to at the least, determine if a String is XML. With a pretty simple Regular Expression, it is possible using plain old Java and without using any specific XML technology.

I know there are other references out there for doing this, but it is here below as a code sample, for my easy reference and maybe it will help someone else out, who knows. Enjoy.

Are we XML (like) data? :

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class test {

     * return true if the String passed in is something like XML
     * @param inString a string that might be XML
     * @return true of the string is XML, false otherwise
    public static boolean isXMLLike(String inXMLStr) {

        boolean retBool = false;
        Pattern pattern;
        Matcher matcher;

        final String XML_PATTERN_STR = "<(\\S+?)(.*?)>(.*?)</\\1>";

        // IF WE HAVE A STRING
        if (inXMLStr != null && inXMLStr.trim().length() > 0) {

            // IF WE EVEN RESEMBLE XML
            if (inXMLStr.trim().startsWith("<")) {

                pattern = Pattern.compile(XML_PATTERN_STR,
                Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);

                matcher = pattern.matcher(inXMLStr);
                retBool = matcher.matches();
        // ELSE WE ARE FALSE

        return retBool;


DOM Document – get or extract contained document (or Node) as XML Source

Something I have to do every once in a while, and can never remember how (especially when under some tight deadline, with people standing over my shoulder asking “is it done yet, is it done?” “how much longer?” etc.) is to extract a fragment of one DOM document to get the XML source of the nested or contained document. So I am going to add a note here, for everyone’s easy reference.

First step is to get a Node to be the Root Node of the new Document. Using methods like Document’s getElementsByTagName(String) and Node.getChildNodes(), or using XPathAPIs and CachedXPathAPI class’ selectSingleNode(Node n, String xPath).

Next we can use a StringWriter and a Transformer to covert the Node to XML Source. Better than a rambling explanation, a simple source example should be do the trick. You can use a method something like the nodeToXMLString example below.

  private String nodeToXMLString(Node node) throws TransformerException
    StringWriter sw = new StringWriter();

    Transformer serializer = TransformerFactory.newInstance().newTransformer();
    serializer.transform(new DOMSource(node), new StreamResult(sw));

    return (sw.toString());

Apache XMLBeans – output XML without a namespace

This is again something that I need to know how to do but never remember how when time it tight.

Apache XMLBeans is a great tool for working with XML in Java, but it requires the XML Schema being used to create the objects from XML to have a Namespace. The namespace ends up being part of the package structure for the Objects created and I guess having a unique path for these is a good idea. However, this requirement can be a bit of a pain, when working with some simple XML structures that do not have a namespace. Especially when your ready to persist the XML Bean objects to XML source. If a namespace is added to your XML Schema that the XML Bean objects are created from, XML source generated from them will by default also have the namespace. I can never remember how to output the XML source without a namespace and so I am writing it down here where I can get at it with a click, and maybe this will help someone else as well.

There are two key steps to remove the Namespace when outputting XML Source.

  1. Tell it to use the default namespace:
  2. Tell it that you have already declared the default namespace:
           dnsMap.put("", "");

After this, you can output as normal:

      retString = myXMLDoc.xmlText(xmlops);
      return retString;

Note: the research and testing to solve this was done using XMLBeans v2.4.0