Speaking at SemTech 2011 about rNews

Speaker at SemTech 2011

I will be speaking about rNews, the new standard from the IPTC for embedding metadata in on-line news content, at the upcoming SemTech conference in San Francisco. The session is Wednesday, June 8, 2011
09:45 AM – 10:35 PM. Here is the abstract for the talk:

rNews: Embedding Metadata in On-line News

The IPTC, a consortium of the world’s major news agencies, news publishers and news industry vendors, recently released rNews, a semantic standard for on-line news. rNews uses RDFa to annotate HTML documents with news-specific metadata, to help with search, ad placement, aggregation and the sharing of on-line news. Jayson Lorenzen, a software engineer with Business Wire and one of the IPTC Member organization delegates working on rNews, will give an overview of the IPTC, the rNews standard, why rNews is needed and how the standard was eventually created. The talk will include use cases and live demonstrations of rNews and will end with a call to action for you to participate; rNews is currently at version 0.5 and the IPTC is looking for feedback on how to improve the standard.

Determine if a String is XML using Java and Regular Expressions

So again I am posting something I have to do every now and then and have to spend time, each time, to check the pattern or usage etc. for.

Once in a while, in an app that does not do much XML, and therefore is not already using an XML parser of some kind, will need to at the least, determine if a String is XML. With a pretty simple Regular Expression, it is possible using plain old Java and without using any specific XML technology.

I know there are other references out there for doing this, but it is here below as a code sample, for my easy reference and maybe it will help someone else out, who knows. Enjoy.

Are we XML (like) data? :

import java.util.regex.Pattern;
import java.util.regex.Matcher;


public class test {



    /**
     * return true if the String passed in is something like XML
     *
     *
     * @param inString a string that might be XML
     * @return true of the string is XML, false otherwise
     */
    public static boolean isXMLLike(String inXMLStr) {

        boolean retBool = false;
        Pattern pattern;
        Matcher matcher;

        // REGULAR EXPRESSION TO SEE IF IT AT LEAST STARTS AND ENDS
        // WITH THE SAME ELEMENT
        final String XML_PATTERN_STR = "<(\\S+?)(.*?)>(.*?)</\\1>";



        // IF WE HAVE A STRING
        if (inXMLStr != null && inXMLStr.trim().length() > 0) {

            // IF WE EVEN RESEMBLE XML
            if (inXMLStr.trim().startsWith("<")) {

                pattern = Pattern.compile(XML_PATTERN_STR,
                Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);

                // RETURN TRUE IF IT HAS PASSED BOTH TESTS
                matcher = pattern.matcher(inXMLStr);
                retBool = matcher.matches();
            }
        // ELSE WE ARE FALSE
        }

        return retBool;
    }



}/**/

DOM Document – get or extract contained document (or Node) as XML Source

Something I have to do every once in a while, and can never remember how (especially when under some tight deadline, with people standing over my shoulder asking “is it done yet, is it done?” “how much longer?” etc.) is to extract a fragment of one DOM document to get the XML source of the nested or contained document. So I am going to add a note here, for everyone’s easy reference.

First step is to get a Node to be the Root Node of the new Document. Using methods like Document’s getElementsByTagName(String) and Node.getChildNodes(), or using XPathAPIs and CachedXPathAPI class’ selectSingleNode(Node n, String xPath).

Next we can use a StringWriter and a Transformer to covert the Node to XML Source. Better than a rambling explanation, a simple source example should be do the trick. You can use a method something like the nodeToXMLString example below.

  private String nodeToXMLString(Node node) throws TransformerException
  {
    StringWriter sw = new StringWriter();

    Transformer serializer = TransformerFactory.newInstance().newTransformer();
    serializer.transform(new DOMSource(node), new StreamResult(sw));

    return (sw.toString());
  }

Apache XMLBeans – output XML without a namespace

This is again something that I need to know how to do but never remember how when time it tight.

Apache XMLBeans http://xmlbeans.apache.org/ is a great tool for working with XML in Java, but it requires the XML Schema being used to create the objects from XML to have a Namespace. The namespace ends up being part of the package structure for the Objects created and I guess having a unique path for these is a good idea. However, this requirement can be a bit of a pain, when working with some simple XML structures that do not have a namespace. Especially when your ready to persist the XML Bean objects to XML source. If a namespace is added to your XML Schema that the XML Bean objects are created from, XML source generated from them will by default also have the namespace. I can never remember how to output the XML source without a namespace and so I am writing it down here where I can get at it with a click, and maybe this will help someone else as well.

There are two key steps to remove the Namespace when outputting XML Source.

  1. Tell it to use the default namespace:
           xmlops.setUseDefaultNamespace();
       
  2. Tell it that you have already declared the default namespace:
           dnsMap.put("", "http://example.com/schemas/DefaultNnameSpace");
           xmlops.setSaveImplicitNamespaces(dnsMap);
      

After this, you can output as normal:

      xmlops.setSavePrettyPrint();
      xmlops.setSaveNamespacesFirst();
      retString = myXMLDoc.xmlText(xmlops);
      return retString;

Note: the research and testing to solve this was done using XMLBeans v2.4.0