In this issue we welcome Morgan Stanley as a new Steering member, we highlight an article from our Associate member MarkLogic, we highlight an article that appeared in a recent Hitachi XBRL blog, we get an update from our Emerging Technology working group, we note the release of schema v. 2.3.1 samples and we get a perspective from Jack Roehrig, Executive Director of RIXML.org.


Morgan Stanley joins RIXML.org as a Steering Member

RIXML.org, a consortium of buy-side, sell-side and vendor firms committed to the development and implementation of the first open standard for investment research, is pleased to announce that Morgan Stanley has joined the organization as a Steering Member. Morgan Stanley is one of the financial industry's thought leaders in equity and fixed income research.

"RIXML.org welcomes Morgan Stanley as a steering member. They bring a great deal of expertise and innovative thinking to the Research packaging and distribution space. We look forward to their contributions to our organization", said Jack Roehrig, Executive Director, RIXML.org.

"Morgan Stanley is pleased to work with the members of RIXML.org to enhance our ability to distribute research content to clients. Our goal is to make our research available to clients when, where and how they need it," said Eric Marks, Executive Director, Morgan Stanley.

As a Steering Member, Morgan Stanley will add much value through direct participation in RIXML Technology working groups, with their fellow associate and steering members, to help develop future releases of the schema.

We look forward to our partnership with Morgan Stanley.


A purpose-built database for XML
Stephen Buxton - Product Manager, MarkLogic and co-author (with Jim Melton) of "Querying XML".

XML is becoming ubiquitous as a standard way to represent all kinds of information, from unstructured to semi-structured to weirdly-structured. At MarkLogic, we've built a database where XML is the data model and XQuery is the native language. That means it's designed differently from other databases, and it can take full advantage of the power of XML, at tremendous speed and scale.

XML is great at representing information in a flexible yet controllable way. Want to add a new piece of metadata, such as a publisher name? How about a whole chunk of metadata, including the first name and last name of the publisher, his location (street address and geo-codes), the date this item was published in each country, and its price in each currency? With XML there's a natural way to add that chunk of metadata to a document. If you enclose each part of this chunk with an XML tag that has a meaningful name, then the whole thing is self-describing - i.e. it's easy for a human reader or a computer program to see what each part of the metadata means. Of course, you wouldn't want people adding ad hoc tags to your documents - that way lies anarchy. XML Schema provides a way to control some parts of your XML while leaving other parts (such as free-flowing text with ad hoc presentational or semantic markup) unchained.

In the XML world you need to be able to store this information just as it is and make it available for search, analytics, and reporting; and you need to be able to ingest vast amounts of XML and still have search and analytics work in real-time. MarkLogic does just that. As information is loaded into the database, MarkLogic creates a patented Universal Index on every XML element and attribute, and on the hierarchical structure of the document, as well as indexing every word and phrase. MarkLogic can index range values for fast analytics, and geo-spatial values for location searches. And all these indexes can also be used to construct alerts, so the database can tell you the instant a document arrives that's of interest to you.

This is where XQuery comes in - now you can perform rich queries that combine structure, words, ranges, and even geo-spatial searches, very quickly over very large amounts of XML (from a few megabytes to hundreds of terabytes, even petabytes). With these rich queries you can quickly and easily build applications to search your XML documents, to repurpose that information into insightful reports, and to display the information in a way that's meaningful to each user. MarkLogic applications can be end-to-end XML, with queries and business logic written entirely in XQuery - no need to go through any non-XML layers - making them quick to build, flexible, and efficient. MarkLogic is in use today as the database for huge amounts of XML at financial institutions, publishing houses, and federal agencies.

For more information go to http://marklogic.com, or register for the MarkLogic user conference at http://www.marklogicevents.com/.


Interested in joining RIXML.org? Call our Program Office and 212-655-2945 or email us at rixml@jandj.org for additional information.


Realizing the RIXML.org/XBRL.org Partnership: It's ALL about the Workflow

Posted February 28, 2011, Hitachi Data Interactive Blog

RIXML.org (www.rixml.org) remains an organization dedicated to the adoption and implementation of an XML-based tagging standard for research data. Both sell-side publishers of research and institutional buy-side consumers of research and vendor intermediary channels, which are value-added distributors of research, have a direct stake in the successful implementation of this standard. Why tagging and why now? A colleague of mine in RIXML.org recently passed on an observation that sums up the need for standardized structure (tagging) quite well: this is a transitional time for the research business. Research packaging and distribution decisions are taking on greater urgency in two different roles:

  • Business drivers - content that adds to the investment decision making process needs to be clearly paid for. "Fuzzy" pricing models, on the hope of increased commissions and order flow, are fewer and further between. "Content in context" is paramount. Rich, deep tagging to identify such content will help to mitigate the opportunity costs of misplaced or irrelevant research to clients.
  • Technology drivers, which speaks directly to the transitional time for the research business. I like to use the iTunes analogy. How many times has anyone searched for songs in iTunes when those songs' tags are absent or misplaced? Those songs will probably not make users' playlists. In the same way, research that is not tagged correctly or not tagged at all will not make consumers' "playlists". The proliferation of PDAs, iPhones, iPads, and their apps among institutional portfolio managers and consumers will only heighten the need to structure (tag) research in its proper context. This is a very exciting time for research to ride along on this technology toolset wave.
RIXML.org is a Direct Association member of XBRL.org. We see our relationship as highly complementary; we face off to XBRL at the working group level. XBRL is clearly designed to bring efficiencies to the research analyst workflow process, saving time in data gathering, reducing error rates, and embedding live extensible data filings into publisher workflow to produce more timely and effective earnings models, fundamental company updates, etc. Our RIXML members see the potential benefit of XBRL. While we are still in the early innings, early models are emerging to bring the two together in practical implementation. We expect these practical models to seek and find their right level in our business in 2011. The membership firms in RIXML.org will continue to benefit from greater understanding of the XBRL vendor applications that may serve as enablers for these workflow models.

RIXML.org: Emerging Technology Working Group Update

Our kick-off session of the Emerging Technology working group occurred in February. Many thanks to Richard Brandt of Quark for Chairing the group. There is excellent representation amongst the membership to contribute. In particular, consensus to focus on:

>Process---leveraging new toolsets for the test and release of new schema versions
>Schema Development---building consensus for new tag sets
>Case Studies---formalizing and evangelizing use case sets for adoption/implementation
>Content Packets---exploring/identifying potential research "component sets"
>Smart Device Packaging/Delivery---in the PDA, Apps, et al, space

This is a very exciting time for research to ride along the emerging technology toolset wave. This group is to convene monthly and we look forward to their findings.

The Release of RIXML.org Schema version 2.3.1. Samples

http://www.rixml.org/newsite/specification.html

Our organization is grateful for the contributions of Alan Francis and Factset in providing representatives samples of the latest version of our schema, v 2.3.1. The above link provides both basic and advanced versions of Fundamental Company, Economics, Industry and Morning Call reports.

In order to our "best foot forward" we encourage the ongoing review and critique of current schema samples.

Perspective from Jack Roehrig, Executive Director, RIXML.org

PLEASE NOTE: This viewpoint is entirely my own and neither the official viewpoint of RIXML.org nor the viewpoint of any of its member organizations.

Mr. Watson, come here-I want to see you.

The above infamous quote, from Alexander Graham Bell in 1876, ushered in the age of the telephone, where, Bell makes the first telephone call in his Boston Laboratory, summoning his assistant from the next room. While this "Watson" reference is most famous, in recent months, another "Watson" emerged on the scene that caught the attention of the trivia/knowledge experts, quiz show whizzes and artificial intelligence aficionados.

As per Wikipedia, this most recent "Watson" is an artificial intelligence computer system capable of answering questions posed in natural language, developed in IBM's DeepQA project by a research team led by principal investigator David Ferrucci. Watson was named for IBM's first president, Thomas J. Watson. Watson recently emerged victorious on the quiz show Jeopardy!, in the show's only human-versus-machine match up.

Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage. Watson was not connected to Internet during the game. IBM describes the system as "an application of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning and Machine Learning technologies to the field of open domain question answering."

In the future, according to IBM, "the goal is to have computers start to interact in natural human terms across a range of applications and processes, understanding the questions that humans ask and providing answers that humans can understand and justify."

Not to get too cosmic here--when you think about it, in some small way, there is indeed a parallel to the goals of the "new world order Watson" and the work done within RIXML.org. Creating order out of the sometimes chaotic structure of research data and tags can help us understand the questions that humans ask, i.e., searching for relevant content across tag sets to get impactful "content in context", and providing answers(result sets, alerts, et al) that users can understand and are willing to pay for(justify).

On a much larger scale, the bigger question to ponder, somewhere down the road: Is there a "Watson" in the Research future?

Over and Out----Beam me up Scottie....

“After you have exhausted what there is in business, politics, conviviality, and so on - have found that none of these finally satisfy, or permanently wear - what remains? Nature remains.”

— Walt Whitman 1819 - 1892