Emerging Technology

Mark Daniels of Thomson Reuters led two calls for our Emerging Technology work stream since our last Quarterly Meeting – on November 6th And December 4th. (The October and January Emerging Tech calls were canceled due to the proximity with the September quarterly meeting and the New Years Day holiday, respectively.) The group also met for an in-person workshop on November 17th at Citi in New York.

The Emerging Tech calls prepared for and followed-up on the topics covered in the Workshop. The Link-Back conversation has reached entitlement synchronization and API standards across publishers and aggregation vendors. The Componentization effort is progressing along the Pilot Program and also taking a second look at RDFa.

Componentization

Based on recent work, Richard Brandt added a new page to our Wiki called “Pilot Specification” and populated it with objectives and a structure for the Pilot Program for RIXML Componentization.


Objectives Structure
  • Value to buy-side consumers: What will Componentization deliver that can be leveraged to incrementally elevate the Research experience?
  • Implications to producers: What best practices should producers adopt to enable the value chain?
  • Opportunity for products that serve the buy-side: How can buy-side applications (vendor or proprietary) use componentization to add value?
  • Sample content (need sources)
  • Accessibility/security
  • Product online upload interface: RIXML, PDF, HTML5, XML
  • Manual report loading: RIXML, HTML5, XML
  • XML schema
  • HTML5 schema
  • RIXML base requirements
  • Spot Tag syntax (folksonomy): RIXML context, not so much social media context
  • Lifecycle syntax: attributes for actions
  • Use-cases: component types, UI (i.e. light box), component comparison, search, ratings, valuation survey, year ahead, morning call, etc.

The Componentization working group discussed progress and plans during an October 15th call. We resolved to focus on the descriptive rather than the proscriptive, and attempt to pursue the objectives in that manner. The call resulted in some specific task assignments, as follows:

1. Assemble samples from the list of free sources compiled earlier in the year.

2. Tag the samples with component type labels as described in our Guidelines.

Recall that the team from Moody’s Analytics had proposed augmenting our component tagging with lifecycle attributes, such as unique component IDs, publication actions, date/time stamps, primary indicators, sequence numbers, and cardinality information. We dug further into this proposal via an in-person meeting at Moody’s World Trade Center location on October 9th and found some promising possibilities. Work on the Pilot Program, including the lifecycle attributes, flowed naturally into our subsequent Tech Workshop.

Tech Workshop

Citi hosted our November 17th Tech Workshop down on Greenwich Street. We followed the agenda outlined below.

During the part of the Workshop about Link-Back, we reviewed ideas about IAM, entitlement synchronization and API standards. The discussion of entitlement synchronization raised several questions. How are unique identifiers initially set up for a given user? Once stored for each channel in a broker database, how are they updated and regularly synchronized? Standardizing an API for this purpose would involve user attribute definitions, IAM data storage, and action definitions. Handshake specifications would need to cover message definitions, acknowledgements, credential conflict resolution, firm-level vs. user-level issues, data privacy provisions, and more.


Agenda
  • Link-Back Landscape
    • Entitlement Synchronization
    • Entitlement API Standards
    • Link-Back for PDF/Print/Offline vs. HTML/Online
  • Componentization
    • Expectations
    • Near-Term Deliverables
    • Pilot Program
  • Tech Topics "On the Horizon"
  • 2016 Goals & Deliverables

We’ve seen two different implementations for link generation, one from Citi and one from Thomson Reuters. Each builds a link (URL) up from constituent parts and yields dynamic references to remote content in a way that facilitates Link-Back.

Under the banner of workflow support, we talked about possibilities for handling PDF-based use-cases, such as batch printing and offline consumption. Citi agreed to itemize key workflows that they’ve identified during their own implementation of Link-Back. We’re very grateful to Citi, and Sara Noble and Hammad Akbar, especially, for leading this effort and for their continued contributions on this topic.

Moving on to the Componentization part of the Workshop, Richard Brandt demonstrated progress along the plan for the RIXML Componentization Pilot Program. Having assembled two sample documents from Edison, he converted both to HTML5 and tagged one of them according to the RIXML Guidelines. The second is in progress. Richard also began to create a cloud-based showcase environment for further developing and displaying the desired use cases.

Also, the basic framework for the enhanced lifecycle attribute syntax we’d been discussing has been prototyped, as shown below:

<body>
<article>
<section 
title="Sample Section Title"
data-rixml-sectionID="6171896a-b527-4219-9bf3-9cf0ce9c647e"
data-rixml-sectionValue="sampleSectionValue"
data-rixml-sectionName="Sample Section Name"
data-rixml-action="new | update | delete"
data-rixml-sectionDateTime="YYYY-MM-DDThh:mm:ssTZD"
data-rixml-sectionRestriction="public | internal | client | other values..."
data-rixml-sectionReference="parent:9c2bb5db-3d0d-4411-8342-
a2ae050b925c;previous:e0d7c275-7577-4581-b0c5-0634fe02f861">
</section>> 
</article>
</body>

Moody’s Analytics put together some slides proposing a second look at RDFa for Componentization, with illustrative examples. We’re grateful to Maribeth Martorana, Alex Shifrin, and John Armstrong for their initiative and thoughtful presentation.


Summary

  • Currently, RIXML Componentization implementation is limited to HTML5. HTML5 is a legitimate option and has great adoption across different browsers and devices.
  • Looking at other possibly more flexible options that work across different standards, including HTML5. The RIXML Componentization Guidance mentions exploring RDFa.
  • Propose to look at RDFa as a complementary implementation for RIXML Components. RDFa provides flexibility to work with many XML based formats.

Benefits of RDFa for RIXML Components

  • Decoupled from the Delivery Method
    • Publisher Independence – Publishers can inject RDFa Component tags in existing delivery formats.
    • Each publisher is able to continue using existing processes/delivery formats.
    • RDFa can easily be extracted and queried as structured data.
  • Self Containment
    • Separate XML and HTML sections are not required for the same content.
    • RDFa would be inserted inline with content eliminating data duplication.
    • Consumption easier due to smaller file sizes.
  • RDFa Can Evolve and Grow
    • RDFa models are extensible in terms of defining a structure for components.
    • RDFa is a complementary standard for current standards (e.g. HTML5) and future standards that may emerge.

2016 Objectives for the RIXML Organization

We’re carrying forward our organizational objectives from 2015 into the new year. Our top focus areas will continue to be Componentization and the Link-Back Landscape. Both of these areas have gained traction and we look forward to greater progress in the coming weeks.

Componentization
Finalize and productize our documentation detailing the guidelines for componentization agreed by our Working Group. Complete a pilot program to illustrate and exercise our ideas.

Link-Back Landscape
With many research publishers broadening their platforms to embrace various forms of digital content delivery, issues around Identity & Access Management (IAM) and workflow support arise between publishers, consumers, and aggregators. RIXML will continue our conversations about how to add value in this space.

Areas for consideration in a future release of the RIXML core schema receive ongoing attention, so that we’re prepared with our best ideas when the timing is right.

Social Media
Propose specific modifications to the RIXML schema to facilitate the inclusion of social media messages within both new and existing Research authoring and publishing platforms.

Side-Car Schemas
Explore the adoption of these schemas and evaluate meaningful updates and additions. (The release of RIXML schema version 2.4 in 2013 included a pair of “side-car” schemas intended for communicating analyst roster and coverage universe data.)

Spot Tags
Propose one or more specific solutions to address the need to avoid fragmentation of keywords in “breaking news” situations. RIXML should offer an easy method for research content publishers to tag new products with non-canonical keywords in a fashion consistent across publishers and in timeframes much shorter that the RIXML schema release cycle.

Identifying Authors and Documents
Discover opportunities to do a better job of uniquely and portably identifying authors and documents/products within RIXML. Further our relationships with ORCID and CrossRef.

And largely through our Emerging Technology work stream, the RIXML organization monitors new technology topics of importance to the Research marketplace. We try to keep our eyes and ears open for opportunities to add value by offering standardization ideas.

Big Data
Continue to monitor the opportunities for RIXML at the intersection of the Investment Research marketplace and the application of “Big Data” methods toward discovering actionable investment signals.

Within our Emerging Technology Working Group, we are in the early stages of exploring potential synergies and benefits of JSON. As per the JSON Website(www.json.org):

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language.

JSON is built on two structures:

  • A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array.
  • An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence.

These are universal data structures. Virtually all modern programming languages support them in one form or another. It makes sense that a data format that is interchangeable with programming languages also be based on these structures.

We are grateful to Mark Daniels, Chair, and the members of our Emerging Technology Working Group, for their efforts and contributions to these discussions. This working group has a standing call on the first Friday of each month, held from 9:30am – 10:30am(EST) and we encourage RIXML.org members’ representatives to participate.

The topic of ReactJS has also been introduced in discussions within our Technology Working Group.

React is a JavaScript library for creating user interfaces by Facebook and Instagram. Many people choose to think of React as the V in MVC.

In a recent post on blog.formidable.com, entitled “Using React is a Business Decision, Not a Technology Choice”, author Eric Baer writes:

React has become very popular among developers and there are lots of resources that speak to its technical merits. However, migrating to (or choosing) a new framework ultimately comes down to selling it to everybody at the table — including non-developers. There are very few engineering managers or PMs would agree to a re-write just because it’s the shiniest new thing and worse still, many organizations have been burned by the high churn in JavaScript tooling, which sadly moves projects backwards as a part of moving the web forward. This post is not an attempt to teach you something new about React. It is an attempt at the Executive Summary; a starting point for your pitch to try to sell not just developers but everybody on the wonders of React.

React Stability — Facebook is heavily invested in React (Newsfeed, Instagram, Messenger, Ads Marketplace etc.) and has a number of dedicated engineering resources working on the project. This dog-fooding and investment is not present in any of the competing frameworks. In addition to Facebook engineering, there is an enormous groundswell of excitement around React. There are currently 571 contributors (as of Dec 2015) to the project along with a conference series and a regular release cadence.

Summary: React is a library for building composable user interfaces. There are many other tools like Angular, Backbone, Knockout and Ember that do similar things but in comparison, React grew out of solving business problems rather than technical ones.

Our Technology Working Group discussions are often about “the next big thing” and what technologies lie on the horizon and how they may compliment RIXML.org content tagging and construct discussions going forward. It is in this spirit in which our working groups operate to exploit “knowledge sharing” for the greater good of all members.

Interested in joining RIXML.org? Call our Program Office and 212-652-4470 or email us at to=This email address is being protected from spambots. You need JavaScript enabled to view it." target="_blank">This email address is being protected from spambots. You need JavaScript enabled to view it. for additional information.

PLEASE NOTE: This viewpoint is entirely my own and neither the official viewpoint of RIXML.org northe viewpoint of any of its member organizations.

“The Human Face of Big Data”…Got A Smartphone? We ARE Big Data !

I came across a fascinating and terrific segment and documentary that was highlighted on a recent NPR broadcast entitled “The Human Face of Big Data: The Promise and Perils of Growing a Planetary Nervous System”, directed by Mr. Sandy Smolan, an award-winning filmmaker whose work spans features, documentaries, television and commercials. He has directed over 50 network series, pilots and TV movies, and as the CEO of Luminous Content, he collaborates with corporations, foundations and NGOs to create character driven content that focuses on how technology impacts lives around the world. The documentary’s Executive Producer is Sandy’s brother, Rick Smolan, who was interviewed for the NPR segment.

The real attention grabber to start the segment, was a line quoted by Eric Schmidt, Chairman of Alphabet, in terms of present day data impact, that “Every two days the human race is now generating as much data as we generated from the dawn of humanity through 2003.” To also quote Wavy Gravy…”Far Out, Man !!!!”

The segment mentions, with the rapid growth of Internet connected devices and sensors, an unstoppable, invisible force has begun changing human behavior in ways from the microscopic to the gargantuan. Billions of interconnected devices are generating tremendous amounts of “Big Data”, a term that was barely used a few years ago but which now affects almost every aspect of our lives, from the moment we first awaken to the extinguishing of the final late-evening light bulb. The segment further highlights that “The Human Face of Big Data” explores how the real-time visualization of data streaming in from satellites, billions of sensors, GPS enabled cameras and smart phones(as phone attached modern “human sensors”, we all are carriers and transmitters !) is beginning to enable us, as individuals and collectively as a society, to sense, measure and understand aspects of our existence in ways never possible before, offering up, for further thought:

  • Analyzing the “Digital Exhaust” to develop patterns/insights---is this not the daily role of the Research Analyst---perhaps, soon to be the Investment Research Data Scientist? A very exciting career path in Research is at hand.
  • Leveraging Cloud storage impact – analyzing what was once incomprehensible data stores at what was once unimaginable cost-effective price points.
  • Reckoning the “Dislocated Cause & Effect” of information events – bridging, closing and eliminating time gaps toward “real-time” immediate interpretation of data patterns to “events”---what next generation research analyst would not be excited by these prospects? This development trend is very real and is at our doorstep.
  • Will these rapid developments force us to re-examine data ownership(whose data?), data privacy/4th amendment issues/selling the data, et al…seems like the present Apple iphone/US government involvement issues might pale in comparison to these emerging large scale(infinitely large) issues down the road.

The very thought of the evolution of our organization(breadth and depth) alongside the rapid evolution of our “planetary nervous system”(what a cool term), makes for a very exciting potential ride….hope you get aboard.

“Spring work is going on with joyful enthusiasm.” 
      John Muir, The Wilderness World of John Muir
 

Friday session ad

Our Friday Topic Series has concluded; however, we are in the process of making replays of the presentation portion of many of these meetings available.  These videos include the list of questions we would like your input on as we plan for RIXML v3.0, so feel free to watch them and let us know your thoughts - and feel free to share them with your colleagues as well!

Componentization

Entitlements

Tagging of Non-Standard Research