Looking ahead: AI and NLP implications for research
At the September 2023 all-member meeting, we picked up a discussion that began at our June meeting, when a member indicated that artificial intelligence/natural language processing (AI/NLP) as it impacts investment research is becoming a topic of interest. At the September quarterly meeting, we spent some time identifying various ways that AI and NLP will or could affect how investment research is created, distributed, tagged, cataloged, aggregated, filtered, and discovered. It is clear that this is an important topic for our member firms and our industry, and our goal at the September meeting was simply to begin identifying at least some of the aspects we may want to begin examining as an organization.
While we did not have a set agenda for this part of the meeting, we certainly had plenty to discuss! Because RIXML all-member meetings involve buyside, sellside, and vendor firms, and a mix of technologists, business-side representatives, and topic experts, we were able to have a robust brainstorming session, which will serve as the beginning of RIXML’s work on this topic.
Below is a summary of what we heard. A huge thank you to Kathleen Stowe of Jordan & Jordan for taking great notes during this fast-paced discussion!
The AI/NLP landscape is rapidly evolving
- While output was initially dry and fairly identifiable, it is rapidly becoming more sophisticated.
- The increased sophistication of the writing style, grammar, etc. is not always matched by the veracity of the information – just because the text sounds like it was written by an expert does not always mean that the information is accurate. Use of AI/NLP tools, at least for the foreseeable future, will require an increased attention to fact-checking
Implications for structured data
- AI/NLP tools might impact the structured tagging that RIXML tag files provide for investment research reports in two ways: it could replace the need for structured tagging, or it could leverage it, making accurate structured tagging more important.
- For RIXML v3.0, we are determining how to allow publishers to indicate whether AI-powered tagging has been included in any particular RIXML tag file. We are looking at adding tag source and tag confidence, and will likely begin by adding this capability for 8-12 key tags.
- We would welcome the input from any AI/NLP experts regarding additional considerations.
AI-facilitated content creation
As tools become available to speed up the research process and the process of writing content, firms will need to consider when, whether, and how to incorporate them into the research creation process.
- While these tools may help create a basic structure for a report or an investment idea, the analyst will always be responsible for the content they produce. One person observed that senior analysts already do this when working with junior analysts; a challenge is that AI-generated content may sound more sophisticated, leading to a false sense of data accuracy.
- At present, many firms have decided that these tools cannot be used in the research creation process at all, because the risk of incorrect data passing through the review/authorization process is too high.
- For AI/NLP tools to be incorporated into the research creation process, there will need to be a mechanism to identify what parts of a research report were created using these tools, so that Supervisory Analysts, compliance teams, etc. can identify parts of the report that may need additional review.
- Because this is a rapidly-developing area, we will be keeping an eye out for enhancements that provide the level of confidence needed. Once there are ways to link from AI-generated sentences to the input data used to create them, it may be possible to efficiently fact-check this content.
AI/NLP-facilitated customization and summarizations
- One potential use of these tools is to generate multiple, customized versions of a research report: an analyst might write a research report for distribution to investment professionals and use an AI tool to create a version aimed at the general public.
- Another potential use of these tools is to speed up the process of creating summaries, abstracts, and overviews.
- There is also interest in the potential of NLP tools to use one analyst’s research reports to train the tool to write in that analyst’s style; while this does not address the data accuracy concerns, this is just one example of using highly curated input data to inform the output.
Considerations regarding input data
- Who owns the output?
- How to identify/cite sources
- How to ensure accuracy
- Who assumes the risk for the output?
- Some buyside firms may want to use the sellside research that they are entitled to as input data into tools they could use to generate investment ideas. Sellside firms would likely want to (and/or need to, for compliance reasons) ensure that when their content is used as input data, the connection between the original and the output is maintained.
- At what point does the content creator’s responsibility end from a compliance standpoint?
- At what point does their ownership of the generated idea end?
- Research departments will need determine how to know when content they are using as input data has been created using AI.
- At the moment, many firms have decided that the only input data they are comfortable using is content that they have created.
AI-facilitated idea generation and research
- Investment professionals are also interested in how AI can speed up the idea generation process by analyzing large amounts of data, identifying trends, spotting connections, etc. In some situations, this comes very close to content creation, but in other situations, it simply represents a much faster way to sift through large amounts of data to provide insights.
- Buyside firms indicated that they have strong relationships with their sellside partners, and value and trust their ideas. While they would like to use AI to help them sift through large amounts of research, they indicated that it would be critical for them to be able to readily identify the source of any investment idea such a tool produces, because it is the analysts’ expertise that they value, not the AI tools.
- Identifying content that was created using AI tools will also be important. Analysts obtain their information from a wide variety of sources; as more of these sources leverage AI tools, authors will need to be more diligent about ensuring that their sources are accurate. As tools to create content get more sophisticated, so will the tools available to identify and analyze such content.
- Tom Jordan (Jordan & Jordan) pointed out that FINRA and the SEC are always looking for feedback and input. If RIXML makes progress on any of these topics, we might want to reach out to them. Jordan & Jordan has experience in this area, including on topics discussed in the Financial Information Forum, another industry association they manage.
- It will be important to keep the required disclaimers and other fine print connected to information that travels through AI tools.
- Much of what we discussed involved the need to maintain trust. In order for an AI tool to be useful, it would require feeding trusted data into a tool, having the output provide links to the data sources for each assertion, and ensuring that the tools were only ingesting authorized content and providing the output only to authorized users.
One thing became clear during our discussion: there is a lot to learn.
If you missed the meeting, don’t worry - this was just the beginning. We will continue to discuss these issues at quarterly meetings and at more focused meetings geared to addressing specific aspects of the larger issue.