Listen

Description

Michael Iantosca

Where content, knowledge management, and AI converge, you'll find Michael Iantosca.

As many in the AI world flock to probabilistic models like LLMs, Michael takes a deterministic approach to content management and knowledge engineering, using ontologies and knowledge graphs to ground content in a concrete facts.

This approach embodies his insight that content and the models that describe it are not static information but rather valuable, ever-evolving enterprise IP assets.

We talked about:

his 44-year career in content, knowledge management and localization/globalization roles
the three pillars of his work: content, knowledge management, and engineering
the need he sees in his work to move away from probabilistic, vector-based models to deterministic, neuro-symbolic models like knowledge graphs
how he decides which models are appropriate to use with each of the varied kinds of data he works with
his explorations of how to automatically construct a knowledge graph to use to power generative AI solutions
how he acquires and develops ontology skills in his team
how graph technology supports the "total content experiences" he builds
how the non-static nature of content makes it a poor candidate to be managed in a static system like a vector-based model
the relative merits and utility of 1) deterministic retrieval for structured content and 2) probabilistic retrieval for unstructured content
the power of combining content models, knowledge models, and ontologies and how they can become crucial enterprise IP assets
his belief that we are entering a golden age of content and knowledge engineering

Michael's bio
Michael Iantosca is the Senior Director of Knowledge Platforms and Engineering at Avalara, a sales tax automation company. With over four decades of leadership in technical content management, Michael has been a pioneer in advancing the profession, driving innovations in structured content, intelligent authoring, and scalable knowledge platforms. Renowned for bridging engineering and content teams, he has championed the adoption of AI and cutting-edge technologies to enhance user experience. A thought leader and mentor, Michael continues to shape the future of technical communication through his expertise and passion for innovation.
Connect with Michael online

LinkedIn
Medium
ThinkingDocumentation

Video
Here’s the video version of our conversation:

https://youtu.be/WG9Nl5OY3QI
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 16. A lot of work in the AI world these days is about vectorizing giant collections of static, unstructured content and data for LLMs. Michael Iantosca has worked for decades in a world where content is dynamic, always precisely structured, and contextualized with rich metadata. So he has a different take on architectural innovations like graph RAG, favoring knowledge-based deterministic retrieval of content over vector-based models and probabilistic methods.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 16 of the knowledge graph Insights podcast. I am really delighted today to welcome to the show, Michael Iantosca. Michael is currently the Senior Director of Knowledge Platforms and Engineering at Avalara, the big tax-compliance automation software company. He's also got a long history. He's spent a couple of decades, a few decades at IBM prior to his role at Avalara. Welcome to the show, Michael. Tell the folks a little bit more about what you're up to these days.

Michael:
Larry, thank you for having me. It's a pleasure and an honor to get a few minutes to talk to you today. Yeah, I have just started my 44th year primarily in the professional content space, but also in the knowledge management and localization globalization space as well. I have been involved with content since the early days of SGML that began the structured content revolution and worked my way up through the professional content ranks. I'm also an engineer in IBM's Grand Wisdom. I was trained for years as a developer, so I do span both worlds.

Michael:
My responsibility is to develop some of the world's most advanced content supply chains for creating content for customers, delivering it, as we like to say, "Deliver the right content to the right person at the right time and in the right experience." That's my north star that drives that. I lead an engineering team that builds those platforms that services multiple groups throughout the enterprise for their content creation and delivery needs, whether that's in product, user assistance, contextual help, knowledge centers, help centers, support sites, and a litany of other channels by consolidating that entire supply chain, so that we can write content once and deliver it in many different channels, including chatbots, generative AI.

Larry:
Cool. I love that ... I think, I'm trying to remember only 16 episodes in, but I think you're the most content-ey person I've had on the knowledge graph Insights podcast. One of the things we talked about before we went on the air was this notion of you're like, "Hey, guys, dear engineers, it's not just about data, it's equally about knowledge." And in this world, the knowledge graph engineers and ontologists, they're on board with that, but you also bring this content perspective to it. I'm really curious how those three concerns combine in your work. Do you approach data stuff differently with a content lens on? And especially the knowledge management part, because each of those is its whole other thing, but you're working, your title, you're a knowledge platforms guy, you're combining all three of those. Tell me how that manifests in your work.

Michael:
Yes, that's critical. We see content and knowledge management, and when I say knowledge management, I'm talking more about taxonomies and ontologies and knowledge graphs and engineering, the actual coding and infrastructure of building out models and solutions in the AI, especially generative AI space, as a holy trio, if you will, that have to have equal footing. Developing really advanced generative AI solutions is not just a coding problem. It is equally as much of a knowledge management problem and equally as much of a content challenge.

Michael:
Content isn't generic. Content constantly is changing. The management of that constant is constantly changing. We can't treat content any more generically than we can treat data that changes daily, sometimes weekly. The state of that content, the purpose of that content is not static. We need the content teams involved because they have the very fuel of our generative AI solutions, but we also need the people that understand advanced semantic knowledge management that can help power both that content and make it intelligent and then feed that to the generative AI models, so that these models can be far better than they are today.

Larry:
Yeah, and when you say today, that's going to be way different even tomorrow from when we drop this. We were talking, again, before we went on about trends in the development that's developing around the implementation of graph technologies across all of this stuff, but in particular to content. I wonder if you could talk about how you see those trends. One of the ways I've seen is from LLMs to RAG to graph RAG and now, Tony Seale and folks talking about neurosymbolic loops and hybrid AI architectures. How was that unfolding in your world?

Michael:
That's a really good question too. I think almost everybody who starts out in the generative AI space follows the same basic path. About three and a half years ago, I think it was, it took us about a week to take a simple vector database, Pinecone, I think we used, and maybe a couple of hundred lines of Python code, and we built a RAG, retrieval augmented generation model, because we didn't want to use the general large language model that uses the public content, and we didn't want to feed our public content to train a large language model. It was natural that we wanted to have a private data model of our own and use a vector database to do that. But that's really an old, old model at this stage. It is a probabilistic retrieval model and therein lies its core weakness as well. What we wanted to do was move away from probabilistic ... oh, are you there?

Larry:
Yeah. Oh, do we have an internet thing? I lost you. Let me check my internet.

Michael:
I apologize. I had a burp in connectivity.

Larry:
Oh, no worries. Okay.

Michael:
Let me pick up again.

Larry:
Yeah. Go ahead.

Michael:
What we want to do is move away from probabilistic models like relying completely on vector-based retrieval and move toward deterministic models, sometimes what we call neuro-symbolic models and use mechanisms such as knowledge graphs, which are far better at providing true reasoning and true inference based on a concrete set of facts or what we call the ground truth. I think what you're seeing now in the marketplace is the initial models that are being deployed are good. They're yielding value, people are excited. They're not perfect, but as development teams reach those plateaus, they want to get better. They want better than where they are. This is what I call the precision paradox. The precision paradox says that as models improve, the tolerance for errors and lack of accuracy or relevance or contextual truth declines. We're partying right now with these models, and then, eventually that party is going to end and we're going to have to get down and do some of the serious work necessary to move to these deterministic, reliable models that are based on ground truth that we control, not that in LLM controls.

Michael:
That's what I see the trend going to. I think, every morning I wake up, I think I read at least 10 to 20 articles, all different models and variations.