Panos Alexopoulos
Any knowledge graph or other semantic artifact must be modeled before it's built.
Panos Alexopoulos has been building semantic models since 2006. In 2020, O'Reilly published his book on the subject, "Semantic Modeling for Data."
The book covers the craft of semantic data modeling, the pitfalls practitioners are likely to encounter, and the dilemmas they'll need to overcome.
We talked about:
his work as Head of Ontology at Textkernel and his 18-year history working with symbolic AI and semantic modeling
his definition and description of the practice of semantic modeling and its three main characteristics: accuracy, explicitness, and agreement
the variety of artifacts that can result from semantic modeling: database schemas, taxonomies, hierarchies, glossaries, thesauri, ontologies, etc.
the difference between identifying entities with human understandable descriptions in symbolic AI and numerical encodings in sub-symbolic AI
the role of semantic modeling in RAG and other hybrid AI architectures
a brief overview of data modeling as a practice
how LLMs fit into semantic modeling: as sources of information to populate a knowledge graph, as coding assistants, and in entity and relation extraction
other techniques besides NLP and LLMs that he uses in his modeling practice: syntactic patterns, heuristics, regular expressions, etc.
the role of semantic modeling and symbolic AI in emerging hybrid AI architectures
the importance of defining the notion of "autonomy" as AI agents emerge
Panos' bio
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics and software, contributing in building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, Panos currently works as a principal educator at OWLTECH, developing and delivering training workshops that provide actionable knowledge and insights for data and AI practitioners. He also works as Head of Ontology at Textkernel BV, in Amsterdam, Netherlands, leading a team of data professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos has published several papers at international conferences, journals and books, and he is a regular speaker in both academic and industry venues. He is also the author of the O’Reilly book “Semantic Modeling for Data – Avoiding Pitfalls and Dilemmas”, a practical and pragmatic field guide for data practitioners that want to learn how semantic data modeling is applied in the real world.
Connect with Panos online
Video
Here’s the video version of our conversation:
https://youtu.be/ENothdlfYGA
Podcast intro transcript
This is the Knowledge Graph Insights podcast, episode number 23. In order to build a knowledge graph or any other semantic artifact, you first need to model the concepts you're working with, and that model needs to be accurate, to explicitly represent all of the ideas you're working with, and to capture human agreements about them. Panos Alexopoulos literally wrote the book on semantic modeling for data, covering both the principles of modeling as well as the pragmatic concerns of real-world modelers.
Interview transcript
Larry:
Hi everyone. Welcome to episode number 23 of the Knowledge Graph Insights podcast. I am really excited today to welcome to the show Panos Alexopoulos. Panos is the head of ontology at Textkernel, a company in Amsterdam that works on knowledge graphs for the HR and recruitment world. Welcome, Panos. Tell the folks a little bit more about what you're doing these days.
Panos:
Hi Larry. Thank you very much for inviting me to your podcast. I'm really happy to be here. Yeah, so as you said, I'm head of ontology at Textkernel. Actually, I've been working in the field of data semantics, knowledge graph ontologies for almost now 18 years, even before the era of machine learning, back when it was mostly about symbolic AI. Yeah, I've been working a lot on this field. I've seen its ups and downs, I've seen it's good and bad things, and I think our discussion is going to focus on these things. What I've been doing lately now with the field of AI, and I think... No, let me say this differently. I think that the field of data semantics, even in the era of AI and large language models, et cetera, is even more important and this is something that I'm actively looking now. I'm actually looking a lot on the synergy and in the interrelation between large language models with data, with knowledge graphs and ontologies.
Larry:
Yeah, I'd love to talk more about that because that just seems to be in the air. One thing that I want to talk about, and I realized I totally left out of my intro, that you wrote this brilliant book called Semantic Modeling for Data.
Larry:
That's right. I kind of buried my lead there, as we say in journalism, but one of the first things that, and we talked about this a little bit before we went on the air. Can you describe to folks what semantic modeling is? What are we doing there when we're modeling?
Panos:
Yes. So the definition I give to the term of semantic modeling is the practice of building descriptions of data that have three important characteristics. The first thing, the first characteristic is that this description should be accurate, that this we should describe data and domains in a correct way, right? We don't want to have statements and assertions that are wrong. The second characteristic is that these descriptions should be explicit both for people and machines. What does that mean? If I have a data, if I have a data set, a set of data and I give it to you, and when you read it, you cannot understand what it is about. That's not good semantics, right? The meaning is lost, and the same applies for systems, for machines. If I call an API, I take some data back and my machine, enterprise system is not able to interpret the meaning of this data, then I have an issue.
Panos:
The third characteristic is agreement. It's not enough to have explicit meaning on data. It's also very important that we both agree on the validity of that meaning and that we serve the same meaning, right? And it starts with, I can give as example very simple things like what is a knowledge graph? If you go and you try to find a definition of what a knowledge graph, you will see many definitions that are not necessarily consistent to each other, right? So there's already disagreement there, and actually the word of a good ontology, semantic modeler would be to try to start with defining that. So that's what semantic modelers do and the artifacts that we build, this is practically an umbrella type, semantic modeling that covers a lot of artifacts that we build. These artifacts can range from database schemas, taxonomies, hierarchies, glossaries, thesauri, ontologies as many of our audience already heard, knowledge graph, et cetera.
Panos:
So when you do semantic modeling, you're not building necessarily one type of artifact. Then the key to remember is that you are describing data, you're describing domains by means of formal symbolic representation. That's also another important thing that semantic data modeling is about creating explicit human understanding, not only system understandable, but also human understandable descriptions of data. That means that embeddings, large language model, et cetera, do not fall into this category because their representations, their underlying representation are subsymbolic, there are numbers, so they maybe contain some meaning. They may be encoding some meaning, but according to my definition, they do not fall into the practice of semantic data modeling.
Larry:
Got it. The way you just said that sub-symbolic, because I've heard a lot of people talk about the symbolic AI and neural networks or machine learning and other, the sub-symbolic stuff. Is there a distinct dividing line there or is this a continuum going up and down then the AI stuff?
Panos:
For me, I don't think it's a continuum. Sub-symbolic is when you have numbers practically, right? So when you have a word, and this is encoded by a vector of 1600 or so numbers, this representation is subsymbolic. Symbolic means words, symbolic means human language, right? Terminology that doesn't happen in machine learning.
Larry:
Got it. Yeah, no, as soon as you said that, I was like, "It's more of a Boolean." Like, "Yep, it's symbolic or it's not."
Panos:
Of course of you can have, let's say hybrid models. For instance, you can take a knowledge graph of entities and the attributes of each entity. You can have the name of the entity, some other characteristics, and then you may have also an embedding of that entity. So you can have at the same time, two representations, a symbolic one and a subsymbolic one for the same entity, and you use that for different purposes. For example, an embedding is very good to find similarities between entities, which is more difficult to do it in the symbolic space.
Larry:
Interesting, yeah. And this gets into RAG architectures and other hybrid architectures that are emerging, or does it? Is that how those come together that if you're looking for similarities, you revert to like a subsymbolic system, but if you're looking for, I don't know, knowledge-based stuff, you shift into the symbolic.
Panos:
Yeah, so RAG is a nice example, right? So what's the idea of RAG? RAG stands for Retrieval Augmented Generation. It's the idea that you have your LLM, but you want it to give you answers based on your own data, based on your own knowledge. So when you give a prompt, when you give a query to the LLM before the LLM answers, you have a step where you retrieve relevant information to your query from some database, from some other set of documents, or as it's lately more into vogue structured data and knowledge graphs.