Listen

Description

Ashleigh Faith

Knowledge graph technology can help content programs in many ways: to aid content discoverability, to discover valuable insights in existing content, and to build transparent personalization programs that build brand loyalty and foster customer trust.

Ashleigh Faith has worked with content and knowledge graphs for more than 15 years and has a knack for explaining the benefits of the technology, most notably via her very popular YouTube channel.

We talked about:

how structured content aids in content discoverability
how to help both machines and humans understand the content in documents
the right way to benefit from machine learning in content practice
the difference between AI and machine learning
what a knowledge graph is, how it works, and it can help you infer and extract meaning from content
how knowledge graphs enable linked data (not necessarily Linked Open Data)
how a knowledge graph used as a data fabric or a data mesh can quickly connect disparate data and content sources
how these technologies make humans more effective at what they're already doing
the differences between table thinking and graph thinking
the trend of relational databases to add graph componentry
how graph technology can help you find connections between assets that you could never find with traditional technology
how knowledge graph technology can capture human knowledge and turn it into actionable business activities
how knowledge graphs help with content personalization
how a transparent "personal knowledge graph" - a collection of interests that your customers share and your content supports - can build brand loyalty and foster customer trust
the importance in the midst of all of this fancy content, taxonomy, and metadata technology to stay focused on the human beings you are serving.

Ashleigh's bio
Ashleigh Faith is the Director of Platform Knowledge Graph and Semantic Search at EBSCO, one of the largest global academic search engines. Her focus is bridging the gap between users and content. She has her PhD, focused on Advanced Semantics, and she has worked in the search and data community for over 15 years. Her main focus is knowledge graph, semantic search, and general information architecture.
Connect with Ashleigh online

LinkedIn
YouTube

Video
Here’s the video version of our conversation:

https://youtu.be/B7uUTcT5jf4

Podcast intro transcript
This is the Content Strategy Insights podcast, episode number 121. You may have heard about knowledge graphs from a friend in data science or enterprise governance, or maybe in a news story about the latest trends in scientific research. Ashleigh Faith wants you to know that content programs can also benefit from this powerful technology. Whether you're trying to make your content more discoverable, looking for fresh insights in your existing content, or building a truly user-centered personalization system, knowledge graphs can help.
Interview transcript
Larry:
Hi, everyone. Welcome to episode number 121 of the Content Strategy Insights Podcast. I am really delighted today to welcome to the show Ashleigh faith. Ashleigh is a long time... Well she's worked in publishing and content for a long time doing really fancy technical stuff and graph stuff. She also does a brilliant YouTube channel that explains the stuff that she works on all day. So welcome to the show, Ashleigh. Tell the folks a little bit more about what you're up to these days.

Ashleigh:
Wow. Thanks, Larry. So, yeah, I've been working in content for quite some time. I'm at, I think 15 years now, and I've always worked in and around publishing. So, making sure you understand the importance of structured data behind the scenes of the content, right? Like you can't do any machine learning without that. You can't do any enhancements without that. I fought that good fight for a long time. Still am to a certain extent. And now I primarily focus on on search because content is fabulous, but if you can't find it it's not going to do you much good.

Larry:
Yep. Yeah. No. And you just said a lot right in there. I want to tease out one of the first things you talked about was you talked about data and the importance of having access to that for tasks like machine learning and optimizing search functionality and stuff. Can you talk a little bit about that? How you help make content more discoverable with . . .

Ashleigh:
Yeah, well, just starting with the schema itself, I know XML is not the thing nowadays that people really like a whole lot. It's mostly JSON documents, but looking at Kurt Kaggle, he's saying it's coming back. So, maybe we got to look at it again. But all of this to say whichever version of schema you want behind your actual content, it's incredibly important to have well structured information there because just as an example, if you're doing any machine learning and let's say you have a very large article or very large asset, it's going to take you more time and money to process that. And you're probably getting a lot of noise instead of the targeted information that you want the machine to really pay attention to. So instead, if you have good structured data behind the scenes, you can target. It's called zoning. You can target your machine learning model to specific areas in the content that you want to pay attention to the most.

Ashleigh:
So for instance, in things in the journal space, like academic journals, the very first paragraph and the very last paragraph are usually the summarization of what are we trying to achieve? What did we find and why is it important to you? So the other stuff in the middle is really good, too, but from a machine learning standpoint, you probably only need to look at the first two things there.

Larry:
Right. But there's also, and I'll try not to go too far down this tangent because I'm a structured-content person. But you look at that first and last paragraph, and you can infer a lot about what that article's about, but there would be, are there other... How do you evaluate? There's probably data and tables and things built into that that would be of use to other people. Can you talk a little bit about how you tease out the helpful content that can help ML do its job better and help people do stuff better out of those kind of documents?

Ashleigh:
Yeah. I mean, that's a very good point. I mean, if you're looking at a table, depending on your use case for machine learning, you might not want it to look at those tables. It might get a little confused. Especially in academic publishing there's a lot of things that you would think were data, like an equation, for instance, but it's actually just an image. So you have to be careful of some of those things, because the machine's not going to really understand that. We also like to make sure humans can understand what is on the page effectively. So sometimes we'll have call outs or other things that are visually appealing in the full text. Well, machines don't understand that. If you have maybe a multi-column layout and you don't have it correctly tagged, then the machine is literally going to read the whole way across the page and not have a clue as to what the actual content is saying.

Ashleigh:
So those are just some examples. If you're trying to pull out some of those images and do some image recognition, there's a lot of great tools out there. A good one that I've used that you can just go and try out if you want right now is Amazon Rekognition. It does so much more in the last few years. It can identify a person versus a dog. It can identify if something is offensive or not, depending on which geographic region you're in and what that means to you. There's a lot of really cool stuff that you can do with machines, if you can extract those images. But if they're not tagged appropriately, your machine is not going to be able to pick up on it. Machines are no different than very smart calculators, right. They're just tools for us to do our jobs better.

Larry:
Yep. And all of what you're... Not all, but a lot of what you're talking about right there is like the difference between the data that's embedded and encoded and included in a document and the document itself. And I think a lot of CMSs and a lot of publishing and a lot of online content stuff is still very like page-level, document-level stuff. And a lot of what you're talking about, you kind of have to get below that and get down to the... So are we talking mostly about metadata or you're talking about actually extracting facts from these documents themselves?

Ashleigh:
So both, both. So, having well structured data on this back end is incredibly important for all the reasons that we just described. But also if you're trying to extract something that is meaningful, like what are the most impactful sentences or there's this great tool that's free out there. I think it's called Consider that you can take a few documents that are about similar things and there's a taxonomy, is super helpful, and assemble what are the main arguments from those articles and synthesize them. Now I am one that believes if anybody tries to sell you on a tool that can write human-like abstracts, they're lying. Most of them are not very good and they seem little uncanny valleys. So just be careful of that. And you want to be transparent if you do use those because people get irritated when you don't disclose that they are machine created.

Ashleigh:
But that's the power of what we're talking about here. I'm an advocate of saying that today in our world, content the way we would traditionally define it is just a wrapper of data for that particular asset. Everything inside of it is data. And it should be so that you can use it in multiple ways, which is going to give you a bigger bang for your buck.

Larry:
I got to say, I love that.