Using AI and Web content to improve organization and discovery of knowledge

nlp kb ai

31 Jan 2022

7 minute read

Knowledge management tools and approaches are getting attention as more and more begin their personal knowledge bases (“second brain” / “digital garden”), ie repositories of notes, linked and tagged to form a cohesive whole. Lying under many of the main offerings for knowledge base tools is a large time commitment to label, organize and group your content, codifying the pathways between ideas your brain has already traversed. The resulting structure, although useful, places heavy friction on the user’s knowledge management. How can we use technology to optimize and automate away this load, while building useful and interesting knowledge structures?

An Interlude on the Place of Web Content in our Knowledge Bases #

Consciousness not only flows like a stream, continuously moving with ever-changing content, but also ebbs like a breaking wave, outwardly expanding and then inwardly retreating. This perennial rhythm of the mind—extracting information from the external world, with-drawing to inner musings, and then returning to the outer realm—defines mental life. - Psychology of Learning and Motivation

I was surprised to find how minor the place of web content (articles or blog posts for example) is in most popular knowledge base software. Most knowledge bases merely allow you to link to web content, but lack built-in mechanisms to embed and import large quantities of web content as first-class members of your second brain.

That content is often fundamental to our learning and creativity online. I want my knowledge repository to be more than just a space for me to gather and store my own knowledge — it should be a place for me to discover new links between my knowledge and that of others online. This discovery can bring forth new, original thought and ideas.

In this process, focused on creation / exploration / archival, it should be very easy to directly integrate web content, and in doing so link our personal knowledge base to the Internet, one of the main means of information discovery and consumption. I discover most of the insightful content I consume online, and I learned to code thanks to posts online. This vision inspired me to build Archivy, my personal knowledge management software centered around a) integration with web content and b) extensibility, so users can easily write plugins and interfaces to fetch digital content in the way they please, like posts from forums, social media sites (eg. Hacker News).

In my attempts to directly integrate web content into my creative process, I discovered another problem: I was overwhelmed by the volume of content. I want the right articles to surface when they’re relevant. I don’t want to have to dig too much for potential links in my knowledge.

This is why I started exploring how Natural Language Processing, the use of algorithms to process text, could help organize my knowledge base, drawing from my notes and web content efficiently.

My results are useful whether or not you end up integrating web content in your knowledge base.

Tags in knowledge bases are useful when searching for a given concept or idea in a large knowledge base, where the large amount of search results for certain queries gives each of them limited value.

This organization through labeling and grouping takes significant effort. This time commitment is also necessary for bidirectional links, an increasingly popular feature that allows you to embed references to items in your knowledge base. I find these links more valuable for external linkage: I embed web content into my knowledge base so that I can create connections between my own ideas and the plethora of thought online.

Tagging and bidirectional links enable ambient findability, a term coined by Peter Morville. In the context of knowledge management / discovery / creation, I interpret mechanisms for ambient findability as those that allow you to “stumble” onto content and ideas that are useful to you in a way you wouldn’t have thought of otherwise. You can cultivate a form of intended serendipity. My project Espial can do this by suggesting a link between two distinct items instead of simply reminding you of a pathway you had already created like most knowledge tools. It aims to make discovery and the act of connection —fundamental to the way we think— more efficient.

Live readonly demo of Espial

For example, when I started writing this post, I could look at the notes and content I had on knowledge, and mechanisms for information access:

knowledge results

These results are actually articles, saved into my knowledge base. But I didn’t explicitly say that they were related to the general concept of “knowledge”, my software figured that out itself! Seeing these similarities can help me process and organize my thoughts before I start writing.

I think this type of ambient findability is useful for the diffuse mode of thinking, the “discovery” mode where you’re exploring and considering a wide network of ideas. You’re open to integrate new lines of thought that can link to and improve your own. Examples of this phase are when you’re researching a new idea for a blog post, or deciding to start a project. It’s the brainstorming and exploration phase, before you enter the focused writing / building mode. This period would benefits tremendously from systems that automatically show you new pathways for your current thought —— surfacing domains, ideas, and directions you can brainstorm and explore, related to your focus.

In this mode, your brain is constantly reacting to new inputs, clustering them together, connecting different ideas and hopefully finding new ones. For me, this is one of the main value of tools for thought. The network / hierarchy of organization you build to obtain this, is not a result but merely a means to this end. This is why I’m sometimes confused by the beautiful pictures of huge knowledge graphs, like the one below, in which the aesthetic of the graph and the effort it represents is conflated with the real goal of creating new thought, in rediscovering the old.

The structure is aesthetic but it’s not the main point, the real utility lies in getting the right connection displayed to you at the right time as you’re thinking.

knowledge graph - source

Lessening Friction and Effort, While Preserving Ambient Findability #

So, how do we keep useful networks that allow us to surface ideas productively, without spending large amounts of effort and time labeling and grouping? This is even more necessary if we want to make use of web content in our knowledge base, with which the mass of content our digital brain stores can become large.

The tools and software I am building try to position themselves as an aid in this process, to lessen, if not remove, the burden of organizing everything by yourself. More than just a way to gain time, they also suggest connections you yourself might not have found.

Indeed, using artificial intelligence and natural language processing, one can automatically generate a network of connections, links and ideas between their notes, emulating those made manually by many current knowledge base users.

Human curation is still needed to decide what patterns / links should be saved. See below for an example output of my current iteration of this software, where red nodes are items of my knowledge base, and white nodes are concepts found by the program. The way it’s represented also allows for interesting discovery of “clusters”:

example graph Sample concepts from said graph: family, tools, code, ideas, books, courses, learning, information, future, money.

This graph construction is a useful tool for exploration in and of itself, but the underlying process also allows other means of discovery by highlighting the tags in which the model is most confident, and also finding links between notes.

See the project documentation here and watch the demo gif below to see what Espial can do.


Subscribe to the blog's Valuable Entropy newsletter: