Most IT professionals are at least somewhat familiar with Relational Database Management Systems, or RDBMS. That is, your: MS SQL Server, MySQL, Oracle, DB2, Sybase, etc. This is the concept where you store data into a durable storage structure, a “record” of a “table”. As a side-thought, you can also use foreign key constraints to relate data.
This concept has worked quite well for decades, and is how most data is stored. Well, up until the recent “social” era.
The birth of “graph” databases:
With the “social” era of: sharing, likes, social platforms, social logins, etc – there came about a different way to store data, via a “graph database”. As far as I know, there are basically 3 kinds databases in present day:
- RDBMS – described above, with tables and foreign key constraints.
- NoSQL (Document database) – which basically stores JSON documents with a dynamic structure. (e.g. MongoDB, CouchDB, etc)
- Graph Database – which primarily focuses on the relationships between nodes in the database. (e.g. Neo4J and others)
Think for a moment about browsing Facebook. When you scroll, an entry gets added for “WAS_SCANNED”, if you watch a video, a entry gets added for “WAS_WATCHED”, when you like something, an entry gets added for “LIKED”, etc. So, this graph database is constantly taking notes about every tiny thing you’ve been doing. All social platforms work this way.
This all starts becoming very interesting when you want to ask a complex question. Considering what is publicly known about Facebook, as an example, an advertising person there should be able to ask:
“Show me all of the female users who live in Montana, who Like liberal-leaning things, who likely own a gun, are opposed to abortion, who shop at Wal*Mart, are at least a little overweight, and have a birthday coming up within the next month or so.”
The significance isn’t the content of the question, but the fact that there are so many facets!! If you came from an RDBMS background, each one of those facets represents at LEAST one join, but probably many. I mean, it might take several days to build-up and get the query right, to go get that list.
With a graph database, these kinds of complex questions are first-class ideas. You can basically query the core elements that you want, and the graph database walks through and finds only the related data that fits. For someone familiar with the query language that is used for that particular GDB, you might build up that query in minute or two.
Graph databases are intended to do all of this extremely fast, and are built with the intention of storing massive amounts of data.
Graph databases aren’t just useful to super-creepy stuff like this, it’s a really powerful platform for managing large amounts of database about relationships between things. More on this in a minute.
First – where should you start? I’d say www.neo4j.com – it’s a very mature platform, is simple to use, has GREAT demos and samples you can use right away – and I really like the Cypher language it uses. There are some major RDBMS platforms that “support” graph concepts, but I haven’t seen anything I liked.
Also, if you want to explore other options, see: https://en.wikipedia.org/wiki/Graph_database#List_of_graph_databases
With Neo4J, right from the first interface, you can load samples and directly interact with your graph:
In fact, a great place to start is to run “:play northwind-graph” or select that from the Start on the left. That walks you through how to load data, and actually loads a bunch of sample data into a graph:
The Northwind example above – here’s what the “graph” looks like for all of that data (run “match (n) return n” to select all), zoomed way out:
Depending on the kind of query you write, you can play with an interactive graph like above, or get back a grid of data. When you are using this data from a software program, you would call a REST endpoint and get this data back as a JSON document.
Using a a graph database for AI with a chatbot
This was supposed to be about Artificial Intelligence, so how does all of this apply? Well, at work, as we continue to evolve our chatbot, we really need to support the idea of a “conversation”. Bots, it seems to me, are established in these phases:
- Phase 1 – You ask a single question and get the answer. Every question is a new start. Super simple, and limited.
- Phase 2 – Support for conversations, where you can build on previous answers. For example: “of those, how many were in production?” or “from those, which were bad?”. All bot frameworks support this in one form or another. That is, for one “conversation”, one day. If you want to persist what transpired during that conversation, you need to come up with something on your own.
- Phase 3 – Support for a relationship with the bot, where the bot writes down all of the things you asked, all the answers it gave you, and understands the relationships between those things. Each user builds up pathways in the graph database similar to how neurons myelinate in the brain. The bot has a stronger/weaker connection with each user, depending on how much that user has interacted. This is, to me, a basic “brain”.
Are you beginning to see where a graph database might come in handy – hint: for Phase 3? If we stored all of the user<->bot interactions in a graph database, you could ask questions like “last week, when those servers were down, how are they today?” – and because the graph would know you, and know what servers are (thanks to the NLP strongly defining them), and since everything has a timestamp and it knows everything you and it said in the past, it could go back in it’s “memory”, like a human, to “remember” what you talked about last week – and then do a new lookup based on the findings.
Now, the crazy part is that like NLP’s, graph databases are pretty mainstream now, and to build a “brain” for a bot like this is just a matter of coming up with a strategy and implementing it. What was science fiction 5-10 years ago, is pretty accessible today!
First, graph databases are pretty cool – and Neo4J has a pretty cool model where you can just stand up one on your desktop. A GDB can store something as complexticated as user actions on Facebook for all of the advertisers to consume, but it can be used to store all kinds of other practical, useful things. For example:
- Within any application, you could store all of the things the user does – not because you are a creepy advertising stalker, but because you could “learn” what that user does most, and could adapt the application better to how they use it.
- For a chatbot, like described above, the chatbot can get to “know” the user and get “smarter” along the way. In the same way that a friend “knows what you mean”, a graph database-backed chatbot could similarly learn what you “probably meant”, and over time get increasingly accurate with that guess.
- I imagine one day operating systems would likely include a graph of everything that happens on the computer. In the case of a server, it could easily tell which parts of the file systems have a lot of activity, or which users do most of the activities. Or, in addition to system logs, you might be able to see trends of problems (and maybe even predictions).
So the idea here is just to expose you to graph databases if you are not familiar, and if you are building a chatbot, this is about the best way I’ve come up with for how to manage “conversations” with the bot. I’ll be working on this in the coming months, so I’ll likely write another post on whether this did actually work or not.
Am I missing any technology here? Do you have some ways to manage chatbot conversations? Leave a comment below – thanks!