Why You Should Start Thinking About Your Organization as a Graph

October 18, 2018, 12:00 am

≫ Next: Chip Design on Graphs: 5-Minute Interview with Chuck Calio, Offering Manager, IBM PowerAI

≪ Previous: Decyphering Your Graph Model

Look at how a knowledge graph works to give your organization more wisdom.

Do you think there is no space for a graph database in your company? Or it would be a huge effort to integrate a graph database into your product?

I have to tell you: You can use a graph database like Neo4j without touching your product, and you can use it for managing your company’s knowledge as well as to improve your software development process. So, even if your business problem is not inherently graphy (hard to believe in 2018), there are a few reasons why you should think about your environment as a graph.

Without knowing your core business, I am pretty sure that your most important connected dataset is your organization. If you’re in the IT sector, you are already connected and your most important company values are already encoded into graphs.

We have learned that graphs are eating the world, and in this blog we’ll show you why they have already eaten your entire software development process and why you should start using graph databases (if you have not started yet).

Software Architecture and Moving to Microservices

Let’s start with your architecture. I am certain that, if you are up and running with your business for more than a few years, you have experienced the typical hype cycle inside your organization every season. You are on your way somewhere from monoliths to serverless, and there are philosophical debates about what could be better in your system.

It is possible that you’re working on loosely coupled microservices, which often have dependencies (by which I mean both development and deployment dependencies) so complex even your architect and service owners need a proper tool to identify and analyze them.

Even if you are doing microservices very well, there will always be a person from your company who will describe your software architecture as a dependency hell, or a distributed monolith.

Anyhow, you cannot override Conway’s Law; your architecture will always be a copy of your organization structure. This is always a complex network/graph, except if you do a one-man show or an under-funded startup with your college besties.

A depiction of a choatic microservices architecture.

So we have a complex architecture – a network of our software components – which depend on many third-party software components.

Do you remember the left-pad incident? That was the Jenga tower of JavaScript libraries two years ago. After the guy who implemented the name-conflicting Kik package removed all his packages from npm, half of the internet was broken. Thousands of builds failed around the world, thousands of JavaScript developers were crying, and no one thought that a package dependency could cause a temporary worldwide crisis. It was a sad day for the IT sector.

So lessons learned, I guess. If you develop open or closed source software, you must be aware of a few facts:

An average of 80 percent of the applications consists of third-party components, mostly open source
Almost 50 percent of the third-party software components of those applications are outdated, usually a few years old

To realize this, I had fun with a project last year. I loaded the Libraries.io dataset about the open source packages from the different package managers into a Neo4j instance, and I figured out that our legacy software depends on thousands of other libraries written by random people around the world. You should have the visibility and the control over your software product dependencies to have the proper business continuity. This is a perfect use case for a graph database.

The Challenge: The Chaos of Delivery Dependencies

If you are aware of your architectural and third-party software dependencies, and you are ready to ship your own software, then you have the challenge to handle your own delivery dependencies. You will realize that your project has delivery chaos, which could (and should) be tracked by a graph. Because no one will understand what should be the step-by-step deployment plan to deliver you should reduce project delivery chaos with graphs.

You likely use some issue-tracking system to support (and to frustrate) your software engineers during your development process – maybe JIRA or similar issue-tracking systems to store all your bug-tracking and project management information.

This can become a mess quickly if you do not use them properly. If you use them as expected, then you will get a lot of useful information about your progress, bugs, critical areas, etc., which provide meaningful statistics.

But your issue-tracking is not just about fixing bugs, it is also a huge, connected dataset and a graph database helps you to discover the valuable insights in the relationships of your tickets. There are hidden values that are almost impossible to discern from your issue-tracking system directly.

If you are a big enough company then every day you realize the problem-solving skills of your software engineers are documented and encoded into comments on bug tickets. There are a lot of value in the tickets and in the relationships of the tickets. Also, if someone commits a change into your code repository, then ideally the developer provides the information about the corresponding ticket as well.

Why shouldn’t you reuse this information to get a better understanding of your own software?

See how a graph database connects data via textual information.

If you take it further, there are a lot of other places where there are more useful textual information like user comments/feedbacks on your product pages or blog posts, conversations and Q&As in developer communities or on Stack Overflow and so on. Or just look around the wild internet where your company will be judged by the public. There’s a lot of related “metrics” about your product or service.

You can hoard all this information into a graph database and you are already halfway there. If you already have some content in Neo4j, then typically the first use case is to make the data searchable and deliver relevant search results to users.

From there, you will be in the positive feedback loop of always improving the search results to satisfy information needs of your users. This is the same if your users are your software developers. They have to find the relevant documents, they have to find the relationships between the documentations and the bug-reports, or the gaps between the requirements and the implemented code. There are a lot of complex queries that can help serve them.

The Solution: A Knowledge Graph

Nowadays, it is not enough to provide relevant results, you have to provide meaningful knowledge to allow users to use the information effectively.

Your company has already accumulated more than a hundred years of knowledge, and it keeps growing. You should re-use this knowledge and the lessons learned to operate your company more successfully in the future. You don’t have to make the same mistakes again during your software development, you can make new mistakes. That is the way of learning, this is how the best companies do their job. So, we have arrived at knowledge graphs.

All learning organizations should maintain their knowledge graph. Previously, knowledge graphs were accessible only to dedicated learning organizations such as NASA. This is no longer the case, and every software development company should have one soon.

Knowledge graphs will become essential, just as your ticketing system is today. I think this is the near future for the tech companies, allowing them to build and enhance a competitive advantage by using this synergy between the graph database, NLP/NLU tools and today’s machine learning algorithms.

It is clear that every organization is a graph from which you can get actionable insights. If you are interested in implementing a knowledge graph based on Neo4j and transforming your data into wisdom, then you should check out GraphAware’s new Hume platform which provides all the necessary components to utilize your corporate knowledge. Because wisdom is the ability to increase effectiveness.

Download this white paper, Sustainable Competitive Advantage: Creating Business Value through Data Relationships, and discover how your company can use graph database technology to leave your competition behind.

Read the White Paper

↧

Chip Design on Graphs: 5-Minute Interview with Chuck Calio, Offering Manager, IBM PowerAI

October 19, 2018, 12:00 am

≫ Next: Graph Algorithms in Neo4j: The Power of Graph Analytics

≪ Previous: Why You Should Start Thinking About Your Organization as a Graph

Check out this 5-minute interview with Chuck Calio from IBM PowerAI.

We use Neo4j to design our next-generation Power chips, said Chuck Calio, Offering Manager IBM PowerAI at IBM.

The future of graph technology is in AI, but it’s not the only use case. From designing hardware (naturally a graph) to dynamic pricing, IBM’s Chuck Calio sees use cases for graphs as myriad and the demand from customers strong.

In this week’s five-minute interview (conducted at GraphTour San Francisco) we discuss how IBM designs next generation chips using Neo4j – and the way those chips accelerate graph algorithms.

Tell us about how you use Neo4j at IBM.

Chuck Calio: We started working with the Neo4j graph database as one of the four NoSQL database categories that we got into as a modern data platform for our customers who want NoSQL databases. But we’re quite pleased, over time, to see the evolution of Neo4j on Linux on Power.

Calio: We actually now use Neo4j to design our next-generation Power hardware, and then we run Neo4j on our hardware, and we use those results to re-optimize Neo4j. And now we have acceleration technology for Neo4j on Linux on Power that will optimize graph algorithms. So we’re super excited about that as one of the next steps in this relationship.

How do you use Neo4j in chip design?

Calio: We use Neo4j to design the chips that go into POWER9, and also our System z. So chips are really gate logics, switching logics, and at the end of the day, if you conceptualize that, it’s a graph. It’s a path forward through logic.

Previously, we used relational database technology. It was really slow – it took too long to change. Now we use graph technology from Neo4j.

We used Neo4j software to design the next generation chips. Then we ran Neo4j on that hardware, and we were able to do some traces and identify areas where we could redesign and optimize Neo4j.

And then, in this new field in hardware called acceleration, which we’re all very interested in, we have the ability to use our Coherent Accelerator Processor Interface (CAPI) to accelerate Neo4j graph algorithms, which is the next growth area, as far as we’re concerned.

How is your partnership with Neo4j going?

Calio: We have a running joke that IBM and Neo4j are recursive – we keep calling each other. I’ve managed a lot of vendor relationships in my role at IBM. Neo4j is a great partner and they have a lot of good products. There’s a really high-level of client interest in graph in general, and Neo4j is the number one graph provider. That’s why we’re really close with Neo4j.

In terms of our engineering partnership, Neo4j is really strong to work with. We have a good executive partnership. Across the board, it’s a very good partnership.

Where do you find graphs especially compelling?

Calio: In terms of graph in general, it’s a very high-growth solution by itself. But also, at IBM, we’re interested in cognitive and AI, and one of the linkages that we see in the future is the linkage between Neo4j graph and cognitive AI solutions, machine learning, deep learning, and those kinds of things. So we have a big vision of the future of both high-value solutions running well together and optimized deeply. A very bright future for both companies.

What do you think the future of graph technology looks like?

Calio: I think the future of graph will be more with AI. That’s my opinion. I see the use of graph to feed artificial intelligence, machine learning and deep learning. In many cases, feature extraction, for example, is difficult to do, especially with large datasets.

There are many other use cases for graph, though. Graph is leading-edge technology for cybersecurity. Clearly, graph is excellent for master data management across the board, not just in the scenario of AI, is important. Recommendation engines are also important to have. An area I’m fascinated with is dynamic pricing, and how we can use graph technology to do that. I think that that’s really one of the things that could fundamentally change the business. I think dynamic pricing is a really exciting use case.

Anything else you’d like to add?

Calio: Neo4j’s a great company to partner with. From an IBM standpoint, I’d like to thank Neo4j for being such a great partner, and I know that the future is going to be really high growth, and it’s very important for Neo4j to be out there across the globe with the high-value solutions that they provide.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com

Take a closer look into the powerhouse behind the analysis of real-world networks: graph algorithms. Read this white paper – Optimized Graph Algorithms in Neo4j – and learn how to harness graph algorithms to tackle your toughest connected data challenge.

Get My Free Copy

↧

Graph Algorithms in Neo4j: The Power of Graph Analytics

October 22, 2018, 12:00 am

≫ Next: 5 Noteworthy Use Cases of Graph Technology and Graph Analytics

≪ Previous: Chip Design on Graphs: 5-Minute Interview with Chuck Calio, Offering Manager, IBM PowerAI

Read the second installment of this series on graph algorithms in Neo4j.

According to Gartner, “graph analysis is possibly the single most effective competitive differentiator for organizations pursuing data-driven operations and decisions.”

Why did Gartner say this? Because graphs are the best structure for today’s complex and ever changing data, and if you can analyze them at scale and uncover key patterns and trends, you will uncover numerous opportunities that others will miss.

In this series on graph algorithms, we explain how using Neo4j Graph Analytics empowers organizations to make new discoveries and develop intelligent solutions faster.

Last week we kicked off this extensive series by explaining why we are so passionate about graph algorithms, and why you should be, too. This week we’ll delve a bit into network science and its many applications – and how graph algorithms unlock the information in complex networks.

In the coming weeks, we’ll cover the rise of graph analytics and increasingly dive deeper into the practical applications of graph algorithms, using examples from Neo4j, the world’s leading graph database.

Network Science & the Rise of Graph Models

Networks are a representation, a tool to understand complex systems and the complex connections inherent in today’s data. For example, you can represent how a social system works by thinking about interactions between pairs of people.

By analyzing the structure of this representation, we answer questions and make predictions about how the system works or how individuals behave within it. In this sense, network science is a set of technical tools applicable to nearly any domain, and graphs are the mathematical models used to perform analysis.

Networks also act as a bridge for understanding how microscopic interactions and dynamics lead to global or macroscopic regularities as well as correlate small scale clusters to a larger scale element and shape projection.

Networks bridge between the micro and the macro because they represent exactly which things are interacting with each other. It’s a common assumption that the average of a system is sufficient because the results will even out. However, that’s not true.

For example, in a social setting, some people interact heavily with others while some only interact with a few. An averages approach to data completely ignores the uneven distributions and locality within real-world networks.

Transportation data cluster of global airports and routes.

Transportation networks illustrate the uneven distribution of relationships and groupings. Source: Martin Grandjean

An extremely important effort in network science is figuring out how the structure of a network shapes the dynamics of the whole system. Over the last 15 years we’ve learned that for many complex systems, the network is important in shaping both what happens to individuals within the network and how the whole system evolves.

Graph analytics, based on the specific mathematics of graph theory, examine the overall nature of networks and complex systems through their connections. With this approach, we understand the structure of connected systems and model their processes to reveal hard-to-find yet essential information:

Propagation pathways, such as the route of diseases or network failures
Flow capacity and dynamics of resources, such as information or electricity or
The overall robustness of a system

Understanding networks and the connections within them offers immense potential for breakthroughs by unpacking structures and revealing patterns used for science and business innovations as well as for safeguarding against vulnerabilities, especially those unforeseen within the labyrinth.

The Power of Graph Algorithms

Researchers have found common underlying principles and structures across a wide variety of networks and have figured out how to apply existing, standard mathematical tools (i.e., graph theory) across different network domains.

But this raises questions: How do people who are not mathematicians conversant in network science apply graph analytics appropriately? How can everyone learn from connected data across domains and use cases?

This is where graph algorithms come into play. In the simplest terms, graph algorithms are mathematical recipes based on graph theory that analyze the relationships in connected data.

Even a single graph algorithm has many applications across multiple use cases. For example, the PageRank graph algorithm – invented by Google founder Larry Page – is useful beyond organizing web search results.

It’s also been used to study the role of species in food webs, to research telomeres in the brain, and to model the influence of particular network components in just about every industry.

For example, in studying the brain, scientists found that the lower the PageRank of a telomere, the shorter it was – and there’s a strong correlation between short telomeres and cellular aging.

Conclusion

Graph algorithms play a powerful role in graph analytics, and the purpose of this blog series is to showcase that role. In our next blog, we’ll step back and look at the rise of graph analytics as a whole and its many applications in exploring connected data.

Find the patterns in your connected data
Learn about the power of graph algorithms in this ebook, A Comprehensive Guide to Graph Algorithms in Neo4j. Click below to get your free copy.

Read the Ebook

↧

5 Noteworthy Use Cases of Graph Technology and Graph Analytics

October 23, 2018, 12:00 am

≫ Next: NLP at Scale for Maintenance and Supply Chain Management

≪ Previous: Graph Algorithms in Neo4j: The Power of Graph Analytics

Learn about the top five use cases for graph technology and graph analytics.

When discussions about any technological concept begin to trend in online media and among business stakeholders, it is only natural to wonder whether it is all hype or whether the technology actually solves any real business problems.

Gartner’s well-publicized hype cycle attempts to provide a guideline on the path that most technologies follow, and when it comes to graph technology, you might be concerned that it is at its peak of inflated expectations when reality doesn’t quite match up to all the chatter.

This article attempts to cut through the noise by informing you about five noteworthy use cases of graph technology and graph analytics so you can make your own mind up.

First, though, a quick definition of graph technology and its benefits.

Graph Technology Defined

Graph technology is formed around the idea of building databases using mathematical graph theory to store data and the links between data as relationships. While the underlying math is quite complex, the crux is that graph databases emphasize the connections between data as much as the individual data points by explicitly storing those connections as relationships.

Why should you care about relationships?

Because much of the value residing in your data comes from unearthing information on the connections between data points. Think social media data; perhaps you want to know which of your followers on social media bought a certain product in your online store as a result of a certain campaign.

Other types of database systems such as relational databases infer data connections for these types of queries using JOIN tables. A large amount of JOIN tables ends up negatively impacting query performance in your database application for deeper queries, which run much faster in graph database systems.

5 Proven Graph Technology Use Cases

1. Machine Learning

Machine learning technology is now more accessible than ever to businesses. Open source machine learning frameworks and commercial deep learning platforms equip developers and data scientists with the tools and knowledge to benefit from machine learning/deep learning use cases such as intelligent image recognition, speech recognition, and intelligent chatbots.

Because graph-structured data inherently excels at representing the relationships between data points, it is being used to power a widely used application of machine learning; namely recommendation engines. By following co-occurrences and frequencies between customer, social, and product data, for example, companies can build and use powerful, intelligent real-time recommendations engines.

2. Fraud Detection

As the world becomes more oriented towards doing everything online, from individuals shopping and banking, to businesses running marketing and advertising campaigns, there is a growing problem with fraud.

For businesses looking to do more to protect themselves and their customers from fraud, graph analytics can prove extremely useful. Since graph technology facilitates modeling data relationships at scale with a lot of flexibility, businesses can analyze large amounts of transactional data rapidly to detect fraud.

Graph analysis can also detect fraudulent social media accounts (bots); these bots can skew the results you obtain from marketing campaigns, leading to inaccurate conclusions from your data.

3. Regulatory Compliance

The need to comply with regulations such as HIPPA, PCI/DSS, and GDPR impacts businesses and organizations in a plethora of industries. When you collect personal information about customers, these regulations specify a real need to maintain visibility over that data as it makes its way through various enterprise systems.

Due to the in-built relationships in graph databases, tracing sensitive data through enterprise systems is much more straightforward than in relational databases, which require complex queries and a complex set of JOIN tables.

With graph systems, single queries that track sensitive data for compliance purposes can run in seconds, and you get a visual representation of the results showing the data flowing through different systems. This provides a transparent and efficient way to achieve and maintain regulatory compliance.

4. Identity and Access Management

Managing identity and access authorizations for employees across the growing range of cloud-based and on-premise apps and systems is becoming increasingly difficult. The design of graph databases allows for more robust, real-time, cross-platform management of all this data, including administrator data, end-user data, files, roles and access rules for different resources.

As data grows, managing all this information without a graph database is impractical, particularly for what is a mission-critical business function. Graph databases can handle complex and connected access control structures that span many relationships with a level of performance unrivaled by directory services or custom-built solutions.

5. Supply Chain Transparency

As a result of globalization, business supply chains are now more complex than ever, and they resemble a complex interconnected network. Brands can source produce from all over the globe from multiple suppliers, and the result is exactly the kind of interconnected system that graph databases excel at modeling.

Graph technology brings the ability to model the complex relationships inherent in modern supply chains in addition to the scalability and agility needed to adapt to growing networks and perform rapid searches on data. The end result is much greater transparency into business supply chains, which can help to find inefficiencies and streamline operations.

Wrap Up

From improved fraud detection to powering deep learning models to making supply chains more transparent, graph databases have several clear and beneficial real-world use cases for businesses.

↧

NLP at Scale for Maintenance and Supply Chain Management

October 24, 2018, 12:00 am

≫ Next: Graph Databases for Beginners: Why We Need NoSQL Databases

≪ Previous: 5 Noteworthy Use Cases of Graph Technology and Graph Analytics

Watch Ryan Chandler's presentation on using natural language processing at Caterpillar with Neo4j.

Editor’s Note: This presentation was given by Ryan Chandler at GraphConnect New York in October 2017.

Presentation Summary

Caterpillar is a 90-year-old company that manufactures large industrial machinery, including some autonomous vehicles. For the last decade, the company has been exploring national language processing (NLP) for purposes such as vehicle maintenance and supply chain management.

NLP is, in essence, a computer’s ability to understand human language as it is written. As we parse sentences, it becomes clear that graphs are a natural representation of language – largely because graphs provide the lowest level of structure with the highest degree of flexibility.

Through a variety of use cases, we explore the best ways to interact with a machine through dialog (which includes expanding the structure of the graph to include verbs and nouns, as well as the relationships between them) and how to apply this concept at mind-blowing scale. And by exploring a film-related dataset through virtual reality, we can physically see how our data is organized as we increasingly refine our model.

Full Presentation: NLP at Scale for Maintenance and Supply Chain Management

I want to discuss natural language processing (NLP) at scale for maintenance, but what this blog actually delivers is something much more exciting: natural language processing at a mind-exploding scale.

I work for Caterpillar, a high-tech company that does things like manufacture large industrial machinery, including autonomous vehicles, generate power and mine resources. My role at Caterpillar as a senior data scientist is becoming increasingly focused on capturing knowledge and storing it in a graph database. I’m also a PhD student in cognitive and linguistic informatics, and work at a satellite location in a small lab at the University of Illinois with undergraduate and graduate students.

NLP, AI and Graph Databases

Why would Caterpillar even be interested in natural language processing? It might seem like a far leap, especially for supply chain management. But let’s take a look at a video from GE back in 2012, which shows the crossover between what they and Caterpillar is doing:

I agree that analytics makes this possible. The topic of natural language dialog between people and machines is probably going to be analytics, and the mechanism to make that happen is natural language processing. Graph databases make this possible because they have a very natural fit with language processing.

Many people will tell you that 80 percent of data is unstructured text, which is based on a study released about 15 years ago. But, as I mentioned, I’m a contrarian and wanted to find out if that number was actually real. A dissenting opinion from Phil Russom at TDWI says that only a little over half of the data at an organization is unstructured, which includes things like sensor data and telemetry in addition to text. There is also a significant amount of knowledge stored in things like social media and internet pages, which we would like to tap into as well.

Artificial intelligence also lends itself naturally to graphs, because they can facilitate the ontologies and knowledge representations that have been around for a long time.

Alfred Korzybski, the founder of “general semantics,” is known for saying, “The map is not the territory. If the words are not the things – meaning they are just representations of things – and the maps are not the actual territory, then obviously the only possible link between the objective world and the linguistic world is found in structure and structure alone.”

Graph is the lowest level of structure and it gives us massive flexibility. I won’t give you a full linguistics lesson, but I will discuss a couple schools of thought regarding language processing. Often when we do language processing we’re broken down either into dependency structures – which looks at the verb and draws arcs from the verb to the relationship of the other words relative to the verb – or it breaks down into a constituency tree.

Below is our binary tree, and you can see how these are amenable to graph representation:

See how graph databases are a natural architecture.

These are graphs, right? One of the overriding principles here is that we parse, and we parse that into graph. I’ll describe two use cases for you and hopefully you’ll be able to see a progression of sophistication as we move from some sample to real world examples, and how – through trial and error – we continue restructuring and figuring out the right way to parse, at least for our uses.

Use Case 1: Dialog System

This example ties back to Bill’s GE talk in which he says that in the future, we’ll be able to talk to the machine. We wanted to see if we could develop a small example of how we might interact with a machine through dialog.

Executives often ask us for reports, so we asked for some data we could use for a dialog system. Below is what they gave us:

Learn more about relational to graph mapping.

In the area of business intelligence, people often say that no matter what type of report you give somebody, there’s always a resulting report to meet the needs of another person or department.

We wanted to create a system that would allow someone to ask any type of question as long as it’s in the domain. This meant creating a dialog system to test the use of a graph, demonstrate an open-ended user interface capable of answering questions, and to develop a capability to create spoken human machine interface.

When you look at this, you see that we have a manufacturing facility, and a manufacturer’s part as a relationship to a product. These concern factory shipments and thus you would be able to ask a system like this, “How many trucks did we manufacturer in Decatur and ship to Asia?”

This is just a fragment of the larger ontology – the larger graph model – we expanded into. And as before, you saw that the products are attached to a manufacturing facility:

See what fragment of ontology looks like.

Here, we intermediate that relationship with a manufacturing function. You can think of the blue ovals as nouns and green ovals as verbs, and by teasing apart this information in a linguistic fashion, we end up with something very readable. We can say that a product has a sales model, and a manufacturing facility produces a product that is delivered via shipping to a dealer.

Using a model like this, you might ask the question, “How many of a certain type of product did we ship from this division to this district?” The nodes wouldn’t necessarily have to be adjacent. They might even have a variable number of intermediary nodes that mediate between them.

Ninety-eight percent of dialog systems consist of a knowledge representation, which in this case is the Neo4j graph. The other part of this system is a query interpreter:

The query interpreter determines whether or not a query is well-formed, which is exactly what the above constituency rules of grammar show. A query can be formed by putting a metric with a noun phrase.

For example, if you just ask “how many trucks do we have?”, it would give you the number of trucks that have ever been in the system across all facilities – which doesn’t provide very meaningful information.

But this interpreter structures the types of queries you’re able to ask. When we bring a phrase in, we look for these words (or metrics). For example, if the query asks for counts, minimum or max, that’s a metric. If it’s asking for nouns, we make that a noun, verb and then regular expressions that might match dates.

So if you want to ask the query, “How many trucks were manufactured on this date?” you have an infinite number of responses that can be generated. You could instead ask, “How many trucks were made in Decatur and shipped to Asia? How many were built in Peoria and shipped somewhere else?” Thus, you can repeat those queries indefinitely and create any number of generative strings.

But we didn’t want to simply have the knowledge representation and use the graph to instantiate that. We wanted to make as much of the application as possible in graph.

Below is the query processing model we developed for our graph:

Discover more about process input text using a graph database.

User input occurs at the bottom, represented by the red node, and the words are stored as yellow nodes. From the interpreter’s perspective, it’s only those things which are in gray that link to our lexicon or database of known entities.

We’ve also come up with a way to deal with synonyms. You can see that “build,” links to “manufacturing,” for example. We route all those synonyms to one canonical definition of what those words mean to ensure consistency. Those, if you remember from the graph model, link directly to parts of that graph.

Now let’s look at a specific glyph (described at the bottom of the slide) that refers to the grey node “quarry and construction trucks.” It’s function is listed as a noun, and it has a specific graph_rep that shows you exactly where to find that node in the graph. The system reads that in and finds the appropriate node.

Below is the code that maps those given grammerical expressions or patterns to the graph query:

We have a front end that interprets the query and a back end that provides the data. Here you simply see that we have a Return Count, which refers to “How many” trucks we built in Decatur. We’re linking through the verb, and the dependency parse emphasizes that the verb is the most important element in the sentence because it contains the action.

Thus, we use the verb manufacturing again and again to link those parts of the fragment, or sub-graphs, together. And once you traverse all the sub-graphs, you get your constrained result representing the number of items.

This is what our interface looks like:

For our purposes, we have people type in their query, but you could use a speech-to-text API, like Google’s, to do this with voice. You can see the response to “How many trucks did we build in Decatur in 2006?” comes back as a Final Count of 14, which was returned in a quarter of a second.

We’re only looking at a fairly small database set, only 100,000 lines or so, but because of the scalable nature of graph databases, we believe that even with a much larger dataset we will continue to have high performance.

Let’s go over a quick recap of what we just learned:

Discover what we learned about Cypher and what works.

We learned that we had to expand the structure of the graph to include verbs and nouns, as well as the relationships between them, to make the graph more readable. When we expanded that relationship that said “manufacture” and made it a proper node, this could have relationships with other nodes, like “when did the manufacturing occur?”

If you look closely, you would’ve seen that there was actually a date hierarchy that showed what year, what month-year, and what date those were manufactured, for aggregation purposes.

Again, there were variable links between any one of those nodes that represented “manufacturing” and “products.” You can see the three different Cypher queries at the bottom of this slide. We went from just looking at things that had to be adjacent to, “Hey, let’s make them variable by putting asterisks in there.” But when you make everything variable, you have a combinatorial explosion that causes query performance to go way down because you’re looking at every possible combination.

So we had the system look at the overall schema of the graph, and when it found those objects of interest based on the dialog, it mapped that entire path. Then we can issue the precise path with the precise directionality on those nodes so that, again, the query comes back almost instantly.

Use Case #2: Reading Warranty Documents at Scale

The first use case I showed you was a simple dialog to get information back from the graph database, while this use case is about reading warranty documents – which brings us to the mind-exploding scale portion I mentioned earlier.

Learning about this use case for reading warranty documents at scale.

I try not to oversell this, but the code that we’re going to develop over the next few years will allow us to read documents at scale and instantiate human knowledge in text. If we can read at scale for meaning, that will be an extremely powerful technology.

So how do we do this?

We’re going to look through a data set of warranty documents, which are recorded when someone brings in a vehicle for maintenance. The technician writes down the customer complaint, which in this case in engine noise, and conducts an analysis of the problem, which is an oil test that shows high iron content. The root cause was found to be a broken rocker arm, and a solution was implemented.

This is what the data looked when we got it wrong:

It looks like a relational database where we have all these strings. We have a complaint, which is that the adapter of the bucket was broken off, and the cause. But these are all together rather than in separate rows, so we had to figure out how to break them up.

Typically as a linguist, you don’t end up with data that is tagged or annotated like this. However, this has 27 million annotations as part of a huge, very focused data set – another critical factor. It’s extremely difficult to have open reasoning across text, so if you can really constrain it, words have far fewer synonyms and you probably know what they mean.

Below is what our data architecture looks like:

We simply create a pipeline which ingests text via an open-source, freely available tool called the Natural Language Toolkit, which uses Python to chunk those sentences into a big string of sentences, corrects the boundaries, and gets rid of garbage in the text. We do a little bit of machine learning classification, and issue a “dependency parse.”

Again, that was that flat one at the beginning that didn’t break down into constituency trees, it just had the verb and connections between the verb. We thought it would be more robust to include badly formed, short sentences so that we wouldn’t be thrown off by missing information.

We brought all of that into the Neo4j database and added some WordNet, which is an electronic dictionary that has all the definitions for a given word, called a lexicographic dictionary or database.

Why did we incorporate machine learning classification?

Half of the items were tagged as the complaint, the cause, and the solution, which provides a great resource to train a machine learning algorithm to tag the other half. That’s exactly what we did here, and we ended up with fairly decent results. This is an F score, so it’s both precision and recall, and we got better results because we combined solution and correction. We were able to tag everything as a result.

This is what the parse looks like:

Learn more about parse using the Stanford Dependency Parser.

We use the open-source, freely-available Stanford Parser to parse the documents. In this particular case, the adapter of the bucket was broken off. If you do a naive keyword search, you might simply look for the word “broken” next to parts identified as components, and when you find them, you know they’re associated because the words are close together.

But in this case you can see that in fact the bucket wasn’t broken, the adapter of the bucket was broken. The “nsubjpass” relationship here relates those two – that’s a nominal subject passive between the verb and that word. We can study all the problem words and component words to see which type of relationships can be found between those two words, and better understand what we should be looking for.

See why parsing without a graph is computationally expensive.

Parsing is computationally expensive. A probabilistic, context-free parser looks at lots of examples and does a statistics-based approach on how to properly break down that sentence. This is 32 logical processors running on a machine, and even though we could run a 100,000 lines through this a day, it’s still a lot of time when you have 27 million.

So why are we doing this in graph?

We use graph so that we can find patterns and connections, and build hierarchies and ontologies. And this is simply those parsers as a graph.

You can imagine that when we run 27 million documents through our database, build a graph, and look at it in the default Neo4j browser, it quickly becomes hard to deal with. At first, we created a node for each document, and every time we see a sentence in a claim, we’re going to create a trace – or parse – through all of those nodes. That way we wouldn’t be recreating the same nodes over and over and over.

Graph databases, every sentence traces through words.

This is problematic because if I was to try to do a search on a claim number, you can’t index relationships out-of-the-box. You can index the properties of nodes or implement APOC, but it still didn’t perform very well.

Ultimately we corrected this using the following model:

With graph databases, every word becomes its own node.

Each series of red nodes is a sentence, and each word is related to the other words in the sentence with “hasnextword.” Note that you can’t see the parsing structure in this slide, that’s hidden.

But now any time the word “engine” is used – shown in the blue nodes – it links to another golden record, the one that’s stored in the dictionary. Then if I still want to look at anything connected to engine, I can just show it this pattern. And those nodes are pretty cheap.

Another aspect of implementing is instantiating knowledge as a structure:

This is a very simple, ontological, super-class/sub-class relationship.

The yellow node reads “component” and all the yellow nodes attached to that are components, such as a pump, powertrain and engine. The red node reads “problem” and all the yellow nodes attached to it indicate a problem, such as a leak. We just looked at high-frequency words to quickly build these graphs.

This ontology provides the structure to create the following simple Cypher query:

This allows you to take millions of documents, parse them, create a very shallow ontology, and then create this trace-through that gives you meaning. This allows you to extract meaning at scale from these text documents.

Semantic Frames

Up until this point we have concentrated on low-level, small example of tracing through a sentence. Now we’re going to shift of the topic of semantic frames:

In the 1960s and 1970s, Marvin Minsky and Charles Fillmore created this idea called semantic frames. In short, what it means is that a lot of what is understood via communication in words is not literally communicated. If I say, “I went and bought a DVD last night,” you know that I might have gotten into a car, gone to a retail location, and used some form of payment to pick up the DVD. This is all background information that’s tied to us cognitively. Ultimately, semantic frames represent a slot-filling system.

Remember, earlier we classified those sentences. And if we know what they were, we can fill in those slots, or expectations, about what else is going to be present. If I know that if I see a complaint I should also see a part and a problem, if I don’t find all of those components mentioned, I might need to start looking at some additional sentences. And as you start setting these expectations, it allows you to get more and more through inference and deduction.

In Conclusion: A Virtual Reality Graph Exploration

To recap what we’ve gone over so far, I’d like to emphasize again that structure is critical. The most important thing is how you set up your structure.

We made each instance of a word a node and then connected it to a golden version of that word. And as I mentioned, we found out that it was difficult to view these graphs in the default Neo4j browser because they were so large. We said, “Okay, well, it’s hard to view tens of thousands, hundreds of thousands of nodes. Can we do this in VR somehow?”

For this VR demonstration, we’re relying on a large online database with actors and movies. I think there are about 27,000 total entries, but I’m only going to show around 1,000 that I’ve randomly placed on two separate planes: actors on one plane, movies on the other. I created connections between the planes using Oculus Rift, and developed it in the Unity game engine.

We are currently working on a project with the National Center for Super Computing Applications – the group that invented the web browser – to develop a visualization of text documents at scale.

How could we build something that would allow us to brose semantics and meaning at a large scale? We are also working to develop a more complete theoretical foundation to build knowledge structures in graph based on constituency parsing and formal semantics.

↧

Graph Databases for Beginners: Why We Need NoSQL Databases

October 25, 2018, 12:00 am

≫ Next: Interchangeable Parts: 5-Minute Interview with Preston Hendrickson, Principal Systems Analyst at CALIBRE

≪ Previous: NLP at Scale for Maintenance and Supply Chain Management

Learn why NoSQL databases are needed to face some of today's biggest data challenges that SQL can't

NoSQL databases are one of those things in life that are unhelpfully defined only by what they are not rather than by what they are, i.e., an anti-definition.

NoSQL is a cheeky acronym for Not Only SQL – or more confrontationally – No to SQL. This anti-definition tells you a lot about why the NoSQL movement began: SQL-based relational databases aren’t always enough.

Relational databases (RDBMS) still have their perfect use cases, and RDBMS often work well alongside NoSQL databases to tap the strengths of both technologies. (This is why Neo4j officially prefers Not only SQL as the definition of NoSQL, because SQL still has its place in any backend.) But it’s still abundantly clear that the relational data model can’t meet every data need.

So, once other data stores – and their accompanying data models – became available, there was (and continues to be) a meteoric rise in the popularity of NoSQL database technologies. Today, we’re going to define NoSQL databases in addition to justifying why we need them now more than ever.

In this Graph Databases for Beginners blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graph technology is the future, why connected data matters, the basics (and pitfalls) of data modeling, why a database query language matters, the differences between imperative and declarative query languages, predictive modeling using graph theory and the basics of graph search algorithms.

This week, we’ll discuss the diverse and sundry world of NoSQL databases – and why they’ve become so popular.

The Many & Motley World of NoSQL Databases

NoSQL databases are a spectrum of data storage technologies that are more different than they are alike, so it’s difficult to make sweeping generalizations about their characteristics.

In the following weeks, we’ll explore a few types of NoSQL databases and other important NoSQL definitions. Our tour will encompass the group collectively known as aggregate stores (highlighted in blue below), including key-value stores, column family stores and document stores as well the various types of graph technologies (in green), which include property graphs, hypergraphs and RDF triple stores.

An overview of NoSQL database types and categories

An overview of the NoSQL database space. Quadrants in blue are collectively known as aggregate stores.

Historically, most enterprise-grade web applications ran on top of a relational database (RDBMS). But in the past decade alone, the data landscape has shifted significantly and in a way that traditional RDBMS deployments simply can’t manage.

The NoSQL database movement has emerged particularly in response to three of these data challenges:

Data volume
Data velocity
Data variety
Data valence

We’ll explore each of these challenges in further detail below.

Data Volume

It’s no surprise that as data storage has increased dramatically, data volume (i.e., the size of stored data) has become the principal driver behind the enterprise adoption of NoSQL databases.

Large datasets simply become too unwieldy when stored in relational databases. In particular, query execution times increase as the size of tables and the number of JOINs grow (so-called JOIN pain).

This isn’t always the fault of the relational databases themselves though. Rather, it has to do with the underlying data model.

In order to avoid JOIN pain, the NoSQL world has several alternatives to the relational model. While these NoSQL data models are better at handling today’s larger datasets, most of them are simply not as expressive as the relational model. The only exception is the graph model, which is actually more expressive. (More on that in the weeks to come.)

Data Velocity

But volume isn’t the only problem modern enterprise systems have to deal with. Besides being big, today’s data often changes rapidly.

Thus, data velocity (i.e., the rate at which data changes over time) is the next major challenge that NoSQL databases are designed to overcome.

Velocity is rarely a static metric. A lot of velocity measurements depend on the context of both internal and external changes to an application, some of which have considerable system-wide impact.

Coupled with high volume, variations in data velocity require a database to not only handle high levels of edits (tech lingo: write loads), but also deal with surging peaks of database activity. Relational databases simply aren’t prepared to handle a sustained level of write loads and can crash during peak activity if not properly tuned.

But there’s also another aspect of data velocity NoSQL technology helps us overcome: the rate at which the data structure changes. In other words, it’s not just about the rapid change of specific data points but also the rapid change of the data model itself.

Data structures commonly shift for two major reasons. First is the fast-moving nature of business. As an enterprise changes, so does its data needs.

Second is that data acquisition is often experimental. Sometimes your application captures certain data points just in case you might need them later on. The data that proves valuable to your business usually sticks around, but if it isn’t worthwhile, then those data points often fall by the wayside. Consequently, these experimental additions and eliminations affect your data model on a regular basis.

Both forms of data velocity are problematic for relational databases to handle. Frequently high write loads come with expensive processing costs, and regular data structure changes come with high operational costs (just ask your DBA).

NoSQL databases address both data velocity challenges by optimizing for high write loads and by having more flexible data models.

Data Variety

The third challenge in today’s data landscape is data variety – that is, it can be dense or sparse, connected or disconnected, regularly or irregularly structured.

Today’s data is far more varied than what relational databases were originally designed for. In fact, that’s why many of today’s RDBMS deployments have a number of nulls in their tables and null checks in their code – it’s all a workaround to adjust to today’s data variety.

On the other hand, NoSQL databases are designed from the bottom up to adjust for a wide diversity of data and flexibly address future data needs, each adopting their own strategy to how to handle the variety of data.

Data Valence

Whenever you talk about data, there’s always a lot of “V”s thrown around (I’ve chose three above, but there’s like a million to choose from). But there’s almost always one powerful “V” missing: data valence.

The Latin root of valence is the same as value, valere, which means to be strong, powerful, influential or healthy.

In chemistry, valence is the combining power of an element; in psychology, it is the intrinsic attractiveness of an object; and in linguistics, it’s the number of elements a word combines. In the context of big data, valence is the tendency of individual data to connect as well as the overall connectedness of datasets.

The valence of a dataset is measured as the ratio of connections to the total number of possible connections. The more connections within your dataset, the higher its valence.

Data valence increases over time but not uniformly. Network scientists (i.e., super nerds) have described preferential attachment (for example, the rich get richer) as leading to power-law distributions and scale-free networks with hub and spoke structures. Literally nothing in that previous sentence can be analyzed using a relational database.

Over time, highly dense and lumpy data networks tend to develop, in effect growing both your big data and its complexity. This is significant because densely yet unevenly connected data is difficult to unpack and explore with traditional analytics (such as those based on RDBMS data stores). Thus, the need for NoSQL technologies where relational databases aren’t enough.

(If you’re interested in learning more about data valence in particular, check out this ebook by Amy Hodler and Mark Needham, portions of which were used in this blog post.)

Conclusion

Relational databases can no longer handle the challenges posed by today’s data volume, velocity, variety or valence. Yet, understanding how NoSQL databases overcome these challenges is only the prelude of finding the right database for your enterprise use case.

In the coming weeks, we’ll explore the strengths and weaknesses of various NoSQL technologies so you can make the most informed decision possible.

Now that you’ve learned about NoSQL in general, it’s time to look closer at graph technology in particular: Get your copy of the O’Reilly Graph Databases book and start using graph technology to solve real-world problems.

Get the Book

Catch up with the rest of the Graph Databases for Beginners series:

Why Graph Technology Is the Future

Why Connected Data Matters

The Basics of Data Modeling

Data Modeling Pitfalls to Avoid

Why a Database Query Language Matters (More Than You Think)

Imperative vs. Declarative Query Languages: What’s the Difference?

Graph Theory & Predictive Modeling

Graph Search Algorithm Basics

ACID vs. BASE Explained

A Tour of Aggregate Stores

Other Graph Data Technologies

Native vs. Non-Native Graph Technology

↧

Interchangeable Parts: 5-Minute Interview with Preston Hendrickson, Principal Systems Analyst at CALIBRE

October 26, 2018, 12:00 am

≫ Next: Graph Algorithms in Neo4j: The Rise of Graph Analytics

≪ Previous: Graph Databases for Beginners: Why We Need NoSQL Databases

Check out this quick interview with Preston Hendrickson of CALIBRE.

“It’s easy to teach people to use Neo4j. It’s hands-on training, not death by PowerPoint,” said Preston Hendrickson, Principal Systems Analyst at CALIBRE.

CALIBRE works with large government customers, including the U.S. Army, where maintenance, operation and support costs of equipment (depending on the program and program longevity) represent as much as 80 percent of total lifecycle costs and a single tank has about 10 million parts to track.

In this week’s five-minute interview (conducted at GraphTour DC) we discuss how CALIBRE has replaced recursive SQL queries with Neo4j, and is now able to train analysts right alongside developers.

Talk to us about how you use Neo4j at CALIBRE.

Preston Hendrickson: One of our customers asked us to do deep dive into parts and ordering. For example, say you have a chair. That chair has legs, a back, a seat and armrests. And some of those parts are interchangeable. We need to know which parts can also fit on other chairs. With that information, we can build chairs, swap parts out and so forth.

Why did you choose Neo4j for the project?

Hendrickson: We chose Neo4j because it allows us to take those parts and actually go down as many levels as required. Chairs are a minor example; it could be a car. In a car, you have hundreds or thousands of parts, and we need to know what interchangeable parts there are across models and across different vendors, like an auto parts store.

In cases like this, Neo4j becomes the better candidate so that we don’t have to write recursive SQL or dynamic SQL. We just write queries for Neo4j and traverse as many levels as we want and trace relationships, which is a lot faster than writing code or querying a database.

Can you talk to me about some of your most interesting or surprising results you had while using Neo4j?

Hendrickson: The number one thing we’ve found is how easy it is to teach people to use Neo4j. Anytime you change technology, the first thing you do is sit people in a room and train them to death – death by PowerPoint. Neo4j was mainly hands-on training.

We actually had not only developers and others with strong, high-tech skills, we had people across the gamut. We included anyone who was new to the company. We had analysts in the room who were able to pick up Neo4j in the exact same way that developers did.

Now we’re all in one accord, and we can all share the graph database.

If you could start over with Neo4j, taking everything you know now, what would you do differently?

Hendrickson: The first thing I would do is, before touching it, I would try to train my brain and not think about it like a traditional RDBMS. It took me a couple of weeks to figure out that I should not model like a traditional database, with third normal form and all that stuff.

I did not get that message until well into the 34th time rebuilding a graph. I tried to do it that way. If I had to start over, I’d work on understanding of what NoSQL, no-schema means versus building hands-on like I’m used to.

What do you see as the future of graphs in your projects?

Hendrickson: In our area, instead of just data retrieval, we want to move more into data science. We are looking into using Python a lot more to connect to the database directly versus taking data, exporting it into something else, having Python read it, getting answers, and pushing data back in. We’re trying to integrate those processes.

Anything else you want to add or say?

Hendrickson: This is exciting for us. It is a new area, and we’re trying to get more of our analytical teams more involved. At the same time, other people are watching those using Neo4j, and they’ve been getting more questions, and it spreads. We like it. We’re having a ball here.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com

Want to learn more on how relational databases compare to their graph counterparts? Get The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook

↧

Graph Algorithms in Neo4j: The Rise of Graph Analytics

October 29, 2018, 12:00 am

≫ Next: If You Build It They Will Come: Starting a GraphDB Meetup Where There Is None

≪ Previous: Interchangeable Parts: 5-Minute Interview with Preston Hendrickson, Principal Systems Analyst at CALIBRE

Learn about the rise of graph analytics and how they apply to graph algorithms.

According to Gartner’s 2018 Magic Quadrant for Data Management Solutions, the biggest reason for using the Neo4j graph database “is to drive innovation.”

This blog series is designed to help organizations better leverage graph analytics to effectively innovate and develop intelligent solutions faster.

This week we’ll trace the rise of graph analytics and answer the question, “Why now?” Part of the answer lies in the convergence of analytics with transactions, sometimes called “translytics.” In the coming weeks, we’ll dive deeper into the practical applications of graph algorithms, using examples from Neo4j, the world’s leading graph database.

The Roots of Graph Analytics

Graph analytics has a history dating back to 1736, when Leonhard Euler solved the “Seven Bridges of Königsberg” problem. The problem asked whether it was possible to visit all four areas of a city, connected by seven bridges, while only crossing each bridge once. It wasn’t. With the insight that only the connections themselves were relevant, Euler set the groundwork for graph theory and its mathematics.

Source: Wikipedia

But graph analytics did not catch on immediately. Two hundred years would pass before the first graph textbook was published in 1936. In the late 1960s and 1970s, network science and applied graph analytics really began to emerge.

In the last few years, there’s been an explosion of interest in and usage of graph technologies. In 2017, Forrester survey data indicated that “69 percent of enterprises have or plan to implement graph databases within the next 12 months.” Demand is accelerating based on a need to better understand real-world networks and forecast their behaviors, which is resulting in many new graph-based solutions.

Why Now? Forces Fueling the Rise in Graph Analytics

This growth in network science and graph analytics is the result of a combined shift in technical abilities, new insights and the realization that existing business intelligence systems and simple statistics fail to provide a complete picture of real-world networks. Several forces are driving the rise in graph analytics.

First of all, we’ve seen real-world applications of graph analytics and their impact on us all. The power of connected data for business benefit has been demonstrated in disruptive success stories such as Google, LinkedIn, Uber and eBay, among many others.

At the same time, digitization and the growth in computing power (and connected computing) have given us an unprecedented ability to collect, share and analyze massive amounts of data. But despite the masses of data they have, organizations are frustrated with the unfulfilled promises of big data and their inability to analyze it.

The majority of analytics used today handle specific, well-crafted questions efficiently but fall short in helping us predict the behavior of real systems, groups and networks. Most networks defy averages and respond nonlinearly to changes. As a result, more businesses are turning to graph analytics, which are built for connected data and responsive to dynamic changes.

In addition, there’s been a recognition of how graphs enhance machine learning and provide a decision-making framework for artificial intelligence. From data cleansing for machine learning to feature extraction in model development to knowledge graphs that provide rich context for AI, graph technology is enhancing AI solutions.

Bringing Together Analytics & Transactions

Historically, the worlds of analytics (OLAP) and transactions (OLTP) have been siloed despite their interdependence (analytics drives smarter transactions, which creates new opportunities for analysis), which is especially true with connected data.

This line has been blurred in recent years and modern data-intensive applications combine real-time transactional queries with less time-sensitive analytics queries. The merging of analytics and transactions enables continual analysis to become ingrained in regular operations. As data is gathered – from point-of-sale (POS) systems, from manufacturing equipment, from IoT devices, or from wherever – analytics at the moment and location support an application’s ability to make real-time recommendations and decisions. This blending of analytics and transactions was observed several years ago, and terms to describe this blurring and integration include “Transalytics” and Hybrid Transactional and Analytical Processing (HTAP).

“We need to combine transactional and analytic systems into transalytic systems and stop thinking about these as two separate systems. 2018 is going to be the year we’ll see major corporations collapse these two systems together, so that you have simplified architecture and can move at the pace of business.” – Bill Powell, Director of Enterprise Architecture, Automotive Resources International (ARI)

“[HTAP] could potentially redefine the way some business processes are executed, as real-time advanced analytics (for example, planning, forecasting and what-if analysis) becomes an integral part of the process itself, rather than a separate activity performed after the fact. This would enable new forms of real-time business-driven decision-making process. Ultimately, HTAP will become a key enabling architecture for intelligent business operations.”– Gartner

Discover how to converge graph analytics with transactions,

Conclusion

Graph algorithms provide the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways through which contagions or network failures spread, and the influences on and resiliency of groups.

Neo4j brings together analytics and transactional operations in a native graph platform, helping not only uncover the inner nature of real-world systems for new discoveries, but also enabling faster development and deployment of graph-based solutions with more closely integrated processing for transactions and analytics.

In the coming weeks, we’ll delve into the many use cases graph algorithms support, from real-time recommendations to fraud detection and prevention.

↧

If You Build It They Will Come: Starting a GraphDB Meetup Where There Is None

October 30, 2018, 12:00 am

≫ Next: Graphs in Time and Space: A Visual Example

≪ Previous: Graph Algorithms in Neo4j: The Rise of Graph Analytics

Learn how and why to start a DB Meetup group in your area to connect with graph database enthusiasts.

About seven years ago, I came across Neo4j while trying to find a way to visually represent stored information and make connections like the human brain does. At the time, I only really needed to build an awesome recommendation engine and not a brain in a box. I can say now that I was reading way too much Ray Kurzweil at the time.

When I first started trying to understand how Neo4j works, I was completely out of my skill set and knew I would need some help. I started looking up meetups in my area and couldn’t find anything on graph database or Neo4j. Not being deterred, I took to the internets and tried to hire a few different developers who could help me build what I was looking to build. However, I couldn’t find anyone willing to learn how Neo4j works.

At that time, it was a rare few who would say NoSQL and graph databases are where it’s at. As we now know, the paradigm has since changed and graphs are everywhere.

Fast forward four years, join in on countless meetups, and a career change to full-time in tech. I then met my future business partner Jason Cox by accident at a random civic hackathon, on the wrong day at the wrong time. We share what we’re nerdy about, I pitch him on what I’m trying to build and how I want to build it with Neo4j. Jason tells me he’s sold on the idea and the technology and wants to help me build it. I share with him everything I’ve learned about the technology, and we dive in.

Well, not so fast. I realized the hard way how difficult it can be to adopt a new technology and become proficient in how it works. Instead we spent the next six months starting our first company together and learning everything we could about Neo4j.

Then one day I saw a LinkedIn update from my friend Karin Wolok who just became Neo4j’s Community Development Manager. I sent Karin a message:

Me: Hey, a little while ago you got an awesome new title. I’m totally jealous! (Community Manager at Neo4j).

Me: I’m completely obsessed with using Neo4j and want to build so many thing with it!

Karin: Really? That’s awesome! I love to hear that. Not many people are really aware of graph databases and their benefits.

Karin: What do you build with it?

That’s when it all started! I went on to overload Karin with all of the things we wanted to build using Neo4j. She told me to come to the next DataPhilly meetup. She was going to so we could talk more. We got to talking at the meetup and Karin inspired me to start the first Philadelphia Neo4j and GraphDB Meetup. She basically just said, “You guys should start a Neo4j Meetup.” That was all I needed to hear!

One month later the Philly GraphDB Meetup group was formed and our first meetup was posted. Karin said she’d help us get set up and give us some good guidelines to get started and a hosting budget for food and drinks. To make it even better, for our first meetup Karin knew of a Neo4j Engineer Dave Fauth, who happens to be in our area and who would be a guest speaker for the event.

With an attendee list of 56 people, we worked our butts off trying to figure out the logistics and planning of everything for the meetup. The day of the meetup was very stressful, but everything went as planned and Karin and Dave made it easy to for us to enjoy ourselves and set a relaxed tone. We had about 30+ people show up in person and online.

Since our first meetup, we’ve had 16 more amazing meetups and counting. After each meetup we’ve refined our skills in learning how to communicate what interests us while building a community of graph database and Neo4j enthusiasts.

Every month we choose a topic outside of the normal use cases or with a civic prospective for graph databases. This helps us build a focal point for cultivating interest outside of the tech community. It’s our belief that graphs are for everyone and anyone should be able to learn how to use graph database technologies like Neo4j.

So every time we host a meetup we incorporate a live demo using the Neo4j online sandbox or desktop app. This gives everyone a gateway in which to participate and get hands on with the technology (like I wished I could have all those years ago).

We’ll be writing more content soon around the do’s and don’t of hosting a GraphDB meetup, lessons we’ve learned and how to plan for your first meetup.

Want to take your Neo4j skills up a notch? Take our online training class, Neo4j in Production, and learn how to scale the world’s leading graph database to unprecedented levels.

Take the Class

↧

Graphs in Time and Space: A Visual Example

October 31, 2018, 12:00 am

≫ Next: We Just Closed the Largest Single Investment in the Graph Space. Now What?

≪ Previous: If You Build It They Will Come: Starting a GraphDB Meetup Where There Is None

Read Dan Williams' presentation on developing graph visualizations in time and space.

Editor’s Note: This presentation was given by Dan Williams at GraphConnect Europe in May 2017.

Presentation Summary

Graph databases are helping to solve some of today’s most pressing challenges. From managing critical infrastructure and understanding cyber threats to detecting fraud, we have worked with hundreds of developers building all kinds of mission-critical graph applications powered by Neo4j.

In this blog, we’ll explore two dimensions of graphs that, from our experience, cause the most confusion but potentially contain vital data insight: space and time. We’ll use visual examples to explain the quirks (and importance) of dynamic and geospatial graphs, and how they can be stored, explored and queried in Neo4j. We’ll then show how graph and data visualization tools empower users to explore connections between people, events, locations and times.

Full Presentation: Graphs in Time and Space: A Visual Example

What we’re going to be talking about today is how to gain actionable data insights by incorporating both space and time into your graph visualizations:

I do product management for the KeyLines team at Cambridge Intelligence, and since I’m a physicist, I thought a presentation on space and time seemed like fun.

I’m completely new to the world of graph databases, and have only been working with them for a few months now. Before this, I was using my engineering and physics background to work with some of the material scientists on the Orion program.

We know that data insights are incredibly important, and visualizations in the context of space and time will help us get to those meaningful insights. Below is an example graph to get us started:

This is a sample dataset that we could have pulled from Neo4j with a Cypher query, but this densely connected network doesn’t provide much helpful information. We can use more helpful graph visualizations to change that.

Graph Visualization

Let’s take a quick step back and talk about graph visualizations. What do I really mean by that? The following is an example of a graph query, in which the nodes represent people and the links represent emails:

Visualization is all about taking data out of graph queries and displaying it in a useful way. Using this example, we can make the visualization more helpful by making the lines thicker to indicate email volume, and using a centrality measure to change the size and colors of nodes to identify the gatekeepers of information and determine the most influential people in the network:

Graph visualizations pulled from graph queries.

These graphs represent emails pulled in the wake of the Enron scandal, but to truly get any insight from this data, we need to understand the passage of time. Otherwise, how do we know who emailed who first? Was the chaos on the right side of the graph triggered by something that happened on the left side? What was the cause, and what was the effect?

Because Neo4j has an open schema, it can store essentially any type of information, including timestamps. But how do we visualize this and end up with meaningful insights?

Visualizing Data in Time

There are a number of options for visualizing time, including a series of snapshots. Consider the following example, which shows how members of the United States House of Representatives voted and collaborated over time:

This collection of images shows an evolution of behavior over time, with the top left image showing a high degree of collaboration, while the most recent image in the bottom right shows an increasing level of partisanship.

What if we wanted to see how a particular member of Congress changed their voting behavior over time?

In the following example, we combine a graph view with a time view, which is shown as a bar along the bottom of the slide:

Combining a graph view with a time view.

In this example, nodes represent people and links represent phone calls. The size of the bars in September compared to the following months demonstrate a relatively high rate of activity in that month. We can also zoom in on a particular set of days to see how that changes the graph:

We can animate the graph to see how new cliques form and change over time, or zoom in on a particular person (in green, below) and see how their activity compares to everyone else’s:

We can see that this person initially made relatively few phone calls, but then increased the number of calls as time went on.

So while there’s a lot of dynamic information in our database, getting actionable insights requires some clever manipulations to data visualizations.

Visualizing Data: Incorporating Space

In graph theory, we learn that space shouldn’t matter. Topology is topology, and the following two graphs are really the same:

Unfortunately, there’s a lot of insight that can be gained from space, and where things are physically located in the real world.

Take the following network as an example, which shows flights between airports in the U.S.:

Different airlines are represented by different colors, and based on how these lines are organized around cities, we can also pinpoint hubs. And while we could learn some information from this graph, the more obvious thing to do would be to stick this data onto a map.

Again, the beauty of Neo4j is that you can really store any information you want as a property on a node, including latitude and longitude. Time often leads on the link, while location often leads on the node. In my KeyLines visualization tool, I can switch over to the map mode, reorganize our data and get even more helpful insights:

It’s much easier to identify the hubs, to understand why they’re located where they are, and to identify the large geographic areas served by a small number of airports compared to a very dense geographical area served by a lot of airports, and so on. This is very, very easy to spot – and it’s what people expect.

Insights are all about people. Not everyone who uses a graph database is a graph scientist or a data scientist. People in roles that range from business analysts to police officers need to be able to translate the data into a world they’re familiar with.

Visualizing Data in Space and Time

We pull all of these concepts together in the following example. We have a time bar down at the bottom of the slide, and all of our data points on our map:

This is information pulled from the open source Boston Hubway data set, which shows the trips taken on publically-available bicycles over a certain period of time. Data pulled from geographic networks tends to be very dense, as it is in this example, which makes it difficult to spot any meaningful insights right off the bat.

But let’s see what happens when we add time and location. We can select a single node (highlighted in red) and look at the journeys to and from that particular location. If you look at the time bar at the bottom of the slide, you can start to see patterns in the data and how those patterns change over time:

Data visualization in geographic networks.

The red journeys are the ones that end at this location, while the green journeys are trips that begin at that location. In this particular location, a lot of journeys end there in the morning and begin there in the evening. Based on that information, you might be able to spot that this is a place of work, where people commute to in the beginning of the day and leave from in the end of the day.

If we pick another station further out of town, we see exactly the opposite pattern: journeys begin there in the morning and end there in the evening:

More data visualization for geographic networks.

This is how you can combine time and space information to get a whole lot more out of your graph than you would get by just storing latitude, longitude and timestamps.

Keylines and Neo4j

These demos are web browser examples that we put together at KeyLines, which is a component for building visualizations in JavaScript. It’s fully compatible with Neo4j, and is easy to use for these types of visualizations that bridge the gap between your organization’s data and end users who are looking for actionable data insights from graphs.

New to graph technology?

Grab yourself a free copy of the Graph Databases for Beginners ebook and get an easy-to-understand guide to the basics of graph database technology – no previous background required.

Get My Copy

↧

We Just Closed the Largest Single Investment in the Graph Space. Now What?

November 1, 2018, 4:16 am

≫ Next: How to Know What You Know: 5-Minute Interview with Dr. Alessandro Negro, Chief Scientist at GraphAware

≪ Previous: Graphs in Time and Space: A Visual Example

Learn why we just closed the largest single investment in the graph technology space and what's next

I’m thrilled to announce that Neo4j has just closed $80 million in a series E funding round.

We are happy to welcome One Peak Partners and Morgan Stanley Expansion Capital to the graph of Neo4j funders. I’d also like to thank all of our existing investors who participated in this round, including Creandum, Eight Roads and Greenbridge Partners. (Be sure to check out our official announcement with all the other details here.)

For those keeping track at home, Neo4j has now raised $160 million in growth funding, representing the largest cumulative investment in the graph technology category. I’m incredibly proud to be a part of that vote of confidence in the power of graphs.

So, where do we go from here?

A Quick Look Back at Our Series D

It’s been two years since we raised our series D round, and a lot has changed.

Two years ago, graph technology was just starting to go mainstream. For years before that, graph databases had been considered “boutique” or “niche” by analysts, journalists and development teams. But in late 2016, graphs hit an inflection point in enterprise adoption.

That inflection was due to the convergence of several factors: First, the maturity of our product made it possible for more and more enterprises to use Neo4j for mission-critical applications. At the same time, we’d reached a tipping point of awareness around specific use cases like fraud detection and real-time recommendation engines. This awareness was further reinforced by many of our enterprise deployments becoming publicly referenceable.

It was at this critical elbow of the adoption S-curve when we raised our Series D, perfectly positioning us to take advantage of the rapidly accelerating market.

Fast forward to today, and graphs are mainstream.

Connected data is an imperative for large organizations, and graph technology is on every enterprise shopping list. Businesses that don’t have a graph-powered solution are looking to get one, and those that already use graph technology are developing their second, third or fourth(!) application. Adoption is exploding and far past “going mainstream.”

I think this slide from my GraphConnect keynote says it best:

A summary of enterprise adoption of graph technology

Why Fundraise, Why Now

So, why did we fundraise now?

One word: Adoption.

The graph paradigm shift is fully underway, and the demand for graph technology is accelerating faster than ever before. Our customers are pushing the envelope on the what, where and how of graphs. They’re asking for more, more, more of everything.

Raising this round is about meeting that demand. With our Series E, we’ll maintain our leadership in the space by continuing to build out the vision of our graph platform. Since day one, we’ve been singularly focused on making connected data accessible to developers through the Neo4j database. Now as we transition into a platform company, we retain that focus on accessible connected data but with an entire stack of native graph technologies.

But let’s double click on one particular factor that doesn’t just loom largest on the horizon of the graph space, but on the horizon of the entire tech space: artificial intelligence.

On the Near Horizon: Graphs and AI

Artificial intelligence and graph technology are at a critical crossroads. Graphs and AI have a symbiotic relationship that further strengthens and accelerates one another.

To prove to you, dear reader, that this isn’t just “AI washing” (!), let me take you back six years ago to GraphConnect 2012. We had maybe 150 people in the room (my six-month-old daughter was one of them), and our keynote speaker was Dr. James Fowler from UCSD.

Dr. James Fowler & Emil Eifrem at GraphConnect 2012

At the risk of simplifying his research so much that I misrepresent it, here’s the crux of his presentation: Imagine two parallel universes. In the first universe, I know everything about you. I know your name, your gender, your height, your weight, your genes, your medical history, your diet, your schedule, your breakfast choices, etc. Everything.

In the second universe, I know nothing about you except that you exist. I don’t even know your gender, for example. But, in this universe I know just a bit about your friends and your friends of friends (your graph!). I don’t know them intimately – it’s sufficient to know whether they smoke or whether they plan on voting or whatnot.

Which universe do you think gives me the best information at predicting your behavior?

To a lot of people’s surprise, in the second universe – driven by connections – I can more accurately predict an individual’s behavior than in the first. Dr. Fowler proved this scientifically in his research, as chronicled in his book Connected.

In other words, predictions are best driven not by discrete, individual records but by relationships.

Let’s look at how that plays out in machine learning. Most machine learning pipelines look something like this:

Today's machine learning pipelines rely on discrete data points

When we train our ML models, we use a sequence of discrete data records, almost like a row in an RDBMS. Each data point is identified and processed discretely. This is an example our first universe where we only know data about the individual.

What would machine learning look like in our second universe? It would use not just the individual data records, but how those records are connected (i.e., the graph) to train our ML models. Our new machine learning pipeline would look something like this:

A machine learning pipeline imagined using a graph of connected data

On the near horizon, I believe we’ll see the majority of machine learning shift from analyzing individual rows of data to also analyzing graphs of connected data. That shift will result in more accurate predictions and thus better decisions.

This change is happening already. Across every use case and industry, we’re seeing graph-and-AI deployments that feed one another with the much-needed context for each technology to grow to the next level, not just incrementally but exponentially.

We are on the cusp of a new Cambrian explosion of graph-powered artificial intelligence.

I believe it’s time to seize that opportunity.

↧

How to Know What You Know: 5-Minute Interview with Dr. Alessandro Negro, Chief Scientist at GraphAware

November 2, 2018, 12:00 am

≫ Next: Graph Algorithms in Neo4j: Use Cases for Graph Transactions & Analytics

≪ Previous: We Just Closed the Largest Single Investment in the Graph Space. Now What?

Read this interview on graph technology with Dr. Alessandro Negro of GraphAware.

“I want to know what I know. That describes what knowledge graphs do for companies,” said Dr. Alessandro Negro, Chief Scientist at GraphAware.

In this week’s five-minute interview, we discuss how GraphAware uses natural language processing to help companies gain a better understanding of the knowledge that is spread across their organization.

Can you tell us a little bit about GraphAware?

Dr. Alessandro Negro: We are mainly graph consultants. That means that we deliver a different kind of professional services and solutions to companies that, at some point, decided to start using the graph – and specifically Neo4j – as a part of their infrastructure.

We also help companies that already have Neo4j and want to get more from it. In this case specifically, we deliver services – like machine learning, recommendations or natural language processing – built on top of Neo4j.

Can you tell us about some of the Neo4j projects that you’re working on?

Negro: We’ve been working on a lot of interesting projects in the last few years. Currently, we are helping companies to get more from the data they already have. For instance, we are helping banks implement better fraud detection analysis. We use graph algorithms on top of a graph model to spot fraud that is difficult to find with classical rule-based mechanisms. We deliver graph visualizations that help analysts perform a more effective and efficient investigation.

We also help companies, such as those in ecommerce, to deliver better recommendations or better search for their customers, implementing search customization or semantic search. Recently, more and more companies are asking us to extract knowledge from textual data.

GraphAware does a lot of work with natural language processing in Neo4j. Can you tell us about that?

Negro: We started working on natural language processing a couple of years ago when we worked with a charity organization. They asked us to merge people’s comments and stories together and they mixed this textual data to provide some type of recommendation for a place to eat or sleep or get medical help or find a better job. We found out that supplementing the graph model with natural language processing offered a better way to deliver these kinds of services to the end user.

Generally, when we talk about textual data, we think of it as unstructured data, but that is not completely true because language has a lot of structure related to grammar and the lexicon and so forth. Most of this structure is hidden in the language itself. Through natural language processing, you can extract these structures and store them in the form of a graph. We found that storing the data in a graph format has a lot of advantages.

We started with one client, but then we abstracted this concept to a Neo4j plug-in that delivered natural language processing on top of the graph database. We evolved it over time into a product called Hume.

Can you tell us more about what Hume is and what it does?

Negro: I’ll describe it in the words of a potential customer who said, “I called you because I want to know what I know.” When I heard that, I thought, “That is an amazing way to describe what Hume can do for companies.” Hume helps companies gather data from multiple data silos and sources and organize it in a structured and homogenous, collaborative knowledge graph. It is a knowledge graph in the broader sense – a kind of interconnected set of entities with attributes. It is collaborative because it is the result of collaboration among people in multiple organizational units.

At the end of this process, what companies get is a new asset that creates new value for the company, because they have a better understanding of what they already know. It is their knowledge; it’s just distributed across different organizational units or different people.

In this way, they have this new infrastructure, this new data, well organized, and on top of this, they can deliver a different set of services to the end users, or they can use the knowledge for improving the quality of their product or the way they are creating products. Or they can save costs for the company.

We have a lot of companies that are interested in getting more from their data. And of course, the Neo4j graph database, at the core of this knowledge graph, is playing a key role in delivering this type of services to our customers.

What do you think the future of graph technology looks like?

Negro: I’m writing a book called Graph-Powered Machine Learning. It summarizes my thoughts about the role of graph and graph models in the coming years.

I have been working with Neo4j since 2012, and been a part of the Neo4j community since that time. And I follow the evolution, not only of Neo4j but also the evolution of the market around graph technology.

In the beginning, a lot of people came to the meetups with no idea about graphs. They would say, “What is a graph? What can a graph do for my company?”

Now the scenario is completely different. You meet people who know what graphs are and who may already have graphs in their infrastructure, but they would like to get more from graph technology. We are in the second phase of the evolution of graph technology. People are aware of the power of the technology and would like to get advanced services on top of it. And most of them are machine-learning projects.

I hear people say, “I’m thinking about recommendation engines” or “I’m thinking about using a knowledge graph for powering conceptual search.” Classical natural language processing, when applied to a graph, can get you more – both in terms of analyzing the data and also in accessing the data. That is completely different from the approach based on the inverted index, for example, that was commonly used in search engines.

I’m expecting that in the next five years this process will evolve more and more because more people already have their data in the graph or are planning to move their data to the graph so they can extract insight and wisdom from it.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com

Using graph databases for journalism or investigation?
Read this white paper The Power of Graph-Based Search, and learn to leverage graph database technology for more insight and relevant database queries.

Discover Graph-Based Search

↧

Graph Algorithms in Neo4j: Use Cases for Graph Transactions & Analytics

November 5, 2018, 12:00 am

≫ Next: Xfinity xFi & User Personalization with Graphs

≪ Previous: How to Know What You Know: 5-Minute Interview with Dr. Alessandro Negro, Chief Scientist at GraphAware

Explore the many concrete use cases for graph technology and graph analytics.

Today’s most pressing data challenges center around connections, not just tabulating discrete data. Graph analytics accelerate breakthroughs across industries with more intelligent solutions.

This blog series is designed to help you better leverage graph analytics so you can effectively innovate and develop intelligent solutions faster.

Last week we traced the rise of graph analytics. This week we’ll explore a few of the many concrete use cases for graph technology.

In the coming weeks, we’ll look at how graph technology is driving emerging AI applications and then dive deeper into the practical applications of graph algorithms, using examples from Neo4j, the world’s leading graph database.

From Chatbots to Cybersecurity

Graph technology is versatile and supports a broad swath of use cases. For example, eBay’s ShopBot uses Neo4j to deliver real-time, personalized user experiences and recommendations (see graph below).

See an example how eBay's ShopBot uses Neo4j for recommendations.

Cybersecurity and fraud systems correlate network, social and IoT data to uncover patterns. More accurate modeling and decisioning for a range of dynamic networks drives use cases from subsecond packaging of financial commodities and routing logistics to IT service assurance to predicting the spread of epidemics.

Graph technologies help businesses with many practical use cases across industries and domains, a few of which are highlighted in the sections that follow.

Real-Time Fraud Detection

Traditional fraud prevention measures focus on discrete data points such as specific account balances, money transfers, transaction streams, individuals, devices or IP addresses. However, today’s sophisticated fraudsters escape detection by forming fraud rings comprised of stolen and synthetic identities.

To uncover such fraud rings, it is essential to look beyond individual data points to the connections that link them. Connections are key to identifying and stopping fraud rings and their ever-shifting patterns of activities. Graph analytics enable us to detect fraud in real-time and shows us that, indeed, fraud has a shape.

Real-Time Recommendations

Graph-powered recommendation engines help companies personalize products, content and services by contextualizing a multitude of connections in real time. Making relevant recommendations in real time requires the ability to correlate product, customer, historic preferences and attributes, inventory, supplier, logistics and even social sentiment data.

A real-time recommendation engine requires the ability to instantly capture any new interests shown during the customer’s current visit – something that batch processing can’t accomplish.

360-Degree View of Data

As businesses become more customer centric, it has never been more urgent to tap the connections in your data to make timely operational decisions. This requires a technology to unify your master data, including customer, product, supplier and logistics information to power the next generation of ecommerce, supply chain and logistics applications.

Organizations gain transformative, real-time business insights from relationships in master data when storing and modeling data as a graph. This translates to highlighting time- and cost-saving queries around data ownership, customer experience and support, organizational hierarchies, human capital management and supply chain transparency.

A flexible graph database model organizes and connects all of an organization’s master data to provide a live, real-time 360° view of customers.

Streamline Regulatory Requirements

Graph technology offers an effective and efficient way to comply with sweeping regulations like the EU’s General Data Protection Regulation (GDPR), which requires that businesses connect all of the data that they have about their customers and prospects.

Organizations manage enterprise risk by providing both the user-facing toolkit that allows individuals to curate their own data records and the data lineage proof points to demonstrate compliance to authorities.

Management & Monitoring of Complex Networks

Graph platforms are inherently suitable for making sense of complex interdependencies central to managing networks and IT infrastructure. This is especially important in a time of increasing automation and containerization across both cloud and on-premises data centers. Graphs keep track of these interdependencies and ensure that an accurate representation of operations is available at all times, no matter how dynamic the network and IT environment.

Identity & Access Management

To verify an accurate identity, the system needs to traverse through a highly interconnected dataset that is continually growing in size and complexity as employees, partners and customers enter and leave the system. Users, roles, products and permissions are not only growing in number but also in matrixed relationships where standard “tree” hierarchies are less relevant.

Traditional systems no longer deliver real-time query performance required by two-factor authentication systems, resulting in long wait times for users. Using a graph database for identity and access management enables you to quickly and effectively track users, assets, devices, relationships and authorizations in this dynamic environment.

Social Applications or Features

Social media networks are already graphs, so there’s no point converting a graph into tables and then back again by building a social network on an RDBMS. Having a data model that directly matches your domain model helps you better understand your data, communicate more effectively and avoid needless work.

A graph database such as Neo4j enables you to easily leverage social connections or infer relationships based on user activity to power your social network application or add social features to internal applications.

Conclusion

We’ve covered just a few of the use cases that are fueled by graph technology. Next week we’ll look take a deeper dive into one particularly exciting area: graph technology and AI applications.

↧

Xfinity xFi & User Personalization with Graphs

November 7, 2018, 12:00 am

≫ Next: How Real-Time Recommendations Increase Revenues, Optimize Margins and Delight Customers [Infographic]

≪ Previous: Graph Algorithms in Neo4j: Use Cases for Graph Transactions & Analytics

Jessica Lowing, Technical Product Manager at Comcast, discusses graph database technology for XFINITY xFi smart home.

Editor’s Note: This presentation was given by Jessica Lowing at GraphConnect New York in September 2017.

Presentation Summary

We’ve come a long way towards making our homes smarter – but we still have a long way to go. Many of the tasks we want our homes to perform, such as “turn out the lights in my kid’s’ room,” requires insight into complex semantic and social relationships.

Comcast has embarked on creating and perfecting its xFi smart home prototype, which – based on research – they knew had to include connected devices, connected people, rich interfaces and automation to be truly helpful. Some products that grew out of this prototype include tools like Kidwatch, which notifies you when your child arrives home from school, and Porchcam, which displays on your TV screen the person who just rang your doorbell.

The xFi team quickly learned that the key to making these tools successful was to use rich definitions for their data in order to leverage relationships that makes their tools as useful as possible. This led to the Comcast allows customers to personalize their smart homes:

Have you ever fallen asleep with the lights on in another room? Or left the house wondering if the oven was on or if the windows or garage door were open? Or left the house without taking an umbrella or jacket? Or been at the grocery store and wondered whether or not you had any chicken left in the fridge?

If our homes were actually smart, it would do all of this stuff for us. Today, there’s a lot of things our homes could be doing for us that they aren’t. And unfortunately, the only progress we’ve made so far is to connect a bunch of things. But in reality, we still have a long way to go towards making our homes truly interactive and helpful.

When it comes down to it, our homes still aren’t very smart.

Smart Home, Turn Off My Kid’s Lights!

Let’s dive into an example with my BFFs River and Lily. Let’s say we want our smart home to turn off the lights in Lily’s room, which is a pretty simple request that your average smart home could probably do. “Alexa, turn off the floor lamp in Lily’s room.” Alexa has that covered.

How about, “Start the washer and dryer after this movie is over?” This request gets a bit trickier, so Alexa probably couldn’t handle this one. If you live in a giant house, you might also want to know whether or not the girls are home, which is something your smart home can’t currently tell you.

Our light example showcases a strong relationship to semantics and social relationships, where we have to know a lot of information: who is speaking, who are the kids, what’s in the kids’ room, do they have lights in their room, etc.

Let’s look at this from a computer’s point of view:

The answers to our questions above are as follows: I am Maggie (a red node), my kids are Lily and River (also red nodes), the room I’m talking about is the kids’ bedroom (purple node), and the subset of light devices in the room I want to control are a floor lamp, nightstand light and ceiling light (green nodes).

If you were truly in a smart home, you would be able to speak naturally to it. It would know you and your family, the context of your real life, and the world you live in — and have the ability to do the things you want, when you want. As a parent, you want to feel safe, comfortable, in control of the sea of devices in your home, as well as connected to the people outside of your home.

The experience of a smart home should be easy and rewarding, not a burden. This is the big problem we’ve been looking at for a long time.

Building the Smart Home Prototype

Comcast is a large telecommunications company that provides cable television, ISP, products for home security, mobile phones and telephony. We own NBCUniversal and have syndication partners across North America.

In 2013, we started a new team based out of Sunnyvale, California – with part of the team in Philadelphia, where Comcast is headquartered – to research and prototype a smart home of the future. We named our team “Jarvis” for none other than Tony Stark’s smart home in Iron Man, and I joined the team as a technical product manager. My masters from MIT’s AI lab in distributed robotic control systems helped me bring insight to this connected home and the problem space in which we’re working.

We started by taking a long, hard look at the question, “What would the connected home of the future really look like?” We based our prediction off of university research, trends in the market and products offered by different companies, which together led to some core themes:

“Connected devices” refers to the ability for all the devices in your home to naturally interact; connected people, which amounts to integrating the natural relationships between people in the real world; rich interfaces, which refers to all the voices and interfaces of different devices; and automation to control those devices.

Next, we set out to develop some functional prototypes that would incorporate these core themes:

These included:

PorchCam on TV: When my doorbell rings, I want to see my PorchCam on my X1 television and tell my smart home to unlock the door.
Where’s Waldo: Ask my smart home, “Where is my daughter?” and see a map of Lily’s coordinates within the city, displayed on my TV.
BYODevice: Bring any device to my home, and quickly and easily integrate it into my smart home of other devices
Kidwatch: Tell me when my youngest daughter comes home from school, or if she fails to arrive by 4:00 PM.
Jawbone Granny: Jawbone is like a FitBit but with an open API. Has my grandmother exercised today? If not, we should plan a family walk around dinnertime.

From Prototype to Product

We held a series of internal demos within Comcast, and the response was very positive – so positive that we were told to go build these tools. Our next step was to evaluate the gap between prototype and production by examining the current services available at Comcast, and on the broader market, and determining the work it would take to bridge that gap.

We started with Xfinity Home, a home security product with a suite of devices such as cameras, lights and door locks that is controlled via an interface. But this system doesn’t include a socially-embedded functional network, because it only included the subset of devices that came off the truck with your technician.

In 2015, we launched Works with Xfinity Home, which let you add other devices to your Xfinity Home. Now “Turn off all the lights” could include your Philips Hue or your Lutron dimmer. In 2016, we tackled automation by building the Xfinity Rules Engine, Rulio. It’s a really powerful, open-source tool that lets you ask questions like, “Tell me when River comes home from school.”

Let’s review how we accomplish this within Xfinity Home:

I want to know when the kids come home from school, so I’ll set up a push notification or SMS whenever the door opens or the system is disarmed. I only have two rules to manage, and all is good in the world. Now I live in a smart home.

But what happens when you add more components, like facial and voice recognition, or when specific devices join your home WiFi network?

No human in their right mind is going to want to manage this number of device-based rules.

When it comes down to it, device automation is not the solution here. So even though we’ve built a lot of connected tools, we still haven’t built a smart home yet.

The Key to Success: Rich Definitions

Circling back to our initial example, we want to know when River comes home. But this notification doesn’t currently differentiate between who is arriving home. It could be River, or the dog-walker or the babysitter.

To begin making these differentiations, we need a richer definition of what a person is within our systems:

A person is not just an ID. A person is a set of relationships to personal information, locations, people and devices.

Similarly, we also needed a rich definition of a home:

What is a home? It’s a physical building in a specific location with a unique topology. There are people who live there and come and go, devices that come and go, and services that support those devices.

Why do we need this rich definition? Since people are at the center of these smart homes, they also need to be at the center of our automation – which brings us back to modeling social and semantic relationships.

This takes us to the XFINITY profile graph, which is a scalable, flexible, multi-tenant user-profile service for extending personal information and relationships across XFINITY products. It lets us model our customers’ real-life relationships, and provides the context so that XFINITY applications provide a more personalized experience for users.

The XFINITY Profile Graph

Design Principles

Before I get into the details, I want to cover a few core ideas and design principles we used when building this graph:

Big idea #1: We need to build a shared platform at the household level that can be used by any Xfinity application to provide the user(s) the same set of information.

Big idea #2: We need to model this set of relationships as a graph data structure. This data is richly connected, and the real value provided is the set of relationships between the data points.

Big idea #3: We need a native graph database. A bad design would be to support graph semantics on a relational database, or in a non-relational database with indexing. But in either of those structures, the graph complexity would grow along with the data, which would require giant crawls through the database to take advantage of those relationships. In a native graph, we bypass this problem.

GraphQL and Neo4j

The Xfinity profile graph is built with GraphQL APIs on Neo4j, a natural fit. This allows platform developers to build generic, expressive APIs for our clients, and provides client developers with the benefit of being intuitive and flexible.

It provides the ability to query for information each of the different applications are interested in, and supports the specific traversal through that graph. For the end customer, it provides applications that support unique experiences for each customer based off that same data service, but with unique information for each household.

Let’s take a look at this again: “Turn off the lights in my kid’s room.” It was kind of tricky before, but it’s not so bad when you look at it through GraphQL:

We can build this and give it to our applications.

The User Interface

Let’s take a look at how this can surface to an end customer, taking a person’s WiFi network as an example. This is traditionally how people manage their home WiFi network: you write your network name and your password on a sticky note and give it to whoever needs to use it. This provides no insight into your network, or any sort of connection to the people in your home.

Xfinity xFi is Comcast’s personalized WiFi experience that is tailored to customers’ individual needs. xFi has introduced user profiles, which lets you name people in your home and assign devices to them. There’s also a concept of “guest” and “household.”

xFi lets you manage your network in terms of the people in your home, a novel way of managing network management that allows you to really dig in and see what’s going on with their devices:

Xfi smart home using Neo4j graph technology.

For each device, you can go in and see individual usage patterns and pause individual devices. Now you can pause all of your kids’ devices when it’s dinnertime, which is really the people-based experience parents are after.

Overall, it gives you an easy way to see what’s going on and control devices at the household, person, and device level. You can learn more about what Xfinity xFi does here. We also made some big announcements about the future of xFi at CES in February of 2018.

Apply Smart Home Technology to Your Industry

If you knew more about your customers, how could you change the products you’re building today? You are just as empowered as my team was back in 2013 to take a hard look at the state of the world and make predictions about where it’s going, to explore, to try things out and prototype.

Connect with your users. Let your product fit naturally into their lives because every customer’s life is going to be unique and different. Go build great personalized products, because that’s what’s getting us closer to a smart home. And you are empowered to do exactly that.

Want to learn more about what you can do with graph databases? Click below to get your free copy of the O’Reilly Graph Databases book and discover how to harness the power of connected data.

Download My Free Copy

↧

How Real-Time Recommendations Increase Revenues, Optimize Margins and Delight Customers [Infographic]

November 8, 2018, 12:00 am

≫ Next: New Features, Now: 5-Minute Interview with Mark Hashimoto, Senior Director of Engineering, Digital Home at Comcast

≪ Previous: Xfinity xFi & User Personalization with Graphs

Check out this infographic on the benefits of using graph technology to power real-time recommendations.

“You may also like” sounds simple, but there’s a lot happening behind the scenes.

Real-time recommendations work best when they take into account both the user’s needs (what is of interest to them) and your business strategy (items you need to promote).

The truly amazing thing is how real-time recommendation strategies are now being adopted by so many industries beyond retail, travel and entertainment. New industries finding the benefits of real-time recommendation engines include government services, financial services, healthcare and job recruiting.

What’s fueling these game-changing recommendations?

Graph technology. With Neo4j Graph Platform we have built a hybrid recommender framework that uses a score-based approach to provide the best-fit recommendations, by leveraging multiple techniques like collaborative filtering, content filtering, business rules and knowledge-based filtering.

Check out the infographic* below to learn how graph technology connects all of your data and enables you to use multiple methods to put you in control.

Read this infographic on real-time recommendations using graph technology.

*Also available in French and German!

Like this infographic? Share it with your network on Twitter, LinkedIn or Facebook.

Level up your recommendation engine:
Learn why a recommender system built on graph technology is more powerful and efficient with this white paper, Powering Recommendations with Graph Databases – get your copy today.

Read the White Paper

↧

New Features, Now: 5-Minute Interview with Mark Hashimoto, Senior Director of Engineering, Digital Home at Comcast

November 9, 2018, 12:00 am

≫ Next: FOSDEM 2019: Join Us in the Graph Developer Room!

≪ Previous: How Real-Time Recommendations Increase Revenues, Optimize Margins and Delight Customers [Infographic]

Check out this 5-minute interview with Mark Hashimoto of Digital Home at Comcast.

“One of the most surprising things I’ve seen with Neo4j is the speed at which we’re able to innovate and deliver features to our customers,” said Mark Hashimoto, Senior Director of Engineering, Digital Home at Comcast.

In this week’s five-minute interview, we discuss how Comcast uses the flexibility of the graph data model to develop and launch new features rapidly using Neo4j for persistence.

Tell us about how Comcast is using Neo4j.

Mark Hashimoto: I work in the Digital Home space, which includes XFINITY Home Security and high-speed Internet. We use Neo4j to develop new features for our customers. Neo4j is our persistent store for personalizing and adding contextual value to the products we already have.

What made you choose Neo4j?

Hashimoto: Four years ago, we were thinking about the data model that we needed to model a digital smart home – people, places (like rooms), all your IoT devices, your laptop, your Xbox, your Alexa speaker and so on. We thought, “What’s the best, most intuitive way to model that, persistently?”

One of my engineers read a paper about graph databases. We started doing a survey of the market. We looked at a lot of different graph technologies. Then we started trying out graph products, both commercial and open source.

Something about Neo4j that we really liked is the support model. At Comcast, open source is great, but when something breaks, you can’t just yell into the ether, “Hey, fix it, random developer X.” We need somebody to have our back in case we have a production issue, and that quickly eliminated a number of players.

What have been some of the most surprising or interesting results you’ve had while using Neo4j?

Hashimoto: One of the most surprising things that I’ve seen with Neo4j is the speed at which we’re able to innovate and deliver features to our customers. Especially at GraphConnect, most people are talking about fraud detection, recommendations, and putting your data into a data fabric and looking at it and getting insights. For us, it was more about getting features out to customers very rapidly.

Since graph databases are inherently schemaless, the graph model allows us to add new data types and new paradigms and just attach them to an existing profile or device or person. That was surprisingly powerful – more powerful than I ever thought it would be.

We launched another feature just this week where, for example, Kelly Clarkson fans who want to see her in concert can buy tickets through their X1 Voice Remote. They can actually buy the tickets directly off the television. We wouldn’t be able to do that unless it was this flexible. It was surprising how fast we were able to iterate.

If you could start over with Neo4j, taking everything you know now, what would you do differently?

Hashimoto: We would probably do more diligence in trying to figure out the best way to do a multi-region, all public cloud deployment using Neo4j. That would probably be something that we would invest in sooner than later.

What do you think the future of graph technology looks like in your industry?

Hashimoto: At Comcast, we have many lines of business. We have cable television, high-speed internet, theme parks in Florida and California as well as China and Japan. But I believe that where our industry is going is creating a data fabric. Like many very large Fortune 50 companies, each business runs independently at its own speed.

Because they do that, their IT departments tend to be siloed or specialized. They do something really well and they do it extremely well. But your cable customers are probably very different from your customers that are going to Universal Studios and bringing their families and riding the rides. If we could pull some of that data together, we will probably find some really interesting business opportunities.

I think that’s where the graph industry’s actually going: to become more of a data fabric. To connect all this data from different silos, within the same business.

Anything else you’d like to add?

I want to thank everybody at Neo4j for being great partners to us and running a great conference. It’s been a pleasure. I see a lot of people smiling. It’s a very friendly conference, so if you haven’t gone to GraphConnect, you should definitely go.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com

↧

FOSDEM 2019: Join Us in the Graph Developer Room!

November 9, 2018, 9:00 am

≫ Next: #GraphCast: Graph Karaoke Featuring The Knife’s “Heartbeat”

≪ Previous: New Features, Now: 5-Minute Interview with Mark Hashimoto, Senior Director of Engineering, Digital Home at Comcast

Like every year, the fall conference season is in full swing when we’re already preparing for the next batch of conferences in spring 2019.

We’re happy to announce that the graph processing and graph database developer room was accepted again for FOSDEM 2019. Like every year we expect a packed room for the whole day with fascinating sessions around many graph-related topics.

Check out all the details and dates here, including information about submissions: http://graphdevroom.org.

Discover the Schedule for the Graph Processing Room at FOSDEM 2016

FOSDEM is an annual conference at the Université Libre de Bruxelles in Belgium and 100% gathering 8000+ free and open source enthusiasts from all over the world. It’s all about sharing ideas, meeting contributors, collaborating and – of course – Belgian beer. An important part of the conference are the very popular Developer Rooms (DevRooms), which are all-day sessions focused on a specific topic, e.g., graphs, free Java, data science, etc.

Next year, we are proud to host the 7th edition of the Graph DevRoom, again with the of Neo4j community and several partners from the graph ecosystem. The DevRoom will take place on Saturday, February 2, 2019 with a welcome at 10am and talks from 10:30am till 7pm.

We want to invite you, the graph community, to join the conference and the Graph DevRoom, as visitors but also as speakers. We will again give creators and maintainers of graph solutions, researchers, geeks and open source hackers the possibility of presenting their latest work to an attentive audience.

This includes:

Graph databases, RDF stores and specialized network databases
Graph query languages (e.g. (open)Cypher, GQL, GraphQL or SPARQL) and user-friendly APIs (e.g. Gremlin)
(Distributed) Graph processing / analytics frameworks
Graph streaming and its applications
Machine learning and artificial intelligence applications using graphs
Natural language processing
Semantic Graphs (RDF) / Knowledge Graphs
Benchmarks
(Large-scale) graph visualization
Graphs and the Internet-of-Things
Real-life application of graph processing including industry experience

Of course, other related topics are welcome.

If you want to give a talk or a demo of your or someone else’s graph project, please submit a short proposal by Dec 2, 2018.

Here are the important dates

Submission deadline 02. Dec 2018
Notification of accepted speakers 09. Dec 2018
Schedule Publication 16. Dec 2018
Devroom day (Room TBA) 02. Feb 2019

If you want to get an impression on how these talks might look like, check out our DevRoom from earlier this year.

And the best thing: The FOSDEM Conference is free to attend and always a great experience.

Can’t wait to see you there!
Michael for the Neo4j Team

↧

#GraphCast: Graph Karaoke Featuring The Knife’s “Heartbeat”

November 11, 2018, 12:00 am

≫ Next: Graph Algorithms in Neo4j: Graph Technology & AI Applications

≪ Previous: FOSDEM 2019: Join Us in the Graph Developer Room!

GraphCast is a new Neo4j blog series featuring videos you should see.

Welcome to our new biweekly Sunday series, #GraphCast, which aims to unearth digestible, notable and just plain fun Neo4j YouTube videos (of which there are a lot).

Whether we focus on some of our most popular videos or highlight a particularly solid educational piece on graph technology that may’ve slipped past you, #GraphCast is meant to be short, sweet and the perfect companion piece to your Sunday morning bowl of cereal (or two, if you’re hungry, we don’t judge).

This week, we decided to dig deep in the YouTube library and kick things off with Graph Karaoke. Though there are several in this music-inspired series featuring graph database Cypher querying around song lyrics, we’re highlighting The Knife’s “Heartbeat” today because it has an excellent Sunday groove.

If you like what you see, other Graph Karaoke vids feature The Boss, Michael Jackson, AC/DC and ABBA. Also, we’d be remiss if we didn’t encourage you to subscribe to the Neo4j YouTube channel, which is updated weekly with tons of graph database goods.

↧

Graph Algorithms in Neo4j: Graph Technology & AI Applications

November 12, 2018, 12:00 am

≫ Next: Graph Databases for Beginners: ACID vs. BASE Explained

≪ Previous: #GraphCast: Graph Karaoke Featuring The Knife’s “Heartbeat”

Learn the use cases of graph technology and AI applications.

Graph technologies are the scaffolding for building intelligent applications, enabling more accurate predictions and faster decisions. In fact, graphs are underpinning a wide variety of artificial intelligence (AI) use cases.

This blog series is designed to help you better leverage graph analytics to effectively innovate and develop intelligent applications faster.

Last week we looked at a variety of use cases for graph analytics, from real-time fraud detection to recommendation engines. This week we’ll delve deeper into a few of the ways that a graph database like Neo4j supports numerous AI use cases.

Knowledge Graphs

Andrew Ng, a preeminent thought leader in the field, includes knowledge graphs as one of the five main areas of AI. Knowledge graphs represent knowledge in a form usable by machines.

Graph analysis surfaces relationships and provides richer and deeper context for prescriptive analytics and AI applications like TextRank (a PageRank derivative) alongside natural language processing (NLP) and natural language understanding (NLU) technologies.

For example, in the case of a shopping chatbot, a knowledge graph representation helps an application intelligently get from text to meaning by providing the context in which the word is used (such as the word “bat” in sports versus zoology).

Machine Learning Model Enhancement & Accelerated AI

Graphs are used to feed machine learning models and find new features to use for training, subsequently speeding up AI decisions. Graph centrality algorithms such as PageRank identify influential features to feed more accurate machine learning models and measurable predictive lift. Graph analysis computes Boolean (yes/no) answers in real time and continuously provides them as a tensor for AI recalculation and scoring.

Graph Execution of AI & Decision Tracking

An operational graph – replacing a rules engine to run AI – is a natural, next step for intelligent applications. As coding AI systems in graphs becomes a norm, it will enable the tracking of AI decisions. This kind of decision tree lineage is essential for adoption and maintenance of AI logic in critical applications.

Global Graph Analytics for Theory Development

Graph analytics lift out global structures and reveal patterns in your data – without you requiring any prior knowledge of the system. For example, community detection and other algorithms are used to organize groups, suggest hierarchies, and predict missing or vulnerable relationships. In this way, you are essentially using graph-driven theory development that infers micro and macro behaviors.

AI Visibility

The adoption of AI in part depends largely on the ability to trust the results. Human-friendly graph visualizations display or explain machine learning processes that are often never exposed within ML’s “black box.” These visualizations serve as an abstraction to accelerate data scientists’ work and to provide a visual record of how a system’s logic has changed over time. Visualizations help explain and build confidence in and comfort with AI solutions.

System of Record for AI Connections

Graphs serve as a source of truth for all your related AI components to create a pipeline for iterative tasks. They automate the sourcing and capture of related AI components so that data scientists focus on analysis and more easily share frameworks.

Learn more about AI, machine learning and deep learning with graph technology.

Source: Curt Hopkins

Conclusion

AI is a burgeoning area for graph technology. Many customers are using building intelligent applications using Neo4j.

In our next blog in this series, we’ll dive deeper into graph algorithms in the Neo4j platform and explain the three main types of graph algorithms.

↧

Graph Databases for Beginners: ACID vs. BASE Explained

November 13, 2018, 12:00 am

≫ Next: Creating a Data Distribution Knowledge Base Using Neo4j

≪ Previous: Graph Algorithms in Neo4j: Graph Technology & AI Applications

Learn the differences between the ACID and BASE data consistency models and their trade-offs

When it comes to NoSQL databases, data consistency models can sometimes be strikingly different than those used by relational databases (as well as quite different from other NoSQL stores).

The two most common consistency models are known by the acronyms ACID and BASE. While they’re often pitted against each other in a battle for ultimate victory (please someone make a video of that), both consistency models come with advantages – and disadvantages – and neither is always a perfect fit.

Let’s take a closer look at the trade-offs of both database consistency models.

In this Graph Databases for Beginners blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graph technology is the future, why connected data matters, the basics (and pitfalls) of data modeling, why a database query language matters, the differences between imperative and declarative query languages, predictive modeling using graph theory, the basics of graph search algorithms and why we need NoSQL databases.

This week, we’ll take a closer look at the key differences between ACID and BASE database consistency models and what their trade-offs mean for your data transactions.

The ACID Consistency Model

Many developers are familiar with ACID transactions from working with relational databases. As such, the ACID consistency model has been the norm for some time.

The key ACID guarantee is that it provides a safe environment in which to operate on your data. The ACID acronym stands for:

Atomic

All operations in a transaction succeed or every operation is rolled back.

Consistent

On the completion of a transaction, the database is structurally sound.

Isolated

Transactions do not contend with one another. Contentious access to data is moderated by the database so that transactions appear to run sequentially.

Durable

The results of applying a transaction are permanent, even in the presence of failures.

ACID properties mean that once a transaction is complete, its data is consistent (tech lingo: write consistency) and stable on disk, which may involve multiple distinct memory locations.

Write consistency is a wonderful thing for application developers, but it also requires sophisticated locking which is typically a heavyweight pattern for most use cases.

When it comes to NoSQL technologies, most graph databases(including Neo4j) use an ACID consistency model to ensure data is safe and consistently stored.

The BASE Consistency Model

For many domains and use cases, ACID transactions are far more pessimistic (i.e., they’re more worried about data safety) than the domain actually requires.

In the NoSQL database world, ACID transactions are less fashionable as some databases have loosened the requirements for immediate consistency, data freshness and accuracy in order to gain other benefits, like scale and resilience.

(Notably, the .NET-based RavenDB has bucked the trend among aggregate stores in supporting ACID transactions.)

Here’s how the BASE acronym breaks down:

Basic Availability

The database appears to work most of the time.

Soft-state

Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.

Eventual consistency

Stores exhibit consistency at some later point (e.g., lazily at read time).

BASE properties are much looser than ACID guarantees, but there isn’t a direct one-for-one mapping between the two consistency models (a point that probably can’t be overstated).

A BASE data store values availability (since that’s important for scale), but it doesn’t offer guaranteed consistency of replicated data at write time. Overall, the BASE consistency model provides a less strict assurance than ACID: data will be consistent in the future, either at read time (e.g., Riak) or it will always be consistent, but only for certain processed past snapshots (e.g., Datomic).

The BASE consistency model is primarily used by aggregate stores, including column family, key-value and document stores.

Navigating ACID vs. BASE Trade-offs

There’s no right answer to whether your application needs an ACID versus BASE consistency model. Developers and data architects should select their data consistency trade-offs on a case-by-case basis – not based just on what’s trending or what model was used previously.

Given BASE’s loose consistency, developers need to be more knowledgeable and rigorous about consistent data if they choose a BASE store for their application. It’s essential to be familiar with the BASE behavior of your chosen aggregate store and work within those constraints.

On the other hand, planning around BASE limitations can sometimes be a major disadvantage when compared to the simplicity of ACID transactions. A fully ACID database is the perfect fit for use cases where data reliability and consistency are essential (banking, anyone?).

In the coming weeks we’ll dive into more ACID/BASE specifics when it comes to aggregate stores and other graph technologies.

Move beyond the basics:
Get your copy of the O’Reilly Graph Databases book and start using graph technology to solve real-world problems.

Get the Book

Catch up with the rest of the Graph Databases for Beginners series:

Why Graph Technology Is the Future

Why Connected Data Matters

The Basics of Data Modeling

Data Modeling Pitfalls to Avoid

Why a Database Query Language Matters (More Than You Think)

Imperative vs. Declarative Query Languages: What’s the Difference?

Graph Theory & Predictive Modeling

Graph Search Algorithm Basics

Why We Need NoSQL Databases

A Tour of Aggregate Stores

Other Graph Data Technologies

Native vs. Non-Native Graph Technology

↧