Neo4j: From Graph Database to Graph Platform

October 24, 2017, 6:00 am

≫ Next: Neo4j Graph Database 3.3 Release: Everything You Need to Know

≪ Previous: Accelerating Neo4j with CAPI SNAP from IBM Power Systems

Today at GraphConnect New York, Neo4j has announced our transformation from the provider of a graph database into the creator of a graph platform.

We are making this change to address the evolving needs of customers in their deployment of Neo4j, in both needing to interoperate within a complex IT infrastructure as well as to help a variety of users and roles succeed with Neo4j.

The figure below depicts both the elements of the platform as well as those user roles that they serve. As we look at this new graph platform strategy, we’ll consider these capabilities within those user contexts.

Learn how Neo4j is evolving beyond being a graph database and becoming a graph technology platform

The Neo4j Graph Platform is designed to help you introduce graph technology and Neo4j beyond just developers to new users, like data scientists, data analysts, business users, and executives.

In deploying the platform, your big data IT teams will enjoy new data integration features designed to make fitting within your infrastructure easier. This is so data scientists and analysts can use new tooling like Cypher for Apache Spark and the graph algorithms library to support their graph analytics needs. All while helping business users understand and participate in the evolution of graph-based applications.

Looking Deeper at the Platform

The Neo4j Graph Platform is built around the Neo4j native graph database and includes the following components:

Neo4j Desktop is developers’ mission control console for all things Neo4j
Neo4j Native Graph Database supports transactional applications and graph analytics
Neo4j Graph Analytics helps data scientists gain new perspectives on data
Data integration tools expedite distilling RDBMS data and other big data into graphs
Graph visualization and discovery help communicate graph benefits throughout the organization
The platform-wide enterprise architecture supports corporate availability, scaling and security needs

Introducing Neo4j Desktop

Neo4j Desktop is the new mission control console for developers. It’s free with registration, and it includes a local development license for Enterprise Edition, and an installer for the APOC user-defined procedures library. It is available immediately and will become the vehicle by which most users experience Neo4j.

The Neo4j Desktop will evolve quickly and eventually connect to your production servers, and install other components like graph algorithms, run Neo4j ETL or upgrade Java. For developers, you get exposure to Enterprise Edition convenience features like:

Built-in user management, user security, Kerberos authentication and LDAP integration
Performance boost from compiled Cypher, enterprise lock management and space reuse
Schema features like Node Keys, existence constraints and composite indexes
Scaling features like unlimited nodes and relationships, and supported Bolt language drivers
All platform components and interfaces like query management and the Neo4j Browser
Exposure to production deployment features like high availability, disaster recovery, secure Causal Clustering, IPv6 and least-connected load balancing.

The amount of time these features save for developers is astounding. Users can download Neo4j Desktop immediately.

Now Shipping: Neo4j Graph Database 3.3

Also for Neo4j developers users, we have shipped Neo4j 3.3, the latest version of the Neo4j graph database. In this version, we focused on improving write and import performance in Community Edition as well as cluster throughput and security in the Enterprise Edition. The result is that 3.3 writes are more than 50% faster than 3.2 and twice as fast as 3.1.

Let’s take a closer look at what we have improved since Neo4j 3.2:

Write data faster: Improvements to import and write speeds was the most popular request from users. Here’s what we’ve done to address that:

Native indexes for numeric properties, improves inserts and updates speeds dramatically.
The bulk data importer uses 40% less memory, and adds the ability to use the page cache when memory is full.
In clusters, during high speed writing activities, we now pre-fetch node IDs in batches, which eliminates an annoying performance bottleneck.
For very large graphs, we have move the metadata about the page cache, into the page cache, which allows more of the actual graph to be contained within memory and delays the need for caching to disk.

This is the fifth consecutive release to improve overall write speed, and this release’s improvement bumps are impressive.

Write performance speed for the Neo4j graph database

Cypher is faster: The Cypher compiler has been refactored and results in 30-50% faster Cypher execution.

Less intrusive locking and configuration updates: Local database locking rules are less restrictive, and some server configuration features can be changed without requiring the database to be restarted.

Cluster operations are faster and more secure in Enterprise Edition

Least-Connected Load Balancing: We have changed the automatic load balancer from round-robin selection to choosing active servers who are “least connected” first. This allows us to optimize the throughput of the cluster, and it works with the existing routing rules for transactionality, write consensus with Causal Cluster core servers, replication priority for read-replica servers, data center routing priorities, and read-your-own-write (RYOW) consistency support.

Intra-Cluster Encryption is now supported in Enterprise Edition, automatically securing transmissions to data centers or cloud zones. Because we are managing security ourselves, the certificate generation and management is built into the Enterprise Edition binary. Intra-cluster encryption supports backup and disaster recovery routing as well.

Enterprise Edition Binary supports commercial licenses only. In order to support the encryption functions above, the pre-built binary for Enterprise Edition is only available through a commercial license.

IPv6 : Causal Clustering also now supports IPv6 to extend its horizontal scalability.

Graph Analytics with Graph Algorithms to Extend OLTP Functionality

Graph analytics help organizations gain a connections-first perspective of their data assets that may never have been revealed before. The new analytics functions includes the ability to materialize graphs from relational and big data (or any data), and then explore them in a “hypothesis-free” manner – i.e., without knowing what you are looking for.

The new graph algorithms library supports the ability to detect hard-to-find patterns and structures in your connected data including:

Community detection algorithms to evaluate how your graph is clustered or partitioned
Pathfinding algorithms to quickly find the shortest path or evaluate route availability and quality
Centrality algorithms like PageRank to determine the importance of distinct nodes in the network

Why Do Graph Analytics Matter?

The Neo4j Graph Platform helps operationalize connections revealed as “big data analytic insights” and turn them into high-value graph applications. In fact, we are starting to see customers support workflow loops that involve the activities of developers, big data IT data suppliers and data scientists.

Here’s a sketch of what that workflow loop looks like:

Developers build real-time graph traversal applications, which quickly become hungry to grow and need more data to connect and traverse.
Big data IT can supply that new data and increase the utility of the data lake.
Data scientists can use this existing and new data to explore, test and develop new algorithms and share them back with developers to operationalize in the application.
The data scientists’ algorithms not only operate upon the expanded graph, they can add data and connections to the graph itself, essentially making it smarter for each subsequent query as the application operates.
If you look at the interaction of these teams, you’ll notice a loop where developers, big data IT and data scientists help the graph application grow by itself. This is the foundation of artificial intelligence, and we call it the AI workflow loop enabled by the Graph Platform.

We see this AI loop in play in recommendation engines seeking deeper contextual connections, fraud detection applications mapping transaction and money flow, cybersecurity applications, and more.

The AI workflow loop with the Neo4j Graph Platform

We also see other emerging workflow loops:

Knowledge Graph workflow loops involve big data IT, data scientists and traditional business analysts who all want a common catalog (ontology) of data assets and metadata, and these Knowledge Graphs want to represent how all these assets are related.
Digital Transformation workflow loops between big data IT, business analysts who know both data and business processes, and the C-level initiative holder – like a Chief Data Officer or Chief Security Officer, or Chief of Compliance – all of whom need to see and understand how their information assets and business processes relate to each other.

Cypher for Apache Spark

We recognized the indisputable popularity of Apache Spark as an in-memory computation engine that supports relational, streaming, learning and bulk graph processing. Yet we also noticed that Spark’s graph capability is missing a declarative query language that enables pattern matching and graph construction.

To address this, and advance the adoption of graphs within the Spark and big data market, we have donated an early version of the Cypher for Apache™ Spark® (CAPS) language toolkit to the openCypher project. This contribution will allow big data analysts to incorporate graph querying in their workflows, making it easier to bring graph algorithms to bear, dramatically broadening how they reveal connections in their data.

Cypher for Apache Spark implements the new multiple graph and composable query features emerging from the work of the openCypher Implementers Group (oCIG) which formed earlier this year. The openCypher project is hosting Cypher for Apache Spark as alpha-stage open source under the Apache 2.0 license, in order to allow other contributors to join in the evolution of this important project at an early stage.

Cypher for Apache Spark is the first implementation of Cypher to allow queries to return graphs, as well as tables of data. CAPS introduces the newest features of the Cypher query language adopted in the openCypher Implementers Group.

The new features in Cypher will include:

Multiple named graphs which allow Cypher to identify specific graphs upon which to operate.
Compositional graph queries allows users to save graph query results as graphs in addition to Cypher’s default of returning tables. Supporting this allows compositional queries, which can be chained together in a function chain of graph algorithms, and this provides the building blocks for multiple graph-handling queries.
Path control: New language features are in gestation to give user control over path uniqueness while defining traversal types like “walks,” “trails” and “paths” as specific patterns.
Regular path expressions apply regular expression syntax in concisely stating complex path patterns when extracting subgraphs. It applies a powerful path expression language based on the leading-edge research language GXPath.

Cypher for Apache Spark and data sources

In making Cypher available for Apache Spark, we looked closely at the way Spark works with immutable datasets, and then in coordination with the openCypher group, brought in facilities that let graph queries operate over the results of graph queries, and an API that allows graphs to be split, transformed, snapshotted and linked together in processing chains that give huge flexibility in shaping graph data, including data from users’ data lakes.

Data Integration

Fresh from the Neo4j engineering labs, the following are available as pre-release software from Neo4j product management or via special request through Neo4j sales to support enterprise data integration needs. We will continue to invest in both Neo4j ETL and the Data Lake Integrator (and more) to ease moving data in and out of graphs.

Neo4j ETL

Neo4j ETL reveals graph connections from relational/tabular data and delivers an exceptional initial and ongoing experience moving data into the Neo4j graph database. Its graphical interface allows the DBA to:

Connect to popular databases via JDBC.
Arrange table structures as labeled nodes and identify JOINs and JOIN tables as relationships.
Map or change labels and properties prior to execution.
Data is exported in graph-ready CSVs and fed to the Neo4j importer.
Graph connections are materialized through relational JOINs as data is imported, and then they are persisted permanently after import.

Neo4j ETL also provides a visual editor to transform relational data into graph data.

Data Lake Integrator

Data lakes struggle to derive value from their accumulated data. While it’s easy to fill the data lake, wrangling its contents, adding context and delivering it to analytical and operational use cases remains an IT challenge. Transforming that data to reveal and traverse its connectedness is all but impossible.

The Neo4j Data Lake Integrator surfaces value submerged in data lakes by making it possible to read, interpret and prepare data as both in-memory and persisted graphs for use in real-time transactional applications and graph analytic exercises within the Graph Platform. The Data Lake Integrator can help enterprises:

Build metadata catalogs: Discover definitions of data objects and their relationships with each other to weave a highly connected view of information in the data lake. The resulting metadata graph will help data exploration, impact analysis and compliance initiatives.
Wrangle data: Combine data from the lake with other sources including Neo4j, wrangling it with Cypher – the SQL for graphs. Composable queries allow Cypher queries to return data in an in-memory graph format using Apache Spark™, as well as in tabular or CSV formats.
Conduct graph analytics: Import data directly from the data lake into the Neo4j Graph Platform for faster and intuitive graph analytics using graph algorithms such as PageRank, community detection and path finding to discover new insights.
Operationalize data: Import data directly from the data lake into the Neo4j Graph Platform for real-time transactional and traversal applications.
Snapshot graphs: The output of the Data Lake Integrator is a properly structured CSV for reconstituting Neo4j graphs. These files can be saved as snapshots, versioned, diffed, reused and backed up in HDFS.

Discovery and Graph Visualization

Finally, the Graph Platform is also able to reach business analysts and users via our collection of partners like Linkurious, Tom Sawyer Software, Tableau, Jet Brains and KeyLines. We also have the Neo4j Browser and our in-house professional services team to help construct custom visualizations.

We hope you enjoy the Neo4j Graph Platform and look forward to our investment in it.

–Jeff Morris,
on behalf of the entire Neo4j team

The Neo4j Graph Platform is here!
Download Neo4j 3.3 – along with other parts of the platform – find out for yourself why it’s the #1 platform for connected data.

Explore the Graph Platform

The post Neo4j: From Graph Database to Graph Platform appeared first on Neo4j Graph Database.

↧

Neo4j Graph Database 3.3 Release: Everything You Need to Know

October 31, 2017, 7:37 am

≫ Next: Cypher – the SQL for Graphs – Is Now Available for Apache Spark

≪ Previous: Neo4j: From Graph Database to Graph Platform

As part of the Neo4j Graph Platform announced at GraphConnect New York last week, we are excited to announce the general availability release of the Neo4j Graph Database version 3.3.

At the heart of the Graph Platform is the native graph database itself, which has a number of release-defining features to satisfy stakeholders across the IT organization.

Let’s take a closer look at the improvements and upgrades new to the Neo4j Database 3.3:

Learn all about the 3.3 General Availability (GA) release of the Neo4j Graph Database

What’s New in Neo4j 3.3 – All Editions:

Cypher and Write Performance

Write performance is 55% faster than Neo4j 3.2 and nearly 350% faster than version 2.3, making it possible to ingest more data in a shorter time. Transactional writes benefit from new native indexes, which replace Neo4j’s Lucene-based indexes for numeric properties. You’ll recall that in 3.2, we went native for label indexes which accounted for most of the performance boost in that release.

Neo4j’s ACID characteristics, as with the previous indexes, encompass the indexes in addition to the data. Bulk writes have also received a major boost: reducing the memory footprint by up to 40% and leveraging virtual memory in RAM-constrained environments.

Cypher performance has also improved. The new Cypher interpreter is now 40-70% faster according to internal tests.

Data Import, Integration & ETL

Neo4j’s data import functionality uses 40% less memory and adds page caching for large imports.

The new (pre-release) Neo4j Data Lake Integrator toolkit supports the import and export of HDFS files into graphs and back again.

Also in pre-release is the new Neo4j ETL that simplifies the process of morphing data from relational database management systems (RDBMS) into graph data. This process reveals previously invisible long-path relationships hidden within structured data.

RDBMS-to-graph and bulk importing improvements let you hit the ground running with Neo4j.

Upgrades for Graph Analytics

As part of the Graph Analytics component of the native Graph Platform, data scientists now enjoy the new impressively fast graph algorithms library – included as part of the APOC library – that now ships with Neo4j Database 3.3.

Use graph algorithms to reveal unseen patterns in your connected data.

Also of significant note is Cypher for Apache Spark (CAPS) which not only binds the ease of writing Cypher with a massively scalable, in-memory graph analytics processor, but its queries also return subgraphs. This makes Cypher composable and allows users to chain graph queries in sequence to carry out complex algorithmic logic. Graphs can now be saved in both Neo4j and as snapshot files in HDFS.

What’s New in Neo4j 3.3 – Enterprise Edition Only Features:

Neo4j Desktop

The leading Enterprise Edition feature of the 3.3 release is the introduction of Neo4j Desktop. This package is the new developers’ mission control console.

It provides a free local instance of Neo4j Enterprise Edition for development and includes access to Neo4j graph visualization and development tools and an installer for the APOC and graph algorithm libraries.

The Neo4j Desktop package is your mission control for organizing and managing all of your Neo4j projects.

Neo4j Database Kernel Improvements

Numerous improvements to the database kernel expand database availability during upgrades.

Neo4j Enterprise Edition now allows key configuration parameters to be changed on the fly, without needing to recycle a database instance. Local Schema Locks automatically narrow the scope of locks, avoiding the need for the database to take a global schema lock when creating or changing a schema object or constraint.

The database now also offers a cache-hit ratio, available at both a query and database level, as a new metric for helping to size one’s database cache.

Scalability Improvements: Load Balancing, Security & More

Other Enterprise Edition feature upgrades and improvements worth noting include:

Intra-server encryption keeps all server-to-server transmissions safe, from data center to data center and across cloud zones – all managed in Bolt.
The Bolt driver’s new least-connected load balancing replaces round-robin selection in order to maintain high cluster throughput under high demand conditions.
Also, node ids are fetched in batch, removing an otherwise pesky bottleneck revealed when clusters are under heavy activity.
Clusters now support IPv6 which sets the stage for trillions of device and IP connections.

Conclusion

As you can see, with the advent of the Neo4j Graph Platform the core graph database has not only improved within itself but also in relation to all other components of the platform.

We’re confident that the 3.3 release of the Neo4j Graph Database will by far be everyone’s favorite release to date. If you haven’t already, make the upgrade to 3.3 and see why Neo4j is the #1 platform for connected data.

Don’t just take our word for it:
Download Neo4j Desktop right now and experience the fastest, most scalable version of the Neo4j Database to date.

Download Neo4j 3.3

The post Neo4j Graph Database 3.3 Release: Everything You Need to Know appeared first on Neo4j Graph Database.

↧

Cypher – the SQL for Graphs – Is Now Available for Apache Spark

November 1, 2017, 1:42 pm

≫ Next: Neo4j & GRANDstack at GraphQL Summit [+Developer Challenge]

≪ Previous: Neo4j Graph Database 3.3 Release: Everything You Need to Know

Learn about the alpha release of Cypher for Apache Spark™ making Cypher available for analytics

In case you missed it at GraphConnect New York: The Neo4j team has announced the public alpha release of Cypher for Apache Spark™ (CAPS).

We’ve been building Cypher for Apache Spark for over a year now and have donated it to the openCypher project under an Apache 2.0 license, allowing for external contributors to join at this early juncture. Find the current language toolkit on GitHub here.

Making Cypher More Accessible to Data Scientists

Cypher for Apache Spark will allow big data analysts and data scientists to incorporate graph querying into their workflows, making it easier to leverage graph algorithms and dramatically broadening how they reveal data connections.

Up until now, the full power of graph pattern matching has been unavailable to data scientists using Spark or for data wrangling pipelines. Now, with Cypher for Apache Spark, data scientists can iterate easier and connect adjacent data sources to their graph applications much more quickly.

As graph-powered applications and analytic projects gain success, big data teams are looking to connect more of their data and personnel into this work. This is happening at places like eBay for recommendations via conversational commerce, Telia for smart home, and Comcast for smart home content recommendations.

CAPS: A Closer Look

Follow the openCypher blog and read the latest post for the full technical details of Cypher on Apache Spark (CAPS).

Cypher for Apache Spark enables the execution of Cypher queries on property graphs stored in an Apache Spark cluster in the same way that SparkSQL allows for the querying of tabular data. The system provides both the ability to run Cypher queries as well as a more programmatic API for working with graphs inspired by the API of Apache Spark.

CAPS is the first implementation of Cypher with support for working with multiple named graphs and query composition. Cypher queries in CAPS can access multiple graphs, dynamically construct new graphs, and return such graphs as part of the query result.

Furthermore, both the tabular and graph results of a Cypher query may be passed on as input to a follow-up query. This enables complex data processing pipelines across multiple heterogeneous data sources to be constructed incrementally.

CAPS provides an extensible API for integrating additional data sources for loading and storing graphs. Initially, CAPS will support loading graphs from HDFS (CVS, Parquet), the file system, session local storage, and via the Bolt protocol (i.e., from Neo4j). In the future, we plan to integrate further technologies at both the data source level as well as the API level.

Cypher for Apache Spark also is the first open source implementation of Cypher in a distributed memory / big data environment outside of academia. Property graphs are represented as a set of scan tables that each correspond to all nodes with a certain label or all relationships with a certain type.

Conclusion

True to our open source roots, CAPS is the first release of a full Cypher system within the ecosystem of the openCypher project, making it available for re-use, modification and extension by the wider community. This is an early alpha release, and we will help further develop and refine CAPS until the first public release of 1.0 next year.

Until then, we look forward to your feedback and contributions. The data industry is recognizing the true power of graph technology, and we’re happy to be building the de facto graph query language alongside our amazing community.

New to the world of graph technology?
Click below to get your free copy the O’Reilly Graph Databases book and discover how to harness the power of graph database technology.

Get My Free Book

The post Cypher – the SQL for Graphs – Is Now Available for Apache Spark appeared first on Neo4j Graph Database.

↧

Neo4j & GRANDstack at GraphQL Summit [+Developer Challenge]

November 2, 2017, 1:46 pm

≫ Next: Retail & Neo4j: Personalized Promotion & Product Recommendations

≪ Previous: Cypher – the SQL for Graphs – Is Now Available for Apache Spark

The Neo4j Developer Relations team is tasked with making sure developers can build applications backed by Neo4j using their favorite technologies. One technology we’ve been really excited about recently is GraphQL.

GraphQL is a relatively new paradigm for building APIs. You can learn more about GraphQL by reading this ref-card. The DevRel Engineering team has spent some time building integrations around GraphQL and Neo4j, and we’re happy to share them with you!

GraphQL Summit

Already in its second year, GraphQL Summit is a developer-focused conference showcasing how developers are using GraphQL. At GraphQL Summit, Will and Michael presented our integrations of GraphQL for Neo4j.

We really enjoyed the conference, and spoke to a lot of developers from startups and surprisingly a significant number of large companies that started to use GraphQL to unify their API landscape and make it easier to develop applications. The talks covered a wide range of uses, initiatives, problems and solutions around GraphQL. We were especially thrilled by topics around subscriptions, authentication and open source announcements and launches by IBM, Apollo and Graphcool.

One prominent theme at GraphQL Summit this year was on the ecosystem of tools being developed and made available to the community. This included a new release of the most popular GraphQL client, Apollo Client 2.0.

The 2.0 release makes Apollo Client more modular, allowing for more customization and incremental adoption. Apollo Client includes integrations with many frontend frameworks, including React. You can see an example of Apollo Client 2.0 as part of a simple GRANDstack app here. Apollo also demonstrated Engine, their tool for performance tracing, schema management, error tracking and caching.

We unveiled an early version of neo4j-graphql-js, a JavaScript library that uses the GraphQL schema to drive the Neo4j data model and translate GraphQL to Cypher. It integrates really well with the existing GraphQL client and server tools, namely Apollo-client and -server and the just released Apollo Client 2.0 together with Engine and Link.

GRANDstack

We’re happy to announce the launch of GRANDstack (GraphQL, React, Apollo, Neo4j Database) just in time for GraphQL Summit. GRANDstack is a combination of technologies and integrations to enable full-stack development, taking advantage of synergies and symbiotic relationships between technologies in the stack.

Full-stack developers have an amazing choice of technologies available to them and it’s important to move beyond the LAMP and MEAN stacks as technologies have evolved. GRANDstack is our take on how to realize developer productivity and performance from modern tools.

GRANDstack: GraphQL, React, Apollo Client and Neo4j Database

GRANDstack builds on integrations between technologies, such as neo4j-graphql-js, providing a prescriptive and opinionated combination of technologies and integrations for building full-stack applications. Our approach of combining the power of the graph-based data model of Neo4j with the expressiveness of the GraphQL schema and queries was well received at GraphQL Summit.

By inspecting the GraphQL query and schema we’re able to generate a single Cypher query that efficiently retrieves the requested data from Neo4j. We can also generate query types and mutations for the types in your schema.

This means less boilerplate code for GraphQL resolvers for fields, query types and mutations. Incidentally it also saves you from updating all those resolver queries when refactoring your schema. Running just one query against your backend database doesn’t suffer from the n+1 query issue that usually arises from sending one query per field resolver.

To add the power of Cypher to your application, you can either annotate fields, mutations or query types with @cypher directives, e.g., to return recommendations, tree summaries or shortest paths as computed information, or to just handle common cases for querying and updating through Cypher.

If you want to learn more, visit grandstack.io or attend one of our upcoming GRANDstack meetups or training classes. You can also see the slides from Will’s presentations at GraphConnect New York here on neo4j-graphql-js and a GRANDstack overview from the San Francisco JS Meetup here.

Neo4j-GraphQL Apollo Launchpad Challenge

Fork our example Apollo Launchpad and win a neo4j-graphql hacker T-shirt and sticker!

You can see an example of in this Apollo Launchpad that shows a GraphQL server providing an endpoint for a simple Movie Graph. To try it out, you just fork the launchpad and spin up a Neo4j Recommendations Sandbox and add the credentials as secrets.

The Neo4j, Apollo Launchpad and GraphQL challenge

This gives you not just a running GRANDstack backend but we’ll also send you a

T-shirt and sticker if get your forked Launchpad working with Neo4j. Just tweet out the URL for your working Launchpad with #GRANDstack and #Neo4j to claim your t-shirt and sticker.

Level up your graph database game:
Click below to register for our online training class, Introduction to Graph Databases and master the world of graph technology in no time.

Sign Me Up

The post Neo4j & GRANDstack at GraphQL Summit [+Developer Challenge] appeared first on Neo4j Graph Database.

↧

Retail & Neo4j: Personalized Promotion & Product Recommendations

November 6, 2017, 12:00 am

≫ Next: Going Meta: Exploring the Neo4j Graph Database…as a Graph

≪ Previous: Neo4j & GRANDstack at GraphQL Summit [+Developer Challenge]

Today’s retailers face a number of complex and emerging challenges.

Thanks to lower overhead and higher volume, online behemoths like Amazon can deliver products faster and at a lower price, driving smaller retailers out of business.

In order to compete, retailers need a new approach – and the fresh technologies that go along with it.

Learn how Neo4j is used for product recommendations and personalized promotions in the retail sector

In this new series on Neo4j and retail, we’ll break down the various challenges facing modern retailers and how those challenges are being overcome using graph technology. This week, we’ll start with personalized promotions and product recommendations.

Neo4j Powers Personalized Promotion & Product Recommendation Engines

Delivering real-time recommendations to online shoppers is a proven way to maximize revenue. It improves the customer experience and increases sales.

However, shoppers expect finely tuned product recommendations and react poorly to one-size-fits-all or uninformed recommendations (e.g., “I’ve already bought that. Why are they showing it to me again?”). To be effective, recommendations must be personalized based on the individual consumer’s preferences, shopping history, interests and needs – in addition to what’s already in their current shopping cart.

Real-time recommendations require data products that connect masses of complex buyer and product data (and connected data in general) to gain insight into customer needs and product trends.

This cannot be achieved with relational database (RDBMS) technology; the SQL queries are too complex and take too long to provide recommendations in real time. The same goes for big data processing technologies like Hadoop and Spark. These technologies work well for something like email recommendations – which are delivered once a day – but they are not real time.

The Challenges with Traditional Systems

A legacy relational database attempting to do retail product recommendations

Data stored in relational databases and other silos is too disconnected and slow for real-time recommendations.

By design, graph database quickly query customers’ past purchases, as well as instantly capture any new interests shown in their current online visit, both of which are essential for making real-time recommendation engines.

Because relationships are treated as first-class entities in a graph database, retailers can connect customers’ browsing history with their purchasing history, as well as their offline product and brand interactions.

This enables a real-time product recommendation algorithm to utilize a customer’s past and present choices to offer personalized promotion recommendation. No offline pre-compute is necessary, eliminating the associated delay.

Graph technology uses for the retail industry

With a graph database, a retailer’s entire inventory, supply chain, customer data and other systems are all connected.

Furthermore, in order to counter dynamic pricing from the likes of Amazon, retailers need the ability to change pricing and promotions at any level of a product hierarchy in real time. For example, they must be able to mark down all 60-inch televisions by 10% for the next two hours if the right economic and competitive factors indicate such a move is necessary.

Similarly, retailers must be able to implement competing promotions. They might reduce all smartphone prices except Apple iPhones due to Apple’s strict pricing guidelines.

Real-time promotions such as these involve complex rules that are made simple when handled by a graph database like Neo4j. The database may hold millions of relationships that have only one parent node. With a graph database, retailers can change one relationship type rather than a thousand products and all their prices.

Case Study: Walmart

Walmart became the world’s largest retailer by understanding its customers’ needs better than any competitor. An important tool in achieving that understanding is the Neo4j Graph Database.

Walmart’s Brazilian ecommerce group wanted to understand the behavior and preferences of its online buyers with enough speed and in enough depth to make real-time, personalized “you may also like” recommendations. However, Walmart quickly recognized that it would be difficult to deliver such functionality using traditional relational database technology.

“A relational database wasn’t satisfying our requirements about performance and simplicity, due to the complexity of our queries,” said Marcos Wada, software developer at Walmart.

To address this, Marcos’s team decided to use Neo4j, the leading graph database. Matching a customer’s historical and session data is trivial for graph databases, enabling them to easily outperform relational and other NoSQL data products.

Walmart deployed Neo4j in its remarketing application run by the company’s ecommerce IT team based in Brazil, and it has been using Neo4j in production since early 2013. Neo4j enables Walmart to understand online shoppers’ behavior, as well as the relationship between customers and products.

As a result, the retailer has also been able to up- and cross-sell major product lines in core markets.

“With Neo4j we could substitute a complex batch process that we used to prepare our relational database with a simple and real-time graph database. We could build a simple and real-time recommendation system with low latency queries,” Marcos said.

Case Study: Top-Ten Retail Company

One Top 10 US-based, bricks-and-mortar retailer turned to Neo4j after its burgeoning online operation was almost overwhelmed by the volume of customer traffic it attracted on Cyber Monday 2015.

The company was running its site on an IBM DB2 relational database, and on Cyber Monday 2015, it offered an across-the-board 15% discount to site visitors. While the retailer had pulled in more customers than any other bricks-and-mortar rival – one of the project’s target metrics – the price paid was unacceptable: The site’s checkout function kept working that day, but 90% of customer traffic was delayed.

As one senior company executive said: “We pushed a lot of guests to the site and we were very successful in terms of volume. But the reality was we got significantly more traffic than we ever projected, and we couldn’t handle it. We protected checkout so the site functioned. But we disappointed way too many guests, and that’s never okay, period.”

The biggest bottleneck was the crucial but complex personalized promotions process, where the company invites shoppers to add last-minute extras to their online cart. To flash up exactly the right recommendations requires software that can instantly analyze the shopper’s cart contents and their buying history, and dig through 15-30 layers of data – such as promotion types, qualifying manufacturers, product names and categories – all in real time.

This proved beyond a conventional relational database like DB2. So, the retailer considered Neo4j, which is optimized to rapidly carry out such complex searches among masses of connected data.

The company already knew its biggest rival, Walmart, had turned to Neo4j to provide the best web experience for its customers (see case study above), so in mid-2016 the company rolled out both a new Neo4j-based front-end and backend to its website, transforming the company’s real-time personalized promotions engine and online cart promotion calculations.

Neo4j now processes 90% of the retailer’s 35 million-plus daily transactions – which involve between three and 22 hops across different layers of data – in 4 milliseconds or less. And during Q4 2016 – the vital Christmas retail period – the company’s digital sales rose 34% to a record high, helped by the friction-free Neo4j solution.

Conclusion

An effective product recommendation engine can’t be half-baked or only partially efficient. Either recommendations (or promotions) are timely and relevant or they convince your would-be customers that your ecommerce site only offers stale, pre-computed suggestions.

The only way to craft truly personalized promotions or product recommendations – that consider not only past buying history and current session data – is to use graph technology to power your recommendation engine. As the leader in the space, Neo4j is the graph technology of choice.

In the coming weeks, we’ll take a closer look at other ways retailers are using graph technology to create a sustainable competitive advantage, including customer experience personalization, ecommerce delivery service routing, supply chain visibility, revenue management and IT operations.

It’s time to up your retail game:
Witness how today’s leading retailers are using Neo4j to overcome today’s toughest industry challenges with this white paper, Driving Innovation in Retail with Graph Technology. Click below to get your free copy.

Read the White Paper

The post Retail & Neo4j: Personalized Promotion & Product Recommendations appeared first on Neo4j Graph Database.

↧

Going Meta: Exploring the Neo4j Graph Database…as a Graph

November 7, 2017, 2:51 am

≫ Next: Neo4j: The Power behind the Paradise Papers

≪ Previous: Retail & Neo4j: Personalized Promotion & Product Recommendations

The graph data model is inherently visual. Try explaining a graph to someone new. You’ll inevitably draw a picture, or wave your hands around to convey what you mean by ‘nodes… links…. and more nodes’. People think in graphs, and they interpret graph intelligence visually. That’s what makes graph visualization such a powerful tool.

Graph visualization is deeply intuitive and harnesses the brain’s unrivalled ability to spot patterns. It’s also flexible enough to apply to virtually any dataset. If there’s an interesting relationship in your data somewhere, you’ll find value in graph visualization.

To prove it, I thought we’d go a bit meta.

In this post, I’ll use KeyLines’ graph visualization power to explore the Neo4j GitHub community. It’ll show how KeyLines makes your graph data more accessible, insightful and valuable.

Our Stack

First we’ll look at the technologies we’ll use for this GitHub exploration tool:

KeyLines – for the visualization front-end
GraphQL – to fetch data from the GitHub API
Neo4j – as a graph datastore ‘cache’ for our GitHub data
Angular – to neatly tie the project together

The Neo4j KeyLines graph visualization architecture

KeyLines and Neo4j integrate seamlessly. With a Neo4j backend, we can cache a copy of our data locally for faster response times. It also gives us access to powerful graph query and analysis functionality.

Neo4j and KeyLines play nicely with the new kid on the block too – GraphQL. The Facebook-backed query language is a specification for pulling data efficiently and in a more ‘type aware’ way than REST. Particularly exciting is its capacity to support nested queries, reducing the number of calls our app needs to make.

Loading a GitHub Account

Let’s kick off our visual exploration by searching GitHub for the world’s most popular graph database:

Loading Neo4j’s 20 most recently updated repos

With each KeyLines interaction (a search, a click, a double-click, etc) we’re setting off a set of actions:

Send an event to the service provider.

The service auto-generates some Cypher to query the Neo4j database:

MATCH (User {login:"christian-cam"})-[PullRequest:PULL_REQUEST]->(Repository)
   	RETURN User, PullRequest, Repository

If the response is blank, it sends a GraphQL query to the GitHub API:

query ($login: String!) {
  user(login: $login) {
	id
	name
	avatarUrl
	login
	company
	pullRequests(first: 50, states: [MERGED], orderBy: {field: CREATED_AT, direction: DESC}) {
  	nodes {
    	id
    	title
    	number
    	commits {
      	totalCount
    	}
    	repository {
      	id
      	name
     	 owner {
        	id
        	login
        	avatarUrl
      	}
    	}
  	}
	}
  }
},
{‘login’: ‘christian-cam’}

The GitHub API returns some data, which is cached in our Neo4j instance. It’s then loaded into KeyLines and styled according to your customization code.

The Neo4j GitHub org contains 26 repos, hundreds of users, and millions of pull requests and diffs. It’s a vast data set that we couldn’t hope to understand in its raw format. Thankfully, graph visualization will help us distill thousands lines of data into an interactive chart.

My graph data model. It looks complex, but KeyLines will help us explore it manageably

Understanding the Structure with Automated Layouts

The beauty of graph visualization is its ability to convey complex graph structures that you can understand right away. Automated layouts are critical. Each of KeyLines’ seven graph layouts will reveal different features of the network.

The structural layout helps to reveal distinct communities. Here’s the Neo4j GitHub community:

meta exploration of Neo4j graph visualization

We can see that each repo has a distinct community of contributors – some large, some small. At the heart of the GitHub community, we have a core of contributors acting as bridges:

Some of Neo4j’s cross-repo community heroes

I’ve used a simple color code to indicate the type of connections between people and repos:

Grey = pull requests
Red = issues raised
Blue = pull request reviews

Pull requests have been bundled into weighted links to avoid chart clutter. With this view, one account that stands out is lutovich, especially in some of the driver repos.

Double-clicking on lutovich reveals their most recent commits:

Explore the Neo4j Graph Database community on GitHub using the power of Neo4j graph visualization

This ‘expand and layout’ approach is a powerful way to explore large graphs. It puts the user in the driving seat, so they can explore details at their own pace.

Filtering the Graph by Time

A prominent feature of my GitHub app is the KeyLines time bar – a neat component for exploring temporal networks.

On first load, the spike of pull requests around September stands out. There’s no surprise that this was around the time of Neo4j 3.2.5 and 3.3 beta.

Pull requests (grey), issues (red) and reviews (blue) all see a spike in September 2017

Let’s zoom in to that time period on the chart:

A GitHub pull request spike in September

We can see a great deal of development effort going into the core product, plus the the Neo4j Browser, docs, OGM and a couple of drivers – insight that we couldn’t see in our initial graph view.

Social Network Analysis

So we’ve zoomed into our graph’s details – now let’s try exploring outwards. Here I’ve added a few Neo4j partners with GitHub accounts:

Our starting point is familiar. KeyLines’ standard layout spaces nodes around the chart, revealing three distinct clusters with some collaboration between. But the chart is still fairly cluttered.

KeyLines’ new and improved combos functionality is a powerful way to declutter a graph, highlighting the most important nodes and links. Here’s what happens when we combine our repos and run the standard layout:

Combos and layout in KeyLines graph visualization

In two clicks we’ve transformed a cluttered chart into a clear graph visualization. We can instantly see Neo4j’s GitHub community contributors, and the bridges between the different projects. This approach can be applied to any kind of graph dataset, revealing trends and patterns that would otherwise be hidden.

Try It Yourself

Inspired to try some graph visualization? We’re happy to help! You can see the power of KeyLines new combos functionality and find working examples of KeyLines with Neo4j on our SDK site, or follow the tutorials on our blog.

Cambridge Intelligence was a Silver sponsor of GraphConnect New York.

Missed out on GraphConnect New York?
Check out videos from all of the sessions now posted on GraphConnect.com with more being posted every day!

Catch Up on GraphConnect

The post Going Meta: Exploring the Neo4j Graph Database…as a Graph appeared first on Neo4j Graph Database.

↧

Neo4j: The Power behind the Paradise Papers

November 10, 2017, 3:16 am

≫ Next: Analyzing the Paradise Papers with Neo4j: A Closer Look at Queries, Data Models & More

≪ Previous: Going Meta: Exploring the Neo4j Graph Database…as a Graph

Once again, the International Consortium of Investigative Journalists (ICIJ) has shaken the world with a far-reaching, in-depth investigation into the shadowy world of offshore finance: The Paradise Papers.

Discover how Neo4j graph technology has helped power the Paradise Papers investigation by the ICIJ

Using Neo4j, the ICIJ has built upon their Pulitzer Prize-winning investigation of 2016 – the Panama Papers – and they’ve begun to add politicians featured in early Paradise Paper reports to their Offshore Leaks Database.

The new 1.4 TB of data – 13.4 million documents – includes information leaked from trust company Asiaciti and from Appleby, a 100-year-old offshore law firm specializing in tax havens as well as information leaked. The files were obtained by German newspaper Süddeutsche Zeitung and shared with Washington D.C.-headquartered ICIJ, a network of independent reporting teams around the world.

As in previous investigations, Neo4j plays a key role in revealing the connections between the wealthy, their money and the taxation-friendly countries in which it resides.

The reason: Graph databases excel at managing highly connected data and complex queries.

Instead of using tables the way a relational database does, graphs use special structures incorporating nodes, properties and relationships to define and store data, making them highly proficient at analyzing the relationships and any interconnections between data — and allowing journalists to “follow the money” easier than ever.

The Paradise Papers investigation powered by Neo4j

Unprecedented Volumes of Highly Connected Data

Pierre Romera, chief technology officer of the ICIJ, told Business Insider: “Most of the leaks we get are not structured since they are raw documents.

“With the Paradise Papers, those documents represented 1.4 TB of data and were gathered from different sources. Putting them in a single database was a challenge for us. With Neo4j and [visualisation tool] Linkurious, and after a few weeks of research, we were able to propose to our 382 journalists a way to explore the data and also to share visualisations from stories they were working on. It’s surprising how intuitive a graph database can be for non-tech savvy people. Thanks to this approach, we could both investigate and prepare the future releases.”

According to Mar Cabra, the ICIJ’s Data and Research Unit Editor, using Neo4j was the only solution available to meet her requirements when they broke the Panama Papers investigation last year.

“It’s a revolutionary discovery tool that’s transformed our investigative journalism process,” she said, “because relationships are all important in telling you where the criminality and secrecy lies, who works with whom, and so on. Understanding relationships at huge scale is where graph techniques excel.

“At least 11.5m documents, and far larger than any data leaks we have investigated before, we needed a technology that could handle these unprecedented volumes of highly connected data quickly, easily and efficiently.

“We also needed an easy-to-use and intuitive solution that didn’t require the intervention of any data scientist or developers, so that journalists around the globe would work with the data, regardless of their technical abilities. Linkurious Enterprise was the best platform to explore this data and to share insights in a secure way. Using the Linkurious graph visualization platform with Neo4j is a powerful combination,” she added.

According to Neo4j Co-Founder and CEO, Emil Eifrem “Whatever else we can be sure of as the Paradise Papers’ investigation unfolds, it’s only with world-class tools like Neo4j and Linkurious that world-class investigation of vast and complex datasets like this can happen in our Age of Connections.

“Graph databases are the only option when trying to make sense of the vast terabytes of connected data that we are producing more and more of, and they are an essential tool for international agencies, governments, financial services and security firms trying to uncover the truth.”

Stay Tuned for More Coverage of the Paradise Papers

In the coming days and weeks, the Neo4j team will continue to unveil how graph technology powered the Paradise Papers investigation, including an in-depth look at the ICIJ data model with example queries, graph visualizations and more.

In the meantime, continue to follow the ICIJ’s Paradise Papers coverage exploring the political and economic dimensions of the investigation as they continue to unfold.

Learn more about how Neo4j powers fraud detection and AML solutions across the globe with this white paper: Fraud Detection: Discovering Connections with Graph Databases.

Read the White Paper

The post Neo4j: The Power behind the Paradise Papers appeared first on Neo4j Graph Database.

↧

Analyzing the Paradise Papers with Neo4j: A Closer Look at Queries, Data Models & More

November 13, 2017, 2:35 am

≫ Next: An In-Depth Graph Analysis of the Paradise Papers

≪ Previous: Neo4j: The Power behind the Paradise Papers

Our friends from the ICIJ (International Consortium of Investigative Journalists) just announced the Paradise Papers this past week, a new trove of leaked documents from the law firm Appleby and trust company Asiaciti.

Similar to the Panama Papers before (which we covered in-depth here), we’ve learned from those records that a large number of people and organizations use shell companies in tax havens and offshore jurisdictions to hide, move and spend huge amounts of money (billions to trillions of USD) without the necessary fiscal oversight.

In the last few days we saw a huge number of reports being published covering activities of companies like Nike, Apple, unsavory involvement by the Queen’s investment group, connections of Russian investments to politicians like Wilbur Ross or companies like Facebook and Twitter and many more.

The more than 13 million documents, emails and database records have been analysed using text analysis, full-text- and faceted-search and – most interesting to us – graph visualization and search.

From the Paradise Papers about section:

Special thanks go to the the Pulitzer Center on Crisis Reporting for supporting visual elements of the project, Neo4j and Linkurious for database support.

We are especially proud that Manuel Villa, whose position was sponsored by our “Connected Data Fellowship” was able to contribute to the research and data work.

As before, the leaked information was added to the already large body of data from previous leaks in a comprehensive Neo4j database that was available both to the data team as well as the investigative journalists.

The ICIJ published a fraction of the Paradise Papers data as part of their Power Players visualization at the same time as the reported stories. There are about 1000 nodes and 3000 relationships in this preliminary dataset.

We’re expecting the ICIJ to release larger parts of the dataset soon, and we will keep you updated with further findings.

Data Model

The data model the ICIJ used for the Panama Papers is quite straightforward. We’ve applied this same data model to the available Paradise Papers dataset. The model includes:

A company, trust or fund created in a low-tax, offshore jurisdiction by an agent (Entity)
People or companies who play a role in an offshore entity (Officer)
Addresses (Address)
Law-firms or middlemen (Intermediary) that asks an offshore service provider to create an offshore firm for a client.

Each of these carries different properties, including name, address, country, status, start and end date, validity information and more.

Relationships between the elements capture the roles people or other companies play in the offshore entities (often shell companies), we see many officer_of relationships for directors, shareholders, beneficiaries, etc.

Other relationships capture similar addresses or the responsibility of creating a shell company by a law firm (intermediary_of).

Until the data is officially published we used the information from the ICIJ’s website for some examples on the reported stories.

Here is the full graph of (currently) independent reports that will come together in the full dataset:

Learn how the Paradise Papers investigation used Neo4j for connected data queries and data models

This initial Neo4j graph database consists of information about 212 legal entities, 669 officers connected to those entities, which were established through 15 different intermediaries (often law firms used to incorporate the legal entities).

We can now use the data in Neo4j for interactive graph visualization, but also for full graph querying and later for the application graph algorithms (Part 2 of this blog series).

Basic statistics over the data tell us that we have these entities:

MATCH (n)
RETURN labels(n), count(*)
ORDER BY count(*) DESC

╒════════════════╤══════════╕
│"labels(n)"     │"count(*)"│
╞════════════════╪══════════╡
│["Officer"]     │675       │
├────────────────┼──────────┤
│["Entity"]      │212       │
├────────────────┼──────────┤
│["Address"]     │134       │
├────────────────┼──────────┤
│["Intermediary"]│15        │
├────────────────┼──────────┤
│["Other"]       │11        │
└────────────────┴──────────┘

The distribution of degrees for our entities shows a typical power law: there are some addresses which have 127 companies registered and some people who have almost 90 shell companies registered to their name.

Remember, this is a tiny fraction of the Paradise Papers data.

MATCH (n)
WITH labels(n) AS type, size( (n)--() ) AS degree
RETURN type,
       max(degree) AS max, round(avg(degree)) AS avg, round(stdev(degree)) AS stdev

╒════════════════╤═════╤═════╤═══════╕
│"type"          │"max"│"avg"│"stdev"│
╞════════════════╪═════╪═════╪═══════╡
│["Other"]       │14   │4    │4      │
├────────────────┼─────┼─────┼───────┤
│["Address"]     │127  │5    │15     │
├────────────────┼─────┼─────┼───────┤
│["Intermediary"]│44   │10   │14     │
├────────────────┼─────┼─────┼───────┤
│["Officer"]     │89   │4    │8      │
├────────────────┼─────┼─────┼───────┤
│["Entity"]      │112  │12   │13     │
└────────────────┴─────┴─────┴───────┘

These are our relationships, so you see it’s mostly officers and addresses for entities and officers.

MATCH ()-[r]->()
RETURN type(r), count(*)
ORDER BY count(*) DESC

╒════════════════════╤══════════╕
│"type(r)"           │"count(*)"│
╞════════════════════╪══════════╡
│"officer_of"        │2079      │
├────────────────────┼──────────┤
│"registered_address"│639       │
├────────────────────┼──────────┤
│"connected_to"      │86        │
├────────────────────┼──────────┤
│"intermediary_of"   │67        │
├────────────────────┼──────────┤
│"same_name_as"      │28        │
├────────────────────┼──────────┤
│"same_id_as"        │2         │
└────────────────────┴──────────┘

We can break down the officer_of relationship by link property, then we see this:

MATCH ()-[r:officer_of]->()
RETURN toLower(r.link), count(*)
ORDER BY count(*) DESC
LIMIT 10

╒═══════════════════════════╤══════════╕
│"toLower(r.link)"          │"count(*)"│
╞═══════════════════════════╪══════════╡
│"director"                 │661       │
├───────────────────────────┼──────────┤
│"is shareholder of"        │326       │
├───────────────────────────┼──────────┤
│"alternate director"       │238       │
├───────────────────────────┼──────────┤
│"secretary"                │209       │
├───────────────────────────┼──────────┤
│"appleby assigned attorney"│102       │
├───────────────────────────┼──────────┤
│"vice-president"           │65        │
├───────────────────────────┼──────────┤
│"auditor"                  │54        │
├───────────────────────────┼──────────┤
│"president"                │52        │
├───────────────────────────┼──────────┤
│"ultimate beneficial owner"│48        │
├───────────────────────────┼──────────┤
│"is signatory for"         │41        │
└───────────────────────────┴──────────┘

What is interesting here, is the number of “appleby assigned attorneys” and “ultimate beneficial owner” which are not visible in the fiscal records.

Jurisdictions

This version of the dataset contains limited address information, but we can check to see the distribution of addresses across countries:

MATCH (n:Address) WHERE exists(n.country)
RETURN n.country, count(*)
ORDER BY count(*) DESC
LIMIT 10

╒═══════════╤══════════╕
│"n.country"│"count(*)"│
╞═══════════╪══════════╡
│"US"       │23        │
├───────────┼──────────┤
│"BM"       │12        │
├───────────┼──────────┤
│"BR"       │7         │
├───────────┼──────────┤
│"GB"       │7         │
├───────────┼──────────┤
│"IM"       │6         │
├───────────┼──────────┤
│"KY"       │4         │
├───────────┼──────────┤
│"ID"       │4         │
├───────────┼──────────┤
│"JO"       │3         │
├───────────┼──────────┤
│"HK"       │3         │
├───────────┼──────────┤
│"BS"       │3         │
└───────────┴──────────┘

One important question we can ask is: What are the most popular offshore jurisdictions used by people in other countries?

For example, for people with addresses in the U.S., what are the most common offshore jurisdictions of Officer’s by country of address

// Most popular offshore jurisdictions for Officer's by country of address
MATCH (a:Address {country: "US"})--(o:Officer)--(e:Entity)
RETURN e.jurisdiction_description AS jurisdiction, COUNT(*) AS num
ORDER BY num DESC LIMIT 10

╒══════════════════════════╤═════╕
│"jurisdiction"            │"num"│
╞══════════════════════════╪═════╡
│"Bermuda"                 │94   │
├──────────────────────────┼─────┤
│"Cayman Islands"          │74   │
├──────────────────────────┼─────┤
│"United States of America"│3    │
├──────────────────────────┼─────┤
│"State of Delaware"       │1    │
└──────────────────────────┴─────┘

We can see that the most common jurisdictions of entities with connections to people with addresses in the U.S. are Bermuda and the Cayman Islands, which also confirms common knowledge and the reason why so few Americans were in the Panama Papers, since Panama is not a common offshore jurisdiction for Americans.

Of course keep in mind that this is only querying a subset of the data. In our next blog post we’ll look at analyzing the full dataset.

Wilbur Ross

The current U.S. Secretary of Commerce, Wilbur Ross, was revealed to have connections to offshore companies, as reported by the ICIJ earlier last week.

// What are the jurisdictions of Ross's connected entities?
MATCH (o:Officer)--(e:Entity)
WHERE o.name CONTAINS "Ross"
RETURN e.jurisdiction_description AS jurisdiction, COUNT(*) AS num
ORDER BY num DESC

╒═══════════════════╤═════╕
│"jurisdiction"     │"num"│
╞═══════════════════╪═════╡
│"Cayman Islands"   │16   │
├───────────────────┼─────┤
│"State of Delaware"│1    │
└───────────────────┴─────┘

We see that Ross is connected to 17 legal entities, 16 of which are registered in the Cayman Islands:

// Wilbur Ross’s connections in the Paradise Papers
MATCH (o:Officer)-->(e:Entity)-[:intermediary_of]-(i:Intermediary)
WHERE o.name CONTAINS "Ross"
MATCH (e)--(o2:Officer)
RETURN *

We could write a similar Cypher statement to query for Ross’s second-degree network, without specifying node labels or relationship types:

MATCH p=(o:Officer)-[*..2]-()
WHERE o.name CONTAINS "Ross"
RETURN p

Queen Elizabeth Investment House

The Duchy of Lancaster – Queen Elizabeth II’s private estate and portfolio – appears in the Paradise Papers dataset. An investment of several million dollars was made in a Cayman entity, Dover Street VI Cayman Fund by the Duchy. While the Queen’s estate regularly reports some domestic investments and holdings, offshore holdings, including the Dover Street VI Cayman Fund investment had not been previously reported.

We can write a Cypher query for all two-degree connections to the Duchy:

MATCH p=(o:Officer {name: "The Duchy of Lancaster"})-[*..2]-()
RETURN p

Paradise Papers using Neo4j to look at the Duchy of Lancaster data model

We can see a common pattern for the orchestration of offshore entities. If we expand two of the Officer nodes, “Robinson – Lynniece” and “Sutherland – Linda M” we see that they are serve as officers for many other offshore entities:

The Paradise Papers investigation by the ICIJ into the Duchy of Lancaster using Neo4j

These individuals serve as corporate services managers, where they may serve as officers for hundreds of offshore entities.

Looking Ahead

This post was an introduction to the Paradise Papers dataset, with some examples from the recently published investigations.

In our next post, we will explore how we can apply graph algorithms, virtual graph projections, more advanced querying and spatial analysis to the entire Paradise Papers dataset.

References

Want to learn more about what you can do with graph database technology? Click below to get your free copy of the O’Reilly Graph Databases book and discover how to harness the power of connected data.

Get My Free Copy

The post Analyzing the Paradise Papers with Neo4j: A Closer Look at Queries, Data Models & More appeared first on Neo4j Graph Database.

↧

An In-Depth Graph Analysis of the Paradise Papers

November 17, 2017, 1:47 pm

≫ Next: Welcome to the Graph Community, Amazon Neptune!

≪ Previous: Analyzing the Paradise Papers with Neo4j: A Closer Look at Queries, Data Models & More

Today, the ICIJ has publicly released data from its most recent year-long investigation into the offshore industry, known as the Paradise Papers. In the last few weeks since the ICIJ announced their investigation, we’ve seen many reports being published covering activities of companies like Nike, Apple, the Queen of England’s estate, connections of Russian investments to politicians like Wilbur Ross and companies like Facebook and Twitter.

More than 13 million leaked documents, emails and database records have been analyzed using text analysis, full-text- and faceted-search and most interestingly to us, graph visualization and graph-based search.

The International Consortium of Investigative Journalists (ICIJ) makes use of the Neo4j graph database internally to aid their investigations. As the ICIJ says on their website:

Graph databases are the best way to explore the relationships between these people and entities — it’s much more intuitive to use for this purpose than a SQL database or other types of NoSQL databases. In addition, graphs allow to understand these networks in a very intuitive way and easily discover connections.

The ICIJ has built a powerful search engine that sits atop Neo4j that allows for searching the Paradise Papers dataset and has made this available to the public as a web application. However, releasing the data as a Neo4j database enables much more powerful analysis of the data. Since Neo4j is an open source database this means that everyone has access to the same powerful tools for making sense of the data.

In a previous post we showed how graph analysis and Cypher – the query language for graphs – can be used to query the data to find connections in the Paradise Papers data. In this post, we show some techniques for querying and analyzing the data in Neo4j, including how we can create data visualizations to help up draw insight, and how we can use graph analysis to learn more about the offshore finance industry.

Graph Querying

For a more thorough overview of the data model and example queries, see our previous post here.

The Data Model

The Paradise Papers dataset uses the property graph data model to represent data about offshore legal entities, officers who may be beneficiaries or shareholders of the entities, and the intermediaries that acted to create the legal entities.

The nodes in the graph are the entities and relationships connect them. We also store key-value pair properties on both the nodes and relationships, such as names, addresses, and data provenance attributes.

Graph visualization is a powerful way to explore the data. For example, identifying highly connected clusters of nodes can be done by visually examining the graph.

Exploratory Queries

We can also perform aggregations when we query for tabular data. Let’s examine the overall size and shape of the Paradise Papers dataset.

How many nodes are there in the Paradise Papers dataset?

MATCH (n) RETURN labels(n) AS labels, COUNT(*) AS count ORDER BY count DESC
╒════════════════╤═══════╕
│"labels"        │"count"│
╞════════════════╪═══════╡
│["Officer"]     │77012  │
├────────────────┼───────┤
│["Address"]     │59228  │
├────────────────┼───────┤
│["Entity"]      │24957  │
├────────────────┼───────┤
│["Intermediary"]│2031   │
├────────────────┼───────┤
│["Other"]       │186    │
└────────────────┴───────┘

We can see that the data consists of information on over 84,000 officers (these are people or companies who play a role in an offshore company) with connections to almost 25,000 offshore legal entities, across 63,000 addresses. The addresses will become important to us later as we made use of location data.

We can also count the number of the different types of relationships in the dataset:

MATCH ()-[r]->() RETURN type(r), COUNT(*) ORDER BY COUNT(*) DESC
╒════════════════════╤══════════╕
│"type(r)"           │"COUNT(*)"│
╞════════════════════╪══════════╡
│"OFFICER_OF"        │221112    │
├────────────────────┼──────────┤
│"REGISTERED_ADDRESS"│128311    │
├────────────────────┼──────────┤
│"CONNECTED_TO"      │10552     │
├────────────────────┼──────────┤
│"INTERMEDIARY_OF"   │4063      │
├────────────────────┼──────────┤
│"SAME_NAME_AS"      │416       │
├────────────────────┼──────────┤
│"SAME_ID_AS"        │2         │
└────────────────────┴──────────┘

And compute degree distribution, to give us an idea of how connected different pieces of the graph are, on average:

MATCH (n) WITH labels(n) AS type, SIZE( (n)--() ) AS degree
RETURN type, MAX(degree) AS max, ROUND(AVG(degree)) AS avg, ROUND(STDEV(degree)) AS stdev
╒════════════════╤═════╤═════╤═══════╕
│"type"          │"max"│"avg"│"stdev"│
╞════════════════╪═════╪═════╪═══════╡
│["Other"]       │2891 │44   │236    │
├────────────────┼─────┼─────┼───────┤
│["Address"]     │9268 │2    │59     │
├────────────────┼─────┼─────┼───────┤
│["Intermediary"]│115  │5    │8      │
├────────────────┼─────┼─────┼───────┤
│["Officer"]     │2726 │4    │20     │
├────────────────┼─────┼─────┼───────┤
│["Entity"]      │312  │11   │13     │
└────────────────┴─────┴─────┴───────┘

The Shortest Path from the Queen of England to Rex Tillerson

One powerful feature of a graph database like Neo4j is the ability to query for paths of arbitrary length. This allows us to find connections between nodes when we don’t know what the connections are, or even the length of the path.

I was curious to see if there were any indirect connections between two public figures who appear in the Paradise Papers dataset: Rex Tillerson, the U.S. Secretary of State who had connections to a Bermuda-based oil and gas company with operations in Yemen, and the Queen of England, whose estate, it was reported, was an investor in a Bermuda-based company. We can easily query for such a path using Cypher:

MATCH p=shortestPath((rex:Officer)-[*]-(queen:Officer))
WHERE rex.name = "Tillerson - Rex" AND queen.name = "The Duchy of Lancaster"
RETURN p

This shows us a single shortest path connecting the Queen of England and Rex Tillerson. The path goes through several offshore entities and officers with connections to these entities. If we adjust our query slightly to include all shortest paths, we see that several of the officers in our path share connections with many legal entities.

MATCH p=allShortestPaths((rex:Officer)-[*]-(queen:Officer))
WHERE rex.name = "Tillerson - Rex" AND queen.name = "The Duchy of Lancaster"
RETURN p

A quick Google search reveals that these individuals are corporate services managers: individuals who are paid to serve as directors of offshore entities to handle the administration of these entities.

Graph Algorithms

Querying the data using Cypher is useful for exploring the graph and answering questions that we have, such as “What are all the offshore legal entities that Wilbur Ross is connected to?”. But what if we want to know who are the most influential nodes in the network? Or elements of the graph who have the highest transitive relevance?

We can easily run the PageRank centrality algorithm on the whole graph dataset using Cypher:

CALL algo.pageRank(null,null,{write:true,writeProperty:'pagerank_g'})

and then query for the Entity node with the highest PageRank score:

MATCH (e:Entity) WHERE exists(e.pagerank_g)
RETURN e.name AS entity, e.jurisdiction_description AS jurisdiction,
       e.pagerank_g AS pagerank ORDER BY pagerank DESC LIMIT 15

╒═════════════════════════════════════════════════╤════════════════╤══════════════════╕
│"entity"                                         │"jurisdiction"  │"pagerank"        │
╞═════════════════════════════════════════════════╪════════════════╪══════════════════╡
│"WORLDCARE LIMITED"                              │"Bermuda"       │18.110508499999998│
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Ferrous Resources Limited"                      │"Isle of Man"   │17.326935999999996│
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"American Contractors Insurance Group Ltd."      │"Bermuda"       │15.6201275        │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Gulf Keystone Petroleum Limited"                │"Bermuda"       │12.81925          │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Warburg Pincus (Bermuda) Private Equity X, L.P."│"Bermuda"       │12.312412         │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Madagascar Oil Limited"                         │"Bermuda"       │11.611646499999999│
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Coller International Partners IV-D, L.P."       │"Cayman Islands"│11.394854         │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Milestone Insurance Co., Ltd."                  │"Bermuda"       │11.224089         │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"CL Acquisition Holdings Limited"                │"Cayman Islands"│11.0752455        │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Alpha and Omega Semiconductor Limited"          │"Bermuda"       │10.965910000000001│
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤
│"Coller International Partners V-A, L.P."        │"Cayman Islands"│10.8205005        │
├─────────────────────────────────────────────────┼────────────────┼──────────────────┤

Geo Analysis

The registered addresses of many of the officers and legal entities are available in the Paradise Papers data. Using a service such as the Nominatim API or Google’s geocoding API we can perform a lookup to turn these address strings into latitude and longitude points.

Once we have geocoded these addresses, we can use geographic analysis to find more insights into the data. Neo4j has a JavaScript driver which makes it easy to build web applications that query Neo4j using Cypher.

One visualization tool we can use is a heatmap, where observations are represented as colors. More intense colors mean more addresses in that area. Examining a heatmap of Paradise Papers addresses shows a high concentration of addresses in the Atlantic, just off the coast of North America. Many of these addresses are in Bermuda, a known offshore jurisdiction.

Heatmap of Paradise Papers geocoded addresses. Try it live.

If we compare this heatmap with a heatmap of geocoded addresses from the Panama Papers dataset (an earlier leak investigated by ICIJ), we can see we have quite a different geographic distribution of addresses.

Instead of a large concentration in the Atlantic, we see a higher concentration in Asia and, to a lesser degree, Europe. The Panama Papers leak has a high number of addresses in Singapore and Kuala Lumpur.

Heatmap of Panama Papers geocoded addresses

Using the geocoded addresses, we can also interactively explore the Paradise Papers as a map. Clicking on an address marker of interest issues a Cypher query to find the Officer and Entity nodes connected to this address.

Exploring the ritzy suburbs of Las Vegas, we can see many addresses that show up in the Paradise Papers. In fact, we easily stumble upon the casino magnate Sheldon G. Adelson who it was revealed has a connection to a Bermuda company he uses to register his casino’s private jets, transferring tens of millions of dollars to a tax-free jurisdiction.

Annotated map of geocoded addresses in Paradise Papers showing registered address of Officer nodes and connected legal entities and jurisdictions. Try it live.

Entity Jurisdictions

When looking at the implications of the structure of the offshore finance industry, one of the questions investigative journalists try to answer is “Who are the enablers?” One aspect of finding enablers is to look at the jurisdictions that make the offshore industry possible.

One can theorize about historical, legal, and economic reasons why some jurisdictions may be chosen for citizens of certain countries, but data like the Paradise Papers are so important for gaining insight into the offshore finance industry because much of this world is so secretive. Next, we examine some of the jurisdiction information in the data.

MATCH (e:Entity)
WITH e.jurisdiction_description AS juris, COUNT(*) AS count
WHERE count > 20
RETURN *
ORDER BY count ASC

We can see that Bermuda and the Cayman Islands far outnumber the other jurisdictions. This makes sense given what we know about the main source of the data, which was a law firm with offices in Bermuda (and many other countries).

We can extend our analysis to begin to answer the question, “Are there certain jurisdictions that citizens of particular countries prefer?” or “What are the most popular offshore jurisdictions, by country of residence of the beneficiary or officer? We can begin to take a look at that answer by creating a bipartite graph of Officer country and entity jurisdiction. We can visualize this data in a chord diagram that shows us the relative distribution of flow through the bipartite graph.

MATCH (a:Address)--(o:Officer)--(e:Entity)
WITH a.countries AS officer_country, e.jurisdiction_description AS juris,
COUNT(*) AS num
RETURN * ORDER BY num DESC

This diagram shows us that the United States is by far the most popular country for Officer to give as their registered address. And of those officers with addresses in the US, Bermuda and Cayman Islands are the most popular offshore jurisdictions. This is not surprising as we saw earlier that those two jurisdictions are by far the most popular in the dataset.

What Can You Find?

This was an overview of the now-public Paradise Papers dataset released by ICIJ. ICIJ has released the leaked data packaged as a Neo4j database to enable everyone to use the same open source software they use for making sense of the complex web of the offshore finance industry.

You can find the Paradise Papers dataset available on the Neo4j Sandbox and soon available for download as a Neo4j database on the ICIJ website. We encourage you to explore the data and see what insights you can find about the offshore finance industry.

As you explore the data, be sure to check out some of the great resources for learning Cypher and graph databases. And if you like the work that the ICIJ is doing, remember that they are an independent media organization and rely on your generous donations to operate.

You can find the code for generating all visualizations in this post on GitHub.

Want to learn more about what you can do with graph technology? Click below to get your free copy of the O’Reilly Graph Databases book and discover how to harness the power of connected data.

Download My Free Copy

Editor’s note: ICIJ has published this data with the following note: “There are legitimate uses for offshore companies and trusts. We do not intend to suggest or imply that any people, companies or other entities included in the ICIJ Offshore Leaks Database have broken the law or otherwise acted improperly. Many people and entities have the same or similar names. We suggest you confirm the identities of any individuals or entities located in the database based on addresses or other identifiable information.”

The post An In-Depth Graph Analysis of the Paradise Papers appeared first on Neo4j Graph Database.

↧

Welcome to the Graph Community, Amazon Neptune!

December 1, 2017, 12:00 am

≫ Next: The ICIJ Releases Neo4j Desktop Download of Paradise Papers

≪ Previous: An In-Depth Graph Analysis of the Paradise Papers

This post originally appeared on Emil Eifrem’s CEO blog on 1 December 2017.

Graph technology has come a long way, and today the transformative nature of graphs is publicly visible through examples such as financial fraud detection in Panama and Paradise papers, contextual search and retrieval of historical information in NASA’s knowledge graph and the use of conversational ecommerce in eBay’s Shopbot.

The Paradise Papers powered by Neo4j and the ICIJ

The Growing Graph Database Space

When I look back to a decade ago, it was just us and some hobbyists in the graph technology space until five years later when other graph database startups started to emerge. And since then we’ve watched the graph space grow as mega-vendors like Oracle, Microsoft, SAP and IBM introduced graph products of their own.

I always found Amazon’s absence from this list ironic given that their business models in both ecommerce and data centers on tap are graph-based disruptions. Amazon Neptune signals the arrival of graph database technology into mainstream ecosystems, both in the cloud and on prem. As evangelists of the graph database category, we helped pioneer, establish and propel this space, and we are both proud and elated to see it transform and grow this way.

Learn more about how Neo4j welcomes the entrance of Amazon Neptune into the graph community

Amazon’s entry into the graph database market adds to an increasingly large palette of choices for end users, and I believe it is part of a rising tide to lift all boats. As with all markets, more competition and more choices will lead to a stronger market and better products. Ultimately, the end users of graph technology will benefit.

The Game Is Only Beginning

Now that all of the major database players have jumped on the graph database bandwagon, you might rightly ask, what’s next?

It’s to me clear that the game is only beginning. Part of this is obvious: While we at Neo4j have invested over a decade in our native graph database, many of today’s offerings are brand new, or (like Neptune) not yet GA.

More broadly, it’s clear that we’re still just scratching the surface of what a graph-powered solution is. While a database like Neo4j or Amazon Neptune is a foundational element of a graph technology stack, integration with different types of data sources, comprehensive graph analytics, easy-to-use graph visualization tools and purpose-built graph-based applications will be essential for broadscale adoption.

The Neo4j Graph Platform announced in my GraphConnect New York keynote describes our own efforts to chart the next decade of evolution in the graph technology space by offering a graph platform.

The Neo4j Graph Platform

That’s the first piece.

Coalescing around a Query Language

Second, to achieve adoption that mirrors that of the venerable RDBMS, we also need a standard graph query language analogous to SQL that is simple as well as easy to learn and implement. As more users learn about graph technology and as more tools and vendors enter the graph space, we’re at a time when a shared, declarative graph query language – agnostic of vendor or platform – will be a massive benefit to both vendors and users.

After trying nearly every other approach, I continue to believe that Cypher is this standard. Why? Because in addition to years of real-world validation, it has by far the widest adoption among actual graph end users.

As a quick data point, consider an issue count on Stackoverflow comparing “Cypher” (17,000+) to “Gremlin” (3,300) or “Tinkerpop” (1,200). I believe 80+% of all graph applications today use Cypher. Nothing else comes close.

So our bet is on Cypher as the SQL for graphs. And it’s our strong belief that an open language will lead to the best result for end users: so much so that in 2015 we broke Cypher out of Neo4j and donated it to the openCypher project, whose governance model is open to the community.

The openCypher project makes Cypher available to any technology as the easy, standard graph query language. So far – besides Neo4j – databases like SAP HANA, Redis Graph and AgensGraph, among others, have standardized on Cypher, and more are in the works. This is an area where we’d love to work together with Amazon Neptune, to make sure that their users can leverage the most popular property graph query language on the planet.

Another major donation we recently made was an early version of the Cypher for Apache Spark

language toolkit to the openCypher project. This will enable big data analysts (or any Spark user) to materialize graph data from their data lakes, incorporate graph querying into their workflows, and make it easier to bring graph algorithms to bear throughout their enterprise data investments, dramatically broadening how they reveal connections in their data.

The Graph Community Is Growing

Last but not least, as a graph community, we need to continue to address the fact that demand to adopt the graph paradigm is growing faster than expertise, yielding a skills shortage.

Over the years an amazing community has grown around Neo4j. It now boasts some 55,000 meetup members across 109 meetup groups. Last year, the community organized and attended more than 400 events about Neo4j. (As a side note: that is a staggering number! Think about it: almost two events per working day!).

The broader graph community needs to build upon this momentum to make sure that every developer, data scientist and data architect is skilled in graph technology. With the entry of larger players like Microsoft and Amazon, I feel confident that we (the community) will continue to develop graph skills necessary for large-scale adoption of this paradigm.

At Neo4j we have a single focus: graphs. To date, we have made the industry’s largest dedicated investment in graph technology: resulting in more than ten million downloads, a huge developer community deploying groundbreaking graph applications around the globe and more than 250 commercial customers, including global enterprises like Walmart, Comcast, Cisco, eBay, and UBS. However our work as a company, a community, and as a movement has only begun.

This year has been a massive year for graphs. We are excited to see Amazon Neptune join the graph community, and we look forward to growing the space together with them and with you, connecting one node at the time.

–Emil

The post Welcome to the Graph Community, Amazon Neptune! appeared first on Neo4j Graph Database.

↧

The ICIJ Releases Neo4j Desktop Download of Paradise Papers

December 1, 2017, 1:34 pm

≫ Next: Beta Release: Java Driver with Async API for Neo4j

≪ Previous: Welcome to the Graph Community, Amazon Neptune!

Early this morning (1 Dec.), the Pulitzer Prize-winning International Consortium for Investigative Journalists (ICIJ) released an ICIJ version of Neo4j Desktop which includes the Paradise Papers and the other Offshore Leaks graph data.

This desktop package – available for Windows, Mac and Linux – includes the Neo4j Graph Database, the Neo4j Browser tool with several interactive guides explaining the dataset and providing investigative Cypher queries to run against the data.

If you’re not a developer, you can use the Paradise Papers Explorer graph application to examine the data by just searching the full dataset for shell companies and their officers without learning a new query language.

Using the Paradise Papers Explorer

The Paradise Papers Explorer app built on the Neo4j graph database

The interactive guide features our new Neo4j graph algorithms library and shows you how to perform PageRank on the graph of officers in the Paradise Papers.

Screencast: Neo4j Desktop for ICIJ

The Neo4j Graph Database – with its powerful capabilities for storing and querying connections – has been popular as a data journalism tool.

Now, the ICIJ demonstrates the other capabilities of the Neo4j Graph Platform, including graph visualization, graph analytics and other apps. These features can be used to make it easier for journalists, researchers and everyone else to explore and understand complex data relationships.

Thanks to the excellent work of the ICIJ team – including the Neo4j Connected Data Fellow Manuel Villa – for their great story highlighting the capability of graph technology with this release. The ICIJ data engineer Miguel Fiandor Gutiérrez thankfully led the effort to make the data available in its raw format and as the Neo4j Desktop package.

If you’re a data journalist, apply to our Data Journalism Accelerator Program if you think graphs can help turn your data into knowledge graphs.

The post The ICIJ Releases Neo4j Desktop Download of Paradise Papers appeared first on Neo4j Graph Database.

↧

Beta Release: Java Driver with Async API for Neo4j

December 8, 2017, 3:06 am

≫ Next: Announcing the Neo4j GraphTour in 8+ Cities Worldwide

≪ Previous: The ICIJ Releases Neo4j Desktop Download of Paradise Papers

Learn more about the most recent beta release of the Java driver 1.5.0 for the Neo4j graph database

Introduction

In this article I would like to introduce the new 1.5.0-beta03 pre-release version of the Bolt Java driver for Neo4j which is now built on an asynchronous, Netty-based infrastructure.

Previous versions of the driver used blocking I/O, which meant that the amount of threads needed to handle N concurrent connections was also N. With new non-blocking I/O, the amount of threads can be significantly reduced because one thread can handle multiple network connections in an async fashion. This functionality is exposed through a new set of asynchronous methods, which allow queries and transactions to be executed without blocking.

Asynchronous processing of results is especially valuable in environments where code should block as little as possible, like Akka actors or Spring Data reactive repositories.

One important thing to note is that starting from 1.5, the Neo4j Java driver will require Java 8. The decision to increment required Java versions was made in order to use the existing async programming APIs and interfaces, like CompletionStage and CompletableFuture present only starting from Java 8. They are now used in async API calls, like Session#runAsync(), Transaction#runAsync(), Session#readTransactionAsync(), etc. The previous driver version 1.4 still requires Java 7 and will remain maintained.

Async API

This section describes the new async APIs present in the 1.5.0-beta03 Java driver version. It does not discuss blocking API counterparts, please refer to the Neo4j Developer Manual for more details. The blocking API has been re-implemented on top of the async API and so shares the underlying infrastructure.

Driver Initialization

The main entry point of the driver API remains unchanged, it is the GraphDatabase class and can be used to create a driver like this:

import org.neo4j.driver.v1.AuthTokens;
import org.neo4j.driver.v1.Driver;
import org.neo4j.driver.v1.GraphDatabase;

Driver driver = GraphDatabase.driver("bolt://localhost",
                                     AuthTokens.basic("neo4j", "test"));

Driver is a thread-safe, application-wide object from which all Neo4j interactions derive.

Sessions

The Driver instance should be used to obtain Session instances which allow for running queries and creating transactions.

Session is a client-side abstraction for logically grouping one or more units of work. It is designed for single-threaded use and may be backed by a TCP connection when executing requested operations. In the routing driver, created for a bolt+routing URI, all transactions within a single session will be implicitly connected with bookmarks.

See the causal chaining section of the Neo4j Developer Manual for more information:

New sessions can be created like this:

import org.neo4j.driver.v1.Session;

Session session = driver.session();

Using the session we can run our first async query:

import org.neo4j.driver.v1.StatementResultCursor;

CompletionStage<StatementResultCursor> cursorStage =
      session.runAsync("UNWIND range(1, 10) AS x RETURN x");

cursorStage.thenCompose(StatementResultCursor::listAsync)
                    .whenComplete((records, error) -> {
                           if (records != null) System.out.println( records );
                           else error.printStackTrace();
                           session.closeAsync();
                    });

Invocation of Session#runAsync() returns a CompletionStage of StatementResultCursor, which is the main abstraction for consuming query results, received asynchronously from the database. In this example, all results are eagerly fetched in a list which is later printed.

This might require a lot of heap memory, depending on the size of the result. Large ones can benefit from incremental consumption using #forEachAsync() and #nextAsync(). Created session objects should be explicitly closed at the end of the chain.

The Session#runAsync() method has various overloads that accept query parameters, records and org.neo4j.driver.v1.Statement objects for convenience.

It is possible to safely retrieve a single record from cursor, while asserting that only single record is returned:

import org.neo4j.driver.v1.StatementResultCursor;

CompletionStage<StatementResultCursor> cursorStage = session.runAsync("MATCH (n) RETURN n LIMIT 1");

cursorStage.thenCompose(StatementResultCursor::singleAsync)
                   .thenApply(record -> record.get( 0 ).asNode())
                   .thenApply(Node::labels)
                   .exceptionally(error -> {
                       error.printStackTrace();
                       return emptyList();
                   })
                   .thenAccept(labels -> System.out.println(labels))
                   .thenCompose(ignore -> session.closeAsync());

This code prints all labels of the fetched node. It also explicitly handles errors (database unavailable, network error, no nodes were fetched, …) by printing the stacktrace and returning an empty list of nodes instead.

Sometimes it might be required to consume result records one by one or as a stream. StatementResultCursor allows this using two methods:

CompletionStage<Record> nextAsync() – returns stage completed with next records in the result stream or null when end of stream has been reached. Stage can also be completed exceptionally when query fails.
CompletionStage<ResultSummary> forEachAsync(Consumer<Record> action) – returns stage completed with summary and applies supplied action to every record of the result stream.

Method #forEachAsync() can be used to convert StatementResultCursor to an rx.Observable from RxJava 1.x library. A naïve example using rx.subject.PublishSubject would be:

import rx.Observable;
import rx.subjects.PublishSubject;

Observable<Record> fetchRecords(Session session, String query) {
    PublishSubject<Record> subject = PublishSubject.create();
    session.runAsync(query)
                .thenCompose(cursor -> cursor.forEachAsync(subject::onNext))
                .whenComplete((summary, error) -> {
                    if (error != null) {
                        subject.onError( error );
                    } else {
                        System.out.println( summary );
                        subject.onCompleted();
                    }
                });
    return subject;
}

Observable<Record> recordsObservable = fetchRecords(session, "MATCH (n:Person) RETURN n");
recordsObservable.subscribe(
    record -> System.out.println(record),
    error -> error.printStackTrace(),
    () -> System.out.println("Query completed")
);

All incoming records are consumed using #forEachAsync() and pushed to a PublishSubject, so that its subscribers can access every record.

Transactions

Sessions not only allow running standalone queries but also running queries within explicit transactions. Callers have control over beginning transactions, executing Cypher queries and committing or rolling them back.

It is recommended to use the Transaction Function API, as detailed in the Neo4j Developer Manual, over explicit transactions. This is true for both the blocking and async API.

In this section we’ll take a look at Async Transaction Functions:

Two main entry points with Async Transaction Functions are:

Session#readTransactionAsync(TransactionWork<CompletionStage<T>>)
Session#writeTransactionAsync(TransactionWork<CompletionStage<T>>)

These allow the execution of read/write transactions denoted by given TransactionWork objects in asynchronous fashion.

A simple write transaction that creates a node might look like:

session.writeTransactionAsync(tx ->
  tx.runAsync("CREATE (n:Person) RETURN n")
     .thenCompose(StatementResultCursor::singleAsync)
).whenComplete((record, error) -> {
    if (error != null) error.printStackTrace();
    else System.out.println(record);
    session.closeAsync();
});

It creates a single Person node in a write transaction and prints the resulting record. Transactions allow execution of queries in an async fashion via various overloads of Transaction#runAsync() and return the same StatementResultCursor as Session#runAsync(), described above. Transaction will automatically commit when given TransactionWork succeeds and will roll back when it fails.

A read transaction consisting of a single statement might look like this:

session.readTransactionAsync(tx ->
    tx.runAsync("MATCH (n:Person) RETURN n")
      .thenCompose(cursor -> cursor.forEachAsync(System.out::println))
).whenComplete((ignore, error) -> {
    if ( error != null ) error.printStackTrace();
    session.closeAsync();
});

In this example above, all records are consumed and printed within the body of a transaction. It will be automatically committed or rolled back afterwards.

Driver Termination

Used Driver instances hold on to all established network connections and should be explicitly closed when your application is shutting down or finished interacting with Neo4j. Not doing so might result in file descriptor leaks or prevent an application from exiting. The driver can be closed like this:

driver.close();

The close operation terminates all network connections and I/O threads. It is a blocking operation and returns when all resources are terminated.

Use with Maven

The new Java driver release is now available in this Maven Central repository and can be included in a Maven project using this dependency definition:

<dependency>
    <groupId>org.neo4j.driver</groupId>
    <artifactId>neo4j-java-driver</artifactId>
    <version>1.5.0-beta03</version>
</dependency>

The driver has a compile time dependency on Netty but it’s shaded into the final driver artifact so there should be no version dependency conflicts.

Other Notable Changes

This Java driver release also adds a couple of new features, apart from the async API. Most prominent are:

A new load-balancing strategy for Causal Clustering uses a least-connected strategy instead of round-robin, which might result in better performance and less degradation when some cluster members perform poorly due to network or other similar issues.
Improved connection pooling: The Java driver now allows setting a limit on the amount of connections in the pool per server address via Config.build().withMaxConnectionPoolSize(25) and connection acquisition timeout via Config.build().withConnectionAcquisitionTimeout(10, TimeUnit.SECONDS)
Maximum connection lifetime: The Java driver allows for the limiting lifetime of a connection, which can be configured using Config.build().withMaxConnectionLifetime(1, TimeUnit.HOURS)

Conclusion

The Bolt Java driver 1.5 is a rather large release with a lot of new functionality. The new asynchronous API is the most involved part of this and allows users of the driver to interact with it in a different way. It also introduces access to new, richer Java 8 API, such as CompletionStage.

At this point, community input about the new async API would be immensely helpful and would allow us to fine tune the API designs and provide as much value to async code bases as possible.

The driver described here is a pre-release version and should not be used in production. Click here for the most stable version of the Neo4j Java driver.

Level up your skills with graph databases and Neo4j: Click below to register for our free online training class, Introduction to Graph Databases and master the world of graph technology in no time.

Sign Me Up

The post Beta Release: Java Driver with Async API for Neo4j appeared first on Neo4j Graph Database Platform.

↧

Announcing the Neo4j GraphTour in 8+ Cities Worldwide

December 12, 2017, 1:28 am

≫ Next: Forrester Research: Graph Databases Vendor Landscape [Free Report]

≪ Previous: Beta Release: Java Driver with Async API for Neo4j

Learn more about the global Neo4j GraphTour, including which cities and customers will be presenting

This spring, Neo4j is coming to a city near you to deliver a full day of sessions on how graph database technology is revolutionising the modern enterprise.

The GraphTour Basics

This one-day local event will turn you into a graph expert — no matter your technical background or familiarity with graph technology, featuring speakers from the Neo4j team as well as local customers and other members of the Neo4j community. And the best part: it’s free to register.

At each GraphTour stop you’ll hear first-hand about the advantages of Neo4j’s native Graph Platform, which includes not only the Neo4j graph database, but also graph analytics and data integration.

GraphTour Presenters

Speaking at every stop of the GraphTour will be Neo4j Chief Scientist Jim Webber diving deep into why native graph technology matters now more than ever.

Other special guest speakers appearing on the tour include Neo4j CEO & Co-Founder Emil Eifrem, as well as Philip Rathle, Michael Hunger, Mark Needham, Andreas Kollegger, Kurt Freytag, Alastair Green, Jesús Barrasa, Benoît Simard and Kees Vegter.

But you won’t just be taking our word for it: At each stop on the GraphTour, local enterprise customers will be showcasing how Neo4j improves their operations, ensures legal compliance, increases revenue, catches fraudsters and contributes to bottom-line growth. Check back frequently for the schedule on Neo4j.com/GraphTour to see which Neo4j customers will be presenting in your city. (We’ll update it whenever new customers are added.)

Sessions in all locations will be presented in English with the possible exception of local customers. Advice at our GraphClinics may also be in the local language.

GraphClinics & Neo4j Solutions

At every stop on the GraphTour, Neo4j engineers, consultants and experts will be on hand for (free!) one-on-one help with your project. Ask them for advice on graph data modelling, Cypher queries and other troubleshooting questions you might have – they’re happy to help.

Also featured at the GraphTour will be our Solutions team, available to advise you on how to build your end-to-end enterprise solutions for anything from fraud detection to real-time recommendations. If you’re looking to leverage connected data at your enterprise, this is the team to talk to!

The GraphTour Itinerary (So Far)

For an always up-to-date schedule of GraphTour stops and locations, visit Neo4j.com/GraphTour. Here’s all of the current cities on the itinerary so far (U.S. cities to be announced soon):

Tel Aviv: Tuesday, 13 February 2018
Madrid: Thursday, 15 February 2018
Berlin: Tuesday, 27 February 2018
London: Thursday, 1 March 2018
Paris: Tuesday, 6 March 2018
Stockholm: Thursday, 8 March 2018
Amsterdam: Wednesday, 21 March 2018
Milan: Wednesday, 11 April 2018
United States: Locations and dates to be announced soon

Special Thanks to Our GraphTour Sponsors

Our partners and sponsors with local expertise will join us at relevant stages of the GraphTour. We are proud to announce that PRODYNA AG will be joining us in Berlin, London and Amsterdam, while Graph Everywhere will support us in Madrid.

Check back here as more partners are announced.

See You on the Neo4j GraphTour!

There’s a relationship-rich community waiting for you on the Neo4j GraphTour, and we hope you’ll join us in a city near you, no matter which side of the Atlantic you’re on (U.S. dates still TBD). There won’t be a GraphConnect Europe in 2018, so don’t miss your chance to see Neo4j on the road this spring!

Space is limited at all stops along the Neo4j GraphTour – click below to register for a GraphTour event in a city near you and get your FREE ticket today.

Get My Free Ticket

The post Announcing the Neo4j GraphTour in 8+ Cities Worldwide appeared first on Neo4j Graph Database Platform.

↧

Forrester Research: Graph Databases Vendor Landscape [Free Report]

December 18, 2017, 2:56 am

≫ Next: Data Profiling: A Holistic View of Data using Neo4j

≪ Previous: Announcing the Neo4j GraphTour in 8+ Cities Worldwide

Learn from Forrester Research on the state of the graph database technology vendor landscape

In 2015, analyst firm Forrester Research published a vendor landscape report on the state of graph databases. It included a few graph technology vendors, several graph use cases and described Neo4j as the “most popular graph database.” Since then, graph database technology has come a long way.

Now, Forrester has reissued their graph databases vendor landscape report with a greater number of vendors, an explosion of new graph use cases and the analysis that “Neo4j continues to dominate the graph database market.”

Connected Data Is Creating New Business Opportunities

Here’s a preview of what’s included in this newest vendor landscape report by Noel Yuhanna:

It’s all about connected data! Connecting data helps companies answer complex questions, such as “Is David’s credit card purchase a fraud, given his buying patterns, the type of product that he is buying, the time and location of the purchase, and his likes and dislikes?” or “From the thousands of products, what is Jane likely to buy next given her buying behavior, products she has reviewed, her purchasing power, and other influencing factors?”

Developers could write Java, Python, or even SQL code to get answers to such complex questions, but that would take hours or days to program and in some cases might be impractical. What if business users want answers to such ad hoc questions quickly, with no time for custom code or with no access to the technical expertise needed to write those programs?

While organizations have been leveraging connections in data for decades, the need for rapid answers amid radical changes in data volume, diversity, and distribution has driven enterprise architects to look for new approaches.

That approach is to use graph database technology to leverage connected data for a sustainable competitive advantage.

You Don’t Have to Take Our Word for It

Throughout this detailed analyst report, Yuhanna gives you example after example of how today’s leading enterprises are using graph technology to transform their industries and disrupt the competition. You will walk away from this report with well-formed ideas and plans on how to apply graph-powered solutions to your industry and circumstances.

While we believe the Neo4j native graph database is the market leader, you don’t have to take our word for it – you’ll get side-by-side comparisons of the various strengths and trade-offs of today’s leading graph database vendors so that you can decide which technology is best fit for your organization and use case. We believe the choice will be obvious.

I highly encourage you to download this limited-time offer for a free copy of the Forrester Research report Vendor Landscape: Graph Databases: Leverage Graph Databases To Succeed With Connected Data by clicking below.

Click below to get your free copy of Vendor Landscape: Graph Databases from Forrester Research – this analyst report will only be available for a limited time:

Get My Free Report

The post Forrester Research: Graph Databases Vendor Landscape [Free Report] appeared first on Neo4j Graph Database Platform.

↧

Data Profiling: A Holistic View of Data using Neo4j

January 3, 2018, 2:43 am

≫ Next: Retail & Neo4j: Customer Experience Personalization

≪ Previous: Forrester Research: Graph Databases Vendor Landscape [Free Report]

Summary

Data profiling is a widely used methodology in the relational database world to analyse the structure, contents and metadata of a data source. Generally, data profiling consists a series of jobs executed upon the data source to collect statistics and produce informative summaries about the underlying data.

As a general rule, data evolves along the time. After some years, the actual data stored and used in a database may vary significantly from what people think it is, or what the database was designed for at the beginning. Data profiling helps not only to understand anomalies and assess data quality, but also to discover, register and assess enterprise metadata.

The Neo4j graph database is the best for analysing connected, high-volume and variably structured data assets, which makes data profiling more critical as it would help us obtain better understanding of the data, identifying hidden patterns more easily, and potentially improving query performance.

This article will share practical data profiling techniques using the Cypher graph query language.

The following are system requirements:

Neo4j Graph Database version 3.2.x, either Community or Enterprise Edition, Linux or Windows (I use Windows 10)
Internet browser to access the Neo4j Browser (I use Chrome)
A graph database. Here I imported data from the Stack Overflow Questions dataset, which contains more than 31 million nodes, 77 million relationships and 260 million properties. The total database size on Windows 10 is about 20 GB.

All of the Cypher scripts and outcomes – mostly screenshots – are executed inside the Neo4j Browser, unless specified otherwise.

1. Database Schema Analysis

Database schema analysis is usually the first step of data profiling. The simple purpose of it is to know what the data model looks like and what objects are available.

Note: for most of the scripts used in this section, you can find them in Neo4j Browser, under the menu Favorites > Data Profiling.

1.1 Show the graph data model (the meta model)

Cypher script:

// Show what is related, and how (the meta graph model)
CALL db.schema()

Outcome

Description

The Stack Overflow database has three nodes:

User
Post
Tag

And it has four relationships:

User POSTED Post
Post HAS_TAG Tag
Post is PARENT_OF Post
Post ANSWER another Post

1.2 Show existing constraints and indexes

Cypher script:

// Display constraints and indexes
:schema

Outcome

Indexes

   ON :Post(answers) ONLINE
   ON :Post(createdAt) ONLINE
   ON :Post(favorites) ONLINE
   ON :Post(score) ONLINE
… …

Constraints

   ON ( post:Post ) ASSERT post.postId IS UNIQUE
   ON ( tag:Tag ) ASSERT tag.tagId IS UNIQUE
   ON ( user:User ) ASSERT user.userId IS UNIQUE

Description

Indexes tell us the properties that will have the best query performance when used for matching.

Constraints tell us the properties that are unique and can be used to identify a node or relationship.

1.3 Show all relationship types

Cypher script:

// List relationship types
CALL db.relationshipTypes()

Outcome

Show data relationships types using Cypher in Neo4j

Description

A list of available relationship types.

1.4 Show all node labels / types

Cypher script:

// List node labels
CALL db.labels()

Outcome

Show node labels and types using Cypher in Neo4j

Description

A list of available node labels / types.

1.5 Count all nodes

Cypher script:

// Count all nodes
MATCH (n) RETURN count(n)

Outcome

Count all nodes in a Neo4j database using Cypher

Description

It only takes 1ms for Neo4j to count 31 million plus nodes.

1.6 Count all relationships

Cypher script:

// Count all relationships
MATCH ()-[r]->() RETURN count(*)

Outcome

Count all data relationships in Neo4j using the Cypher query language

Description

Again, it only takes 1 ms for Neo4j to return the total number of relationships.

1.7 Show data storage sizes

Cypher script:

// Data storage sizes
:sysinfo

Outcome

Show data storage sizes in a Neo4j graph database

1.8 Sample data

Cypher script:

// What kind of nodes exist
// Sample some nodes, reporting on property and relationship counts per node.
MATCH (n) WHERE rand() <= 0.1
RETURN
DISTINCT labels(n),
count(*) AS SampleSize,
avg(size(keys(n))) as Avg_PropertyCount,
min(size(keys(n))) as Min_PropertyCount,
max(size(keys(n))) as Max_PropertyCount,
avg(size( (n)-[]-() ) ) as Avg_RelationshipCount,
min(size( (n)-[]-() ) ) as Min_RelationshipCount,
max(size( (n)-[]-() ) ) as Max_RelationshipCount

Outcome

Data profiling for a statistical sample size using Cypher in Neo4j

You may have noticed the first line of the script – MATCH (n) WHERE rand() <= 0.1 effectively chooses 10% (0.1) of the total nodes for sampling. Changing this value would change the sample size (e.g., using 0.01 uses 1%).

2. Node Analysis

Node analysis is more or less similar to table and column analysis for the profiling of a relational database (RDBMS). The purpose of node analysis is to reveal facts about nodes, as well as properties of nodes.

2.1 Count nodes by their labels / types

Cypher script:

// List all node types and counts
MATCH (n) RETURN labels(n) AS NodeType, count(n) AS NumberOfNodes;

Outcome

Count nodes by label or type in Neo4j using the Cypher query language

Description

Node counting gives a clearer idea of the volume of each type of the node in a database.

2.2 Property Analysis

2.2.1 List all properties of a node

Cypher script:

// List all properties of a node
MATCH (u:User) RETURN keys(u) LIMIT 1

Outcome

2.2.2 List all properties of a relationship

Cypher script:

// List all properties of a relationship
MATCH ()-[t:POSTED]-() RETURN keys(t) LIMIT 1

Outcome

There is no property for the relationship.

2.2.3 Uniqueness of the property

Cypher script:

// Calculate uniqueness of a property
MATCH (u:User)
RETURN count(DISTINCT u.name) AS DistinctName,
       count(u.name) AS TotalUser,
       100*count(DISTINCT u.name)/count(u.name) AS Uniqueness;

Outcome

Calculate the uniqueness of a property using the Cypher query language

Description

It seems 78% of the user names are unique. A property having unique values can be a good candidate as the ID.

2.2.4 Nullability of the property

Cypher script:

// Calculate nullability of a property
MATCH (u:User) WHERE u.name IS null RETURN count(u);

Outcome

Calculate the nullability of a property in the Neo4j graph database

Description

There is no empty value for property Name of node User.

2.2.5 Min, Max, Average and Standard Deviation of the Values of a property

Cypher script:

// Calculate min, max, average and standard deviation of the values of a property
MATCH (p:Post)
RETURN min(p.favorites) AS MinFavorites,
       max(p.favorites) AS MaxFavorites,
       avg(p.favorites) AS AvgFavorites,
       stDev(p.favorites) AS StdFavorites;

Outcome

How to calculate the min, max, average and standard deviation of a property using Cypher in Neo4j

2.2.6 Occurrences of values of a property

Cypher script:

// Find out most often used values for a property
MATCH (p:Post)
RETURN p.answers AS Answers, count(p.answers) AS CountOfAnswers
ORDER BY Answers ASC;

Outcome

Calculate the occurences of values of a property in Neo4j

Description

From the results, there are 1.17 million posts that have 0 answers, 4.66 million have 1 answer, and so on.

2.3 Node Rank (Centrality)

2.3.1 Importance of a user

Cypher script:

// Calculate node rank / Centrality of a node
// i.e., the relevance of a node by counting the edges from other nodes:
// in-degree, out-degree and total degree.

MATCH (u:User)
WITH u,size( (u)-[:POSTED]->()) AS OutDepth, size( (u)<-[:POSTED]-()) AS InDepth
ORDER BY OutDepth, InDepth
WHERE u.name STARTS WITH 'T'
RETURN u.name, min(OutDepth), max(OutDepth), min(InDepth), max(InDepth)

Outcome

Calculate node rank / centrality of a node in Neo4j

… … … …

Description

For a user, the max(OutDepth) represents the max number of posts he/she has submit. When max(InDepth) is 0, it means there is no relationship ending at the User node.

The user who has the most OutDepth per post can be considered to be more important within the community.

Note: as this is a heavy query, make sure there is enough heap size (specified by dbms.memory.heap.max_size in the neo4j.conf file). Alternatively, use a filter to limit the scope of the query as shown in the sample, which only looks for users whose name starts with "T".

2.3.2 Importance of a post

By looking at which post has the most number of answers, we can tell the importance and/or received attention of the post.

2.4 Orphan Nodes

Cypher script:

// orphans: node has no relationship
match (u:User)
with u,size( (u)-[:POSTED]->()) as posts where posts = 0
return u.name, posts;

Outcome

… … … …

Description

These are users who have never submitted any post or answer.

3. Relationship Analysis

Relationship analysis focuses on relationships in a graph database. It can help us understand the completeness, integrity and density of certain relationships between nodes.

What is unique to graph databases – compared to normal RDBMSs – is the powerful analyses available to reveal the hidden knowledge of connected data. One example is to find out the shortest path between two nodes. Another one is to identify relationship triangles.

3.1 Statistics on relationships

Cypher script:

// Count relationships by type
match (u)-[p]-() with type(p) as RelationshipName,
count(p) as RelationshipNumber
return RelationshipName, RelationshipNumber;

Outcome

Calculate data relationship statistics in Neo4j using Cypher

Description

Display the total number of each relationship type in the database.

Another query to get similar results is given below, however it takes much more time to complete:

MATCH ()-[r]->() RETURN type(r), count(*)

3.2 Find the shortest path between two nodes

Cypher script:

// Find all shortest path between 2 nodes
MATCH path =
      allShortestPaths((u:User {name:"Darin Dimitrov"})-[*]-(me:User {name:"Michael Hunger"}))
RETURN path;

Outcome

Learn more about data profiling using the Neo4j graph database and the APOC library

The shortest path between the two chosen users is 6.

Description

The shortest path between two users – highlighted by red arrows in the diagram above – tells us the two users are connected by posts having the same tags (red nodes). These are not necessarily the only paths, as users may post to answer each other’s questions or posts, but in this case, connection through the same tag – i.e., a common area of interest – are the fastest way to connect the two users.

In a large graph like this one, it may not be viable to calculate the shortest path between any two users. However, it may be valuable to check the connectivity among the most important people or among posts having the most interest.

3.3 Triangle detection

Cypher script:

// Triangle detection:
match (u:User)-[p1:POSTED]-(x1),(u)-[p2:POSTED]-(x2),(x1)-[r1]-(x2)
where x1 <> x2 return u,p1,p2,x1,x2 limit 10;

// Count all triangles in the graph
match (u:User)-[p1:POSTED]-(x1),(u)-[p2:POSTED]-(x2),(x1)-[r1]-(x2)
where x1 <> x2 return count(p1);

Outcome

How to detect triadic closures (triangles) in the Neo4j graph database using the Cypher query language

Description

Triangles are another key concept in graph theory. Triangles are represented by three connected nodes, directional or uni-directional. Identifying triangles – or lack of triangles – provide interesting insights on the underlying data asset.

Triangles are also referred as Triadic Closures, as per Graph Databases, 2nd Edition (O'Reilly Media):

A triadic closure is a common property of social graphs, where we observe that if two nodes are connected via a path involving a third node, there is an increased likelihood that the two nodes will become directly connected at some point in the future.

Putting this concept into our daily life, it’s a familiar social occurrence. If we happen to be friends with two people who don’t know one another, there’s an increased chance that those two people will become direct friends at some point in the future.

By discovering the existence of triangles in a graph database, we can create more efficient queries to avoid circular traversal.

4. Using the APOC Library

Since Neo4j 3.0, users can implement customized functionality using Java to extend Cypher for highly complex graph algorithms. This is the so-called concept of user-defined procedures.

The APOC library is one of the most powerful and popular Neo4j libraries. It consists of many procedures (about 300 at the time of writing) to help with many different tasks in areas like data integration, graph algorithms or data conversion. No surprise, it also has several functions for analysing the metadata of the graph database.

To enable APOC in Neo4j 3.x, there are a few simple steps:

Stop Neo4j service
Download and copy the most recent version of the APOC JAR file to the plugins folder under the database, e.g., graph.db\plugins
Add the following line to the neo4j.conf file: dbms.security.procedures.unrestricted=apoc.*
Start Neo4j service again

The out-of-box functions for profiling are all under apoc.meta.*

Below are some samples:

•  CALL apoc.meta.data()

This will list all nodes and relationships as well as properties of each.

List all nodes, relationships and properties in Neo4j using the APOC library

•  CALL apoc.meta.graph()

This is equivalent to CALL db.schema() (refer to section 1.1 above).

•  CALL apoc.meta.stats()

This will list statistics of nodes and relationships. It also shows the cardinality of each relationship by node types. For example, the following stats communicate the fact that the INTERACTS relationship is between the nodes of label Character:

{
  "labelCount": 1,
  "relTypeCount": 1,
  "propertyKeyCount": 3,
  "nodeCount": 107,
  "relCount": 352,
  "labels": {
    "Character": 107
  },
  "relTypes": {
    "(:Character)-[:INTERACTS]->()": 352,
    "()-[:INTERACTS]->(:Character)": 352,
    "()-[:INTERACTS]->()": 352
  }
}

•  CALL apoc.meta.schema()

This will return metadata of all node labels, relationship types and properties.

•  CALL apoc.meta.subGraph({labels:['Character'],rels:['INTERECTS']})

This is a very useful function especially for a very large graph, as it allows you to analyse a subset of nodes and relationships (subgraphs). The complete specification looks like this:

CALL apoc.meta.subGraph({labels:[labels],rels:[rel-types],excludes:[label,rel-type,…]})

5. Further Discussion

There is huge advantage to storing data as a graph. The graph data model enables us to do much more powerful analysis on relationships over large amount of data, and unearth buried connections among the vast amount of individual data elements (nodes).

Data profiling on a graph database like Neo4j gives us more insightful understandings of the actual data we are working on. The results obtained can then be used for further detailed analysis, performance tuning, database schema optimization and data migration.

As a native graph database, Neo4j provides native graph data storage and native query processing through the Cypher query language. Some of the most useful data profiling tasks can be easily done using Cypher, as shown in this article.

There are also extensions that support more complex and advanced graph analysis; for example: Betweenness Centrality and Connected Components. Neo4j graph algorithms is the one I’ve been using to perform more complex data profiling. Once installed, the functions can be called directly as part of the Cypher query and results visualized inside the Neo4j Browser. I plan to cover more on this in coming articles.

Learn how relational databases compare with graph databases:
Download this ebook, The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook

The post Data Profiling: A Holistic View of Data using Neo4j appeared first on Neo4j Graph Database Platform.

↧

Retail & Neo4j: Customer Experience Personalization

January 8, 2018, 3:30 am

≫ Next: How to Import the Bitcoin Blockchain into Neo4j [Community Post]

≪ Previous: Data Profiling: A Holistic View of Data using Neo4j

To remain viable, today’s retailers must be nimble enough to face their colossal online competition while also addressing another new reality of retail: The customer is now at the center of the value chain.

In order to adapt to these new realities, retailers must have real-time control of inventory, payment, and delivery systems. However, real-time responsiveness is difficult for traditional retailers slowed down by legacy infrastructure.

Nowhere is such nimbleness and real-time responsiveness required as in the management and personalization of the customer experience.

Learn how Neo4j is used for customer experience personalization with this global retailer case study

In this series on Neo4j and retail, we’ll break down the various challenges facing modern retailers and how those challenges are being overcome using graph technology. In our previous post, we covered personalized promotions and product recommendation engines.

This week, we’ll discuss customer experience personalization.

Why Customer Experience Personalization Is Critical

Retailers can personalize the online customer experience by serving relevant content based on the customer’s desires, interests, and needs. Doing so improves customer engagement and is likely to lead to increased revenue and customer loyalty.

For example, by serving relevant blog posts beside product descriptions, retailers can portray themselves as experts on how to use a particular product. Customers will increase their visits – and purchases – because they know they can get valuable information from a reliable source.

Retailers can also use path analytics to help improve outcomes. This involves analyzing customer behavior leading up to a purchase and using that data to guide customers along a more profitable path. This may entail adjusting content, or changing where a link takes future customers.

Retailers can also identify a dimension shared by a group of customers and cluster them based on these attributes. For example, customers could be clustered around the attribute of having (or not having) children, or customers could be clustered based on profession and tenure, such as an early-career engineer versus a seasoned VP of marketing.

Different dimensions of consumers have different responsibilities and incomes, and therefore different buying habits. Retailers can use this information to personalize content for each customer.

The Essential Role of Graph Technology

Retailers have plenty of data that can be used to determine the best paths and content to serve customers. That includes data pertaining to products, markets, social media, master data, digital assets, and the like. However, this data often resides in information silos, making it difficult to consolidate and identify opportunities to serve customers the most relevant content.

When it comes to combining all these data sources into a personalization engine, relational databases can’t do complex recommendation computations in real time.

You could move the data into Hadoop or a data warehouse to pre-compute recommendations for each customer, but the recommendations will always be slightly out of date. In addition, it is inefficient to pre-compute recommendations for an entire customer base every day when only a small portion of the customer base visits the website on any given day.

Rather than forklift all customer data into a centralized system, a graph database allows you to keep data where it is and add a graph analysis overlay.

Each customer can be given a department identifier, which is then tied back to the main customer identifier. The identifier for each department or line of business consists of individual identifiers, resulting in a two-overlap graph of each customer. This allows you a view of the bigger picture of the customer relationship and to quickly navigate back into the original systems anytime a customer interacts with the company.

Case Study: Global Sporting Goods Retailer

A global leader in the sporting goods industry wanted to offer a more personalized experience to its online customers. Unlike other online retailers that offer static content to all website visitors, this sporting good retailer wanted to serve content based on user interests, local languages, regional sporting news and market-specific product offerings.

There was just one problem: The data required to provide personalized web experiences was spread across various information silos.

“We have many different silos, many different data domains, and in order to make sense out of our data, we needed to bring those together and make them useful for us,” said a senior project manager.

On a technical level, data models didn’t align between information silos, and there wasn’t a standard, consistent way to communicate between the different data domains. Rather than consolidate all the data into a single place, the project manager wanted to create a “Shared Metadata Service” that would allow employees to categorize and search for content across every platform and division within the enterprise.

The Shared Metadata Service would also allow the Group to target audiences with content organized by language, country, sport and athlete. In addition, the Service would need to include search engine optimization (SEO) for content and be able to govern the roles for who has rights to change data and ownership of employees to ensure high quality data.

The Neo4j graph database proved to be ideal for creating the Service, offering access and searchability to all relevant data, along with support for emerging services. In order to implement the Shared Metadata Service on Neo4j, the engineering team had to first unify the different models between content, product data and master data.

With the help of Neo4j consultants, the retailer’s team defined an optimal data model that connects all three domains, relating information as diverse as marketing campaigns, product specifications, contracted athletes and associated teams, sports categories, gender and more.

Today, the Neo4j-powered Shared Metadata Service has two million nodes with nearly ten million relationships, but for the sporting goods retailer, this is only the first step. The ultimate goal is to build a recommendation engine that uses Neo4j to offer relevant, real-time suggestions to online shoppers.

Conclusion

Customer experience personalization is a pass/fail operation: Either an experience delights a customer by offering the right content, context or recommendation at exactly the right moment, or the effort fails and a customer grows frustrated at an irrelevant experience.

The best way to deliver personalized customer experiences is to use a database platform natively designed to store customer context – i.e., relationships to various interconnected factors – and retrieve that related data in real time. The only connected data platform that perfectly meets those requirements is Neo4j.

In the coming weeks, we’ll take a closer look at other ways retailers are using graph technology to create a sustainable competitive advantage, including ecommerce delivery service routing, supply chain visibility, revenue management and IT operations.

Catch up with the rest of the retail and Neo4j blog series:

Personalized Promotion & Product Recommendations

The post Retail & Neo4j: Customer Experience Personalization appeared first on Neo4j Graph Database Platform.

↧

How to Import the Bitcoin Blockchain into Neo4j [Community Post]

January 9, 2018, 3:16 am

≫ Next: What You Need to Know about How Meltdown and Spectre Affect Neo4j

≪ Previous: Retail & Neo4j: Customer Experience Personalization

[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

This guide runs through the basic steps for importing the bitcoin blockchain into a Neo4j graph database.

The whole process is just about taking data from one format (blockchain data), and converting it into another format (a graph database). The only thing that makes this slightly trickier than typical data conversion is that it’s helpful to understand of the structure of bitcoin data before you get started.

However, once you have imported the blockchain into Neo4j, you can perform analysis on the graph database that would not be possible with SQL databases. For example, you can follow the path of bitcoins to see if two different addresses are connected:

Screenshot of connected Bitcoin Addresses in the Neo4j Browser.

In this guide I will cover:

How bitcoin works, and what the blockchain is.
What blockchain data looks like.
How to import the blockchain data into Neo4j.

This isn’t a complete tutorial on how to write your own importer tool. However, if you’re interested, you can find my bitcoin-to-neo4j code on GitHub, although I’m sure you could write something cleaner after reading this guide.

1. What Is Bitcoin?

Bitcoin is a computer program.

It’s a bit like uTorrent; you run the program, it connects to other computers running the same program, and it shares a file. However, the cool thing about bitcoin is that anyone can add data to this shared file, and any data already written to the file cannot be tampered with.

Learn how to import the bitcoin blockchain data into the Neo4j graph database using Cypher queries

As a result, Bitcoin creates a secure file that is shared on a distributed network.

What can you do with this?

In bitcoin, each piece of data that gets added to this file is a transaction. Therefore, this decentralised file is being used as a “ledger” for a digital currency (i.e., cryptocurrency).

This ledger is called the blockchain.

Where can I find the blockchain?

If you run the Bitcoin Core program, the blockchain will be stored in a folder on your computer:

Linux: ~/.bitcoin/blocks
Windows: ~/Library/Application Support/Bitcoin/blocks
Mac: C:\Users\YourUserName\Appdata\Roaming\Bitcoin\blocks

When you open this directory you should notice that instead of one big file, you will find multiple files with the name blkXXXXX.dat. This is the blockchain data, but split across multiple smaller files.

2. What Does the Blockchain Look Like?

The blk.dat files contain serialized data of blocks and transactions.

Blocks

Blocks are separated by magic bytes, which are then followed by the size of the upcoming block.

Each block then begins with a block header:

A block is basically a container for a list of transactions. The header is like the metadata at the top.

Block Header Example:

000000206c77f112319ae21489b66774e8acd379044d4a23ea7498000000000000000000821fe1890186779b2cc232d5dbecfb9119fd46f8a9cfd1141649ff1cd907374487d8ae59e93c011832ec0399

Transactions

After the block header, there is a byte that tells you the upcoming number of transactions in the block. After that, you get serialized transaction data, one after the other.

A transaction is just another piece of code again, but they are more structurally interesting.

Each transaction has the same pattern:

Select Outputs (we call these Inputs).

Unlock these inputs so that they can be spent.

Create Outputs

Lock these outputs to a new address.

So after a series of transactions, you have a transaction structure that looks like something this:

This is a simplified diagram of what the blockchain looks like. As you can see, it looks like a graph.

Transaction Example:

0200000001f2f7ee9dda0ba82031858d30d50d3205eea07246c874a0488532014d3b653f03000000006a47304402204df1839028a05b5b303f5c85a66affb7f6010897d317ac9e88dba113bb5a0fe9022053830b50204af15c85c9af2b446338d049672ecfdeb32d5124e0c3c2256248b7012102c06aec784f797fb400001c60aede8e110b1bbd9f8503f0626ef3a7e0ffbec93bfeffffff0200e1f505000000001976a9144120275dbeaeb40920fc71cd8e849c563de1610988ac9f166418000000001976a91493fa3301df8b0a268c7d2c3cc4668ea86fddf81588ac61610700

3. How to Import the Blockchain into Neo4j

Well, now we know what the blockchain data represents (and that it looks a lot like a graph), we can go ahead and import it into Neo4j. We do this by:

Reading through the blk.dat files.
Decoding each block and transaction we run into.
Converting the decoded block/transaction into a Cypher query.

Here’s a visual guide to how I represent Blocks, Transactions and Addresses in the database:

Blocks

Neo4j import for a block of the bitcoin blockchain

CREATE a :block node, and connect it to the previous block it builds upon.
- SET each field from the block header as properties on this node.
CREATE a :coinbase node coming off each block, as this represents the “new” bitcoins being made available by the block.
- SET a value property on this node, which is equal to the block reward for this block.

Transactions

Import a bitcoin transaction into Neo4j graph database

CREATE a :tx node, and connect it to the :block we had just created.
- SET properties (version, locktime) on this node.
MERGE existing :output nodes and relate them [:in] to the :tx.
- SET the unlocking code as a property on the relationship.
CREATE new :output nodes that this transaction creates.
- SET the respective values and locking codes on these nodes.

Addresses

If the locking code on an :output contains an address…

Import a bitcoin blockchain address into a Neo4j graph data model

CREATE an :address node, and connect the output node to it.
- SET the address as a property on this node.
- Note: If different outputs are connected to the same address, then they will be connected to the same address node.

4. Cypher Queries

Here are some example Cypher queries you could use for the basis of inserting blocks and transactions into Neo4j.

Note: You will need to decode the block headers and transaction data to get the parameters for the Cypher queries.

Block

MERGE (block:block {hash:$blockhash})
CREATE UNIQUE (block)-[:coinbase]->(:output:coinbase)
SET
   block.size=$size,
   block.prevblock=$prevblock,
   block.merkleroot=$merkleroot,
   block.time=$timestamp,
   block.bits=$bits,
   block.nonce=$nonce,
   block.txcount=$txcount,
   block.version=$version,

MERGE (prevblock:block {hash:$prevblock})
MERGE (block)-[:chain]->(prevblock)

Parameters (example):

{
	"blockhash": "00000000000003e690288380c9b27443b86e5a5ff0f8ed2473efbfdacb3014f3",
	"version": 536870912,
	"prevblock": "000000000000050bc5c1283dceaff83c44d3853c44e004198c59ce153947cbf4",
	"merkleroot": "64027d8945666017abaf9c1b7dc61c46df63926584bed7efd6ed11a6889b0bac",
	"timestamp": 1500514748,
	"bits": "1a0707c7",
	"nonce": 2919911776,
	"size": 748959,
	"txcount": 1926,
}

Transaction

MATCH (block :block {hash:$hash})
MERGE (tx:tx {txid:$txid})
MERGE (tx)-[:inc {i:$i}]->(block)
SET tx += {tx}

WITH tx
FOREACH (input in $inputs |
         MERGE (in :output {index: input.index})
         MERGE (in)-[:in {vin: input.vin, scriptSig: input.scriptSig, sequence: input.sequence, witness: input.witness}]->(tx)
         )

FOREACH (output in $outputs |
         MERGE (out :output {index: output.index})
         MERGE (tx)-[:out {vout: output.vout}]->(out)
         SET
             out.value= output.value,
             out.scriptPubKey= output.scriptPubKey,
             out.addresses= output.addresses
         FOREACH(ignoreMe IN CASE WHEN output.addresses <> '' THEN [1] ELSE [] END |
                 MERGE (address :address {address: output.addresses})
                 MERGE (out)-[:locked]->(address)
                 )
        )

Note: This query uses the FOREACH hack, which acts as a conditional and will only create the :address nodes if the $addresses parameter actually contains an address (i.e., if it is not empty).

Parameters (example):

{
   "txid":"2e2c43d9ef2a07f22e77ed30265cc8c3d669b93b7cab7fe462e84c9f40c7fc5c",
   "hash":"00000000000003e690288380c9b27443b86e5a5ff0f8ed2473efbfdacb3014f3",
   "i":1,
   "tx":{
      "version":1,
      "locktime":0,
      "size":237,
      "weight":840,
      "segwit":"0001"
   },
   "inputs":[
      {
         "vin":0,
         "index":"0000000000000000000000000000000000000000000000000000000000000000:4294967295",
         "scriptSig":"03779c110004bc097059043fa863360c59306259db5b0100000000000a636b706f6f6c212f6d696e65642062792077656564636f646572206d6f6c69206b656b636f696e2f",
         "sequence":4294967295,
         "witness":"01200000000000000000000000000000000000000000000000000000000000000000"
      }
   ],
   "outputs":[
      {
         "vout":0,
         "index":"2e2c43d9ef2a07f22e77ed30265cc8c3d669b93b7cab7fe462e84c9f40c7fc5c:0",
         "value":166396426,
         "scriptPubKey":"76a91427f60a3b92e8a92149b18210457cc6bdc14057be88ac",
         "addresses":"14eJ6e2GC4MnQjgutGbJeyGQF195P8GHXY"
      },
      {
         "vout":1,
         "index":"2e2c43d9ef2a07f22e77ed30265cc8c3d669b93b7cab7fe462e84c9f40c7fc5c:1",
         "value":0,
         "scriptPubKey":"6a24aa21a9ed98c67ed590e849bccba142a0f1bf5832bc5c094e197827b02211291e135a0c0e",
         "addresses":""
      }
   ]
}

5. Results

If you have inserted the blocks and transactions using the Cypher queries above, then these are some examples the kind of results you can get out of the graph database.

Block

MATCH (block :block)<-[:inc]-(tx :tx)
WHERE block.hash='$blockhash'
RETURN block, tx

Transaction

MATCH (inputs)-[:in]->(tx:tx)-[:out]->(outputs)
WHERE tx.txid='$txid'
OPTIONAL MATCH (inputs)-[:locked]->(inputsaddresses)
OPTIONAL MATCH (outputs)-[:locked]->(outputsaddresses)
OPTIONAL MATCH (tx)-[:inc]->(block)
RETURN inputs, tx, outputs, block, inputsaddresses, outputsaddresses

Address

MATCH (address :address {address:'1PNXRAA3dYTzVRLwWG1j3ip9JKtmzvBjdY'})<-[:locked]-(output :output)
WHERE address.address='$address'
RETURN address, output

Paths

Finding paths between transactions and addresses is probably the most interesting thing you can do with a graph database of the bitcoin blockchain, so here are some examples of Cypher queries for that:

Between Outputs

Neo4j output of a bitcoin blockchain path

MATCH (start :output {index:'$txid:vout'}), (end :output {index:'$txid:out'})
MATCH path=shortestPath( (start)-[:in|:out*]-(end) )
RETURN path

Between Addresses

MATCH (start :address {address:'$address1'}), (end :address {address:'$address2'})
MATCH path=shortestPath( (start)-[:in|:out|:locked*]-(end) )
RETURN path

Conclusion

This has been a simple guide on how you can take the blocks and transactions from blk.dat files (the blockchain) and import them into a Neo4j database.

I think it’s worth the effort if you’re looking to do serious graph analysis on the blockchain. A graph database is a natural fit for bitcoin data, whereas using an SQL database for bitcoin transactions feels like trying to shove a square peg into a round hole.

I’ve tried to keep this guide compact, so I haven’t covered things like:

Reading through the blockchain. Reading the blk.dat files is easy enough. However, the annoying thing about these files is that the blocks are not written to these files in sequential order, which makes setting the height on a block or calculating the fee for a transaction a bit trickier (but you can code around it).
Decoding blocks and transactions. If you want to use the Cypher queries above, you will need to get the parameters you require by decoding the block headers and raw transaction data as you go. You could write your own decoders, or you could try using an existing bitcoin library.
Segregated Witness. I’ve only given a Cypher query for an “original” style transaction, which was the only transaction structure used up until block 481,824. However, the structure of a segwit transaction is only slightly different (but it might need its own Cypher query).

Nonetheless, hopefully this guide has been somewhat helpful.

But as always, if you understand how the data works, converting it to a different format is just a matter of sitting down and writing the tool.

Good luck.

Want in on awesome projects like this?
Click below to get your free copy of the Learning Neo4j ebook and catch up to speed with the world’s leading graph database technology.

Get the Free Book

The post How to Import the Bitcoin Blockchain into Neo4j [Community Post] appeared first on Neo4j Graph Database Platform.

↧

What You Need to Know about How Meltdown and Spectre Affect Neo4j

January 16, 2018, 12:43 am

≫ Next: Retail & Neo4j: Ecommerce Delivery Service Routing

≪ Previous: How to Import the Bitcoin Blockchain into Neo4j [Community Post]

Learn how the Meltdown and Spectre security vulnerabilities affect the Neo4j graph database

Following the public announcement of the Meltdown (CVE-2017-5754) and Spectre (CVE-2017-5753 and CVE-2017-5715) security vulnerabilities earlier this month, the Neo4j team wants to keep you informed on how these vulnerabilities affect users and customers of the Neo4j Graph Platform.

Here are the most frequent questions and answers we have received on these security vulnerabilities:

Meltdown & Spectre: Frequently Asked Questions

Q: What are Meltdown and Spectre?

A: Meltdown and Spectre are exploits of vulnerabilities affecting almost all modern processors. These exploits could allow a malicious program to read data from another program running on the same server. The links in this article provide further details about how they work and their potential impacts. Abuse of these exploits would be very hard to detect, but so far there are no known cases.

Q: How might Meltdown and Spectre affect Neo4j?

A: In theory, a malicious program running on the same server as Neo4j could read graph data from Neo4j’s memory. In practice, Neo4j is usually deployed on secure servers which are free of malicious programs, so the risk is small. However, it is still important to eliminate this risk through fixes or workarounds.

Q: Does Neo4j need to be patched to work around these vulnerabilities?

A: No, it’s only possible to work around these vulnerabilities with changes in the levels below Neo4j: in the operating system or in firmware. Patches are already available for all of our supported operating systems, and we expect further OS patches and firmware patches to become available over the next weeks and months.

Q: Will Neo4j performance be affected by the OS-level workarounds?

A: We are conducting tests to discover the impact on Neo4j. We are comparing performance before and after applying OS-level workarounds, for the latest patch release of each supported version of Neo4j. This testing may lead us to make changes to Neo4j to mitigate any performance degradation.

At present, it’s too early to tell what the performance impact might be or whether changes to Neo4j itself will be helpful.

Q: What actions should I take as a Neo4j user in response to these vulnerabilities?

A: The Neo4j team recommends applying the relevant patches provided as they become available from your operating system vendor. Since many of the patches are very new, there have been teething problems, so we recommend testing the OS upgrade before rolling it out to your production systems. Please contact Neo4j Support for further advice.

Further Updates Are Forthcoming

As of this writing, the Neo4j team is conducting further research into how the Meltdown and Spectre vulnerabilities affect the security and performance of the Neo4j graph database.

Please check back frequently as this blog post (and other locations across our website) will be updated as more information becomes available.

Resources on Meltdown & Spectre

The post What You Need to Know about How Meltdown and Spectre Affect Neo4j appeared first on Neo4j Graph Database Platform.

↧

Retail & Neo4j: Ecommerce Delivery Service Routing

January 17, 2018, 2:44 am

≫ Next: Retail & Neo4j: Supply Chain Visibility & Management

≪ Previous: What You Need to Know about How Meltdown and Spectre Affect Neo4j

As a retailer, if you think keeping up with Amazon is expensive and time-consuming, consider the alternative: extinction.

When it comes to delivery and fulfillment, Amazon is the uncontested emperor of ecommerce. Yet, their efficiency in tracking and delivering orders isn’t a complete secret: graph technology.

Now that off-the-shelf graph database platforms like Neo4j are available to smaller-than-Amazon retailers, it’s time to consider one way you can take back the lead: ecommerce delivery.

Learn how the Neo4j Graph Platform powers ecommerce delivery service routing in this eBay case study

In this series on Neo4j and retail, we’ll break down the various challenges facing modern retailers and how those challenges are being overcome using graph technology. In our previous posts, we’ve covered personalized promotions and product recommendation engines as well as customer experience personalization.

This week, we’ll discuss ecommerce delivery service routing.

How Neo4j Transforms Ecommerce Delivery Service Routing

Amazon has set the standard for shipping and delivery. Thanks to its free, two-day shipping for Amazon Prime members, ecommerce shoppers aren’t willing to wait any longer than two days to receive their online purchases. As a result, retailers must meet or beat the standard or risk losing customers to Amazon.

To shorten delivery times, retailers must have visibility into inventory at storefronts and distribution centers, as well as the transit network. They need to know, for example, whether a routing problem could delay a product shipped from a distribution center located closer to the customer, or whether a shortage of products makes it impossible to meet a specific delivery date altogether. Identifying the fastest delivery route requires support for complex routing queries at scale with fast and consistent performance.

Ecommerce delivery service routing is a natural fit for graph database given the highly connected nature of the data. It’s not just that it requires a lot of “hops” across data points, but that there can be many different paths with any number of permutations.

Those permutations may be optimized and deemed the best path at different times of the year and for different products, even within a single order. A graph database can take these various factors into account and support complex routing queries to streamline delivery services.

Case Study: eBay

Even before its acquisition by global ecommerce leader eBay, London-based Shutl sought to give people the fastest possible delivery of their online purchases. Customers loved the one-day service, and it grew quickly. However, the platform Shutl built to support same-day delivery couldn’t keep up with the exponential growth.

The service platform needed a revamp in order to support the explosive growth in data and new features. The MySQL queries being used created a code base that was too slow and too complex to maintain.

The queries used to select the best courier were simply taking too long, and Shutl needed a solution to maintain a competitive service. The development team believed a graph database could be added to the existing Service-Oriented Architecture (SOA) to solve the performance and scalability challenges.

Neo4j was selected for its flexibility, speed and ease of use. Its property graph model harmonized with the domain being modeled, and the schema-flexible nature of the database allowed easy extensibility, speeding up development. In addition, it overcame the speed and scalability limitations of the previous solution.

“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code. At the same time, Neo4j allowed us to add functionality that was previously not possible,” said Volker Pacher, Senior Developer for eBay.

The Cypher graph query language allowed queries to be expressed in a very compact and intuitive form, speeding development. The team was also able to take advantage of existing code, using a Ruby library for Neo4j that also supports Cypher.

Implementation was completed on schedule in just a year. Queries are now easy and fast. The result is a scalable platform that supports expansion of the business, including the growth it is now experiencing as the platform behind eBay Now.

Conclusion

Effectively competing with Amazon means your solution needs to be fail-safe, flexible and future-proof. While other technology solutions have narrow use-cases or fixed schemas, Neo4j allows you to evolve your ecommerce delivery platform as variables and circumstances change.

Using the power of graph algorithms that find the shortest path between your fulfillment centers and your customers, routing ecommerce deliveries will be a snap – and not a headache.

In the coming weeks, we’ll take a closer look at other ways retailers are using graph technology to create a sustainable competitive advantage, including supply chain visibility, revenue management and IT operations.

Catch up with the rest of the retail and Neo4j blog series:

Personalized Promotion & Product Recommendations

Customer Experience Personalization

The post Retail & Neo4j: Ecommerce Delivery Service Routing appeared first on Neo4j Graph Database Platform.

↧

Retail & Neo4j: Supply Chain Visibility & Management

January 22, 2018, 2:53 am

≫ Next: Meltdown & Spectre: Current Results from Neo4j Performance Testing

≪ Previous: Retail & Neo4j: Ecommerce Delivery Service Routing

Now more than ever, supply chains are vast and complex.

Products are often composed of different ingredients or parts that move through different vendors, and each of those parts may be composed of subparts, and the subparts may come from still other subparts and other vendors from various parts of the world.

Because of this complexity, retailers tend to know only their direct suppliers, which can be a problem when it comes to risk and compliance. As supply chains become more complex – and also more regulated – supply chain visibility is more important than ever.

Fortunately, graph technology makes multiple-hop supply chain management simple for retailers and their suppliers.

Learn how Neo4j brings simplicity to the complex challenge of supply chain visibility and management

In this series on Neo4j and retail, we’ll break down the various challenges facing modern retailers and how those challenges are being overcome using graph technology. In our previous posts, we’ve covered personalized promotions and product recommendation engines, customer experience personalization and ecommerce delivery service routing.

This week, we’ll discuss supply chain visibility and management.

How Neo4j Enables Crystal-Clear Supply Chain Visibility

Retailers need transparency across the entire supply chain in order to detect fraud, contamination, high-risk sites, and unknown product sources.

If a specific raw material is compromised in some way, for example, companies must be able to rapidly identify every product impacted. This requires managing and searching large volumes of data without delay or other performance issues – especially if consumer health or welfare is on the line.

Supply chain transparency is also important for identifying weak points in the supply chain or other single points of failure. If a part or ingredient was previously available from three suppliers but is now only available from one, the retailer needs to know how that might affect future output.

Achieving visibility across the supply chain requires deep connections. A relational database is simply not built to handle a lot of recursive queries or JOINs, and as a result performance suffers.

A graph database, however, is designed to search and analyze connected data. The architecture is built around data relationships first and foremost. This enables retailers and manufacturers to manage and search large volumes of data with no performance issues and achieve the supply chain visibility they need.

Case Study: Transparency-One

Recognizing the inherent risks of the supply chain, Transparency-One sought to build a platform that allows manufacturers and brand owners to learn about, monitor, analyze and search their supply chain, and to share significant data about production sites and products.

Transparency-One initially considered building the platform on a classic SQL database-type solution. However, the company quickly realized that the volume and structure of information to be processed would have a significant impact on performance and cause considerable problems. So, Transparency-One began looking at graph databases.

Neo4j was the only graph database that could meet Transparency-One’s requirements, including the capacity to manage large volumes of data. Neo4j is also the most widely used graph database in the world, both by large companies and startups.

“We tested Neo4j with dummy data for several thousand products, and there were no performance issues,” said Chris Morrison, CEO of Transparency-One. “As for the search response time, we didn’t have to worry about taking special measures, since we got back results within seconds that we would not have been able to calculate without this solution.”

Using Neo4j, Transparency-One got up and running and built a prototype in less than three months. Since then, the company has extended Neo4j with new modules and the platform is currently deployed by several companies.

Conclusion

With so many partners, suppliers and end-consumers growing more interconnected than ever before, retailers must have complete end-to-end supply chain visibility in order to proactively address issues in their supply chains – whether it’s a contamination outbreak or a faulty part.

However, when retailers reimagine their data as a graph, they transform a complex problem into a simple one. Furthermore, using graph visualization to manage and oversee complex supply chains allows human managers (and not just algorithms) the ability to instantly pinpoint and fix critical junctures, single points of failure and other problems within the supply network.

In the coming weeks, we’ll take a closer look at other ways retailers are using graph technology to create a sustainable competitive advantage, including revenue management and IT operations.

Catch up with the rest of the retail and Neo4j blog series:

Personalized Promotion & Product Recommendations

Customer Experience Personalization

Ecommerce Delivery Service Routing

The post Retail & Neo4j: Supply Chain Visibility & Management appeared first on Neo4j Graph Database Platform.

↧