Congrats to Cerved and LARUS for Winning the Big Data Analytics Award from Digital360

August 3, 2018, 12:00 am

≫ Next: Financial Risk Reporting: Building a Risk Metadata Foundation

≪ Previous: Taking the First Step: How to Lead a Local Neo4j GraphDB Meetup

Congratulations to Cerved and LARUS for winning the Digital 360 award for their Graph4You project

On July 2018, Cerved – a Neo4j customer – has been awarded with the Big Data Analytics prize, by a panel of 50 CIOs working for the most important Italian companies, at the Digital360 Awards 2018, thanks to its Graph4you project.

Cerved is an Italian data-driven company giving businesses and institutions the data and tools to guide their decisions, and a team of people to turn those decisions into action. Cerved helps its clients with credit information, marketing and the management of bad credit. They have one of the leading European rating agencies within its group.

Cerved and Larus win a Digital360 award for Big Data Analytics.

Graph4you was born within the Innovation group at Cerved during the last quarter of 2015, specifically to explore the potential of graphs technologies to support financial decision making.

The technology behind their solution is based on a Neo4j graph database. In particular, the database has been collecting different data sources coming from within Cerved, as geo-localized information of people and companies, real estate data, public administrations and business ownerships. Nowadays, in the Neo4j cluster, there are more than 45 million nodes and 100 million relationships.

Graph4you reveals, in fractions of seconds, the hidden relationships among the entities stored in the graph, highlighting connected paths. With this graph database solution, it’s easy for users to investigate all the information along interconnected entities, emphasizing possible troubles and issues.

Today, Graph4you users can query and navigate through the Italian business network to discover connections between entities or get the corporate linkages of companies and individuals. The web application uses Ogma.js (by Linkurious) as a graph visualization library.

Furthermore, it’s possible to interface via REST API in order to simplify integration and automate customers workflow. The platform has been realized by Quantyca in Spring and React.

With Graph4you, Cerved helps its customers accelerate, improve and make clear the decision-making process for procurement, fraud detection and business intelligence problems. This is possible thanks to Neo4j’s native graph database solution and its built-in algorithms.

However, connecting and navigating such amounts of data also require a big effort in extending the capabilities of Neo4j. Connecting companies such as Cerved and LARUS did the trick.

LARUS provided their full support in the project, developing the weighted shortest-path algorithm and engineering most of the queries a user can perform over the database.

Check out a sample of how to query and navigate connected data with Graph4you.

Based on the well-known Dijkstra algorithm, the procedure developed by LARUS enriches the possibility for the user to discover the weighted distance between pairs of nodes or between a node and a customer list (e.g. blacklist, whitelist, prospect).

For each query, the user can use a custom set of weights (scenario) at runtime, as well as filtering for specific nodes and relationships.

One more, Neo4j did the job. At the core of the business, Neo4j provides full support for the Graph4you engine, and allowed two companies to connect their own experiences and skills, providing such a powerful product to the marketplace.

Want in on awesome projects like this?

Click below to get your free copy of the Learning Neo4j ebook and catch up to speed with the world’s leading graph database technology.

Get the Free Book

↧

Financial Risk Reporting: Building a Risk Metadata Foundation

August 6, 2018, 12:00 am

≫ Next: The Top 10 Reasons to Attend GraphConnect 2018 in New York City

≪ Previous: Congrats to Cerved and LARUS for Winning the Big Data Analytics Award from Digital360

Learn how connected data is helping banks with financial risk reporting.

Regulations, such as BCBS 239, are driving banks to reexamine the way they manage financial risk management data.

Building a connected data foundation supports a world of innovative uses of your enterprise data – including 360-degree visibility of your customers, detecting and preventing fraud, proactively assessing credit applications and driving what-if scenarios that improve productivity and profitability.

In this series, we describe how connected data and graph database technologies are transforming risk reporting in modern banks to help them meet stringent demands of risk reporting compliance.

Last week, we explored key data challenges associated with BCBS 239 compliance. This week, we’ll take a closer look at how a federated approach to risk management drives compliance – and creates a foundation for business value. We’ll also discuss how to choose the right graph database technology for risk reporting compliance.

Using a Federated Approach to BCBS 239 Compliance

Integrating information into a single, enterprise-wide logical data model is very difficult and time consuming. In some cases, the structure and location of much of the data makes it all but impossible to address in a single, centralized data store. And ironically, moving everything into a single repository makes tracing data lineage even more difficult.

After earlier failed attempts to centralize enterprise information in data warehouses and operational data stores, most banks have accepted their data will remain in silos.

Given these complications, many banks are now embracing a federated model that leaves the data dispersed in its original locations, while maintaining control of the model using centralized metadata.

Leave data in original locations while using a centralized metadata model.

Federated metadata models make it considerably easier to relate entity identities, maintain data consistency, and describe end-to-end data lineage.

Building a Financial Risk Metadata Foundation

Banking institutions have no choice but to address to BCBS 239 regulations. But rather than reactively responding to the new mandates, progressive banks are using BCBS 239 as justification for building a strong metadata foundation for risk management, regulatory and analytic applications.

Traditional metadata management technologies might appear to be an obvious choice for building your new metadata foundation. But they are not capable of handling highly-connected risk-management data, tracing data lineage or adjusting for temporal inconsistencies in reports.

The complexity of the risk management and BCBS 239 require more than just simple metadata managers. To tackle the modeling and management requirements of the new regulations, you need to use graph database technology.

Choosing the Right Graph Database Technology

Data management experts agree that metadata challenges should be solved using graph databases, and not old, traditional technologies. But all graph technologies are not the same; many are thin veneers built atop old relational or NoSQL engines with inherent problems.

Relational Databases Aren’t Graph Ready

Relational databases masquerading as graph technologies are fraught with systemic faults. As complex queries traverse a graph model, query hops translate into a flurry of relational table joins that use computing resources inefficiently and cripple application performance. In sharp contrast, native graph databases use graph methods to store and query graph-based metadata. The result is fast, consistent data retrieval, presentation and management.

Non-Native NoSQL Databases Provide No Solution

Non-native NoSQL databases also fall short of core BCBS 239 requirements. Instead of storing data relationships as native graph elements, they add a graph translation layer that reduces query performance. And NoSQL engines lose transactions and relationships regularly, making them unreliable for tracing data lineage back to original information sources.

Neo4j: Native Graph Platform Ready for Financial Risk Compliance

The most popular and successful graph database is Neo4j, which is used in a large majority of graph installations worldwide. As a 100% native graph database, Neo4j eliminates the data consistency and corruption problems caused by non-native approaches to graph applications. And its dependable query performance delivers instant results, even for Value at Risk, Potential Future Exposure and other complex risk-reporting requests.

A modern graph approach to financial risk reporting.

Conclusion

By choosing Neo4j for risk reporting compliance, you get a lot more than the world’s leading graph database. Neo4j supports global financial terminology standards is backed by professional services that guarantee success and provide new visibility into your compliance efforts and day-to-day operations.

Next week, we’ll explain how the Financial Industry Business Ontology (FIBO) and graph technology complement one another to solve risk reporting challenges. We’ll also explore real-world risk reporting solutions.

Catch up with the rest of the financial risk reporting blog series:

The Connected Nature of Financial Risk

Comply and innovate:
Find out how financial services firms use connected data to comply and get a competitive edge in this white paper, The Connected Data Revolution in Financial Risk Reporting: Connections in Financial Data Are Redefining Risk and Compliance Practices. Click below to get your free copy.

Read the White Paper

↧

The Top 10 Reasons to Attend GraphConnect 2018 in New York City

August 10, 2018, 12:00 am

≫ Next: Financial Risk Reporting: Graph Technology Is a Game Changer

≪ Previous: Financial Risk Reporting: Building a Risk Metadata Foundation

Ten reasons why you must attend GraphConnect 2018 in New York City.

Once again, GraphConnect makes its way back to the vibrant island of Manhattan this autumn! Whether you’re new to the world of graph database technology or you’ve been a part of the movement for some time, there’s something new for everybody at this year’s event.

Here are just a few of the best reasons why you should attend GraphConnect 2018 in NYC on September 20 and 21:

10. The Location

This year, GraphConnect 2018 takes place at the Marriott Marquis in Times Square:

GraphConnect sessions, Training and Workshops will all conveniently take place in one spot, which means maximizing your time to extract the most value out of this year’s experience.
Getting the most out of your experience also means stepping out of the hotel and into the bright lights and long avenues of Gotham City. Get a hug from your favorite cartoon character, stay up for the Midnight Moment, go on a hunt for the original Ray’s pizza slice (hint: it’s not in Times Square), or simply stand there and let the digital glow of rampant consumerism envelope you. It’s quite a thing.
Ready to a sip a cocktail overlooking the bustle of Time Square? Head on over to the St. Cloud rooftop bar at the Knickerbocker Hotel. If you’re wanting a more old-world New Yorker place to grab a drink with your group, we suggest visiting Jimmy’s Corner. This narrow Times Square “dive” bar features Sinatra and the Delfonics on the jukebox and boxing memorabilia covering the walls.

Come early or stay late, because there’s so much to enjoy throughout the whole city!

9. In-Person Access to Neo4j Engineers

Meet Neo4j engineers in person to ask any questions you may have.

Get the full graph database experience by meeting the makers of the world’s #1 platform for connected data!

At GraphConnect 2018, you’ll get to spend time with the engineers who help put Neo4j into production and make our customers successful in their graph database endeavors. With this personal access, you’re encouraged to network, idea swap and ask any quick questions or advanced technical quandaries you may have.

Plenty of Neo4j experts will also be in the DevZone GraphClinic to help get your graph database off the ground and going full speed ahead.

8. The DevZone

Relax in the DevZone and rub shoulders with your peers and graph database experts.

Back by popular demand is our DevZone, a place for developers to chat with the speakers, lounge on couches, play Graph Karaoke, get Neo4j Certified and grab a snack.

Stop by to connect with other developers, learn something new and have fun!

7. More Trainings Than Ever Before

At GraphConnect 2018, you'll have access to more training sessions than ever before.

The second day of GraphConnect – Friday, September 21 – is devoted to instructor-led classroom training. Each half-day session, lead by top Neo4j engineers, trainers and partners, allows for substantial immersion on the given topic.

Workshops this year include a graph modeling introduction, Cypher tuning and performance, data science and machine learning, extensions for analytics and operations, full-stack development, graph algorithms and lots of other enlightening graph-related subject matter.

Feed your head with lots of graph technology goodies!

6. The GraphHack

Stay for the GraphHack, a hackathon for creating new applications with Neo4j and other popular technologies.

This year, we’re doing the GraphHack – our graph database hackathon – on Saturday, September 22 to close out the conference. This year’s hackathon is organized around the theme of “Buzzword Bingo” to highlight a number of useful Neo4j integrations with other popular technologies.

Developers divide into teams to build cool new applications using Neo4j and other technologies listed on their Bingo card. Each team must make a Bingo (four in a row vertically, horizontally or diagonally) for a valid hackathon submission.

With a chance to win some great prizes (BB8 Droid, anyone?) you don’t want to miss this fun and innovative event.

5. The Amazing Speaker Lineup

See an impressive lineup of speakers at GraphConnect 2018.

GraphConnect attracts the top speakers and innovators in the graph technology ecosystem. This year is no exception.

This year’s speaker highlights include:

David Fox, Adobe
Ann Grubbs, Lockheed Martin Space Systems
Dr. Peng Sun, CA Technologies
Seth Dimick, Nordstrom
Gary Stewart, ING
Tatiana Romina Hartinger, Cognitiva
Alexander Jarasch, The German Center for Diabetes Research
Bajal Mohiyudeen, AT&T

Of course, there are many more brilliant speakers to fill the day, and you’ll find the full list of of speakers at the “Speakers” section on GraphConnect.com.

4. The Takeaways

Take away tons of great information about graph database technology.

No matter what role you have, you are sure to take away ideas from the event to apply to your job and make a difference at your company or for your customers.

Through the power of connected data, some attendees learn how to streamline processes while others see a new way to approach a business problem. You’ll definitely leave with a snapshot of where the future capabilities of graph technology are headed.

And while we can’t say exactly what might be introduced on the keynote stage this year, rest assured the latest updates to graph technology will improve your understanding of how to best connect your data.

3. The Content

Experience speakers and sessions with great content about graph technology.

With so many amazing submissions for this year’s Call for Papers, we had a really difficult time choosing the sessions. For you, that means amazing, high-quality presentations all around!

This year we have six tracks spanning a variety of technologies, industries and use cases. Key topics include graph database evolutions in AI and machine learning, discovery and visualization, biotech and healthcare, knowledge graphs and digital transformation.

2. The disConnect Party

Decompress at GraphConnect at the disConnect party.

On Thursday, September 20, you’ll spend the day watching both CEO Emil Eifrem and Chief Scientist Jim Webber talk about the future of Neo4j and graphs, attending a dozen sessions and lightning talks, getting help with your data modeling in the GraphClinic, and perhaps doing some Graph Karaoke.

Afterwards, you’ll likely be ready to decompress and socialize with fellow engineers and business execs.

The post-conference disConnect party lets you mingle with your new connections — all while making plans on how you’ll change the world with the power of graph database technology.

Plus, there will be free drinks and snacks. Need I say more?

1. The Relationships

Make new friends at GraphConnect, because we're all about relationships.

Graph technology is powerful because it leverages data relationships. GraphConnect is powerful because it builds your person-to-person relationships.

After all, trainings, presentations and announcements can all be experienced elsewhere or even remotely, but in-person networking and relationship building with fellow graphistas can’t happen anywhere else but GraphConnect.

You’ll find it difficult to be an orphan node at GraphConnect 2018. That’s because everyone in attendance already understands the inherent value of relationships over individual data points.

Do you really need any other reasons to attend GraphConnect this year?

I didn’t think so. We’ll see you at GraphConnect 2018 – the premiere global gathering of the graph technology community.

Register Now

↧

Financial Risk Reporting: Graph Technology Is a Game Changer

August 13, 2018, 12:00 am

≫ Next: Graph Databases for Beginners: Why a Database Query Language Matters (More Than You Think)

≪ Previous: The Top 10 Reasons to Attend GraphConnect 2018 in New York City

Learn how graph technology is transforming the world of financial risk reporting.

How do you prevent another global financial crisis like the one in 2008?

The answer lies in better visibility into the deep connections in risk data. Armed with an understanding of risk data lineage, financial houses could have limited their exposure.

A connected data foundation not only streamlines financial risk reporting – it also supports a world of innovative uses of your enterprise data, including 360-degree visibility of your customers, detecting and preventing fraud, proactively assessing credit applications and driving what-if scenarios that improve productivity and profitability.

In this series, we describe how connected data helps you gain valuable insights from the relationships hidden in that information. In previous weeks, we explored the connected nature of financial risk and the benefits of creating a risk metadata foundation to streamline compliance and drive business value.

In this final post, we’ll discuss the synergy between advances in standard terms for risk reporting and graph databases and highlight innovative applications that Neo4j customers have developed.

The Finance Industry Business Ontology (FIBO)

FIBO is a collaborative effort to define and evolve a set of standard terms for investment instruments, business entities, market data, legal obligations and corporate actions affecting global financial markets.

Authored by The Enterprise Data Management Council (EDM), the FIBO ontology has given rise to a series of technical standards governed by the Object Management Group (OMG).

The Benefits of FIBO Terminology

FIBO’s great strength stems from its ability to clearly and completely describe the entities, instruments and relationships involved in financial transactions. Such clarity enables financial organizations worldwide to:

Align data elements across multiple data repositories and silos to achieve data consistency
Trace the lineage of investment data back to its original, authoritative sources
Use a standard, common language for communications between financial houses, and between business and technical audiences
Build rigorous, robust solutions for financial reporting and compliance

FIBO Is Best Represented in Graphs

FIBO is a conceptual model that is best represented as a graph ontology, as depicted by the visual below.

Learn about the FIBO conceptual model, represented here as graph ontology.

Keeping Pace with Industry Change

The flexibility of FIBO and graph technology makes them excellent standards for the ever-changing financial industry. Their extensibility allows them to adapt as financial markets, technology, industry players and regulations evolve. As the FIBO standard spreads across the industry, the pressure on banks to express risk metadata as a graph of standard language is mounting rapidly.

Neo4j Is FIBO-Ready

The FIBO ontology is now available as add-on to the Neo4j platform, so you can create an enterprise canonical data model that:

Uses the same infrastructure to store risk data lineage as well as governance metadata such as definitions, related terms, etc.
Provides a unified access layer that spans data silos

As a result, by using Neo4j, you can integrate data governance, compliance reporting and real-time data movement into a single solution that guarantees data consistency across operational and regulatory systems.

Handling All Aspects of Compliance

Bank compliance with risk regulations spans business, financial and legal domains. The Legal Knowledge Interchange Format (LKIF) establishes an ontology and information exchange rules that support SEC regulations, forms, submissions and responses. The Financial Regulation Ontology (FRO) is an open-source ontology based on FIBO and LKIF, and consolidates regulations for banking, insurance, funds and hedge funds.

Discover how regulatory bodies worldwide are evolving financial regulation ontologies.

Early adopters are utilizing the clarity and flexibility of graph modeling to create an enterprise platform for visualizing, analyzing, reporting and governing financial risk.

Real-World Risk Reporting Solutions

Neo4j customers have developed innovative graph solutions for addressing the data lineage and metadata management challenges of financial risk reporting.

Online Data Distribution and Knowledge Base Platform

A leading Global 500 financial services firm needed a data distribution platform that:

Included a knowledge base that described the lineage of datasets and attributes to online customers
Could accommodate new data sources, datasets, consumers and rules easily

The knowledge base needed to be able to easily answer numerous questions such as:

What datasets and attributes do we provide?
How are the datasets related?
Which consumers are using which attributes?
How are users receiving our data?

The firm chose to store the metadata model for the knowledge base in Neo4j instead of Oracle. Neo4j’s flexible schema enabled the firm to model all its data flows and rapidly answer questions about how and where its data is used.

Given the success realized with Neo4j, the firm plans on widening its coverage of datasets and offering the solution to other parts of the bank.

Real-Time Risk Assessment of Breaking News

Another leading global financial services firm is using Neo4j to make connections between news events and stock market volatility. Their system predicts potential impacts of news events on stock prices and associated portfolio exposure.

Designers of the system sought to answer questions like how a mine explosion that spikes copper prices might affect the stock price of Apple, a heavy user of the metal. The application sends real-time alerts to the phones of money and risk managers who hold Apple shares.

“These are people who don’t want to be caught off guard,” says a senior vice president at the bank’s innovation center. The unit contributes a few percent of the bank’s revenue. But bank executives expect that revenue contribution to grow and that the new services can add value even when the bank doesn’t charge for them.

“It is my hope that as we digitize the bank we will be more valuable to our customers and we will gain market share,” the CEO said.

BCBS 239 and Graph Technology Are Game Changers

The risk-reporting mandates of BCBS 239 place new demands on data architectures at banks and financial houses worldwide. The need for fast access to real-time data lineage and financial risk information has given organizations solid justification for revisiting the old, relational reporting systems they’ve struggled with for years.

Forward-looking banks are using BCBS 239 as justification to proactively build federated databases that add centralized metadata control over operational data in existing silos and sources.

The new risk-reporting platforms combine graph database technology and financial industry ontologies to address crucial data lineage, latency, visualization and reporting challenges presented by BCBS 239 requirements.

As a result of their proactive efforts, innovative banks that adopt graph-based, financial risk reporting distance themselves from competition as they:

Analyze financial risk faster and with more accuracy
Use financial ontologies to simplify communications and speed development
Build risk reporting applications that bring them into regulatory compliance

A Foundation for Innovative Applications

By using Neo4j to address BCBS 239 financial risk reporting requirements, you build a connected data foundation that supports a world of innovative uses of your enterprise data – including 360-degree visibility of your customers, detecting and preventing fraud, proactively assessing credit applications, and driving what-if scenarios that improve productivity and profitability.

This concludes our series on financial risk reporting. We hope these blogs inspire you to explore how you can make use of connected data to drive game-changing results.

Catch up with the rest of the financial risk reporting blog series:

The Connected Nature of Financial Risk

Building a Risk Metadata Foundation

↧

Graph Databases for Beginners: Why a Database Query Language Matters (More Than You Think)

August 15, 2018, 12:00 am

≫ Next: Neo4j Community Mavens: Because Being a Supernode Isn’t Always a Bad Thing

≪ Previous: Financial Risk Reporting: Graph Technology Is a Game Changer

Learn why your database query language matters because of its (dis-)connection to your data model

Languages (the natural, human kind) shape how you view the world.

From color to time to gender relations, there’s no escaping how language limits (or expands) your worldview. Words are the categories and labels that we use to process and understand reality – and then to communicate that understanding to others.

So when it comes to analyzing and describing data (a subset of reality), language matters.

Just like their natural counterparts, technical languages shape how you understand and process your data. If a given programming language or query language doesn’t have a label or category or approach to a given data problem, you’ll think about the challenge differently (and subsequently how your application will process it).

Finding the best database for your application or development stack is about more than just features, scalability and performance. While all of those are essential, there’s another backend element that too many architects overlook: the database query language itself.

In this Graph Databases for Beginners blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graph technology is the future, why connected data matters, the basics of data modeling and how to avoid the most common (and fatal) data modeling mistakes.

This week, we’ll discuss why a database query language matters – even (especially?) if you’re not a developer.

Why We Need Database Query Languages

Up to this point in our beginner’s series, all of our database models have been in the form of diagrams like the one below.

A graph technology data model of a social network

Graph diagrams like this one are perfect for describing a graph database outside of a technology context. However, when it comes to actually using a database, every developer, architect and business stakeholder needs a concrete mechanism for creating, manipulating and querying data. That is, we need a query language.

To use a natural language example, this is the difference between drawing a map (i.e., the process of data modeling) versus asking for turn-by-turn directions, communicating those directions to the driver, pointing out that purple cow on the side of the road, and telling the driver to slam on the brakes before he or she hits aforementioned purple cow (i.e., the capabilities of a query language).

Most relational databases (RDBMS) use a variant of SQL (Structured Query Language), making SQL the de facto database query language amongst most data professionals. But for the most part, SQL – the query language used by developers and data architects – is too arcane and esoteric to be understood by business teams.

This meant that a lot of development time was spent translating business requirements into SQL, and then if a particular query wasn’t possible, that problem had to be retranslated back to the business decision makers in a way they’d understand. The result: A lot of wasted time.

But there is a way to eliminate this back and forth translation (and it doesn’t involve teaching your entire business team to be fluent in SQL): Use a language everyone understands.

Just as graph technology has made the data modeling process more understandable for the uninitiated, so has a graph query language made it easier than ever for the common person to understand and create their own queries.

Why Linguistic Efficiency (& Effectiveness) Matters

If you’re not super technical, you might be wondering why the choice of a database query language matters at all. After all, if query languages are anything like natural human languages, then shouldn’t they all be able to ultimately communicate the same point with just a few differences in phrasing?

The answer is both yes and no.

Query Language Efficiency

Let’s first look at small-scale language efficiency with a few natural language examples.

In English, you might say, “I used to enjoy after-dinner conversation” while reminiscing about your childhood. In Spanish, this same phrase is written as, “Disfrutaba sobremesa.” Both languages express the same idea, but one is far more efficient at communicating it.

Similarly, in English you might want to express, “I love my younger sister as well as my grandmother on my father’s side” (14 words, 70 characters). But in Mandarin Chinese, you could just say, “我爱我的妹妹和奶奶” (six words, nine characters).

When it comes to a database query language, the linguistics of efficiency are similar. A single query in SQL can be many lines longer than the same query in a graph database query language like Cypher. Don’t just take my word for it: Make sure you click that link above and explore the example – it’s just too long to wholly repeat here. (Seriously, I will wait.)

Another aspect of language efficiency to consider: Lengthy queries not only take more time to run, but they are also more likely to include human coding mistakes because of their complexity. Plus, shorter queries increase the ease of understanding and maintenance across your team of developers.

For example, imagine if a new developer had to pick through a long, complicated SQL query and try to figure out the intent of the original developer – trouble would certainly ensue.

But what level of efficiency gains are we talking about between SQL queries and graph queries? How much more efficient is one versus another? The answer: Fast enough to make a significant difference to your business.

The efficiency of graph technology queries means they run in real time, and in an economy that runs at the speed of a single tweet, that’s a bottom-line difference you can’t afford to ignore.

Query Language Effectiveness

Disclaimer: I totally stole this from Ravi Pappu‘s talk at GraphTour DC. (Unfortunately, we weren’t given permission to post the video recording.)

In Eurasia a good while back, humanity had two primary ways of counting: using an abacus or using Hindu-Arabic numeral system (like 1, 2, 3, 4, 5, and so on). In terms of counting and arithmetic, both methods were about equal in terms of their efficiency.

But there’s a reason that we aren’t still using abaci today: Arabic numerals could do more than just count up things. They could be used for so much more.

From algebra to accounting, the Arabic numeral system was far more effective because it could be used to accomplish a far broader set of functions. It was like another language: allowing everyone to process and understand reality in a fundamentally different way.

Abaci were efficient at one particular task (counting), but you couldn’t do algebra with them (or, at least, it would be really time consuming if you tried). The abacus isn’t in the dustbin of history because it wasn’t good at its job (it was), but because it only did one job when the world needed more.

The Intimate Relationship between Data Modeling and Querying

Before diving into the mechanics of a graph database query language below, it’s worth noting that a query language isn’t just about asking (a.k.a. querying) the database for a particular set of results; it’s also about modeling that data in the first place.

We know from previous posts that data modeling for a graph database is as easy as connecting circles and lines on a whiteboard. What you sketch on the whiteboard – including the relationships – is what you store in the database.

On its own, this ease of modeling has many business benefits, the most obvious of which is that you can understand what the hell your database developers are actually creating. But there’s more to it: An intuitive model shaped with the right query language ensures there’s no mismatch between how you built the data and how you analyze it.

A query language represents its model closely. That’s why SQL is all about tables and JOINs while Cypher is about pattern matching relationships between entities. As much as the graph model is more natural to work with, so is Cypher as it borrows from the pictorial representation of circles connected with arrows which even a child understands.

In a relational database, the data modeling process is so far abstracted from actual day-to-day SQL queries that there’s a major disparity between analysis and implementation. In other words, the process of building a relational database model isn’t fit for asking (and answering) questions efficiently from that same model.

And a model mismatch means mental mismatch means wasted time and energy.

Graph database models, on the other hand, not only communicate how your data is related, but they also help you clearly communicate the kinds of questions you want to ask of your data model. Graph models and graph queries are just two sides of the same coin.

The right database query language helps us traverse both sides.

An Introduction to Cypher, the Graph Database Query Language

It’s time to dive into specifics. Most relational databases use a dialect of SQL as their query language, and while the graph database world has a few query languages to choose from, a growing number of vendors and technologies have adopted Cypher as their graph database query language (including Neo4j).

This introduction isn’t a reference document for Cypher but merely a high-level overview.

Cypher is designed to be easily read and understood by developers, database professionals and business stakeholders alike. It’s easy to use because it matches the way we intuitively describe graphs (i.e., the way we intuitively describe data) using whiteboard-like diagrams.

The basic notion of Cypher is that it allows you to ask the database to find data that matches a specific pattern. Colloquially, we might ask the database to “find things like this,” and the way we describe what “things like this” look like is to draw them using ASCII art.

Consider the simple pattern in the figure below.

This graph diagram describes three mutual friends.

If we want to express the pattern of this basic graph in Cypher, we would write:

(emil)<-[:KNOWS]-(jim)-[:KNOWS]->(johan)-[:KNOWS]->(emil)

This Cypher statement describes a path which forms a triangle that connects an node we call jim to the two nodes we call johan and emil, and which also connects the johan node to the emil node. As you can see, Cypher naturally follows the way we draw graphs on the whiteboard.

Now, while this Cypher pattern describes a simple graph structure it doesn’t yet refer to any particular data in the database. To bind the pattern to specific nodes and relationships in an existing dataset we first need to specify some property values and node labels that help locate the relevant elements in the dataset.

Here’s our more fleshed-out query:

(emil:Person {name:'Emil'})
     <-[:KNOWS]-(jim:Person {name:'Jim'})
     -[:KNOWS]->(johan:Person {name:'Johan'})
     -[:KNOWS]->(emil)

Here we’ve bound each node to its identifier using its name property and Person label. The emil identifier, for example, is bound to a node in the dataset with a label Person and a name property whose value is Emil. Anchoring parts of the pattern to real data in this way is normal Cypher practice.

The Beginner’s Guide to Cypher Clauses

(Disclaimer: This section is still for beginners, but it’s definitely developer-oriented. If you’re just curious about database query languages in general, skip to the “Other Graph Query Languages” section below for a nice wrap-up.)

Like most query languages, Cypher is composed of clauses.

The simplest queries consist of a MATCH clause followed by a RETURN clause. Here’s an example of a Cypher query that uses these three clauses to find the mutual friends of a user named Jim:

MATCH (a:Person {name:'Jim'})-[:KNOWS]->(b)-[:KNOWS]->(c), 
      (a)-[:KNOWS]->(c) 
RETURN b, c

Let’s look at each clause in further detail:

MATCH

The MATCH clause is at the heart of most Cypher queries.

Using ASCII characters to represent nodes and relationships, we draw the data we’re interested in. We draw nodes with parentheses, just like in these examples from the query above:

(a:Person {name:'Jim'})
(b)
(c)
(a)

We draw relationships using using pairs of dashes with greater-than or less-than signs (--> and <--) where the < and > signs indicate relationship direction. Between the dashes, relationship names are enclosed by square brackets and prefixed by a colon, like in this example from the query above:

-[:KNOWS]->

Node labels are also prefixed by a colon. As you see in the first node of the query, Person is the applicable label.

(a:Person … )

Node (and relationship) property key-value pairs are then specified within curly braces, like in this example:

( … {name:'Jim'})

In our original example query, we’re looking for a node labeled Person with a name property whose value is Jim. The return value from this lookup is bound to the identifier a. This identifier allows us to refer to the node that represents Jim throughout the rest of the query.

It’s worth noting that this pattern (a)-[:KNOWS]->(b)-[:KNOWS]->(c), (a)-[:KNOWS]->(c) could, in theory, occur many times throughout our graph data, especially in a large user set.

To confine the query, we need to anchor some part of it to one or more places in the graph. In specifying that we’re looking for a node labeled Person whose name property value is Jim, we’ve bound the pattern to a specific node in the graph — the node representing Jim.

Cypher then matches the remainder of the pattern to the graph immediately surrounding this anchor point based on the provided information on relationships and neighboring nodes. As it does so, it discovers nodes to bind to the other identifiers. While a will always be anchored to Jim, b and c will be bound to a sequence of nodes as the query executes.

RETURN

This clause specifies which expressions, relationships and properties in the matched data should be returned to the client. In our example query, we’re interested in returning the nodes bound to the b and c identifiers.

Other Cypher Clauses

Other clauses you use in a Cypher query include:

WHERE
     Provides criteria for filtering pattern matching results.

CREATE and CREATE UNIQUE
     Create nodes and relationships.

MERGE
     Ensures that the supplied pattern exists in the graph, either by reusing existing nodes and relationships that match the supplied predicates, or by creating new nodes and relationships.

DELETE/REMOVE
     Removes nodes, relationships and properties.

SET
     Sets property values and labels.

ORDER BY
     Sorts results as part of a RETURN.

SKIP LIMIT
     Skip results at the top and limit the number of results FOREACH
     Performs an updating action for each element in a list.

UNION
     Merges results from two or more queries.

WITH
     Chains subsequent query parts and forwards results from one to the next. Similar to piping commands in Unix.

If these clauses look familiar – especially if you’re a SQL developer – that’s great! Cypher is intended to be easy-to-learn for SQL veterans while also being easy for beginners. At the same time, Cypher is different enough to emphasize that we’re dealing with graphs, not relational sets.

In addition, Cypher borrows the idea of pattern matching from SPARQL, and some of the collection semantics have been borrowed from languages such as Haskell and Python.

(Click here for the most up-to-date Cypher Refcard to take a deeper dive into the Cypher query language.)

Other Graph Query Languages

Cypher isn’t the only graph database query language (though it’s certainly the dominant one); other graph technologies have their own means of querying data as well. Some support the RDF query language SPARQL (linked above), or the imperative, path-based query language Gremlin.

At the time of this writing, there is also an industry-wide effort to standardize around a single, vendor-neutral graph query language known as GQL.

Conclusion

Not everyone gets hands-on with their database query language on the day-to-day level; however, your down-in-the-weeds development team needs a practical way of modeling and querying data, especially if they’re tackling a graph-based problem.

If your team comes from an SQL background, a query language like Cypher will be easy to learn and even easier to execute. And when it comes to your mission-critical application, you’ll be glad that the language underpinning it all is build for speed and efficiency across connected data.

At this point, it’s worth reflecting. Take a closer look at your data and ask yourself: How would I solve my data challenges differently if my entire approach – vocabulary, syntax, semantics, conceptual model – was distinctly matched to the nature of the challenge?

Don’t be afraid to explore those implications.

Take your first step:
Click below to get your free copy of the O’Reilly Graph Databases book and explore the potential of your connected data.

Download My Free Copy

Catch up with the rest of the Graph Databases for Beginners series:

Why Graph Technology Is the Future

Why Connected Data Matters

The Basics of Data Modeling

Data Modeling Pitfalls to Avoid

Imperative vs. Declarative Query Languages: What’s the Difference?

Graph Theory & Predictive Modeling

Graph Search Algorithm Basics

Why We Need NoSQL Databases

ACID vs. BASE Explained

A Tour of Aggregate Stores

Other Graph Data Technologies

Native vs. Non-Native Graph Technology

↧

Neo4j Community Mavens: Because Being a Supernode Isn’t Always a Bad Thing

August 16, 2018, 12:00 am

≫ Next: Which GraphConnect Training Should You Take? [Quiz]

≪ Previous: Graph Databases for Beginners: Why a Database Query Language Matters (More Than You Think)

Learn about the new Neo4j Community Maven program and how you can become a local supernode (in a good way)

The Neo4j community is full of different kinds of people: people who help others with technical questions, those who write blogs to help educate others about graph databases, those who contribute to open source projects and more.

Our community also has those special individuals who yearn to connect to their community by sharing knowledge and ideas, learning from others, and paving the way for a graph-thinking epidemic.

Their enthusiasm is infectious and they are energized by connecting others within their community. They are motivated by gaining knowledge and have a constant desire to know more. A big differentiating factor of these people and others that may just absorb knowledge is that they love to share their knowledge with others and always want to find ways to help.

In turn, those connection lovin’ individuals are seen by the community as thought leaders and connectors. They are the ones that spin the gears to make community engagement happen.

We’re looking for those community-driven supernodes.

Introducing the Neo4j Community Maven Program

We recently created a program to support those people, the Neo4j Community Maven program.

Wait, what’s a maven? If you’ve ever had the opportunity to read Malcolm Gladwell’s book The Tipping Point, you may already be familiar with the term “maven.”

In Gladwell’s model, mavens are one of the three causes of epidemics (or the three “agents of change”). Mavens are “information specialists,” or “people we rely upon to connect us with new information.” They accumulate knowledge, especially about the marketplace and know how to share it with others.

“A maven is someone who wants to solve other people’s problems, generally by solving his [or her] own.”

According to Gladwell, mavens start “word-of-mouth epidemics” due to their knowledge, social skills and ability to communicate.

What Being a Maven Entails…

So, what do we expect from our Neo4j Community Mavens?

Well, for one, good intentions – we want them to want to connect and help people in the graph technology community. And two, for them to be the go-to node in their local (geographic) community. We plan to help support them in their growth of being informational resource in their local community.

With this program, we hope to encourage and support more individuals to take an active role in engaging with their local Neo4j community by bringing people together, sharing information and advocating for the graph-thinking paradigm.

…and What Mavens Get in Return

Great things happen when you grow and guide a community you believe in. At the very minimum, it leads to personal and professional growth, enrichment, learning and friendships! Aside from the studies showing being involved in community is good for the mind and body, it also offers you the opportunity to learn from others and grow professionally.

When you’re leading a community, people begin to recognize you as a community thought-leader, which has potential to bring new professional opportunities. It also just feels really good to develop strong connections with like-minded people and accomplish the creation of something awesome around your passions.

The best part is, you’re not doing it alone… you’re joining something much bigger. There are other Neo4j Community Mavens all over the world to connect with! This program positions you as the pivotal node (a la betweenness centrality graph algorithm) between communities across the world.

Become a Neo4j Community Maven

Interested in being that pivotal supernode (the good kind) in your city or region?

I encourage you to learn more about the Neo4j Community Maven program and if you’re interested, go ahead and apply here.

Have other ideas on how you want to be involved? Shoot me an email, I’d love to hear from you! karin@neo4j.com.

Cheers,

Karin Wolok

Want to take your Neo4j skills up a notch?
Take our online training class, Neo4j in Production, and learn how to deploy the #1 platform for connected data like a pro.

Take the Class

↧

Which GraphConnect Training Should You Take? [Quiz]

August 17, 2018, 12:00 am

≫ Next: Powering Recommendations with a Graph Database: Connect Buyer and Product Data

≪ Previous: Neo4j Community Mavens: Because Being a Supernode Isn’t Always a Bad Thing

Take this six-question quiz to determine which Neo4j training you should take at GraphConnect 2018

It’s that time of year again.

In less than five weeks, Emil will be getting on stage at GraphConnect 2018 in the heart to Times Square, NYC and announcing the new release of….wow, I think we’re getting a little ahead of ourselves, aren’t we?

Let’s refocus: You know you’re going to attend GraphConnect, but you haven’t bought your tickets yet. No time like the present, right?

So you head over to GraphConnect.com to get yours…but wait. As your mouse (or finger) hovers over that training option you suddenly realize:

…but which Neo4j training should you sign up for? There’s like thirteen to choose from (more than ever before).

It’s time to make that decision a little easier.

Find the Neo4j Training That’s Right for You

Luckily for you, we’ve put together an awesome, six-question quiz to make your choice clear. Click below to get started!

No quiz is perfect, but we hope you’ve found the droids…err, the Neo4j training…that you’re looking for. Click here to take the quiz again or tweet me with your angry comments!

Neo4j Training Classes Offered at GraphConnect 2018

This year, we’ve changed up our entire training selection with mostly half-day courses, allowing you to take two training classes in one day (in most cases).

Here’s this year’s line-up, organized by role:

For beginners:

For data scientists and BI analysts:

For architects, DBAs and data modelers:

For developers:

No matter what Neo4j training you choose, we wish you the best of luck during your training session and we hope you enjoy all of the great speakers (among other reasons) at GraphConnect 2018!

See You Soon!

Of course, if you have more questions about Neo4j training at GraphConnect 2018, you can always reach out to the friendly team at graphconnect@neo4j.com. They’ll help you sort out any questions, concerns or last-minute details that require our attention.

What are you waiting for?
Click below to register for GraphConnect 2018 on September 20-21, 2018 in Times Square, New York City – and connect with leading graph experts from around the globe.

Sign Me Up

↧

Powering Recommendations with a Graph Database: Connect Buyer and Product Data

August 20, 2018, 12:00 am

≫ Next: Imperative vs. Declarative Query Languages: What’s the Difference?

≪ Previous: Which GraphConnect Training Should You Take? [Quiz]

Effective recommendations increase revenue and drive up average order value. But delivering highly relevant, real-time recommendations requires as much context as possible. Connecting the user to the perfect recommendation is an art.

In this series, we explore using recommendations to support use cases from online commerce to logistics. This week, we’ll explain why a graph database is a natural fit for delivering relevant, real-time recommendations.

“You May Also Like…”

Product recommendations help businesses maximize their online revenue.

It requires advanced technology, but this is now available off the shelf and is already being used by Walmart and other market leaders.

“You may also like” is a deceptively simple phrase that encapsulates a new era in customer relationship management. In offering tailored suggestions, businesses maximize the value they deliver by providing highly targeted, real-time product recommendations to their online consumers.

The ability to make compelling offers requires a new generation of technology. That technology must capture the customer’s buying history and also instantly analyze their current choices, before immediately matching them to the most appropriate product recommendations. And all of this analysis must be done in real time before the customer moves to a competitor’s website.

The key technology in enabling real-time recommendations is the graph database, a technology that is quickly leaving traditional relational databases (RDBMS) behind. Graph databases easily outperform relational and other NoSQL data products for connecting masses of buyer and product data (and connected data, in general) to gain insight into customer needs and product trends.

Significantly, graph databases are a core technology platform of internet giants like Google, Facebook and LinkedIn. But while those pioneers had to build their own in-house data stores from scratch, off-the-shelf graph databases – especially Neo4j – are now available to any business wanting to make the most of real-time recommendations.

“We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require 10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.” – Volker Pacher, Senior Developer, eBay

The proﬁt and productivity improvements graph databases offer over relational systems are astounding.

Graph Databases: An Uncontroversial Choice

eBay is not alone in selecting an off-the-shelf graph database like Neo4j as a core platform for business-critical systems.

Neo4j is the world’s most popular graph database, according to database monitoring site DB-Engines. Graph databases are growing in popularity faster than any other type of database – by around 250% since last year alone. The DB-Engines authors noted excitedly that “graph databases are grabbing an ever-larger slice of developers’ attention. If you haven’t used them yet, perhaps it’s time to have a closer look.”

The key to understanding graph database systems is that they give equal prominence to storing both the data (customers, products, categories) and the relationships between them (who bought what, who “likes” whom, which purchase happened ﬁrst). In a graph database, you don’t have to live with the semantically poor data model and expensive, unpredictable JOINs from the relational database world.

Instead, graph databases support many named, directed relationships between entities (or nodes) that give a rich semantic context for the data.

And queries are super-fast since there is no JOIN penalty. This makes graph databases especially suited to formulating recommendations, because making the best recommendations – and maximizing value – involve more than simply offering products because they are best sellers.

Best sellers can be a successful part of a recommendation, but they are one which – by their nature – are an aggregate picture of all customers. Nowadays, customers expect ﬁnely-tuned recommendations in the longtail and they react poorly to one-size-ﬁts-all suggestions.

Real-time recommendations require the ability to understand the customer’s past purchases, quickly query this data and match the customer to the people who are the closest match to them both in their social network and in buying patterns.

To make real-time recommendations also requires the ability to instantly capture any new interests shown in the customers’ current visit. Matching historical and session data like this is trivial for Neo4j.

Conclusion

Today’s users expect highly personalized recommendations. Delivering smart recommendations drives engagement – and gives you even more input to improve your recommendations and continue to engage your users.

The only way to craft truly personalized promotions or product recommendations – that consider past buying history and current session data – is to use graph technology to power your recommendation engine.

Next week, we’ll delve into the business benefits of using graph technology and explore why leaders like Walmart and eBay have chosen Neo4j for real-time recommendations.

Deliver real-time relevance:
See how leading companies are using Neo4j drive personalization at scale with this white paper, Powering Recommendations with a Graph Database. Click below to get your free copy.

Read the White Paper

↧

Imperative vs. Declarative Query Languages: What’s the Difference?

August 21, 2018, 12:00 am

≫ Next: eBay ShopBot: Graph-Powered Conversational Commerce

≪ Previous: Powering Recommendations with a Graph Database: Connect Buyer and Product Data

Explore the various trade-offs and differences between imperative and declarative query languages

The evolution of programming languages has grown in parallel with the evolution of computing itself, with new languages being created with each new advance and paradigm shift.

After all, if language shapes how we view reality, then changes in computing (i.e., thinking) will necessitate a corresponding shift in language. Databases are no exception.

The two main paradigms of database query languages are imperative and declarative. Understanding the difference between these two approaches is essential if you’re going to be successful in database development.

In this Graph Databases for Beginners blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graph technology is the future, why connected data matters, the basics (and pitfalls) of data modeling and why a database query language matters in the first place.

In this blog post, we will discuss the trade-offs and differences between imperative and declarative query languages. Selecting which type of query language to use depends upon your specific situation.

Imperative Query Languages: Definition & Example

If the query languages were human archetypes, imperative languages would be the micromanaging boss who gives instructions down to the final detail.

In the most basic sense, imperative query languages are used to describe how you want something done specifically. This is accomplished with explicit control in a detailed, step-by step manner; the sequence and wording of each line of code plays a critical role.

Some well-known general imperative programming languages include Python, C and Java.

In the world of graph database technology, there aren’t any purely imperative query languages. However, both Gremlin and the Java API (for Neo4j) include imperative features. These two options provide you with more detailed power over the execution of their task. If written correctly, there are no surprises – you will get exactly what you want done.

However, imperative database query languages can also be limiting and not very user-friendly, requiring an extensive knowledge of the language and deep technical understanding of physical implementation details prior to usage. Writing one part incorrectly creates faulty outcomes.

As a result, imperative languages are more prone to human error. Additionally, users must double-check the environment before and after the query and be prepared to deal with any potential erroneous scenarios.

To better illustrate the differences, imagine you have two children: Isabel and Duncan. Isabel represents an imperative query language and Duncan the declarative query language.

To get the two children to make their beds, you take differing approaches. For Duncan, it is easy. Simply instruct Duncan to make his bed and he will do it however he sees fit. Yet, he might make it differently than you had in mind, especially if you’re a picky parent.

Isabel requires an entirely different process. You must first inform her that she needs both sheets and blankets to make her bed, and that those materials are found on top of her bed. Then she requires step-by-step instructions, such as “spread the sheet over the mattress” and “tuck in the edges.”

The final result will be closely similar to Duncan’s (or perhaps, exactly the same). At the end of the process, both children have their beds made.

Declarative Query Languages: Definition and Examples

All too often, declarative query languages are often defined as being any database query language that is not imperative. However, to define it in such a manner is too broad.

Declarative query languages let users express what data to retrieve, letting the engine underneath take care of seamlessly retrieving it. They function in a more general manner and involve giving broad instructions about what task is to be completed, rather than the specifics on how to complete it. They deal with the results rather than the process, focusing less on the finer details of each task.

Some well-known general declarative programming languages include Ruby, R and Haskell. SQL (Structured Query Language) is a declarative query language and is the industry standard for relational databases.

In the graph technology ecosystem, several query languages are considered declarative: Cypher, SPARQL and Gremlin (which also includes some imperative features, as mentioned above).

Using a declarative database query language may also result in better code than what can be created manually, and it is usually easier to understand the purpose of the code written in a declarative language. Declarative query languages are also easier to use as they simply focus on what must be retrieved and do so quickly.

However, declarative languages have their own trade-offs. Users have little to no control over how inputs are dealt with; if there is a bug in the language, the user will have to rely on the providers of the language to fix the problem. Likewise, if the user wants to use a function that the query language doesn’t support, they are often at a loss.

In the previous example of the children, Duncan was able to complete his task in a method faster and easier for his parent than Isabel. However, imagine now that you want to them to wash the dishes.

It is the same process for Isabel: You’d need to walk through each step with her so she can learn how the process works.

For Duncan, however, we have hit a snag. Duncan has never learned how to wash the dishes. You will stay in that impasse with Duncan unless his programming engineers decide to teach him how to wash the dishes. (Duncan isn’t like most children.)

Conclusion

This post is not meant to pit the two types of database query languages against each other; it is meant to highlight the basic pros and cons to consider before deciding which query language to use for your project or application.

You should select the best query language paradigm for your specific use case. Neither paradigm is better than the other; they each have different strengths for software development.

If your project requires finer accuracy and control, imperative query languages do the job well. If the speed and productivity of the development process matter more, declarative languages offer the flexibility of getting results without as much effort. Ultimately, the choice depends on you.

Are you new to the world of graph technology?
Click below to get your free copy of the O’Reilly Graph Databases book and discover how to use graph databases for your application or project.

Get the Book

Catch up with the rest of the Graph Databases for Beginners series:

Why Graph Technology Is the Future

Why Connected Data Matters

The Basics of Data Modeling

Data Modeling Pitfalls to Avoid

Why a Database Query Language Matters (More Than You Think)

Graph Theory & Predictive Modeling

Graph Search Algorithm Basics

Why We Need NoSQL Databases

ACID vs. BASE Explained

A Tour of Aggregate Stores

Other Graph Data Technologies

Native vs. Non-Native Graph Technology

↧

eBay ShopBot: Graph-Powered Conversational Commerce

August 22, 2018, 12:00 am

≫ Next: 11 Must-See Speakers at GraphConnect 2018 in New York City

≪ Previous: Imperative vs. Declarative Query Languages: What’s the Difference?

Get a glance of eBay's ShopKnowledge system architecture.

Editor’s Note: This presentation was given by Ajinkya Kale and Anuj Vatsa at GraphConnect New York in October 2017.

Presentation Summary

eBay ShopBot is a chatbot powered by knowledge graphs that supports conversational commerce. In creating the system, the eBay team wondered how they could determine the next question that the chatbot should ask the user.

Natural Language Understanding (NLU) is used to break down queries into their component parts. Learning from one type of query can be captured and transferred to other contexts to further enrich the knowledge graph. They believe that graphs are the future for AI. For eBay, Neo4j is more than a database; it also powers machine learning on the knowledge graph.

eBay uses a Dockerized system on top of Google Cloud Platform to deploy it. eBay originally ran two services in one container: a monolithic Scala service and Neo4j. The team decided to migrate to microservices, adhering to Docker best practices.

One challenge was the size of the data model. By leveraging an alpha feature of Kubernetes called StatefulSet, eBay was able to scale ShopBot to support increased traffic in one locale (eBay US) as well as rolling it out to additional locales (eBay Australia).

Full Presentation: eBay ShopBot: Graph-Powered Conversational Commerce

What we’re going to be talking about today is our use of Neo4j as a backend to the AI technology in eBay’s virtual shopping assistant, ShopBot:

Ajinkya Kale: I lead the research efforts on knowledge graphs in the New Product Development group at eBay. We focus on cutting-edge technology and also on the artificial intelligence side of eBay efforts.

At eBay, we have around 160 million active buyers and about $11 billion in mobile sales:

We have more than a billion live listings. There’s a misconception that eBay is all about used items, but 81% of the items that are sold on eBay are actually new. And we have almost 13 million listings added through mobile every week.

Every country, every site, has its own priorities. In the UK, there’s a makeup product sold every three seconds. Australia has a wedding item purchased every 26 seconds.

Check out eBay velocity stats by region.

eBay ShopBot

eBay ShopBot is a chatbot powered by knowledge graphs. ShopBot is a personal shopping assistant. We built ShopBot as a shopping assistant initially within the Facebook Messenger platform, but it supports multiple platforms now. Because of the way it is built, ShopBot integrates with any chatbot platform.

Learn about the components of eBay's ShopBot feature.

We created ShopBot to bridge the gap between regular search and natural language search. If you enter a query like, “I want a gift for an 11-year-old who likes Iron Man,” most other search engines will fail. It’s a really hard problem to go from a regular search engine to a search engine which powers natural language.

ShopBot supports natural language understanding (NLU).

What Is Conversational Commerce?

Conversational commerce is basically a system where you interact with the agent as you would interact with a salesperson in a shop.

Conversational commerce is interacting with the agent as you would a salesperson in a shop.

If you go to a shop to buy some sneakers, you say, “I’m looking for some shoes,” the salesperson might ask you, “Okay, what are you going to use them for?” And once you tell him what you’re going to use them for, then he might ask you what brand you prefer, then maybe what color you prefer, and what size shoes you wear.

This kind of multi-turn interactivity is really hard to accomplish in a regular search engine.

Figuring Out the Next Question to Ask

In creating the system, we had to ask ourselves, “How do we determine the next best question to ask when a user is trying to search for an item on the platform?” There are a lot of bot frameworks – like Api.ai, wit.ai, and others – but they’re all based on rules engines. You input a rule and the system will start acting the way you’ve input the rules. It’s almost like a bunch of “if-then-else” statements.

eBay offers almost everything under the sun. We have more than 20,000 product categories and more than 150,000 attributes such as brand, the color of the object, and any other attribute about an object that you’re trying to buy. You really cannot use a rules engine to support a huge catalog like eBay, with more than a billion items in inventory.

A rules engine cannot support a huge catalog like eBaY.

We needed a solution that would scale and where we could encode this inherent human brain behavior where when I talk to someone about shoes, the next thing he’ll think about is Nike or Adidas as a brand. For humans, it’s very easy to think about it in that form. It comes inherently with the knowledge that we have been building since childhood.

Rather than just encoding rules, we decided it was more like a probabilistic inference problem where given a particular intent that the user is looking for, you decide based on some probability the next best question to ask.

Building a probabilistic inference graph with Neo4j.

eBay has around 16 million active buyers. Almost every product that could be searched for has been searched for, and almost every attribute that could be used has been used.

We went from eBay’s core user behavior data to form a probabilistic graph so that we can drive conversation on what questions to ask next. And this is where you would see a graph database being used.

There’s a nice paper by Peter Norvig that talks about how data is the key, even in the deep-learning or machine-learning era. You can have amazing algorithms, but without data you cannot do much about it.

The biggest challenge we had was that, although there is a lot of academic research and papers talking about how natural language generation can be driven through machine learning or deep learning, none of them really have productionalized an actual conversational system based on deep learning.

No one has really productionalized a conversational system based on machine learning.

We ended up building an expert system based on a collaborative approach where you use past user behavior data and apply it to the next user in a probabilistic way.

Check out eBay's ShopBot under the hood.

Using Natural Language Understanding to Break Down Queries

This diagram shows the natural language understanding system that we have built:

ShopBot's natural language understanding.

Right now you can go to Facebook Messenger and try ShopBot using queries like this. We take a query like this one:

“My husband needs some new black leather dress shoes, but I want to spend less than $80. What do you have?”

We break the query into multiple components to get a deeper understanding of the user intent. We detected the gender from the query. You cannot just use the gender for the actual shopper; you have to look at the gender in the query. In this case, a female is searching for her husband, so we’ve detected that the gender is male.

The intent here is needs, which means it’s a shopping intent. And then there is item condition; and on eBay, you can buy used items as well as new, and she is clearly looking for a new item. Black leather dress shoes are attributes of the product she’s trying to look for. And there are also some constraints like price, which we have to apply when we filter out the inventory we have.

Here is a simplified example of the graph underneath:

A simplified example of the graph underneath eBay's ShopBot.

If the query is about shoes, we first data-mine what gender is the user looking for.

Women’s shoes have different attributes that should be asked about next compared to men’s shoes. Perhaps women care more about brand than men do. Men may care more about color (which is almost always brown).

Based on those probabilities, we decide how to infer this and basically convert it into a Markov chain where you only need the current context to ask the next question.

Transferring Learning

NLU handles the natural language understanding for the query and then we have a piece that does the ecommerce understanding:

Learn more about eBay's ShopBot transfer learning capabilities.

If you say something like, “I’m looking for eggplant Foamposites,” I would have no idea what you are talking about. But because we have so much past user behavior data, there has to be someone who’s an expert in sneakers who must have clicked on the right item.

We know from that item’s attributes that when someone says, “Eggplant Foamposite,” the product should be in the athletic shoes category, the product brand is Nike, they’re mostly used as basketball shoes, we know what the product release date was, the color is mostly purple and the material is Foamposite.

But there is another really nice and unique thing about this process. From this query, you determine from the graph that eggplant corresponds to purple. The cool thing about this is that you can apply it in cases where you do not have much past user behavior data.

So if someone says, “I’m looking for an eggplant iPhone case,” most of the time the person is not looking for pictures of eggplants on the case. He or she is looking for a purple case.

A probabilistic knowledge graph helped with transfer learning.

From the previous query, where we learned that Nike has tagged all these eggplant Foamposites with the color purple, we associated or learned through our knowledge graph that eggplant is associated with purple, and now we can use it in a situation where we absolutely don’t know about what someone means when they say, “Eggplant iPhone case.”

That type of transfer is very good and is possible only because of the probabilistic knowledge graph that we built.

Architecture Overview of eBay ShopKnowledge

This is an overview of the system architecture:

We have a huge set of data, which includes ecommerce data, as well as world knowledge data, which comes from Wikipedia, Wikidata, and other data sources such as Freebase and DBpedia.

We combine this data using the Apache Airflow scheduler to get the data into a knowledge graph form through Google Cloud and Spark. The data is then eventually modeled as a graph and pushed to Neo4j.

More Than a Database

For us, technically, the graph is not just a database, as it is for most of the users of a graph database. It’s not for BI or analytics either. For us, it’s a store that is used as a cache for a machine-learning knowledge graph.

Once the data goes to the knowledge graph database system, we do graph inferencing on it to power query understanding, entity extraction, price prediction and determination of trends.

In the previous example, we looked at how inferencing works. Once we have the data into the graph structure, we use a Dockerized system on top of Google Cloud Platform to deploy it. The deployment structure and the services will be explained later in this blog.

This is a high level summary of what we have in our current knowledge graph:

A high-level summary of eBay's ShopBot knowledge graph.

We have around half a billion nodes, and now about 20 billion relationships. We have combined ecommerce data with Wikidata to power more world knowledge, such as when the product was released and what other trends people are following. Things keep changing, so you have to add more knowledge into your knowledge graph.

There are also some machine learning aspects. As I mentioned, we don’t use the graph just as our database, but also to store the probabilistic graphical model and use it as a cache for the runtime system.

We have implemented some supervised models on top of it. For example, if you are talking about winter jackets, you don’t want sports jackets because eBay has a winter jacket category. You want to actually make the categorization to get attributes from the winter jacket category. Those are supervised models that also live in the graph.

We use some semi-supervised approaches to label propagation. We have a whole team that curates the graph for trends. Because we have such a huge graph, they cannot curate each of the nodes in the graph and mark the trends. They model a subset of the graph with trends, and then we use label propagation to spread it through the entire graph.

Use Case: What Is It Worth?

One cool use case that we have recently served on Google Assistant, and which you can try, is What Is It Worth? You can say, “I want to talk to eBay,” and then you can either try to find out how much an item that you have is worth or you can check the price for the latest products that are going to be launched.

You can say, “What is the iPhone 8 going to cost?” Or you can say that “I have this old backpack that’s lying around. It’s this particular model. It’s this particular color. What can I get for it?” That’s also served through our knowledge graph.

Neo4j: The End of Data JOINs

Why Neo4j?

When we started we did try some relational datasets, but we soon found out that the more datasets we had, the JOINs kept growing. We said, “Let’s just forget about all these JOINs now. Let’s put everything in a graph. For any new dataset, our pipelines go from anywhere from a week to two weeks to just add a new dataset.” So we never have to worry about JOINing ever again.

Obviously, Neo4j is battle-tested. It has good production support and the tooling system is also good. We could very easily do a lot of experimentation because of the interactive browser and the visualizations that they have. And it’s the only solution that provides graph algorithms on top of a graph database.

eBay ShopBot, why did they choose Neo4j.

Graphs Are the Future for AI

I agree with Emil that graphs are the future for AI.

I had a lot of search experience before I joined the AI team at eBay. What I found is that a search system usually needs an autocomplete system, a query recommendation system and an item recommendation system. All of that can be powered through a graph.

You can think of it as your machine learning model cache, and some other system can do your backend indexing. All your business logic and the creative juices can live in the graph where you actually do the inferencing. And you can push only the part where you have to pull items to the backend indexing store.

Containerizing Neo4j

Anuj Vatsa: Now let’s focus on how we evolved with Neo4j over the last year and a half and share some key takeaways about how we containerized the Neo4j database and how we deployed huge data models into Google Cloud using Kubernetes.

As a part of the New Products Development team, we were doing a lot of prototyping and that’s how we started using Neo4j, because it fits our case very well. And one of the main aspects of this was also deploying to the cloud.

Containerize and scale Neo4j in Google Cloud.

Our tech stack includes Docker. We use Kubernetes and we have polyglot services in the backend. We have some services written in Scala, some in Java and some in Go, so we wanted to make sure that everything works out of the box.

Why Docker and Kubernetes?

So why did we decide to use Docker? Docker gave us a very easy way to build, ship, and run applications through lightweight containers in the cloud. We use Kubernetes because Kubernetes is the best orchestration layer out there which would allow us to scale in or scale out our containers depending on the traffic volume.

In the early days, because we were prototyping, we were using a huge monolithic Python service and we had lots of different modules running as a single Python service. We had a huge Python code base and one of them was the knowledge graph module.

When you think of this from a containers perspective, we had two processes: the Python service and the graph database. Initially, our models were pretty small, so at one point the graph database was actually a localhost to the system.

The problem with the huge Python service was, as many of you might have encountered, we ran into the global interpreter lock issues. That meant we could only serve a couple of requests per second, so we were spawning more pods to serve more users.

We went back and thought we should actually split all the services into individual microservices:

eBay ShopBoy microservice using a graph database.

Once the data science team built the graph models, we used to bake the graph models as base Docker images. We used to put them in the Google Cloud repository, which you can think of as a GitHub for Docker images.

In our service deployments, we used to have this Docker file refer to the current base image that was supposed to be deployed. It used to download the data and spawn the graph database and then we wrote a Scala service on top of the database to serve our APIs and all other use cases.

In this way, graph was still a localhost to the system and we had two processes running: the Scala service and the graph database.

This limited us in a couple of ways. First, if our models grew huge, their deployment would become a bottleneck. Second, pushing and downloading these models was taking a lot of time. We wanted a solution that would work in all these cases.

In this version of the graph, we were still using Cypher queries and we saw some latencies at the rate of a couple of seconds. When we switched over to procedures, we brought down our latencies to less than a hundred milliseconds. Switching to procedures was an improvement of 25x over the baseline.

Following Docker Principles

One of the Docker principles says that you should never run multiple processes within a single container, and that is what we were doing.

We were running a Scala service and Neo4j as a process within our container. We wanted to get away from that. The other issue arose when we need to do a pod restart. We used to go back and download the model and then restart the pod so that would also become a bottleneck (see list of limitations below).

Handling Multi-Terabyte Models

In the meantime, we had requirements coming up where our models were on the order of multiple terabytes. At the same time, we were thinking of launching in different locales.

The first version of ShopBot was only for the U.S., but now we also offer it for the Australian dataset.

One of our requirements was minimal deployment time. We also wanted the switch from the old dataset to the new dataset to be pretty seamless. The end user shouldn’t even know that we have switched over to new data.

That’s how we arrived at the containerized solution. We considered the fact that a lot of folks spawn their own VMs and load balance their services to herd the VMs. We could have done that too but that meant we would be taking away all the goodness from Kubernetes and we’re pushing all that load balancing and scaling of logic onto ourselves.

Even though spawning your own VMs was a tried and tested model, we went with containerizing the Neo4j graph database.

Most of the Kubernetes apps, by default, are stateless. That means they don’t have any storage and the way you provide storage for a Kubernetes app is through persistent volumes. If you want to use a persistent volume and make sure your pod doesn’t copy over the data each time, there is an alpha feature in Kubernetes called StatefulSet.

StatefulSet ensures that when you start a pod, the data that is associated with this pod is constant in all the rollouts. We will copy the data only the first time when we are doing the switch over for the data. If there was a pod restart or a hardware failure, on the consecutive next pod initialization, we wouldn’t have to copy over the data. That meant we were avoiding multiple duplicate copies.

Scaling with Kubernetes

Remember that we used to bake the models into these base Docker images. Instead of baking them into the Docker images, we started baking the models into Google Persistent Disks (PDs). During a deployment, that PD gets copied to the local PD. And this happens only the first time.

Then through a Kubernetes service definition, we could easily route the traffic from our Scala service to the pods, as shown below.

So in the example above, we have two pods, and the Kubernetes service definition knows when the pods are coming up, and it load balances and routes the traffic to each of the pods. If our traffic increases, all we have to do is change the deployment script from 2 to x number of pods, and Kubernetes takes care of scaling out.

The diagram above shows the Scala service talking to Neo4j US. One of the requirements was supporting also multiple locales. With this model, we were also able to work on getting the Scala service to route on different Kubernetes service definitions, depending on the locale. Each of those deployments are individual.

Scala services routes on different Kubernetes service definitions.

If the data science team comes up with new data, and we go through the whole process of creating a new original PD. When we do the switchover, StatefulSets does an ordered termination of pods and ordered initialization of pods.

That ensured if there is any data corruption or problem that we didn’t catch in testing, we can stop that particular pod and it will stop the rollout because StatefulSet ensures that Pod N is up before working on the next one.

We are still continuously evolving. This was pretty new and like I said, some of the things that we used were alpha features. We are still working with the Kubernetes team, and we’re working on ways we can improve.

One of the limitations of this method is that, when you create a pod for the first time, because it’s a multi-terabyte model, your init phase takes a lot of time. It takes us a few hours to get one of the pods up because of the data copy. Hopefully we can get to a stage where our deployment times are not in the magnitude of hours but in a couple of minutes.

Inspired by Dave’s talk? Click below to register for GraphConnect 2018 on September 20-21 in Times Square, New York City – and connect with leading graph experts from around the globe.

Get My Ticket

↧

11 Must-See Speakers at GraphConnect 2018 in New York City

August 24, 2018, 12:00 am

≫ Next: Powering Recommendations with a Graph Database: Proven Business Benefits [+ Case Studies]

≪ Previous: eBay ShopBot: Graph-Powered Conversational Commerce

Discover the must-see speakers at this year's GraphConnect 2018.

There are a lot of great reasons to attend GraphConnect 2018, but one of the best reasons is that every year we feature a fresh, new lineup of the world’s best graph experts sharing their experiences on how graph database technology impacted their business.

This year’s GraphConnect is no exception! While we have countless speakers this year, there are certainly a few that are top of mind when it comes to “must-see” status.

Here are the 11 must-see presenters at GraphConnect 2018 (in no particular order):

Ann Grubbs, Lockheed Martin Space Systems

See Ann Grubbs, Chief Data Engineer for IT at Lockheed Martin Space Systems, speak at GraphConnect 2018.

Lockheed Martin Space Systems, Ann Grubbs is responsible for technical solutions for data governance and data management. Throughout her career she’s been a database developer, object modeler, software developer and project manager. As an early adopter of graph technology and a graph evangelist at Lockheed Martin, she strongly believes in leveraging the right tool for the job. You can even call her Poly (as in Polyglot).

At GraphConnect, Ann will be presenting on Product DNA: Master Data Graph Enabling the Digital Transformation

Brandy Freitas, Pitney Bowes

See Brandy Freitas, Senior Data Scientist at Pitney Bowes, speak at GraphConnect 2018.

Brandy Freitas is a Senior Data Scientist at Pitney Bowes who specializes in machine learning for predictive analytics. Her undergraduate degree is in Physics and Biochemistry. For graduate school, Brandy was a National Science Foundation Graduate Research Fellow in Biophysics at Harvard University. She specialized in single particle cryo-electron microscopy, with a focus on machine learning in automated protein structure determination.

At GraphConnect, Brandy is presenting on Enhancing Machine Learning with Graph Metrics.

David Fox, Adobe

See David Fox, Software Engineer at Adobe, speak at GraphConnect 2018.

David Fox is a Software Engineer at Adobe, with a focus on application development and data engineering. He has 10 years of experience developing high-performance backend systems and working with a large variety of databases alongside massive datasets.

David will be presenting on Harnessing the Power of Neo4j for Overhauling Legacy Systems at GraphConnect.

Dr. Alexander Jarasch, The German Center for Diabetes Research (DZD)

See Dr. Alexander Jarasch of DZD speak at GraphConnect 2018.

As Head of Data and Knowledge Management at DZD, Dr. Alexander Jarasch is responsible for a knowledge graph database and machine learning techniques in diabetes research. With a background in bioinformatics, Dr. Jarasch has a Ph.D. in biochemistry and structural bioinformatics from Gene Center Munich (LMU). As postdoctoral fellow on behalf of Evonik Industries AG, he developed protein engineering software for 3D modeling of enzyme candidates with biotech application. As part of the Global Strategic Team at Roche Diagnostics GmbH, he developed machine learning techniques for predicting more chemically stable antibodies.

Dr. Alexander Jarasch will be presenting on Graphs to Fight Diabetes at GraphConnect.

Seth Dimick, Nordstrom

See Seth Dimick, Data Scientist at Nordstrom, speak at GraphConnect 2018.

As a Data Scientist at Nordstrom, Seth Demick is tasked with creating innovative solutions that leverage data for personalization, automation, optimizations and measurement in fashion retail. He mines from click streams, customer and website data to perform behavioral research informing feature and UX design, measure success and create statistical models powering feature personalization.

At GraphConnect, Seth will be presenting on Graph Recommendations at Nordstrom.

Dr. Tatiana Romina Hartinger, Cognitiva

See Tatiana Romina Hartinger of Cognitiva speak at GraphConnect 2018.

Dr. Tatiana Romina Hartinger is a Solutions Expert at Cognitiva where she implements projects related to artificial intelligence. She has a Ph.D. in Discrete Mathematics where her research focused mainly on graph theory and combinatorics. She completed her Ph.D. at the University of Primorksa in Koper, Slovenia. There she worked as a teaching assistant at the Faculty of Mathematics, Natural Sciences and Information Technologies (FAMNIT) and as a Young Research at Andrej Marušič Institute.

At GraphConnect, Dr. Hartinger is presenting A Conversation with Graphs.

Gary Stewart & Will Bleker, ING

See Gary Stewart of ING speak at GraphConnect 2018.

See Will Bleker of ING speak at GraphConnect 2018.

Gary Stewart is a hands-on platform architect at ING for distributed data. With over 17 years of experience in integration, business process management and RDBMS, in the past few years Gary has submerged himself in the NoSQL world to make ING more resilient and scalable without trading consistency.

Will Bleker is Chapter Lead and Middleware Engineer at ING. He has over 15 years of experience in the platform engineering field. With high-availability systems as his main focus, he has moved from technologies like Solaris Cluster to distributed NoSQL databases and now Neo4j.

Gary and Will are presenting on Being In Control and Staying Agile with Graph Requires Shifting Left at GraphConnect.

Amy Hodler, Neo4j

See Amy Hodler of Neo4j speak at GraphConnect 2018.

Amy Hodler manages the Neo4j graph analytics programs and marketing. She loves seeing how our ecosystem uses graph analytics to reveal structures within real-world networks and infer dynamic behavior. In her career, Amy has consistently helped teams break into new markets at startups and large companies including EDS, Microsoft and Hewlett-Packard (HP). She most recently comes from Cray Inc., where she was the analytics and artificial intelligence market manager. She has a love for science and art, with an extreme fascination for complexity science and graph theory.

At GraphConnect, Amy will be presenting on 6 Ways Graph Technology Is Changing Artificial Intelligence and Machine Learning.

Dr. Alessandro Negro, GraphAware

See Dr. Alessandro Nego, Chief Scientist at GraphAware, speak at GraphConnect 2018.

Dr. Alessandro Negro is the Chief Scientist at GraphAware, where he specializes in recommendation engines, graph-aided search and NLP. He has been a long-time member of the graph community and is the main author of the first-ever recommendation engine based on Neo4j.

Dr. Alessandro is speaking on Graph-Based Natural Language Understanding, Part 1 and Part 2 (both sessions have limited capacity seats).

Pat Patterson, StreamSets

See Pat Patterson of StreamSets speak at GraphConnect 2018.

Pat Patterson has been working with Internet technologies for over two decades, building software and working with developer communities at Sun Microsystems, Salesforce and StreamSets. At Sun, Pat was best known as the community lead for the OpenSSO open source project; as a developer evangelist at Salesforce, he focused on identity, integration and the Internet of Things. As a technical director at StreamSets, Pat has been responsible for building the community around the open source StreamSets Data Collector, speaking at events around the world and educating the big data community on the value of data in motion.

Pat is presenting on Ingesting Data into Neo4j for Master Data Management at GraphConnect.

Dr. Peng Sun, CA Technologies

Dr. Peng Sun, Principal Strategic Research at CA Technologies, specializes in building smart applications leveraging streaming, graph-aided search, NLP and neural networks. He holds a Ph.D. in Planetary Sciences from the University of Arizona, on cosmic rays transport theories leveraging high performance numerical simulations techniques. Dr. Sun enjoys the convenience provided by Neo4j in handling the graph data representation, storage and comprehensive queries.

Dr. Sun is speaking on Accelerating Digital Transformation in CA Technologies with Neo4j at GraphConnect.

Don’t wait til later!
Click below to register for GraphConnect 2018 on September 20-21 and connect with leading graph experts from around the globe.

Get My Ticket

↧

Powering Recommendations with a Graph Database: Proven Business Benefits [+ Case Studies]

August 27, 2018, 12:00 am

≫ Next: Announcing a New Neo4j Community Site & Forum

≪ Previous: 11 Must-See Speakers at GraphConnect 2018 in New York City

Discover how real-time recommendations support a number of different use cases that translate into business value.

Relevant, real-time recommendations drive revenue, but they are challenging to deliver. That’s because good recommendations require bringing together so much data, surfacing the relationships between all that data and delivering just the right suggestion in context and in the moment.

In this series, we discuss how real-time recommendations support a number of different use cases, from product recommendations to logistics. Last week, we explained why many organizations are choosing a graph database for real-time recommendations.

In this final post, we’ll see how using Neo4j for real-time recommendations translates into business value and describe how companies from eBay to Walmart are using real-time recommendations.

Business Benefits of Neo4j

Companies from around the globe have incorporated Neo4j into their data architecture to take advantage of the powerful real-time recommendations the graph database provides. These enterprises have experienced a variety of business benefits as a direct result.

Improved Competitiveness

Neo4j enables new types of business functionality that are often not possible with other technologies, allowing you to make real-time decisions based on connected data.

For example, Walmart uses Neo4j to make real-time product recommendations by using information about what users prefer. Additionally, most of the top dating and online job sites use Neo4j to recommend jobs or dates by incorporating a knowledge of the extended network (friends-of-friends) into the recommendation, again in real time, substantially improving the accuracy of the recommendation.

Reduced Project Time, Competitiveness and Cost

Neo4j cuts the overhead on many types of projects, particularly those involving connected data. Many customers cite the huge acceleration that occurs when a graph model is brought to bear on a connected data problem.

For example, eBay cites that with Neo4j it requires 10-100 times less code than it did with SQL, and Telenor, one of the world’s top telecom providers, uses Neo4j for the authorization system on its business customer portal, improving performance by 1,000 times.

Faster Product Time to Market and Better Performance

Neo4j requires developers to produce less code than RDBMS alternatives. Less code means higher quality and an increased success rate on projects. Neo4j’s performance is dramatically better for connected datasets – often the difference between something being possible and not possible.

eBay cites that “Neo4j allowed us to add functionality that was previously not possible.” Many customers experience improvements on a similar scale, so much so that Neo4j is often described as decreasing query times from “minutes to milliseconds” for connected data queries.

Walmart and Other Leading Adopters [Case Studies]

Market leaders are using Neo4j to serve up real-time recommendations in areas such as retail, industrial spare parts, jobs, movies, entertainment, restaurants and even online dating.

Case Study #1: Walmart

Walmart calls Neo4j “a perfect tool for real-time product recommendations.” The retailer has sales of more than $482 billion and employs 2.3 million associates worldwide, serving more than 260 million customers weekly through its 11,500 stores in 28 countries and ecommerce websites in 11 countries.

“Neo4j helps us to understand our online shoppers’ behavior and the relationship between our customers and products, providing a perfect tool for real-time product recommendations,” said Walmart Software Developer Marcos Wada. “As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands.”

Case Study #2: Movie Recommendation Website

A leading movie recommendation website is revolutionizing the way the ﬁlm industry promotes projects by enabling fans to discover the best upcoming releases before they hit the big screen and make recommendations based on individual taste. In turn, it provides movie studios with insights into the preferences and behavior of ﬁlm fans, enabling them to more effectively target their marketing campaigns.

They considered MySQL for its recommendation system, but after seeing the amount of data required, they looked at other databases and chose Neo4j. Their CTO said: “We wanted to quickly connect audiences to the right movies, and Neo4j just ﬁts our philosophical standpoint. We are very happy that we discovered Neo4j. We increased the speed of generating recommendations and users to movies, which is a core part to our business model.”

Case Study #3: eBay

eBay uses the delivery coordination platform Shutl to make the delivery of online and mobile orders quick and convenient. This eliminates the biggest roadblock between retailers and online shoppers: the option to have your item delivered the same day.

Switching from MySQL to Neo4j allowed the fast-growing service to quickly match orders with couriers with relatively constant performance, and in a data model that allows queries to remain localized to their respective portions of the graph. “We achieved constant query performance by using Neo4j to create a graph that is its own index. That’s awesome development flexibility,” said Volker Pacher, Senior Developer at eBay.

Case Study #4: Wobi

Wobi is a price comparison website for pensions and insurance that uses detailed financial pictures of customers to provide the best “value offers” to users. To achieve such a detailed level of customer understanding, Wobi needed a single customer database where it could rapidly drill-down into each individual’s history and add new information on the fly – which is exactly the model that Neo4j provides.

Neo4j is currently handling half a million customers with an average of eight pensions and insurance policies and products each – a total of 4 million nodes and 30 million relationships. It has the capacity to expand much further. According to Shai Bentin, Chief Technology Officer at Wobi, “It’s not a large database yet – but it will be. And I feel safe with Neo4j.”

Case Study #5: Fortune 200 Company

An international Fortune 200 hospitality company adopted Neo4j to power its real-time pricing recommendation engine after running into significant slow-downs with its prior database architecture. Since working with Neo4j, the company has significantly reduced request processing time as well as the company’s hardware requirements, and performance has improved so substantially that they have seen a 300% growth in the volume of generated price changes.

Conclusion

Neo4j is used by thousands of companies around the world, including more than 50 of the Global 2000 such as eBay, Walmart, Hewlett-Packard and Cisco. These companies have all recognized the value in – and necessity of – finding and leveraging connections between data for a variety of uses that ultimately provide customers with better user experiences.

Whether it’s eBay providing customers with more seamless same-day delivery, Walmart offering accurate real-time product recommendations, or a well-known Fortune 200 hospitality company providing a more powerful price-comparison tool, Neo4j and the real-time recommendations it provides are a consistent driving force of success.

Ready to see what Neo4j can do for your company? Learn how to master the emerging world of graph databases by reading the free ebook O’Reilly’s Graph Databases, have your development team take Neo4j for a spin and explore the variety of available online training options to get up and running with Neo4j in no time.

Catch up with the rest of the real-time recommendation blog series:

Quickly Connect Buyer and Product Data

↧

Announcing a New Neo4j Community Site & Forum

August 30, 2018, 12:00 am

≫ Next: The Graphie Awards: What They Are & How to Win One at GraphConnect 2018

≪ Previous: Powering Recommendations with a Graph Database: Proven Business Benefits [+ Case Studies]

Check out and join the new Neo4j community site page and forum.

We are very excited to announce our brand new Neo4j Community site! We created this place to allow our community to ask and answer technical questions, share and discover open source projects with each other, contribute content and collaborate on ideas.

Technical Help Forum

This is where you are able to ask technical questions (or help others by answering questions!). We have created categories that align with specific technical topics.

Before you ask a question, you are actually able to search first to see if your question has already been answered. When posting a topic, you are able to add tags that help other users find your topic if they encounter the same question.

We ask that you please share all details you have for others to understand and reproduce your question. You’ll notice that each category also has an introductory message with a number of helpful links.

Community Engagement

Tell the community about your projects, share your blogs to help others learn or discuss ideas in your local groups. Don’t have a project to share but are curious about what others are doing? Browsing is encouraged here as well!

Why did we move from Slack?

As our user Slack grew to 8,750 members, it became more and more difficult to answer questions properly and many discussions descended into private messages.

Also, the 10,000 message limit made it impossible to find out if a question was already answered and required the helpful folks to repeat themselves time and again. Others that encountered the same issues came to the same conclusion and moved to a dedicated forum, several also moved to Discourse as we did.

Going forward, we will reduce the number of Slack channels, as well as reduce the technical conversations to foster more casual chats.

Go ahead to the site, create an account and introduce yourself!

For the first five weeks we will be selecting five random users that have signed up and introduced themselves to win a $50 USD coupon for the Neo4j Graph Gear Store. So make sure you’re in the crowd we choose from!

If you have any questions, feedback or ideas you’d like to see as we add features to the site, please share it in the Feedback category!

↧

The Graphie Awards: What They Are & How to Win One at GraphConnect 2018

August 31, 2018, 12:00 am

≫ Next: Powering Recommendations with a Graph Database: A Rapid Retail Example

≪ Previous: Announcing a New Neo4j Community Site & Forum

Learn how your Neo4j project could be recognized with a Graphie Award at GraphConnect 2018

Editor’s Note: This is literally the last day before tickets to GraphConnect 2018 go up in price. If you were holding out (for some reason?), this is your last chance before you have to fork over another Benjamin.

You just got one more reason to attend GraphConnect 2018: We’re giving away awards, and you can get one.

Everything You Need to Know about the Graphies

Officially known as the Neo4j GraphConnect Awards, the “Graphies” recognize excellence in connected data across a number of categories. Graphies are open to Neo4j customers, Neo4j partners, startup program members, investigative/data journalists, Neo4j community members and Neo4j ambassadors.

Winners will be honored on September 20 during the disConnect Party at the end of the GraphConnect conference day in New York City, so stick around to hear who won this year’s awards.

We’ve already got a great pool of nominations, but we don’t want anyone in the Neo4j ecosystem to feel left out of the process or be overlooked (we’re not omniscient…yet). So if you think your graph-powered application or project is award-worthy (remember: multiple categories), then let us know and submit your graph for consideration.

How You Can Get a Graphie

Ready to get your Graphie? To be considered, all potential award winners must complete our nomination form (you can also access this form via GraphConnect.com).

Besides basic organization and contact info, here’s what information you’ll need to provide for the Graphie nomination process:

A high-level overview of your Neo4j-powered project or application
What problem you were solving with Neo4j
Why/how you chose Neo4j
Any results (quantitative or otherwise) you can share about your project
The general number of nodes or relationships in your graph
The largest or longest traversal (number of hops) in your graph
Any other supporting material (visuals, code, etc.) to support your nomination

Intimidated by any of these categories or measures? Don’t be!

There are lots of qualifying categories – we’re not just looking for the biggest graph or largest bottom-line impact – so tell us what makes your graph unique or interesting (like a data journalism investigation, or a machine learning project, or something else entirely!).

Of course, we’re always here to answer your questions about the Graphie Awards or GraphConnect in general. For more information, email us at graphconnect@neo4j.com.

What are you waiting for?
Get your ticket to the world’s leading graph technology conference on September 20-21 and connect with leading graph database experts from around the globe.

Get My Ticket

↧

Powering Recommendations with a Graph Database: A Rapid Retail Example

September 3, 2018, 12:00 am

≫ Next: Graph Databases for Beginners: Graph Theory & Predictive Modeling

≪ Previous: The Graphie Awards: What They Are & How to Win One at GraphConnect 2018

Check out this rapid retail example of Cypher query code for graph data recommendations.

It’s one thing to say that Neo4j streamlines real-time recommendations; it’s another to show you the code so you can see for yourself.

In this series, we discuss how real-time recommendations support a number of different use cases, from product recommendations to logistics. Last week, we explained how and why organizations are using a graph database for real-time recommendations.

In previous posts, we covered how recommendations connect buyer and product data as well as highlighting real-world success stories.

In this final post, we’ll walk through code for a quick retail example so you can see exactly how easy real-time recommendations are using Neo4j.

Rapid Example: A Retail Recommendation Engine Using Neo4j

In a retail scenario (either online or brick-and-mortar), we could store the baskets that customers have purchased in a graph like the one below.

This graph shows how we use a simple linked list of shopping baskets connected by NEXT relationships to create a purchase history for the customer.

Check out this quick example of a real-time recommendation graph.

In the graph above, we see that the customer has visited three times, saved their first purchase for later (the SAVED relationship between customer and basket nodes).

Ultimately, the customer bought one basket (indicated by the BOUGHT relationship between customer and basket node) and is currently assembling a basket, shown by the CURRENT relationship that points to an active basket at the head of the linked list.

It’s important to understand this isn’t a schema or an entity-relationship (ER) diagram but represents actual data for a single customer. A real graph of many such customers would be huge (far too big for examples in a blog) but would exhibit the same kind of structure.

In graph form, it’s easy to figure out the customer’s behavior: They became a (potential) new customer but failed to commit to buying toothpaste and came back one day later and bought toothpaste, bread and butter. Finally, the customer settled on buying bread and butter in their next purchase – which is a repeated pattern in their purchase history we could ultimately use to serve them better.

Now that we have a graph of customers, and the past products they’ve bought, we think about recommendations to influence their future buying behavior.

By far, the simplest recommendation is to show popular products across the store. This is trivial in Cypher as we see in the following query:

Learn a simple Cypher query for graph database recommendations.

The Cypher query above showcases a lot about Cypher.

First, the MATCH clause shows how ASCII-art is used to declare the graph structure (or pattern) that we’re looking for. In this case, it can be read as “customers who bought a basket that had a product in it” except since baskets aren’t particularly important for this query we’ve elided them using the anonymous node ().

Then we RETURN the data that matched the pattern and operate on it with some (familiar looking) aggregate functions. That is, we return the node representing the product(s) and the count of how many product nodes matched, then order by the number of nodes that matched in a descending fashion. We’ve also limited the returns to the top five, which gives us the most popular products in the purchasing data.

However, this query isn’t really contextualized by the customer but by all customers and so isn’t optimized for any given individual (though it might be very useful for supply chain management). We do better without much additional work by recommending historically popular purchases that the customer has made themselves, as in the following query:

Discover a simple Cypher query for graph database recommendations.

The only change in this query, compared to the previous one, is the inclusion of a constraint on the customer node that it must contain a key name and a value Alice. This is actually a far better query from the customer’s point of view since it’s egocentric (as good recommendations should be!).

Of course, in an age of social selling it’d be even better to show the customer popular products in their social network rather than just their own purchases since this strongly influences buying behavior.

As you’d expect, adding a social dimension to a Neo4j graph database is easy, and querying for friends/friends-of-friends/neighbors/colleagues or other demographics is straightforward as in this query:

Check out this graph database recommendations Cypher query.

[IMAGE 4] To retrieve the purchased products of both direct friends and friends-of-friends, we use the Cypher WITH clause to divide the query into two logical parts, piping results from the first part into the second. In the first part of this query, we see the family syntax where we find the current customer (Alice) and traverse the graph matching for either Alice’s direct friends or their friends (her friends-of-friends).

This is a straightforward query since Neo4j supports a flexible path-length notation, like so: -[:FRIEND*1..2]-> which means one or two FRIEND relationships deep. In this case, we get all friends (depth one) and friend-of-friends (at depth two), but the notation can be parameterized for any maximum and minimum depth.

In matching, we must take care not to include Alice herself in the results (because your friend’s friend is you!). It is the WHERE clause, which ensures there is only a match when the customer and candidate friend are not the same node.

We don’t want to get duplicate friends-of-friends that are also direct friends (which often happens in groups of friends). Using the DISTINCT keyword ensures that we don’t get duplicate results from equivalent pattern matches.

Once we have the friends and friends-of-friends of the customer, the WITH clause pipes the results from the first part of the query into the second. In the second half of the query, we’re back in familiar territory, matching against customers (the friends and friends-of-friends) who bought products and ranking them by sales (the number of bought baskets each product appeared in).

Conclusion

Graph technology enables you to incorporate customer feedback, adjust for seasonal trends or suggest birthday gift ideas based on data on the customer’s Facebook friends. And all in real time, without clever coding, and with no fear of the relational JOIN bomb.

Catch up with the rest of the real-time recommendation blog series:

Quickly Connect Buyer and Product Data

Proven Business Benefits (+ Case Studies)

↧

Graph Databases for Beginners: Graph Theory & Predictive Modeling

September 5, 2018, 3:00 am

≫ Next: Cypher Philly: Civic Empowerment and Actionable Change Using Data-Driven Storytelling

≪ Previous: Powering Recommendations with a Graph Database: A Rapid Retail Example

Learn how to use concepts in graph theory and predictive analysis to understand your connected data

There’s a common one-liner, “I hate math…but I love counting money.”

Except for total and complete nerds, a lot of people didn’t like mathematics while growing up. In fact, of all school subjects, it’s the most consistently derided in pop culture (which is the measure of all things, we’re sure).

As a reader who’s interested in a technical topic (graph databases) but wants a non-technical introduction to aforementioned topic, I’m going to make a not-too-bold assumption: You probably don’t like math very much.

But what if I told you, that instead of using math to just count your money, you could use math to make more money? Interested? Turns out, that’s exactly what we’re going to talk about today.

In this Graph Databases for Beginners blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graph technology is the future, why connected data matters, the basics (and pitfalls) of data modeling, why a database query language matters and the differences between imperative and declarative query languages.

This week, we’re going to take a step back from graph database technology and look at the mathematics powering it all: graph theory. Then we’ll look at how that math – in conjunction with graph technology – helps companies grow their bottom line.

Graph Theory (Not Chart Theory)

Skip the definitions and take me right to the predictive modeling stuff!

First, let’s define just a few terms.

If you’ve been with us through the Graph Databases for Beginners series, you (hopefully) know that when we say “graph” we mean this…

…and not this:

An example of a bar chart (image credit: Innesw, Wikimedia Commons)

This is a chart, not a graph (for the sake of this blog post, and really for the sake of this whole website). Image source

Graph theory is a type of math that doesn’t use a lot of numbers. A total nerd came up with it to stop his friends (not really his friends) from bugging him about getting out of the house more (he didn’t). Fortunately for you, you too can use this math to avoid getting out of the house and lose your friends.

So when we say “math” you don’t have to find “x” or even use numbers so we’re good, right?

Also, for the sake of the examples of this blog post, we’re going to be using a lot of social science and network science examples.

“Wait,” I hear you say, “we have to use math and science now too? Ugh.” But this science doesn’t involve cutting open frogs or finding the velocity of the train headed to Philadelphia or working on anything that was clearly a hoax invented by the Chinese.

So if you came for the “make money” part of this blog post, stick around. We’re going to use math and science to make money, and while the examples come from just a few narrow fields of aforementioned math and science, they’re applicable to your business too. We promise, because graphs are everywhere.

Triadic Closures

One of the most common properties of graphs is that of triadic closures. This is the observation that if two nodes are connected via a path with a mutual third node, there is an increased likelihood of the two nodes becoming directly connected in the future.

In a social setting (social science!), a triadic closure would be a situation where two people with a mutual friend have a higher chance of meeting each other and becoming acquainted.

The triadic closure property is most likely to be upheld when a graph has a node A with a strong relationship to two other nodes, B and C. This then gives B and C a chance of a relationship, whether it be weak or strong. Although this is not a guarantee of a potential relationship, it serves as a credible predictive indicator.

Let’s take a look at this example.

A triadic closure example in graph theory

Above is an organizational hierarchy where Alice manages both Bob and Charlie. This is rather strange, as it would be unlikely for Bob and Charlie to be unacquainted with one another while sharing the same manager.

As it is, there is a strong possibility they will end up working together due to the triadic closure property. This creates either a WORKS_WITH (strong) or PEER_OF (weak) relationship between the two of them, closing the triangle – hence the term triadic closure.

Possibilities for triadic closures in an example graph

But How Do I Use It?

“Social graph theory definitions are great and all, but how do I use the math to make more money (and therefore count my more money)?” I hear you asking in your head.

To dive into every single use case of graph technology would be a George R.R. Martin-length series of novels that wouldn’t even cover them all (and still wouldn’t be finished), so we’re going to settle for a few examples as we define each new term. Hopefully these illustrations help get your creative juices flowing on how graphs – and particularly graph algorithms – help build your bottom line.

Social Networks: Imagine you have a social network application (whether external or internal), and you want to suggest connections to your users in a way that delivers the most value (i.e., not just random suggestions). You would likely build your predictive model to suggest connections that complete triadic closures. This gives your users the most relevant and useful recommendations. Happy users (almost always) = more business value (i.e., money) from your social network.

Other Examples: Not running a social network? Here are some other applications of triadic closures:

A real-time recommendation engine might use triadic closures to suggest new products to ecommerce shoppers that are closely related to past purchases.
In Customer 360, predicting triadic closures helps you build a more complete (i.e., less siloed) picture of a customer because you’re able to see the entire picture of how a customer’s data is connected.
In network management, investigating triadic closures may help you identify which servers or other network components are dependent on the same bottleneck, making your impact analysis more accurate.

Structural Balance

Another aspect to consider in the formation of stable triadic closures is the quality of the relationships involved in the graph. To illustrate the next concept, assume that the MANAGES relationship is somewhat negative (who likes to be managed?) while the PEER_OF and WORKS_WITH relationships are more positive.

Based off of the triadic closure property, we fill in the third relationship with any label, such as having everyone manage each other (weird), like in the first image below, or the (somehow even weirder) situation in the second image below.

An anti-example of triadic closure in graph theory

A weird example of a triadic closure in a graph

However, you can see how uncomfortable those working situations would be in reality. In the second image, Charlie finds himself both the peer of a boss and a fellow worker. It would be difficult for Bob to figure out how to treat Charlie – as a fellow coworker or as the peer of his boss?

We have an innate preference for structural symmetry and rational layering. In graph theory, this is known as structural balance.

A structurally balanced triadic closure is made of relationships of all strong, positive sentiments (such as the first example below) or of two relationships with negative sentiments and a single positive relationship (second example below).

A stable triadic closure composed of positive relationships, forming a graph triangle

A stable triadic closure of two negative relationships and one positive relationship, forming a triangle

Balanced closures help with predictive modeling in graphs. The simple action of searching for chances to create balanced closures allows for the modification of the graph structure for accurate predictive analysis.

But How Do I Use It?

Fraud Detection: Imagine you’re trying to catch the bad guys in the financial services sector with a fraud detection application. When you look at the data as a graph, fraud rings always have a particular shape. So if you build your predictive model to identify potential instances of structural balance (such as two synthetic identities listed under the same phone number or any number of other known fraud tactics), then bam you now have a subset of your data to investigate for potential fraudulent activity.

Other Examples: Here are a few other ways you might use the structural balance principle of triadic closures to further your world domination goals (that’s what you’re here for, right?):

If you’re building an artificial intelligence application, then having your system learn how to identify structural balance is essential to making relevant, context-driven decisions.
Within identity and access management, analyzing triadic closures with structural balance helps you identify users who have (or need to have) access to structurally similar assets or other applications – this helps you set more accurate inheritance rules accordingly.
If you’re managing an organizational knowledge graph, pinpointing structural balance is key to interconnecting your data that may otherwise be siloed.

Local Bridges

We go further and gain more valuable insight into the communications flow of our organizations by looking at local bridges.

Local bridges are a tie between two nodes where the endpoints of the local bridge are not otherwise connected, nor do they share any common neighbors. Think of local bridges as connections between two distinct clusters of the graph. In this case, one of the ties has to be weak.

For example, the concept of weak links is relevant in algorithms for job search. Historical studies have shown that the best sources of jobs come from looser acquaintances rather than close friends. This is because closer friends tend to share a similar worldview (are in the same graph component) but looser friends across a local bridge are in a different social network (and are in a different graph component).

In the image above, Davina and Alice are connected by a local bridge but belong to different graph components. If Davina were to look for a new job, she would be more likely to find a successful recommendation from Alice than from Frances.

This property of local bridges being weak links is something that is found throughout social graphs. As a result, we make predictive analyses based on empirically derived local bridge and strong triadic closure notions.

But How Do I Use It?

Data Lineage: So, you’re the person tasked with tracking user data for GDPR or some other regulatory compliance? Graphs have got you covered. The principle of local bridges – and the graph algorithms that help you identify them – means you can more accurately predict how two (or more) clusters of data are in fact related to the same user. Imagine how a sales department may be storing data on a particular user, but marketing might also be storing data on the same user. By identifying a local bridge between these similar clusters, you more accurately meet GDPR requests on stored user data.

Other Examples: Here are a few other applications of the local bridges concept when it comes to predictive modeling:

Consider the example above (about job searches) in greater detail: If you’re building a professional networking site aimed at job seekers, then building local bridges within your social network app makes your job recommendations more accurate.
If you’re a cybersecurity professional, it’s imperative to identify local bridges between clusters of vulnerable network components, firewalls, or other possible lines of attack. Your opponents certainly will be.
Local bridges are also the bedrock of advanced recommender systems, because leveraging local bridges allow you to accurately model and predict links between various clusters of data that include users, their past purchases and their current shopping carts.

Conclusion

While graphs and our understanding of them are rooted in hundreds of years of mathematical study (i.e., graph theory), we continue to find new ways to apply them to our personal, social and business lives. Technology today offers another method of understanding these principles in the form of the modern graph database.

As you’ve seen through this Graph Databases for Beginners series so far, you simply need to understand how to apply graph theory and related analytical techniques in order to achieve your business goals. Come to the math side – we have money.

Ready to dive deeper into the world of graph technology?
Learn how to apply graph theory to real-world challenges with your free copy of the O’Reilly Graph Databases book.

Get the Book

Catch up with the rest of the Graph Databases for Beginners series:

Why Graph Technology Is the Future

Why Connected Data Matters

The Basics of Data Modeling

Data Modeling Pitfalls to Avoid

Why a Database Query Language Matters (More Than You Think)

Imperative vs. Declarative Query Languages: What’s the Difference?

Graph Search Algorithm Basics

Why We Need NoSQL Databases

ACID vs. BASE Explained

Other Graph Data Technologies

Native vs. Non-Native Graph Technology

↧

Cypher Philly: Civic Empowerment and Actionable Change Using Data-Driven Storytelling

September 6, 2018, 12:00 am

≫ Next: Hilary Mason [ML at Cloudera] & Stephen O’Grady [Principal at RedMonk] Will Keynote at GraphConnect 2018

≪ Previous: Graph Databases for Beginners: Graph Theory & Predictive Modeling

Meet Jason Cox and Jess Mason who run a meetup called Philly GraphDB to explore graph technologies.

Jason Cox and Jess Mason are the co-founders of Untitled Folder, LLC, a software development and consulting agency based out of Philadelphia. We help early-stage startups on their journey from an Untitled Folder of ideas to building a scalable Minimal Viable Product (MVP). In our spare time, we run a meetup called Philly GraphDB where we explore graph database technologies and share what we’ve learned with the community.

Meet Jason Cox ad Jess Mason who run a meetup called Philly GraphDB to explore graph technologies.

How It Started

While working with open city data in Neo4j, Jason and I discovered there was a lot of open public data freely available to anyone. Our goal then switched to how we could utilize this data to its fullest potential.

Having seen the impact software engineers, data scientists and journalists could have with data – like that of the Paradise and Panama Papers – we were inspired.

One afternoon, while preparing open data examples for our investigative journalism meetup, hosting Will Lyon from Neo4j, we conceived the foundations for the Cypher Philly initiative.

The initiative instantly gained support when we pitched our idea for the initiative at the end of the meetup. We also grew our team after bringing on our friend, Marieke Jackson, an expert data scientist who’s passionate about using civic data to build scrollytelling stories.

The Initiative

Cypher Philly is an open source project designed to empower citizens, journalists, data scientists, coders and creatives with the ability to harness open data for civic good.

Our goal is to simplify the process of telling data-driven stories using open public data to bring about actionable change, while also informing citizens and governments alike.

Cypher Philly uses open civic data to foster activism.

Collaborations

The Cypher Philly team has been growing and collaborating with various organizations, groups and communities to build and expand our reach.

To help us grow, we’ve gained local sponsorship from Linode to cloud host all of our related civic app projects. We also picked up Azavea as a sponsor to host our meetups in their offices, as well as lend support in building geospatial models of our data.

We also gained support from the Code for Philly and Code for America communities to expand our reach and expose the project to more people who want to contribute. Additionally, we gained support from the Philadelphia Design Activists community, the Committee of Seventy organization and various journalistic groups.

How It Works

The Cypher Philly team and participants have access to a collection of digital tools and methods we’ve built for finding, scraping, importing and storing data for civic-related projects using open public data. These tools and methods are freely available from our open source GitHub repository and may be used by any group or individual.

We meet frequently with different participants in smaller groups based on their skill sets and interest in participation. In our meetings, we discuss our goals and tasks for contributing to the projects we’re currently working on. All participants can see, in real time, what tasks are available for contribution.

Our project tasks and GitHub issues – available for anyone to assign themselves to an issue and work on that issue on their own time – live at the Untitled Folder Projects Board.

Collaboratively, we contribute to completing a data story that our city/state’s Cypher team deems most relevant in addressing the major civic issues we face.

Currently, Cypher Philly is working to address PA gerrymandering and how citizens are represented by district.

Currently Cypher Phill is working to address PA gerrymandering and how citizens are represented by district.

Want in on projects like this? Click below to get your free copy of the Learning Neo4j ebook and catch up to speed with the #1 platform for connected data.

Learn Neo4j Today

↧

Hilary Mason [ML at Cloudera] & Stephen O’Grady [Principal at RedMonk] Will Keynote at GraphConnect 2018

September 7, 2018, 12:00 am

≫ Next: Effective Internal Risk Models for FRTB Compliance: The Importance of Risk Model Approval

≪ Previous: Cypher Philly: Civic Empowerment and Actionable Change Using Data-Driven Storytelling

Learn about GraphConnect 2018 keynotes Hilary Mason (at Cloudera) and Stephen O'Grady (at RedMonk)

We’re pleased to announce the addition of two phenomenal keynote speakers to the already-awesome lineup of presenters at GraphConnect 2018 in New York City. (Psst: Register now if you haven’t already, as space is limited!).

Morning Keynote Speaker: Hilary Mason, Cloudera

In addition to Neo4j CEO & Co-Founder Emil Eifrem, the morning keynote session at GraphConnect will feature Hilary Mason, GM of Machine Learning at Cloudera and Founder & CEO of Fast Forward Labs (acquired by Cloudera).

Hilary Mason is a pioneer in modern artificial intelligence and founded Fast Forward Labs in 2014 – an applied machine learning and artificial intelligence firm – prior to its acquisition by Cloudera in 2017. Formerly the Chief Data Scientist at Bitly, Hilary brings a wealth of real-world experience in AI to GraphConnect 2018.

Hilary will present “The Present and Future of Artificial Intelligence and Machine Learning” during her morning keynote session.

A pioneer in the field of modern AI, Hilary will share thoughts on where AI is today and where it’s going. This session will separate the hype from the reality around AI, discussing where organizations are reaping the biggest benefits from AI and where they see the most future potential.

Hilary will also cover where graph databases add value to AI applications, and share her real-world experiences from graph-enabled AI projects.

Evening Keynote Speaker: Stephen O’Grady, RedMonk

Stephen O'Grady, Principal Analyst & Co-Founder of RedMonk

After the breakout sessions are finished for the day, the closing keynote address at GraphConnect 2018 will be delivered by Stephen O’Grady, Principal Analyst and Founder of Redmonk, the world’s leading developer-focused industry analyst firm.

Stephen’s keynote session is titled: “What Will You Build, and *Why*? The motivations, ethics, and career opportunities of modern application development.”

Stephen O’Grady will discuss the opportunities and considerations around modern software development that go beyond technology requirements, including many important personal and moral questions developers need to answer, such as:

How do I maximize my personal financial opportunity in application development?
Where do my personal and professional ethics fit as I choose jobs and projects?
How can I get as much career advancement and learning out of the projects I deliver?

Stephen’s session will explore how developers maximize their career growth and financial success while staying true to their personal and professional ethics. If you’re developer or lead or work with developers, this session is not to be missed!

We’ll See You at GraphConnect 2018!

With the addition of Hilary Mason and Stephen O’Grady, we’re excited to give you two more great reasons to attend GraphConnect 2018 in New York City this autumn. We can’t wait to see you there!

Why wait?
Get your ticket to GraphConnect 2018, and we’ll see you on September 20th!

Get My Ticket

↧

Effective Internal Risk Models for FRTB Compliance: The Importance of Risk Model Approval

September 10, 2018, 12:00 am

≫ Next: 5 Ways to Tackle Big Graph Data with KeyLines and Neo4j

≪ Previous: Hilary Mason [ML at Cloudera] & Stephen O’Grady [Principal at RedMonk] Will Keynote at GraphConnect 2018

Learn why risk model approval is critical to effective internal risk models for FRTB compliance

Sweeping regulations are changing the way banks handle risk. The Fundamental Review of the Trading Book (FRTB) represents an important shift designed to provide a firm foundation for the future. While laws passed after the financial crisis offered a patchwork, the FRTB is a change that offers banks a motivation for putting in place a strong infrastructure for the future.

In this series on the FRTB, we explore what it takes to create effective internal risk models using a graph database like Neo4j. This week, we’ll look at the major areas impacted by the FRTB, including raising risk reserves, the trading desk, and the role and approval of internal risk models.

What Is the FRTB?

Fundamental Review of the Trading Book (FRTB) regulations are part of the upcoming Basel IV set of reforms and create specific capital-reserve requirements for bank trading desks based on investment-risk models. The new regulations require banks to reserve sufficient capital to maintain solvency through market downturns and avoid the need for governmental bailouts.

Banks are using FRTB mandates as an opportunity to build a firm foundation for future risk management and compliance applications that lowers development and staffing expenses, optimizes reserve ratios, maximizes available capital and drives investment profits.

FRTB Raises Basel Reserve Requirements

In the financial crisis a decade ago, banks worldwide held large risk exposures in their trading books without sufficient capital reserves to weather the length and depth of the plunge in investment markets. As a result, regulators created new data management and capital-reserve requirements to avoid another market meltdown.

In turn, banks created risk compliance models that were tested and approved by regulators. But at many institutions, those models were not maintained, and as time passed, market and internal changes exposed the banks to new investment risks.

Today, risk-compliance problems are addressed by BCBS 239 (Basel Committee on Banking Supervision standard 239) and FRTB (Fundamental Review of the Trading Book) regulations. BCBS 239 puts forth principles for risk-data governance, aggregation and reporting, and associated IT infrastructure.

FRTB standards – which are part of BCBS and the upcoming Basel IV set of reforms – create specific capital-reserve requirements for bank trading desks based on investment-risk models.

Intense Focus on the Trading Desk

FRTB regulators develop guidelines that require banks to reserve sufficient capital to maintain solvency through market downturns and avoid the need for governmental bailouts.

The reserve requirements for trading books are higher than banking books, tempting institutions to engage in regulatory arbitrage – the movement of assets between books to affect reserve requirements – a practice that is now being tightly scrutinized and regulated.

The Role of Internal Risk Models

FRTB regulations include default reserve calculations that result in measurably higher capital requirements designed to account for new levels of trading-book risk unaccounted for by the Basel II risk framework. The higher capital requirements translate directly to lower levels of investment capital, flexibility, revenues and profits.

Banks may accept BCBS’s reserve calculations or develop their own internal risk models to calculate capital-reserve requirements. To use internal-model results, banks must obtain the approval of national regulators by proving how well models represent risk in the banks’ investment strategies.

The approval process requires a bank to forecast hypothetical profits and losses using its model’s calculated capital reserves as well as to backtest the model with real pricing and holdings data. FRTB also requires that internal models implement expected shortfall calculations to address outlying tail risks in investment strategies.

The Importance of Risk Model Approval

To satisfy supervisory authorities of the accuracy of an internally developed risk model, banks must prove all of the following:

Their data is complete, accurate and consistent; and the components of the risk model can be traced back to original, authoritative data sources
There is sufficient pricing and transaction history to test the model back to 2007
Their aggregation rules are accurate and comply with BCBS regulations
Their risk models are sufficiently realistic and robust to represent market realities in normal and emergency situations
Their framework models historical, current and what-if market scenarios
Their policies and procedures for data governance, aggregation and validation are complete and consistently enforced
Their IT infrastructure handles inter-day fair-market evaluations, scheduled reports, and ad hoc requests from internal and external risk supervisors

If a bank fails the regulatory audit, regulators use standard BCBS formulas to determine substantially higher amounts of capital that the bank must reserve to cover potential losses.

If the internal model passes the audit, the model’s calculated capital reserve requirements replace regulators’ default reserve requirements as well as traditional value-at-risk (VaR) measures.

The importance of internal risk modeling approval.

Conclusion

Internal risk model approval leads to lower reserves and higher levels of investment capital, flexibility, revenue and profits. FRTB mandates higher default reserve requirements than those calculated by banks’ internal risk models.

Implementing this demands the ability to trace data dependencies through many levels of complexity. A graph database offers an effective way to capture all these connections at scale. Neo4j is the world’s leading graph platform.

In the coming weeks, we’ll explore how to trace data lineage across data silos and how traditional technologies like spreadsheets, relational databases, and data warehouses fall short. We’ll dive into why banks need a modern graph platform as the foundation for effective internal risk models that meet FRTB requirements.

Risk demands a strong foundation
Find out why leading financial services firms rely on Neo4j graph technology for compliance and innovation, Effective Internal Risk Models Require a New Technolgoy Foundation. Click below to get your free copy.

Read the White Paper

↧

5 Ways to Tackle Big Graph Data with KeyLines and Neo4j

September 11, 2018, 12:00 am

≫ Next: When Graph Meets Big Data: Obstacles and Opportunities for Visualization

≪ Previous: Effective Internal Risk Models for FRTB Compliance: The Importance of Risk Model Approval

Learn about graph visualization for Neo4j using KeyLines

Understanding big graph data requires two things: a robust graph database and a powerful graph visualization engine. That’s why hundreds of developers have combined Neo4j with the KeyLines graph visualization toolkit to create effective, interactive tools for exploring and making sense of their graph data.

But humans are not big data creatures. Given most adults can store between 4-7 items only in their short-term memory, loading an overwhelming quantity of densely-connected items into a chart won’t generate insight.

That presents a challenge for those of us building graph analysis tools.

How do you decide which subset of data to present to users? How do they find the most important patterns and connections?

That’s what we explore in this blog post. You’ll discover that, with some thoughtful planning, big data doesn’t have to be a big problem.

The Challenge of Massive Graph Visualization

For many organizations, “big data” means collecting every bit of information available and then figuring out how to use it later. One of the many problems with this approach is that it’s incredibly challenging to go beyond aggregated analysis to understand individual elements.

Learn about the challenges of massive graph database visualization.

20,000 nodes visualized in KeyLines. Pretty, but pretty useless if you want to understand specific node behavior. Data from The Cosmic Web Project.

To provide your users with something more useful, you need to think about the data funnel. Through different stages of backend data management and front-end interactions, the funnel reduces billions of data points into something a user can comprehend.

How the data funnel brings big data down to a human scale.

The data funnel to bring big data down to a human scale.

Let’s focus on the key techniques you’ll apply at each stage of the funnel:

1. Filtering in Neo4j: ~1,000,000+ nodes

There’s no point visualizing your entire Neo4j instance. You want to remove as much noise as possible, as early as possible. Filtering with Cypher queries is an incredibly effective way to do this.

KeyLines’ integration with Cypher means giving users some nice visual ways to create custom filtering queries, like sliders, tick-boxes or selecting from a list of cases.

In the example below, we’re using Cypher queries to power a “search and expand” interaction in KeyLines:

MATCH (movie:Movie{title: $name})<-[rel]-(actor:Actor)

RETURN *, { id: actor.id, degree: size((actor:Actor) --> (:Movie)) } as degree

First, we’re matching Actors related to a selected Movie before returning them to be added to our KeyLines chart:

There’s no guarantee that filtering through search is enough to keep data points at a manageable level. Multiple searches might return excessive amounts of information that’s difficult to analyze.

Filtering is effective, but it shouldn’t be the only technique you use.

2. Aggregating in Neo4j: ~100,000 nodes

Once filtering techniques are in place, you should consider aggregation. There are two ways to approach this.

First, there’s data cleansing to remove duplicates and errors. This is often time-consuming but, again, Cypher is your friend. Cypher functions like “count” make it really easy to aggregate nodes in the backend:

MATCH (e1:Employee)-[m:MAILS]->(e2:Employee) 
RETURN e1 AS sender, e2 AS receiver, count(m) AS sent_emails

Second, there’s a data modeling step to remove unnecessary clutter from entering the KeyLines chart in the first place.

Questions to ask in terms of decluttering: Can multiple nodes be merged? Can multiple links be collapsed into one?

It’s worth taking some time to get this right. With a few simple aggregation decisions, it’s possible to reduce tens of thousands of nodes into a few hundred.

Use link aggregation to reduce graph database nodes.

Using link aggregation, we’ve reduced 22,000 nodes and links into a much more manageable chart.

3. Create a Clever Visual Model: ~10,000 – 1,000 nodes

By now, Neo4j should have already helped you reduce 1,000,000+ nodes to a few hundred. This is where the power of data visualization really shines. Your user’s visualization relies on a small proportion of what’s in the database, but we may then use visual modelling to simplify it further.

The below chart shows graph data relating to car insurance claims. Our Neo4j database includes car and policyholders, phone numbers, insurance claims, claimants, third parties, garages and accidents:

Graph dat relating to car insurance claims.

Loading the full data model is useful, but with some carefully considered re-modelling, the user may select an alternative approach suited to the insight they need.

Perhaps they want to see direct connections between policyholders and garages:

Or the user may want a view that removes unnecessary intermediate nodes and shows connections between the people involved:

The ideal visual data model will depend on the questions your users are trying to answer.

4. Filters, Combining and Pruning: ~1,000 nodes

Now that your users have the relevant nodes and links in their chart, you should give them the tools to declutter and focus on their insight.

A great way to do this is filtering by adding or removing subsets of the data on demand. For better performance, present them with a filtered view first, but give the user control options to bring in data. There are plenty of ways to do this – tick boxes, sliders, the time bar or “expand and load.”

Another option is KeyLines’ combos functionality. Combos allow the users to group certain nodes, giving a clearer view of a large dataset without actually removing anything from the chart. It’s an effective way to simplify complexity, but also to offer a “detail on demand” user experience that makes graph insight easier to find.

Group nodes into combos to give a clearer data set view.

Combos clear chart clutter and clarify complexity.

A third example of decluttering best practices is to remove unnecessary distractions from a chart. This might mean giving users a way to “prune” leaf nodes, or making it easy to hide “super nodes” that clutter the chart and obscure insight.

Leaf, orphan and super nodes rarely add anything to your graph data understanding, so give users an easy way to remove them.

KeyLines offers plenty of tools to help with this critical part of your graph data analysis. This video on managing chart clutter explains a few more.

5. Run a Layout: ~100 nodes

By this point, your users should have a tiny subset of your original Neo4j graph data in their chart. The final step is to help them uncover insight. Automated graph layouts are great for this.

A good force-directed layout goes beyond simply detangling links. It should also help you see the patterns, anomalies and clusters that direct the user towards the answers they’re looking for.

KeyLines' latest organic layout for data visualization.

KeyLines’ latest layout – the organic layout. By spreading the nodes and links apart in a distinctive fan-like pattern, the underlying structure becomes much clearer.

With an effective, consistent and powerful graph layout, your users will find that answers start to jump out of the chart.

Bonus Tip: Talk to Your Users

This blog post is really just a starting point. There are plenty of other tips and techniques to help you solve big graph data challenges (we’ve not even started on temporal analysis or geospatial visualization).

Probably the most important tip of all is this: Take time to talk to your users.

Find out what data they need to see and the questions they’re trying to answer. Use the data funnel to make that process as simple and fast as possible, and use the combined powers of Neo4j and KeyLines to turn the biggest graph datasets into something genuinely insightful.

Visit our website to learn more about graph visualization best practices or get started with the KeyLines toolkit.

Cambridge Intelligence is a Gold Sponsor of GraphConnect 2018. Use code CAM20 to get 20% off your ticket to the conference and training sessions, and we’ll see you in New York!

Meet graph experts from around the globe working on projects just like this one when you attend GraphConnect 2018 on September 20-21. Grab the discount code above and get your ticket today.

Get My (Discounted!) Ticket

↧