Quantcast
Channel: Blog – Neo4j Graph Database Platform
Viewing all 1139 articles
Browse latest View live

When Graph Meets Big Data: Obstacles and Opportunities for Visualization

$
0
0
Graph layout done in 2D with Gephi.
Love the buzzword or hate it, “big data” is an inescapable reality. Whether it’s for business intelligence or bioinformatics, recommendation engines or risk analysis, the volume, variety and velocity of data is only going to increase.

Graph is an undeniably powerful tool for working with big data. Unfortunately, one of graph’s greatest strengths – intuitive visualization – is restricted by an essentially fixed resource: screen size. Even a large monitor only accommodates a few hundred nodes before the graph’s structure becomes difficult to parse.

Too Damn Dense


My first startup, in 2012, leveraged social networks as a trust backing for the sharing economy. (It was also great practice for Buzzword Bingo at GraphConnect 2018). We looked at the graph signatures of many real and fake social network accounts. Although it’s difficult to quantify the difference between these accounts, the difference in patterns presents clearly when visualized.

We had no luck with the popular crowd, though. For example, the graph below depicts a user with ~2,000 friends. The graph layout, with different colors representing separate communities, was done in 2D with Gephi.

Graph layout done in 2D with Gephi.

Although it’s beautiful and does reveal a few distinct clusters, it also reveals the shortcomings of this approach. Most glaring is the giant blob of nodes on the left where everything overlaps. 2D layouts do effectively separate discrete clusters, but it’s hopeless if those clusters are interconnected.

What we’re left with is a graph visualization that only shows, well, the graph. If I want to cluster by parameters, like number of connections, level of activity, location, etc., I can’t do much beyond re-assigning color or size. It’s not easy to visually sort and classify nodes.

Of course, we can bring the data to a separate application to produce 2D scatter plots of those attributes, or spread the nodes on a map. But we’ll lose the connection between those charts and their relationships that are so intuitively captured by the graph.

Furthermore, I can’t easily shift between the (u0:User)-[:friend_of]-(u1:User) perspective above to representations of say:

  1. (u:User)-[:mentioned_in]->(p:Post)
  2. (p0:Post)-[:mentioned_same_user_with]-(p1:Post)
  3. (u0:User)-[:mentioned_in_same_post_with]-(u1:User)

Into the 3rd Dimension


Two years later I founded my current startup, Kineviz. Our first client, Box, brought us in to visualize collaboration on their file sharing platform. These “Collab Graphs” were commissioned for the BoxWorks customer conference, where attendees would use gesture controls to interact with the graph on a large screen.



To take full advantage of the gestural interface, we decided to perform the layout in 3D. The first thing that struck us was how this 3D graph with 3,000+ nodes didn’t look nearly as dense as a 2D graph of 300 nodes!

Clusters separated visually without losing interconnections. Intuitively, it made sense that high dimensional information benefits from a high dimensional visualization (given that each connection in a graph counts as a dimension). The linear increase from 2D to 3D amounts to an exponential increase in the amount of data that can be comfortably laid out.

Minority Reporting with VR


3D data visualization is hardly new, and it’s not without its drawbacks. On a 2D screen, depth information is lost. And any screen-based visualization suffers from the disconnect between examining a local structure while maintaining the global context (aka the Google Maps problem: zoom into street and you lose context of the city. )

Back in 2014, the Oculus DK2 virtual reality headset had just shipped and we were eager to try it out. Because we’d developed the Collab Graphs in WebGL, the foundation of the WebXR standard, we were able to bring it into VR without too much trouble. Partly because it would be cool, but also, partly because we had an intuition this might address the shortcomings of 3D on a 2D screen.

The result was the inspiration for our hybrid VR and 2D data visualization platform, GraphXR. If you haven’t tried VR, it’s hard to convey just how well it mirrors the experience of physical space. Much as you know whether someone’s standing behind you right now (boo!) or where the door is without having to turn and look, VR grants you situational awareness of a complex pattern without having to constantly zoom in and out or jump between views.

A great deal of research remains to be done before we can make definitive claims about the efficacy of VR data visualization, but anecdotally, users of our first generation tool reported between 15x and 150x(!) speed gains in analyzing their data.



It’s worth mentioning that, while the future is bright for Extended Reality (XR is the superset of VR and Augmented Reality), we have yet to see the benefits of VR dataviz carry over to AR.

The current generation of AR headsets suffers from a narrow field of view, effectively limiting your viewing area to a screen. Because content never enters the viewer’s peripheral vision, it’s not loaded into the brain’s spatial buffer like we think we’re seeing in VR. This limitation will undoubtedly be eliminated in future generations of AR headsets.

The Big Picture


These are exciting times to be a data nerd, especially if visualization plays a significant role in your work. I’ve only covered a handful of the challenges and solutions to visualizing large graphs.

Bloom and Neo4j’swhole ecosystem of data visualization partners offer a range of strategies for working with big data. Skyrocketing GPU power enables Graphistry to address large graph layout problems, while companies like 3Data and Virtualitics explore the possibilities of VR data visualization. Like big data and graph adoption, the options for visualization are only going to grow.


Kineviz is a Silver Sponsor of GraphConnect 2018. Use code KIN10 to get 10% off your ticket to the conference and training sessions, and we’ll see you in New York!


Meet graph experts from around the globe working on projects just like this one when you attend GraphConnect 2018 on September 20-21. Grab the discount code above and get your ticket today.

Get My (Discounted!) Ticket

Fighting Money Laundering and Corruption with Graph Technology

$
0
0
An algorithm to calculate UBO status as defined in Structr's Flow Editor.
The shocking revelations of the International Consortium of Investigative Journalists (ICIJ), who released both the Panama and Paradise Papers, as well as the West Africa Leaks, have shown that aggressive tax avoidance and money laundering are a widespread and worldwide problem.

Money laundering often correlates with other illegal activities such as terrorist financing and corruption in politics and businesses, while tax avoidance leads to political and social tensions.

Why Risk Management and Money-Laundering Prevention Is Important


The total volume of money laundering worldwide is estimated at 2000 billion U.S. dollars – an incredibly high figure (for example, Germany is estimated at 100 billion Euros). No wonder governments and companies are stepping up their efforts to combat these illegal activities, leading to drastically tightened regulatory requirements in many areas.

Corporations of all sizes are facing the challenge of adjusting both their business processes and risk management practices. Companies with a decentralized, multi-layered structure and a huge customer base (e.g. franchise systems) face a constant regulatory demand for risk control and network transparency on a large scale.

If a business can’t comply to all regulatory requirements quickly enough, it’ll slow down and dry out long-term.

What Is AML Compliance?


Anti-money laundering (AML) compliance is more than just ticking a box on a spreadsheet. An effective AML compliance management solution means conforming to all regulatory rules (laws, policies and regulations) by instituting a variety of measures:

    • Risk analysis
    • Transaction monitoring
    • Supporting money laundering officers
    • Employee training and review
    • KYC (know-your-customer) principle
    • PEP (politically exposed person) checks
    • UBO (ultimate beneficial ownership) identification
    • Whistleblower systems
    • Case management and interface with authorities
    • Documentation
    • Long-term, audit-proof archiving

Why do you need an IT system in the first place?


The huge amounts of data you deal with have inherently complex network structures.

There are individuals acting in different roles for all types of entities, records documenting relationships and interactions, and millions of financial transactions, each of which could be part of a money-laundering pattern.

It’s self-explanatory that you can only handle this amount of data with an IT system, and KERBEROS, the solution we have built, is a good example to meet these requirements.

But data is just one part of the game: A business-supporting IT solution needs to be practical, cost-efficient and flexible. Most importantly, the solution needs to be effective. Companies need a detailed “X-ray vision” beyond the first layer of huge data to effectively understand, visualize, control and mitigate their risks 24 hours a day, seven days a week, for all 365 days of a year.

Comprehensive AML report generated in real-time based on graph data.

Fig 1. Comprehensive AML Report generated in real-time based on graph data.


What are the challenges for building an effective AML solution within your IT system?


What would a fully digitized 360-degree risk management and compliance solution look like if you must guarantee full compliance across a complex organization at reasonable costs? And why is it hard to build a 24×7 software tool to ensure premium compliance in an organization with a lean and cost-sensitive resource investment?

It’s not only the volume of the data you need to take into account, but also its structure. To store and efficiently manage these kinds of networks, you need the right database, one that has a data model optimized for connected data. You need a native graph database.

Native graph databases like Neo4j, the database selected for the solution, are designed and built from the ground up to store data in the form of nodes and relationships – no other database model is as efficient and optimized for handling connected data.

The other important component to a comprehensive AML solution is a flexible application platform that allows you to develop the “solution” part in an agile way. Structr, the product used here for building the application, provides built-in functionality so developers can focus on the important issues instead of being slowed down by annoying standard tasks.

In addition, the entire definition of the application is stored in the native graph database, allowing the developers to evolve the application with an unprecedented level of flexibility.

An algorithm to calculate UBO status as defined in Structr's Flow Editor.

Fig 2. Algorithm to calculate UBO status as defined in Structr’s Flow Editor.


Flexibility is key: It helps developers follow the evolution of legislation, which is a moving target, but also assists in mapping the non-uniform, inconsistent regulations that international companies face. Only with a high level of flexibility are companies able to adopt their business processes and risk management practices fast enough to survive.

The latest evolutionary step in the KERBEROS solution is Structr’s brand-new Flow Engine (see Fig. 2). It allows business users to define and run database queries and algorithms as a graph without the help of developers, using only the integrated Flow Editor.

Summary


The same graph technology that helped bold data journalists of the ICIJ to analyze and structure leaked data about business entities, wealthy individuals and public officials is just as powerful for other use cases. It turns out graph technology is also ideal in overcoming rigorous compliance challenges such as multi-layered risk management regulations, e.g. anti-money-laundering.

The German company KERBEROS Compliance Management Systems GmbH [1], together with a team at the Neo4j partner Structr [2], has developed an effective IT solution for risk management and money-laundering prevention. It’s a great example of how to fight money-laundering and therefore corruption with Neo4j as the graph platform and Structr as the graph application platform – two products that perfectly match.

Meet Us at GraphConnect New York


At the upcoming GraphConnect 2018 conference in New York (Sep. 20th, 2018), Christian Tsambikakis, Julian Schibberges and Axel Morgner will do an in-depth review on how creating an effective compliance management system is only possible with graph technology. They will also present the upscaled solution they’ve built with Neo4j and Structr for one of the top international providers of sports betting and casino games, enabling them to detect, analyze, quantify, document and report suspicious connections and transactions between their customers and entities of their business partner network.


Structr is a Gold Sponsor of GraphConnect 2018. Use code STR20 to get 20% off your ticket to the conference and training sessions, and we’ll see you in New York!

Meet graph experts from around the globe working on projects just like this one when you attend GraphConnect 2018 on September 20-21. Grab the discount code above and get your ticket today.

Get My (Discounted!) Ticket

GraphConnect 2018 Agenda: Everything You Need to Know

$
0
0
Check out the highlights and final agenda for talks and more at GraphConnect 2018.
You already know there are many exciting reasons you should attend GraphConnect 2018 – and, of course, we’ve got you properly jazzed on this year’s roster of must-see presenters (including our two most recently announced keynote presenters).

But the best reason of all to buy the ticket and take the ride? A bottom-to-top plentiful and packed agenda of all the keynotes, presentations and lightning talks you can attend at the graph technology event of the year.

Check out the highlights and final agenda for talks and more at GraphConnect 2018.

See the full agenda right here on GraphConnect.com or check out the highlights below:

Keynotes


Kicking off the day will be two keynotes, the first given by Neo4j‘s own CEO, Emil Eifrem, who will be talking about the state of the graph union, and – more importantly – where he sees the graph world going in the future (hint: lots of places).

As always, Emil reserves the biggest announcements of the year for his GraphConnect keynote, so you won’t want to miss this big start to an action-packed day.



Shortly after Emil blows your freakin’ mind, Hilary Mason – the GM of Machine Learning at Cloudera – will blow your freakin’ mind again with her keynote “The Present and Future of Artificial Intelligence and Machine Learning.”

As the founder of Fast Forward Labs (acquired by Cloudera) and former Chief Data Scientist at Bitly, Hilary knows what the hell she’s talking about when it comes to machine learning and AI (no buzzword commandeering here).

Closing us out for the day will be Stephen O’Grady, Principal Analyst and Co-founder of RedMonk, a developer-focused industry analyst firm.

Stephen’s areas of expertise center on open source, cloud computing, databases (relational and NoSQL), application development and big data. As a go-to thought leader in the technology space, he’s been quoted in such respected publications as Businessweek, The New York Times and The Wall Street Journal.

Key Topics


AI & Machine Learning: AI and machine learning are on everyone’s minds, but – as we’re starting to see – graph technology is emerging as a key to the context necessary for computer systems to “learn.” At this year’s GraphConnect, there will be no lack of talks, case studies and general discussions about the power of graph technology in supporting AI and machine learning.

Be sure to catch the following presentations: Data Discovery & Graph Visualization: Data discovery is critical to business analysis when it comes to big data. But data discovery is only as effective as the tools used to expose patterns and outliers hidden in the data. At this year’s GraphConnect, you’ll have plenty of chances to hear discovery and data visualization experts speak on how graph technology is their tool of choice.

Highlights of this key topic area include: Biotech & Healthcare: No doubt, biotechnology and healthcare organizations accumulate large, complex volumes of data often stretching across an entire enterprise. From life-saving research to improving the healthcare system as a whole, this year’s GraphConnect features speakers who are using graph technology to drive necessary change.

Must-see presentations on these industries include: Knowledge Graphs: A knowledge graph is conceptually on point in terms of search, but to be truly effective it requires a highly contextual search solution. Neo4j augments the knowledge graph-based search to deliver only relevant results. At this year’s GraphConnect, you will see firsthand how businesses are using graph technology to improve search capabilities of product, services, content and knowledge catalogs.

Talks you won’t want to miss include: Digital Transformation: You know your business would benefit from leveraging existing and emerging technologies, but making this transformation comes with some reasonable trepidation. Will it be a huge pain? Will it be prohibitively expensive? Graph technology is leading the way for digital transformation to be strategic, flexible and quickly deployed.

Talks you won’t want to miss include:
In between talks, be sure to visit the DevZone and Graph Clinic to rub elbows with Neo4j graph database experts.

Throughout the Day: GraphClinics, DevZone and Fikas


All day, whenever you’re between presentations or lightning talks, the GraphClinics are open for free consulting and troubleshooting of your Neo4j deployment. The GraphClinics are staffed by Neo4j engineers and consultants, so you’ll receive invaluable tips and insights from experts of the leading graph database technology.

The Neo4j Developer Relations team is also hosting a DevZone, where members of the team will be on hand to talk with you about using Neo4j Bloom, graph algorithms and AI, ETL tools, graph-based solutions and more!

And be sure to visit the 5th floor foyer for a quick fika, where you’ll have an opportunity to meet Neo4j executives and engineers.

Celebrate an enlightening, busy day by grabbing some snacks and a drink at the disConnect party.

The disConnect Party: (allNodes)-[:PARTY_WITH]->(allOtherNodes)


After a full day of graph tech talks, the time will be nigh to disconnect from all the heady topics, grab a drink and a snack, and start building relationships with your fellow engineers and business execs.

The post-conference disConnect party is located in the Sponsor exhibition hall in the 5th floor lobby of the Marriott Marquis. This is your chance to mingle with graph enthusiasts and chat about all the plans you have to bring about powerful change with graph database technology.


With such a jam-packed agenda and all-star, prestigious speaker lineup, you simply can’t afford to miss GraphConnect 2018. Click below to get your ticket, and we’ll see you on September 20th!

Get Your Ticket

Join Us at GraphHack 2018: Neo4j Buzzword Bingo Hackathon

$
0
0
Learn all about the GraphHack 2018 hackathon happening soon after GraphConnect 2018 in New York City
GraphConnect is almost here, which means it’s time for our annual GraphHack Hackathon!

Join other graph hackers for a fun day of building applications featuring Neo4j’s great integrations with other popular technologies (a.k.a. buzzwords). This year’s event will be hosted at the Stack Overflow office (28th floor) in New York City all day on Saturday, September 22nd.

All Your GraphHack Questions Answered


Do I need a GraphConnect ticket to attend?

Nope. While the GraphHack is always tied to GraphConnect, you don’t need a ticket to attend. We want as many people from the Neo4j community (veterans or newbies) to be able to attend.

Note that you must present a picture ID to enter the building.

What’s this year’s theme?

The topic for this year’s event is Buzzword Bingo.

This year we thought it would be fun to highlight many of the useful Neo4j integrations with other technologies in a “Buzzword Bingo” format. This means that teams will be building applications using Neo4j and other technologies listed on their Bingo cards.

Don’t worry, our Bingo rules are very flexible – a valid submission can use any four technologies listed on the card.

Learn all about the GraphHack 2018 hackathon happening soon after GraphConnect 2018 in New York City

(this card is only an example)


Will there be prizes?

Yes, totally! This year’s prizes include:
    • Oculus Rifts
    • GoPros
    • Bose SoundLink Color Bluetooth Speakers
    • And more!

What’s the full agenda for the hackathon?

Here’s a quick breakdown of the GraphHack schedule:
    • 9:00-10:00 a.m. – Optional Neo4j Workshops (includes breakfast; details)
    • 10:00 a.m.-3:00 p.m. – GraphHack! (includes lunch)
      • 10:00-10:30 a.m. – Form hacking teams
      • 10:30-11:00 a.m. – Presentation / kickoff
      • 11:00 a.m.-3:00 p.m. – Hacking
    • 3:00-4:00 p.m. – Presentations
    • 4:00 p.m. – GraphHack cocktail hour (because you deserve it!)
Plan to spend the day hacking on a cool graph application with new friends.

What if I’m new to Neo4j and graph databases?

If you’re not currently a graph hacker or want to learn more (or brush up on old skills), then come early for a hands-on workshop with an intro to Neo4j and an overview of many of the popular “buzzwords” (i.e., Neo4j integrations) featured at the hackathon.

The optional workshops happening from 9:00 a.m. to 10:00 a.m. include:

    • Intro to Neo4j: New to graph databases and Neo4j? Start here. This workshop will start with an overview of the property graph data model, graph thinking, an introduction to querying Neo4j using Cypher and the Neo4j Browser. This workshop will show how to use the Neo4j Sandbox to work with existing datasets and load your own data.
    • Intermediate Neo4j: For those who have some basic experience with Neo4j, this workshop will focus on how to use the Neo4j drivers to build an application using Cypher, and it will cover more advanced data import techniques like loading from web APIs and using the APOC library.
    • Full-Stack Development with GRANDstack: Learn how to build modern web applications backed by Neo4j using GraphQL, React and Apollo (i.e., the GRANDstack).
    • Natural Language Processing (NLP) with Neo4j: Learn how to apply NLP techniques to enrich the graph model and find insights within large text-based datasets.
    • Building Graph Apps to Run on Neo4j Desktop: Part of the Graph Platform, Graph Apps run directly in Neo4j Desktop and leverage components specifically designed for building graph applications.
    • Spatial Neo4j: Learn how to take advantage of geospatial functionality in Neo4j. This workshop will cover using spatial features with Cypher and how to use OpenStreetMap data for tasks like routing.
    • Data Visualization with yFiles: Visualize and work with graph data using the yFiles data visualization tool from yWorks. Learn how to build interactive graph applications and visually explore your data.
    • GPU Visual Analytics Hands-On with Graphistry Join Graphistry’s CEO on combining GPU visual technology with Neo4j and Notebooks, and hear what is coming with GoAi GPU startups.
Please arrive early if you plan to attend a workshop as they will begin at 9:00 a.m.

What are the categories for winning?

Judges will award prizes to winners in the following categories:
    • Most complete application
    • Best use of Cypher
    • Most buzzwords used
    • Best fully functional application

What are the other rules for the GraphHack?

Here are just a few other rules to keep in mind:
    • You must join a team. (It’s okay to come alone, meet new friends, etc.)
    • You must present at the end of the day for a chance to win.
    • Please include a slide or diagram that shows the different “buzzwords” (integrations) your team used.
    • A valid submission should use any combination of four technologies listed on their Bingo card.
    • You must register your team on the HackDash board so we can keep track of your team members and progress during the hackathon. Be sure to tell us what technology buzzwords you are using in your project description.

Anything else to keep in mind?

You will need a laptop and power cord. We also suggest that you already have Neo4j installed. If you’re not familiar with Neo4j, then definitely attend one of the 9:00 a.m. workshops!

Download Neo4j or try out the (free) hosted Neo4j Sandbox suitable for development.

One last thing: Guests will need a photo ID to enter the building!

GraphHack 2018 Partners


The GraphHack 2018 is in partnership with our friends at:

Logistics & Details:


Date & Time:
Saturday, September 22nd, 2018
9:00 a.m. – 4:00 p.m. EDT

Location:
Stack Overflow
110 William Street
28th Floor
New York, NY 10038
United States


Join us for GraphHack 2018 in New York City!
Click below to RSVP on Eventbrite – see you there?


Sign Me Up

Custom Visualization Solutions: Getting the Most Out of Your Data

$
0
0
Discover custom data visualization solutions with Neo4j and yFiles.
Neo4j is perfect for storing and processing large amounts of connected data. For example, in web analytics, where click-paths and custom events get logged, it is easy to get that data into Neo4j. Once the data is in the graph database, complex queries may be executed that help both site administrators and site owners get a good understanding of how their users and visitors use the website or ecommerce store.

Sure, the results are used to improve the predictions of recommendation engines. Gut with that knowledge, shop architects can also then optimize the structure of their sites and further optimize customer experiences.

Importing the data into Neo4j is one thing – creating or optimizing recommendation engines is a whole different story for which you will find lots of information here, and at this year’s GraphConnect.

Discover custom data visualization solutions with Neo4j and yFiles.

However, one aspect that is often overlooked is how the visualization of graphs helps in understanding the query results and the automatic decisions of recommendation engines, as well as in planning for optimizations of customer journeys.

The Neo4j browser is an invaluable tool for experimenting with Cypher queries and refactoring the schema of your database contents. The developer tool also comes with a visualization component that displays query results in a graphical manner.

The simple, low-level interface is great for developers, but less technically inclined end-users prefer less-generic, highly-customized solutions that suit the exact requirements of their specific use-cases.

There are several aspects that need to be considered to get the most out of data visualization and the best end-user experience:

Rich Visualization


Being able to visualize connected items and their properties in diagrams helps a lot in understanding the data and processes. For the best user-experience rich element visualizations should be used.

Many generic solutions provide a master/detail view of the graph, where mainly the structure of the visualization is presented in one part of the screen. The details behind each of the elements are seen in a separate, mostly textual detail window beside the main visualization.

This quickly becomes difficult to use once the diagram is explored in more detail. Users have to manually select each of the elements one by one to find more information about the element. Exporting such a diagram for reports becomes almost impossible, because the detail information is missing from the visualization or is only available for a single element.

With rich item visualizations, more information can be put into the diagram and included in the visualization for every element on the screen or later in the report.

For example, in a customer journey diagram, not only the name or ID of the page could be shown in the visualization, but also numeric data like “time spent on page,” “drop-off-rate,” “average page value” and “number of visitors.”

These values can be added to the visualization of each element. They don’t need to be represented as naked numbers either, but data can be implemented via color-coding, gauges, level-meters, varying sizes and so on.

Of course, auxiliary textual data is displayed just as well as one or more badges that indicate a certain state of the page or event (“high-performer,” “needs-attention,” “data-anomaly”). These techniques result in drastic improvements in the user experience.



Just like in Neo4j, where different node labels depict a different type of entity in the database, the visualization should make it easy for the user to tell the difference between the different kinds of elements. Not all elements need to follow the same visualization scheme. Instead, the visualization should make it easy for the user to recognize which types of elements are visualized.

Sometimes there is a problem with having many details shown inside the visualization: If the number of elements in the diagram or on the screen gets large – if the user needs to zoom out of the diagram to understand the bigger picture – that additional information can become distracting and make the graph impossible to decipher.

In this case, sophisticated visualization solutions can use a technique called “level of detail rendering.” Depending on the number of elements on the screen, or the zoom level, element visualizations of varying detail can be employed.

Once the user zooms into the diagram, more information becomes visible. If the visualization permits, switching between those levels can be animated, resulting in smooth transitions between the various levels of detail.



Arranging Elements Meaningfully


Another aspect to diagram visualization, that’s just as important as how to render the elements on the screen, is where to place those elements. For many applications, the location of the elements in the diagram can encode additional information for the user right into the visualization. Simple solutions only try to reduce the distance between elements that are interconnected by relations.

More advanced placement algorithms can encode additional information into the layout of the diagram. That information could be importance or a total ordering of elements. More important elements could be rendered on top of the drawing. Or, if there is a flow of data or goods, a common main direction of the connections in the diagram helps a lot in understanding the flow through the data.

But there are more subtle things than flow and order. Alignment and different connection routing styles help just the same – not all relationships must be visualized by an arrow pointing from one element to another. Especially for hierarchically nested structures and relationships, using container visualizations or partitions in the background often are the better choice and lead to less cluttered, more enjoyable diagrams.



Sometimes there isn’t a single aspect or layout that suits the application domain, and sometimes the diagrams’ contents change over time.

For these scenarios, it’s important to have versatile layout algorithms, as well as smooth transitions between the various visualizations. The latter helps the users keep their mental maps of the diagram data. Watching the animations is a pleasure, too.

Interaction and Animation


Finally, interactivity is king when it comes to complex visualizations. If you can give the user the option to interact with the visualization, they will not only enjoy navigating and exploring the database; they will be able to do this a lot more efficiently, too.

This is not just about being able to pan the drawing area. Even for the simple use-case of exploration, things like mouse-hovers, highlights, tool-tips and context menus, contextual filters, guided view port animations, etc. can improve the user-experience dramatically.



A Perfect Match: Neo4j and yFiles


Neo4j is the right tool to store and query your data, and the Neo4j Browser is great for developers.

For end-user facing applications, more elaborate visualization solutions exist. For connected data, the graph visualization library yFiles is a perfect match, since it provides all the possibilities described above.

Here’s how yFiles is employed in the above scenarios:

    • With yFiles, you don’t need to just render the raw data in the database. With software-defined mappings you can show more useful abstractions of the data stored in Neo4j.
    • Create perfect item visualizations, specific to the data that will be displayed on the screen. Level of detail rendering and custom reactive displays of the data result in easy-to-understand-and-follow diagrams and visualizations.
    • With the right layout, the structure of the data and their dependencies and relationships can be highlighted. Clever edge-routing and labeling algorithms ensure that no information is hidden behind other items. This removes the need for the user to manually untangle the diagram. The yFiles library comes with the most complete set of diverse automatic layout algorithms for you to play with and choose from.
    • Interactions and built-in animations help the user understand and navigate the diagram more easily and increase the user-experience.
    • You can add user interactions and the ability to edit both the structure and the properties of the diagram. Let interactions trigger updates in the Neo4j database or in external systems.
    • Embed the visualizations into new or existing end-user applications as a white-label solution. Create standalone applications or integrate the functionality into larger dashboards and tools.
It is important to understand that yFiles is much more than an off-the-shelf application with only limited customization options. yFiles is a powerful tool for developers to create applications that meet even the most advanced requirements.

Explore the demos and download yFiles today and enjoy the endless possibilities you get when combining it with the power of Neo4j.

Happy diagramming!

yWorks is a Gold Sponsor of GraphConnect 2018. Use code YWO20 to get 20% off your ticket to the conference and training sessions, and we’ll see you in New York!

Meet graph experts from around the globe working on projects just like this one when you attend GraphConnect 2018 on September 20-21. Grab the discount code above and get your ticket today.

Get My (Discounted!) Ticket

Bring Order to Chaos: A Graph-Based Journey from Textual Data to Wisdom

$
0
0
Discover how to turn data into wisdom through visualization.
Data is everywhere. News, blog posts, emails, videos and chats are just a few examples of the multiple streams of data we encounter on a daily basis. The majority of these streams contain textual data – written language – containing countless facts, observations, perspectives and insights that could make or break your business.

Be overwhelmed by this graphic of data generation.

Photo source

The data, in its native form, is completely useless because it doesn’t provide any value. It is sparse, distributed and unstructured – it is chaotic.

To make sense of the data, we have to transform and organize it – a process that produces information. However, for the information to become “knowledge,” which is learned, requires more work. Knowledge is connected information. There is a big jump between information and knowledge. It is a quality change, but it is not an easy change. It requires a transformation process which, by connecting the dots, creates sense, significance and meaning from the information.

Discover how to turn data into wisdom through visualization.

Insight and wisdom are above knowledge. They aim to identify meaningful pieces of information and relate them to each other by using, for instance, cause-effect relationships, similarity or dissimilarity. Insight and wisdom gained from connected data provide guidance on producing better products, making users happier, reducing costs, delivering new services, etc.

This is how to realize the full value of data, after a long transformation path, in which machine learning provides the necessary “intelligence” for distilling value from it. The graph database supports a proper descriptive model for representing knowledge, as well as a powerful processing framework to get wisdom back in return.

A mental shift (from classical KPI-based), new computational tools, and a proper “representational” model, are required to help organize and analyze vast amounts of information.

This blog post describes some of the techniques needed to bring order to the chaos of unstructured data using GraphAware Hume (formerly known as GraphAware Knowledge Platform) and Neo4j.

Hume transforms your data into searchable, understandable and actionable knowledge by combining state-of-the-art techniques from natural language understanding, graph analysis and deep learning to deliver a wide range of solutions for your most challenging problems.

Step 1: Representation Matter


Whether you are working on an enterprise search engine, a recommendation engine or any kind of analytics platform, the traditional approach to organizing text, based on pure inverted index – common in all the search engines – is not flexible enough to handle the multiple machine learning algorithms required for processing it. Inverted index organizes the data for fast retrieving; it doesn’t produce or store any knowledge.

The task of transforming data into knowledge has two main challenges: knowledge representation and knowledge learning and construction.

Knowledge representation refers to the way in which information is modeled so that a computer program can access it autonomously for solving complex tasks. It plays a fundamental role since, if properly designed, it speeds up processing by making the concepts [re]usable and extensible. It represents an ordered and connected version of the same information that’s otherwise isolated, distributed and disorganized.

A knowledge graph is the representational model used in Hume. Knowledge graphs consist of a set of interconnected typed entities and their attributes. Here, the knowledge graph sits in the middle of the evolutionary path of data and represents the concrete enabler for AI. It collects and organizes the data from multiple data sources and analyses results, providing flexible and extensible access patterns to it.

Hume uses a combination of frameworks and technologies borrowed from Natural Language Processing [1] (NLP) and, more generally, machine learning as well as external knowledge sources for knowledge learning and construction. Hume’s knowledge graph creation and analysis process is described in the following image.

See a process workflow of data.

The order of the steps above can change and each step can be executed multiple times. Step by step, the knowledge graph grows in content and capability to organize and connect concepts and documents. At first, Hume extracts the text’s structure, and represents it in the first knowledge graph.

View this representation of the first graph visualization extracted from Hume.

Hume’s knowledge graph has been modeled to allow multiple representations of the text for feeding other algorithms in the pipeline.

Let’s consider the most common:

Bag of Words (BoW): represents a text (such as a sentence or a document) as the multiset (a bag) of its words, disregarding grammar and order but keeping frequency that represent the words weight in the vector.

TF-IDF: Extends the BoW’s weighting schema, based on the pure words’ frequency in the text (Term Frequency, TF) considering it relatively to the number of times they occur in the overall corpus (Inverse Document Frequency, IDF). Words that appear more often (compared with all the corpus) in the current text are more relevant.

N-Gram: BoW and TF-IDF lose a lot of the meaning inherent in the order of words in the original sentence. By extending the representation to include multi-word tokens the NLP pipeline can retain much of the meaning inherent in the order of words in our statements. N-Grams are sequences containing up to N tokens which appear one after the other in the original text.

Co-Occurrence Graph: It is a graph representation of a document where each node is a word and an edge among words exists if the connected words appear in a N-grams. This is a totally different text representation compared with the vector-based. In Hume, it is the input for keywords extraction algorithms which use PageRank to find the most interesting words in the text. Details are available in a previous blog post. Here’s an example on how to extract BoW vector from Hume’s knowledge graph:

match (n:Document)
where id(n) = 
match (n)-[:HAS_ANNOTATED_TEXT]->(:AnnotatedText)-[:CONTAINS_SENTENCE]->(:Sentence)-[r:HAS_TAG]->(t:Tag)
with n, t, sum(r.tf) as tf
return collect(t.value + " : " + tf) as BoW


Which generates the following result:

Bag of words.

Step 2: Every Word Counts


The first step extracts the text’s hidden structure using grammatical and lexical analysis. This analysis creates a basic graph that can be used for further analysis, but it doesn’t provide any hint about the meaning of the words or their semantic relationships.

The second step uses machine learning techniques and external sources to enrich the knowledge graph with word’s meanings.

Named Entity Recognition


Named entities are specific language elements that belong to certain well-known categories, such as people names, locations, organizations, chemical elements, devices, etc.

Recognizing them allows Hume to:

    • Improve search capabilities
    • Connect documents (e.g connecting people in a financial document with information from a business registry)
    • Relate causes (e.g weather conditions, accidents, news) with effects (e.g. flight or tram delay, stock price changes)
There are several approaches to Named Entity Recognition, which typically require extensive training or complex configuration.

By combining multiple techniques and algorithms, Hume delivershigh-quality Named Entity Recognition models. We’ve created NER models that can be quickly added into projects for some of the most common use cases like Companies, People, Points of interest, etc. Adding named entities to the knowledge graph gives Hume more contextual information to use for building connections.

View this visual data extract from Hume.

Word2Vec


BoW, TF-IDF and N-Grams treat words as atomic units. The advantage of that approach is simplicity and robustness. However, to transform text into knowledge, you need to identify semantic relations between words.

Word2Vec is a deep learning algorithm that encodes the meaning of words in vectors of modest dimensions [2]. The algorithm learns the meaning of words by processing a large corpus of unlabeled text. No one has to tell the algorithm that the “Timbers” are a soccer team, that Los Angeles and San Francisco are cities, that soccer is a sport, or that a team is a group of people. Word2vec can learn those things and much more on its own. All you need is a corpus large enough to mention “Timbers,” “Los Angeles” and “San Francisco” near other words associated with soccer or cities.

Hume provides comprehensive support for word2vec including:

    • Computing word2vec from the imported corpus
    • Importing word2vec (tested with Numberbatch and Facebook fasttext)
    • Computing similarity between words
Computing or importing the vector for each tag in the knowledge graph allows you to extend tag nodes with a property that can be used to computes semantic distances between words. These distances are valuable since they express how much two words are related and can be used in multiple ways.

For instance, in Hume, the distances are used for filtering out spurious named entities or finding more relevant concepts in the ontology hierarchies imported (described later).

View this Hume filter for spurious names.

Ontology Enrichment


Sometimes the text in the corpus is not comprehensive enough for machines to automatically find the kinds of connections that humans can easily find.

Suppose you are analyzing some news and you find two articles describing earthquakes that were felt in Los Angeles and San Francisco respectively. The machine can easily identify the two cities as locations, but it may not connect these two events because they happened in distinct locations.

To solve this problem, Hume integrates with multiple external knowledge bases. These knowledge bases are designed to help computers understand the meaning of words by building a hierarchy of concepts. Hume queries external knowledge on demand to find new relationships.

The Cypher procedure that implements enrichment can be invoked as follows:

MATCH (n:Tag)
CALL ga.nlp.enrich.concept({enricher: 'conceptnet5', tag: n, depth:1, admittedRelationships: ["IsA","PartOf"]})
YIELD result
RETURN result
CYPHER QUERY


In our example scenario, Hume will learn that Los Angeles and San Francisco are both located in California, which gives it another way to connect the two events in the news articles. The enriched version of the knowledge graph looks like:

Enriched version of the knowledge graph.

Step 3: Close To Me


A powerful navigational pattern for large datasets is finding related content based on similarity. While reading a paragraph, it could be helpful for the reader to be able to find other content that expresses the same idea in a simpler or more detailed way.

Hume supports similarity computation at different levels, including documents, paragraphs, sentences and words through simple procedures.

MATCH (a:Tag:VectorContainer)
WITH collect(a) as nodes
CALL ga.nlp.ml.similarity.cosine({
input:nodes, 
property:'word2vec'})
YIELD result
return result;


Storing distances or similarities (as you would prefer to see them) is a trivial task in a graph, here the result.

Knowledge graph storing distances or similarities.

Similarities between items are useful not only for navigation – they are part of graph construction techniques which help to create a graph where we can run PageRank to identify, for instance, relevant paragraphs. This approach allows Hume to provide summarization.

Step 4: Like By Like


The typical (old style) way we access and navigate information is by using search and link. We type keywords into a search engine and find a set of documents related to them. We then go over the documents in that result set and possibly navigate to other linked documents.

This approach is a useful way for interacting with online archives, but has many limitations since you have to know upfront the keywords and the filters. With the amount of text available today, it is impossible for humans to access it in an effective way using this approach.

Suppose you could have a mechanism that allows you to “zoom in” and “zoom out” to find specific or broader themes; you might look at how those themes changed through time or how they are connected to each other. So, rather than finding documents through keyword search before, you might first find the theme that you are interested in, and then examine the documents related to that theme.

By leveraging machine learning tools, Hume allows you to organize the corpus in themes or topics. The resulting “thematic structure” is a new view which you can use to explore and digest the collection of documents.

Probabilistic Topic Modeling


Probabilistic topic modeling algorithms, such as Latent Dirichlet Allocation (LDA), are statistical methods that analyze the words in the texts to discover the themes that run through them, how those themes are connected to each other, and how they change over time [3].

Probabilistic topic modeling algorithms do not require any prior annotations or labeling of the documents – topics emerge autonomously from the analysis of the original texts. This is a huge advantage for this kind of algorithm since they don’t require any “previous” effort in annotating documents. The topics emerge from the corpus itself.

View this depiction of probabilistic topic modeling.

Hume provides topic modeling by using LDA through a couple of procedures:

CALL ga.nlp.ml.enterprise.lda.compute({
 	iterations: 10,
 	clusters:35,
 	topicLabel:'LDATopic'})
YIELD result
RETURN result


Once computed, topics point to the related documents. They become new entry points for accessing or navigating your information.

View this knowledge graph with new entry points for navigating information.

Each topic is described by a number of words since the machine cannot (yet) abstract a single word that summarize the content in the cluster.

Step 5: Caring About Sentiment


It is often useful to relate a piece of text with the sentiment expressed in it. Extracting and processing sentiments from text provides a new emotional access pattern to your corpus and also new knowledge that reveals new insights.

Suppose you want to build a recommendation engine which leverages reviews to spot detailed strengths and weaknesses of different hotels (e.g. good location but bad staff).

Sentiment analysis is a difficult task, because in different contexts the same sentence can have different meanings. Many models predict sentiment based on the BoW approach, while others use a recursive deep neural network to build a representation of complex underlying structure of sentences [4].

Hume integrates and combines multiple approaches. Users can choose from or customize a sentiment model for their specific use case. In a previous blog post, we compared the different approaches, all available in Hume, to show the advantages and disadvantages of each of them.

The knowledge graph improved with sentiment information looks like:

Knowledge graph with sentiment information.

Sentiment can be computed either for the entire document or for each sentence accordingly to the specific use case. Having such sentiment it can be easily related to people, keywords, topics, etc.

Conclusion


The techniques, tools and the knowledge graph representation described here show how to bring order to the chaos inherent in unstructured data.

By integrating these techniques and others, Hume makes it easier for you to transform your data into actionable knowledge which will help you realize the full value of your data, create new services, deliver better results, improve productivity and reduce costs.

Get in touch with GraphAware to see what Hume can do for you.

Bibliography


[1], Cole Howard, Hannes Hapke, and Hobson Lane, “Natural Language Processing In Action”, Manning, 2018

[2] Tomas Mikolov, “Statistical Language Models Based on Neural Networks”. PhD thesis, PhD Thesis, Brno University of Technology, 2012.

[3] David M. Blei, “Probabilistic Topic Modeling” , Communications of the ACM, April 2012, Vol. 55 No. 4, Pages 77-84

[4] Richard Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank”, Conference on Empirical Methods in Natural Language Processing (EMNLP 2013)


GraphAware is a Gold Sponsor of GraphConnect 2018. Use code GRA20 to get 20% off your ticket to the conference and training sessions, and we’ll see you in New York!

Meet graph experts from around the globe working on projects just like this one when you attend GraphConnect 2018 on September 20-21. Grab the discount code above and get your ticket today.

Get My (Discounted!) Ticket

Graphs4Good: Connected Data for a Better World

$
0
0
Learn about the Graphs4Good project and how it supports graph-powered, positive social change
You’re reading this because of a napkin.

It was the year 2000, and I was on a flight to Mumbai. Peter, Johan and I had been building an enterprise content management system (ECM) but kept running up against the challenge of using an RDBMS for querying connected data.

That’s when an idea struck. I grabbed a napkin and quickly sketched the first property graph model. Together, we used the idea on that napkin to create the world’s first graph database: Neo4j.

What I couldn’t have possibly imagined was the worldwide impact that tiny back-of-the-napkin sketch would create. There’s been a huge business impact to be sure, but there’s also been a steadily growing and (up to this point) quiet impact outside of bottom-line considerations.

Collectively, the Neo4j community has been using graph technology to solve some of society’s most pressing problems: climate change, curing cancer, advancing women, money laundering and pushing the boundaries of human knowledge.

Whether they’re working at non-profits, government agencies, newsrooms or research labs, these changemakers are working countless hours to make the world a better place. Sometimes they’re individuals and sometimes they’re disparate global teams, but they’re all working against strong headwinds with limited resources and tight (or non-existent) budgets.

Yet, they persist in their work – investigating shady connections, unwinding tangled webs of illegal activity, modeling the molecules that wreak havoc on the human body (alongside those that heal it), and mapping our course to the stars.

Here and there, you may have heard of some of their projects, but as a whole, they’ve gone unsung. Today, that changes.

Introducing Graphs4Good


Neo4j’s company vision is to help the world make sense of data. But building a graph database (and other graph technology products) is only a part of bringing that vision into reality.

I believe a key pillar of that vision is connecting and enabling those who work with data so they’re more effective. And that’s precisely why we’re launching the Graphs4Good program.

Learn about the Graphs4Good project and how it supports graph-powered, positive social change


This new program aims to showcase – and then support, encourage and connect others to – graph-powered projects that effect positive social change, uphold democratic principles and take on some of the world’s toughest challenges.

While we’re officially launching Graphs4Good today (at GraphConnect 2018), the success of these projects – and the support that Neo4j has given them – has already been ongoing for many years. Let me tell you some of their stories.

Graphs4Good in Data Journalism


This all starts with the International Consortium of Investigative Journalists (ICIJ). In 2015, they started their groundbreaking work with Neo4j and graph visualization tool Linkurious for their work on the Swiss Leaks story – a look into the secretive and shady world of Swiss private banking.

From there, they shook the world with the largest data leak in recorded history: The Panama Papers. The leak of 11.5 million documents – around 2.6 Terabytes – from the Panamanian law firm Mossack Fonseca recorded 40 years’ worth of transactions, accounts and intricate webs of shell companies from some of the world’s biggest names.

The subsequent reporting exposed the fraud, tax evasion, money laundering and evasion of international sanctions (along with other illicit activities) among national leaders and celebrities across more than 45 countries. In sum, reporters at 100 news media outlets working in 25 languages used the leaked documents to expose corruption.

Their hard work would eventually win them a Pulitzer Prize, but none of this would have been possible without an accessible model of connected data.

The possibilities of the connected data model


A year later, the ICIJ released the Paradise Papers: a 13.5-million-document, 1.4-Terabyte leak from Appleby and Asiaciti Trust. Over 380 journalists used this leak to expose the wrongdoings (whether illegal, unethical or both) of big names like Apple, Nike and Facebook, in addition to heads of state such as Queen Elizabeth and members of the Trump administration.

It was data journalism investigations like these – alongside the continued spread of fake news – that inspired the Neo4j team to initiate the Neo4j Connected Data Fellowship at the ICIJ, as well as continue our support of the Neo4j Data Journalism Accelerator Program.

Since these major projects, the ICIJ has kept up their good work with the West Africa Leaks and a second (but much smaller) leak of Panama Papers from Mossack Fonseca.

Within the world of data journalism, I also have to mention the great work done by the team at NBC News at mapping the intricate connections between 200,000 tweets by Russian trolls.

Work like this is the heart and soul of what we hope the Graphs4Good program will continue to showcase and support. But data journalism isn’t the only sector where graph technology has been making a positive impact.

Graphs4Good in Other Sectors


How could I possibly summarize all of the other impactful projects being powered by graph technology? I can’t – these are only a few notable highlights from names you might recognize. Many, many more are listed on our new Graphs4Good program page.

From the dawn of human life, we’ve been fighting disease. We’ve already won against some of the most deadly maladies, but there’s a long way to go, especially when it comes to cancer. I’m personally aware of eight different graph-powered projects working on a cure for cancer. One powerful story in that space is the Candiolo Cancer Institute, and I encourage you to go and read their story.

In a similar vein, we’re also happy to welcome the German Center for Diabetes Research (DZD) who uses Neo4j to research the complex struggle to both help current diabetes patients and to eliminate the condition once and for all.

Pivoting to an entirely different field, NASA has been using a Neo4j knowledge graph to advance their mission to Mars.

David Meza, the Chief Knowledge Architect at NASA, has said that, “Neo4j saved well over two years of work and one million dollars of taxpayer funds,” specifically in relation to their Orion spacecraft missions. I think we can all agree such work more than qualifies as good.

Impact of Neo4j at NASA


Along the same lines of scientific inquiry, we also welcome the International Salmon Data Laboratory (officially launching in October 2018) to the Graphs4Good program in order to support their work on the holistic analysis of salmon habitats worldwide – and the ecological implications beyond.

And finally, this very brief summary of positive-impact graph projects can’t fail to mention what might seem like a small thing for now – but its ripples will be felt for years to come.

This past summer, Neo4j team members were able to sponsor and lead a data science camp through Pink Programming – an organization with the mission to advance the role of women in technology. While the camp this year was small (20 women), they were able to learn how to use graph technology to grow their skills in data science. Twenty years from now, I hope to hear the stories of how those women were able to positively influence the tech ecosystem for decades to come.

Concluding Thoughts


Graphs are a new(-old) way of looking at the world, and that new perspective gives us a tremendous power to transform our society for the better.

Each day, I hear stories of how graph technology is being deployed to make positive change, and I continue to be amazed and humbled by the world-changing work of the Neo4j community. Many of the projects in the Graphs4Good program were born as (and some continue to be) community-led, open source projects.

I cannot emphasize that factor enough: The Neo4j community is here to help you make a positive global impact. This community is a friendly and dedicated graph of people who generously contribute their time and effort to get your graph-powered project up and running (or keep it running smoothly) – no matter your cause.

Today, we launch the Graphs4Good project with hundreds of stories of positive impact using connected data. Next year, let’s make it thousands.

Our team’s impact started with a napkin. Yours starts with a graph platform – what will you do with it?

Emil

Meet the 2018 Graphie Award Winners

$
0
0
Check out who won the 2018 Graphies at GraphConnect including eBay, Adobe, Comcast and others
Last night at GraphConnect 2018, we announced the 11 winners of the 2018 Neo4j Graphie Awards at the conference’s closing reception.

The Graphie Awards celebrate the world’s most innovative graph technology applications, recognizing success in connected data across multiple categories – not only for Neo4j customers but across the entire Neo4j community and ecosystem.

Check out who won the 2018 Graphies at GraphConnect including eBay, Adobe, Comcast and others

Partners, startup program members, data journalists, community members and ambassadors were all encouraged to submit their graph-based projects for consideration.

We had a wonderful pool of nominations, and choosing just 11 winners was definitely a difficult task. Here’s who got to take the final

The 2018 Graphie Award Winners


Congratulations to the 2018 Graphie winners, who include:

Adobe logo

Adobe Behance Graph Impact: Cloud Infrastructure Savings
Adobe Behance drastically reduced infrastructure costs and increased performance using Neo4j, resulting in an order-of-magnitude decrease in DevOps staff hours and improvement in sign-in to initial activity experience.

Microsoft

Microsoft Graph Impact: Scalability
To perform customer segmentation with near infinite granularity across massive datasets, Microsoft turned to Neo4j for a graph that provides their sales and marketing organization with the data needed to enter new markets.

Comcast

Comcast Graph Vision: Artificial Intelligence
Comcast uses Neo4j to personalize and enrich their customer experience from Xfinity X1 to XFi.



eBay Graph Impact: Revenue & Reach
To power hub-pages and “virtual browse nodes” that aid in discoverability and PageRank score, eBay partnered with Neo4j and was able to improve their Google search rankings.

Neo4j Customer: Pitney Bowes

Pitney Bowes Graph Vision: Embedded Machine Learning
By combining Neo4j’s graph platform with powerful machine learning, Pitney Bowes helped their clients to identify money laundering networks, Ponzi schemes and fraud structures three years before they were reported.

DZD German Center for Diabetes Research

The German Center for Diabetes Research (DZD) Graph Impact: Medical Research
The German Center for Diabetes Research uses Neo4j to combine heterogeneous research data sources and connect data across disciplines and locations in order to halt the emergence and progression of diabetes.

DXC Technology

DXC Technology Graph Impact: Digital Transformation
DXC Technology is globally impacting how customers engage and view the potential of digital transformation within their industry, and using Neo4j, they were able to identify $500M of new potential revenue worldwide.

Convergys

Convergys Graph Impact: Risk & Compliance
Convergys used Neo4j to deliver a successful hybrid cloud GDPR solution spanning multiple data sources across hundreds of clients and +10,000 employees.

Graphen

Graphen Graph Impact: Graph Analytics
Graphen built a tool to migrate relational data into Neo4j, as well as an immersive augmented virtual reality environment to visualize graph data.

Juit

Juit Graph Impact: Unstructured Data
Juit used Neo4j to consolidate data across the complex Brazilian legal system, comprising 93 courts, 18,000+ judges, 60+ legal research sources, 200+ million lawsuits and 26 million verdicts to simplify and accelerate jurisprudence research.

Iryna Feuerstein, Software Developer, Prodyna

Iryna Feuerstein Graph Community MVP
Iryna is recognized for her commitment to teaching and growing the Neo4j community since 2014. She runs the Neo4j Düsseldorf meetup, has delivered multiple Neo4j + R workshops and has written articles on graph data processing with Neo4 and Apache Spark. Iryna is a consultant for Neo4j partner PRODYNA.

Get Your Graphie Next Year!


Didn’t get an award this year? We know we’re only honoring a sliver of the great ideas out there and we want to see more awards (and especially more nominations) in the future.

The great news is that we will now be doing the Graphies annually. So if your graph-powered project or application is pushes the envelope, rises to a meet a new challenge or develops an innovative graph use case, you’re probably on track for a 2019 Graphie.

Award categories will not necessarily always remain the same either, so if you felt like your project didn’t fit into any of the awards above, take heart – you’re not out of the running at all.

For more information on this year’s Graphie Award winners, visit neo4j.com/graphies.


Need to sharpen your graph skills in order to get a Graphie next year? Your training starts today.
Click below to register for our online training class, Get Started with Graph Databases using Neo4j and you’ll master the world of graph technology in no time.


Sign Me Up

Effective Internal Risk Models for FRTB Compliance: Risk Modeling Requires Data Lineage

$
0
0
Discover why data lineage is so important for RFTB compliance and risk modeling.
Where did the data come from, originally? That’s a key question that the Fundamental Review of the Trading Book (FRTB) rules will require banks to answer in real-time. Banks must be able to decompose risk models to uncover the full lineage of investment data. Think of it as Ancestry.com for data.

In this series on the FRTB, we explore what it takes to create effective internal risk models using a graph database like Neo4j. Last week, we looked at the major areas impacted by the FRTB. This week we’ll explore the relationship between risk modeling and data lineage, and next week we’ll describe why modern graph technology is an effective foundation for compliance applications.


Discover why data lineage is so important for RFTB compliance and risk modeling.

Risk Modeling Requires Data Lineage


Risk modeling – especially at large banks, hedge funds and aggressive investment houses – has complex requirements and requires organizations to trace data connections across a web of investment baskets, holdings, financial instruments and pricing data.

FRTB requirements for historical testing require banks to decompose risk models into their individual risk components and trace back through time to available pricing and position information. This requires data managers to uncover the lineage of their investment information, including:

    • Which data is relevant
    • How the data is sourced or calculated
    • Whether data sources are authentic and authoritative
    • What risk factors affect all upstream information dependencies
    • Whether all calculations are based on approved BCBS aggregation rules
    • Where and how the data maps into the bank’s risk model
Banks must be able to trace data dependencies through many levels of complexity before reaching original, authoritative data sources – a crucial requirement that existing bank systems simply can’t address. This shortcoming is a key reason why FRTB compliance requirements have been delayed from 2019 to January 2022.

Learn about the connectedness of investment data.

The diagram above illustrates the connectedness of investment data, but it is a gross simplification of the actual interdependencies that exist among trading desks, investment baskets, investment instruments and market prices.

The image below drills down deeper on the Batteries Basket but still presents a high-level picture of the connected data that drives investment risk analysis.

See what connected data risk analysis looks like.

Building and Testing Internal Risk Models


After internal analysts identify risk factors in investment strategies, banks must link all interrelated risk data to ensure they can trace risk inherent in their models now and in the future. In the diagram below, tracing data lineage starts at the trading desk and moves to the right through baskets, holdings and prices, eventually reaching authoritative data sources on the right.

Once the lineage of each investment position is modeled back to its most basic components, risk applications plug prices into the model and move back to the left, aggregating risk back up to the level of the trading desk. At the end of the analysis, the risk model calculates a capital reserve ratio that mitigates the risk inherent in the investment strategy.

Track risk data lineage for risk aggregation.

Banks must then prove the accuracy of those models all the way back to 2007 using real data for each individual risk factor to which trading desks are exposed. After the internal risk models are backtested historically and approved by regulators, banks must continuously maintain them by evaluating 24 dates from the last year using real position and pricing data with no two evaluations more than one month apart.

When banks fail to meet this requirement for any risk, it is deemed a non-modellable risk factor (NMRF), and the bank must set aside capital to offset the increased risk.

Evaluating model risk.

Data Governance Is Key to Risk Modeling


Modeling bank risk is challenging due to a variety of investment, regulatory and data management factors. At virtually all banks, investment data detailing trades, holdings, historical prices and market prices reside in discrete data silos. And those silos often exist at various trading desks or other locations rather than centralized at the institutional level – making risk data management all the more daunting.

More importantly, the demands of bank risk modeling go far beyond the calculations used in traditional financial and analytic models. The added complexity stems from the interrelatedness and complexity of market information and the ever-rising diversity and complexity of investment instruments and positions.

Those dependencies can cascade many levels deep, making their associated risks all but impossible to visualize or calculate. That very interdependence brought many funds and trading desks down like a house of cards in the wake of the Lehman Brothers collapse a decade ago.

These complexities demand that compliance efforts begin with a bullet-proof data governance foundation. Without such a framework, risk aggregation, reserve calculations and required reporting are nearly impossible to achieve.

Conclusion


Banks need effective internal risk models that can trace many layers of dependencies. As prices and positions change in everyday operations, internal models must reevaluate risk multiple times a day, and more often during market events. In the coming weeks, we’ll take a closer look at how modern graph technology provides a strong foundation for risk modeling that serves both compliance requirements and drives innovation.


Risk demands a strong foundation
Find out why leading financial services firms rely on Neo4j graph technology for compliance and innovation, Effective Internal Risk Models Require a New Technology Foundation. Click below to get your free copy.


Read the White Paper


Catch up with the rest of the FRTB Compliance and Neo4j blog series:

ICYMI: (Mostly) Everything Important that Happened at GraphConnect 2018

$
0
0
In case you missed it, here's a complete recap on everything at GraphConnect 2018
Unless you live under a graph database rock, then you know that the biggest graph technology event of the year – GraphConnect 2018 – happened last week, and it was wondrous to behold.

From September 20-22, Neo4j customers, partners, celebrities, prospects, skeptics, employees and community members all came together for a fantastic few days of conference sessions, training workshops and a massive hackathon.

In case you missed it (ICYMI) – really, we’re kind of wondering why you weren’t there – here are the highlights of everything important from GraphConnect 2018.

In case you missed it, here's a complete recap on everything at GraphConnect 2018


Live-Streamed Keynotes from Hilary Mason & Emil Eifrem (Watch Now)


By far the biggest happenings at GraphConnect 2018 were the keynote addresses by Neo4j CEO Emil Eifrem and by Hilary Mason, the GM of Machine Learning at Cloudera (also the founding CEO of Fast Forward Labs, acquired by Cloudera in 2017). Both keynotes were livestreamed and are available to watch now.

Watch Emil Eifrem’s keynote on the State of the Graph here:



Catch up with Hilary Mason’s keynote, “The Present and Future of Artificial Intelligence & Machine Learning” right here:



The video recording of the closing keynote from Stephen O’Grady – Founder and Principal Analyst of RedMonk – will be available soon.

Neo4j 3.5 Announced and Now Available for Preview


It wouldn’t be a proper GraphConnect if we didn’t give you a breakdown of what’s coming up in the latest release of the Neo4j Graph Platform.

During Emil’s keynote, Neo4j VP of Products Philip Rathle gave GraphConnect attendees an overview of what features will be rolling out in the Neo4j 3.5 release slated for later this year. Watch his presentation below:



But you don’t have to take our word for it. Here’s a sampling of what the tech press also thought about Neo4j 3.5:

Special Thanks to GraphConnect 2018 Sponsors


EY was the platinum sponsor of GraphConnect 2018


GraphConnect wouldn’t be possible without our awesome conference sponsors who supported the event. Some sponsors also gave talks and participated in the hackathon.

We’d like to extend extra special thanks to platinum sponsor EY for their support of GraphConnect 2018. Their team shared the Neo4j-based solutions they’re bringing to market and had a little fun at the booth with a virtual reality graph explorer that showcased connected data in another dimension.

Throughout the day, the sponsor area was busy with other amazing demos and many meaningful conversations between sponsors and attendees. From data governance to digital transformation, our sponsors showed off their value to Neo4j customers, prospects and even other partners with solutions built on or around Neo4j.



Launch of the Graphs4Good Program


At Neo4j, our company and community vision is to help the world make sense of data. That’s why at GraphConnect 2018, we launched the Graphs4Good program to connect and enable those who work with data so that they’re more effective.

This new program aims to showcase – and then support, encourage and connect others to – graph-powered projects that effect positive social change, uphold democratic principles and take on some of the world’s toughest challenges.

Check out Emil’s blog post announcing the launch. Also, be sure to read what RTInsights had to say about the impact of the Graphs4Good program.

2018 Graphie Awards Recognize Innovative Graph Tech


Finally, we wrapped up GraphConnect 2018 with the Graphie Awards.

The Graphie Awards celebrate the world’s most innovative graph technology applications, recognizing success in connected data across multiple categories – not only for Neo4j customers but across the entire Neo4j community and ecosystem.

Read about this year’s 11 winners – including Microsoft, Adobe, Comcast, eBay, Pitney Bowes and more – in this wrap-up article of the 2018 Graphie Award winners.

It’s Never Too Early to Think about Next Year


We really hoped you enjoyed GraphConnect 2018 – or that you now insanely regret not going!

We couldn’t have done it without all of the many contributions by Neo4j partners, customers and community members who not only made the event a success but also make the Neo4j ecosystem such a joy to be part of throughout the entire year.

Video recordings of all other breakout sessions will be uploaded to the Neo4j YouTube channel in the next two to three weeks. Be sure to subscribe and enable notifications so that you know precisely when GraphConnect videos drop on our channel.

It’s never too early to think about next year and what GraphConnect 2019 will hold – and what you might contribute to it. We hope to see you there.


New to the world of graph technology?
Download this white paper, The Top 5 Use Cases of Graph Databases, and discover the diverse power of graph tech for your enterprise.


Get the White Paper

Mapping a Connected World: The Value of Geospatial Graph Visualization

$
0
0
See how Keylines Combos create deeper graph visualization.
One of the world’s first maps featured the night sky. Surprisingly, it also featured connected data.

Over 1,000 years ago, Chinese astronomers used manuscript to plot 1,300 stars, recognizing the importance of making connections between clusters to identify constellations. Their mathematical projections turned out to be remarkably accurate, despite relying solely on the naked eye.

See how Chinese astronomers mapped the stars over 1,000 years ago.

An extract from the Dunhuang Chinese Star Chart, currently held at the British Library.

Back to the 21st century – whether it’s monitoring traffic flow at rush hour or identifying routes used by persons of interest – the ability to visualize connections on maps is still highly valuable. Fortunately, we have a wealth of technology to help us gain insight from links between locations.

By combining the power of Neo4j’s graph database with the features in version 5 of KeyLines (a JavaScript SDK for interactive graph visualization), exploiting geospatial information has never been easier.

This blog post describes three compelling use cases to help you get the most out of connected data on maps. We’ll start by looking at dashboards.

Maps as Dashboards


From 30,000 feet above, the earth looks very different. You get to survey the entire landscape, notice key features and ignore details that aren’t so important from a distance. Adopting a bird’s eye view is useful in software applications, too. Dashboards offer the “big picture” at a glance, and a mapping element is an insightful component.

Map dashboarding is common in the world of cyber threat intelligence. It also helps global and national IT networking companies manage system performance to monitor faults quickly and efficiently.

In a typical scenario, the dashboard shows the health of a network on a world map, with links representing the status of connections.

Check out this dashboard view of a global corp's IT network.

The dashboard view of a global corporation’s IT network.

The magic happens when analysts drill into a particular location to get detail on demand. The dashboard becomes the access point to the vast amounts of data in the graph database.

KeyLines provides the incremental data loading and ways to visually group items.

See how Keylines Combos create deeper graph visualization.

Use KeyLines combos to go deeper into the data and explore what’s happening at the level of individual devices or sub-networks.

Notice we’ve switched from a topographic view of the data with real geolocations to a topological view. What’s important at this level is the content itself, not its geographic accuracy. The iconic topological London Underground Tube map is a great example.

In our dashboard, items are grouped by location but laid out in a more schematic way. Our top-level “nodes” are just visual representations of a “location” property in the Neo4j database. So on the dashboard, we use a single node to summarize all nodes with the New York property.

Running a Cypher query achieves something similar with links. This one uses links to represent a summary of connections between “New York” and “London”:

match (:Device {location: 'New York'})-[l:LINK]-(:Device {location: 'Paris'}) return count(l)

When the user wants to dig into the detail, a very different Cypher query interrogates the real nodes and links in the data. But in the dashboard view, what you see doesn’t have to exactly mirror the underlying Neo4j graph database. The improved geo component in KeyLines 5 lets you move smoothly between these different visual models.

Maps as a Way to Give New Perspective on Old Information


Sometimes data just doesn’t make sense until you see it in the context of the world you live in.

The police are well aware of this and have used maps as investigative tools for years. It’s why so many detective movies include that scene where investigators stare quizzically at a big map on the incident room wall covered in pins and string.

The world of fraud investigation is only just beginning to see the benefits.

Here’s a data model representing typical insurance fraud claims. People are linked to policies, policies have claims, claims involve vehicles, witnesses, repair shops and damage reports.

KeyLines data model for fraud detection.

When you try plotting this information on a map, there’s a challenge. Most of the data in your Neo4j database doesn’t have geospatial information. The items that do – addresses of policy owners and repair shops – aren’t directly linked to each other.

There is a clever solution that combines the flexibility of Neo4j with the advanced geospatial and visualization features of KeyLines 5.

A Cypher query tells us which repair shop is associated with a policy holder’s address by examining every claim that links the two.

match (p:Person)-[:MADE]-(:Claim)-[c:REPAIRED]-(g:Repairshop) return distinct p, g

KeyLines then takes these people and repair shops, and makes a virtual link between them.

What insight do we gain from seeing these links on a map? How does it help our fraud investigation?

In this case, they show the journey people made to take their damaged car to get fixed.

See this graph visual model on a map.

It’s only after we’ve used KeyLines to take this visual modeling journey – transitioning from raw Neo4j data to a visual model on a map – that we spot an anomaly.

Notice how most people (purple) travel to their nearest repair shop (green) to get their car fixed. But some travel unusually long distances to one repair shop in particular (red links). Is it just the best one around? Or should we look into whether they’re fraudulently inflating claims?

Maps as Familiar Context


We get many requests from law enforcement and intelligence communities to support their mapping source of choice. U.S. law enforcement are familiar with ESRI’s ArcGIS mapping ecosystem, just as the Ordnance Survey National Grid reference system is well known to British police forces.

Everyone has their favorite map tiles, from satellite views to vector-based street maps.

This isn’t just about what looks good. It’s no use visualizing battlefield intelligence if you can’t overlay the latest satellite imagery. You might miss insights if your graph data can’t exploit tools like geofencing to track whether key targets stray outside virtual boundaries.

We’ve used this feedback to drive development of KeyLines 5.

You can choose from a huge ecosystem of map tile providers, projection systems and third party plugins to integrate with KeyLines. If your Neo4j instance contains any kind of geospatial information then you’re good to go. KeyLines doesn’t just put it on a map, it puts it on exactly the map you want.

With this flexible approach, finding map insight isn’t restricted to typical street or country views. We can take the “CRS.Simple” Coordinate Reference System (CRS) and plot graph data with an image as a “map” backdrop. From prisons to casinos, airport terminal maps to crime scenes, floor plans make great mapping context.
Check out this airport diagram depicting volume of flight connections.

Using the new CRS options in KeyLines 5 to overlay data. This airport diagram from www.airportshuttles.com shows the volume of flight connections involving a transfer between terminals.

Stick with What You Know


A final point about familiarity. These use cases also showcase the range of customizable options available in KeyLines. Every detail of the chart – its colors, styles, iconography and layout – can be quickly and easily modified to suit the needs of your users. If giving a familiar and intuitive user experience is important, styling is a great way to do it.

Find Out More


This post looks at some of the ways you can combine Neo4j with KeyLines 5 to help find insight in your geospatial data.

If you have geocoded information in your graph data, are you exploiting its full potential? Can you deliver the kind of interactive user experience we’ve demonstrated?

If you want to learn more, contact us or start your own trial.


Want to learn more about what you can do with graph databases? Click below to get your free copy of the O’Reilly Graph Databases book and discover how to harness the power of connected data.

Download My Free Copy

Visualizing Enterprise Architecture: 5-Minute Interview with Jessica Dembe & Patrick Elder, Blackstone Technology Group

$
0
0
Check out this 5-minute interview with Jessica Dembe and Patrick Elder of Blackstone Technology Group.
“Native visualization was something that stood out to us, and we had struggled trying to do the same thing with other tools,” said Patrick Elder, Product Architect at Blackstone Technology Group.

Enterprise architecture connects numerous IT assets using information from diverse systems. Once it’s possible to visualize all those connections across disparate data stores, decisionmakers see opportunities for optimization and cost reduction.

In this week’s five-minute interview (conducted at GraphConnect New York) we discuss with Patrick and Jessica Dembe, Front-End Engineer, how Blackstone Technology uses Neo4j to rapidly enable its government client visualize its enterprise architecture as the first step in digging in and analyzing its use of IT assets.

Check out this 5-minute interview with Jessica Dembe and Patrick Elder of Blackstone Technology Group.

Talk to us about how you use Neo4j with your government client.


Patrick Elder: Our client is in the Enterprise Architecture Office, so they’re looking for a way to visualize how systems are related within an organization.

The data that they aggregate is all about showing different parts of the organization and how it ties together, so a lot of our job is stitching together information from disparate sources and different kinds of data.

For example, we connect systems with the technologies they use, how they’re funded, what types of activities and capabilities they have, what mission they support, what organization they’re in – all those kinds of things.

We saw Neo4j as a natural fit given that use case, and the ability to visualize connections natively within the tool was really attractive to us.

What made you choose Neo4j?


Jessica Dembe: The way that you could visualize relationships and how things are connected to each other. All the systems, investments, everything, and how it is all connected together.

Elder: Native visualization was something that stood out to us, and we had struggled trying to do the same thing with other tools.

What we were really excited by was the next step. Once we get beyond the initial visualization, what is the analysis that’s going to be done? The fact is that our clients can’t even ask the question yet because without Neo4j, they can’t even see what’s possible. That’s what is really exciting as we take the next steps with Neo4j.

What have been some of your most surprising results while using Neo4j?


Elder: I think the biggest thing for us was the “Wow” factor – how quickly we were able to put this together, how fast we were able to put our data into Neo4j from a relational database (RDBMS), create something visible, and show it to our clients. They were really impressed by that.

If you could start over with Neo4j, taking everything you know now, what would you do differently?


Dembe: I think we would start with iconography for different investment areas, using icons in visualizations; that’s something we’re looking to implement.

Elder: One of the things that I’ve really noticed here at GraphConnect is all the different types of analysis and how people are applying different tools to the graph database.

From our side, I think that’s something that we could have gone to the client first and said, “The visualization is something that we can show you right away, but what is the analysis you want to do?” And maybe that could help inform us on how to design the database with different kinds of queries and different tools that we could bring in to deliver that in the first release.

What do you think the future of graph technology looks like in the federal space?


Elder: Certainly from where we sit in enterprise architecture, graph technology is going to be really valuable. It’s a type of analysis that our clients need to do that they don’t even conceive as possible yet.

I’m thinking about what it could do for document analysis, which is a very manual process, using keyword searches and SharePoint. This can now be something that we can operationalize with Natural Language Processing and give the analysts ten documents to look at instead of 1,000.

Dembe: I agree. It wasn’t until I got here that I saw that’s possible, and I’m pretty sure once we bring that back, they will want to see what we can do with that capability.

Is there anything else you’d like to add?


Elder: Overall GraphConnect has been a great experience. There’s been a lot of learning for us as we’re relatively new to Neo4j, and we’ve had the opportunity to see some more mature projects and bigger datasets. In our environment, our dataset’s relatively small. Here at GraphConnect, we heard about billions of nodes and relationships – we’re orders of magnitude smaller than that.

We’ve seen that, even with a larger dataset, there’s so much power and efficiency there that we really can’t tap any other way.

Dembe: From my vantage point, I’ve had so much feedback on how you are visually representing data. It’s given me a lot to think about when I get back in terms of how I can model data and the best way to approach it.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com


Want to take your Neo4j skills up a notch? Take our online training class, Neo4j in Production, and learn how to scale the world’s leading graph database to unprecedented levels.

Take the Class

Effective Internal Risk Models for FRTB Compliance: Modern Graph Technology Is the Answer

$
0
0
Learn why graph technology is the answer to internal risk models for FRTG compliance.
Relational database technology can’t handle what is coming in banking and risk modeling. By the 2020s, Accenture predicts current banking business models will be swept away by a tide of ever-evolving technology and other rapidly occurring changes.

The right foundation for building compliance solutions is graph database technology. Neo4j answers the demands of Fundamental Review of the Trading Book (FRTB) regulations while building a foundation for future investment and risk compliance applications. Neo4j is the world’s leading graph database platform and the ideal solution for tracking investment data lineage.

Learn why graph technology is the answer to internal risk models for FRTG compliance.

In this series on the FRTB, we explore what it takes to create effective internal risk models using a graph database like Neo4j. In previous weeks, we explored the requirements of FRTB compliance and the relationship between risk modeling and data lineage. In this final post, we explain why modern graph technology is an effective foundation for compliance applications.

The Aggressive Demands of FRTB Compliance Applications


The complexity of the FRTB models requires a software platform that enables banks to:

    • Trace the lineage of risk factors back to their original, authoritative data sources
    • Span pricing, position, cash management and other data silos into a unified dataset
    • Work with regulators to visualize and modify risk model graph diagrams
    • Enable the easy modification of risk models to keep pace with changing market conditions, organizational changes and investment strategies
    • Handle mergers, divestitures and reorganizations that affect the historical and future operation and performance of trading desks
These strict demands require a technology platform that understands connected data and that models the interdependence and complexity of data lineage in modern markets and investment instruments.

Traditional Technologies Can’t Handle Lineage and Modeling


Traditional spreadsheet, relational database and data warehousing technologies can’t address the requirements of investment risk modeling because they:

    • Cannot handle the complexity and connectedness of modern investment instruments and markets
    • Cannot trace multi-tiered data lineage efficiently
    • Cannot produce computational results in near real-time across the risk chain
    • Cannot help risk managers and regulators visualize, understand and evolve risk models as market and bank conditions change
For all these reasons and more, traditional technologies are simply unfit for creating and maintaining investment risk models and compliance reporting applications.

Modern Graph Technology Is the Answer


The right foundation for building compliance solutions is graph database technology. A native graph database stores, accesses and processes information not in tables, but as directly connected data – which is the precise way that data must be managed to build efficient, reliable risk models.

Compliance applications using native graph technology iterate back and forth through data connections to produce lightning-fast risk assessments.

Such high performance provides the agility that trading desks need to take full advantage of market opportunities while remaining compliant with the ever-changing and dynamic nature of risk regulations.

Neo4j: The World’s Leading Graph Platform


As the world’s leading native graph database platform, Neo4j is the ideal solution for effectively capturing investment data lineage across internal and external applications and data sources. Due to its ability to store this lineage as a graph of connected data, Neo4j traverses the connections in real time to assess risk factors and compute capital requirements across all positions and trading desks.

Additionally, the Neo4j graph platform has data visualization tools that enable analysts, supervisors and regulators to visualize the lineage of risk factors to create compliance models, and later, to run what-if scenarios to test and improve them.

Trace Risk Data Lineage across Data Silos


The greatest challenge in developing and maintaining investment-risk models is integrating information that resides in many discrete silos across the enterprise. These data silos exist because trading, fund management, accounting, cash management and pricing systems operate and store data independently of each other.

It is impractical and prohibitively expensive to integrate all these investment applications into a single compliance solution. Instead, Neo4j enables an organization to use a federated metadata model to unite investment data silos into a unified dataset.

See how Neo4j provides data lineage across data silos.

Build a Foundation for Compliance Applications


Once the federated metadata layer exists in Neo4j, risk supervisors can trace the lineage of risk factors back to their original, authoritative data sources, thereby solving the number one problem in building compliance systems.

But this new, connected data foundation can also support a full spectrum of innovative uses – including credit risk analysis, value-at-risk calculations, fundamental research, market and sector analysis, investment-desk performance studies, return on invested capital analysis, and many more mission-critical systems.

Visualize and Modify Risk Models Easily


Neo4j includes a variety of data visualization tools that enable organizations to build, understand and improve even the most complex investment risk models. By using Neo4j to build risk compliance solutions, organizations:

    • Create and evolve models to keep pace with changing market conditions, organizational changes and investment strategies
    • Work with regulators to improve and certify risk models
    • Handle mergers, divestitures and reorganizations that affect the historical and future operation and performance of trading desks
    • Build a foundation for analyzing performance across trading desks, departments, markets, business sectors and investment strategies

Conclusion


While BCBS’s current deadline for achieving FRTB compliance is January 2022, banks are acting now to use FRTB mandates to streamline their internal systems and build a firm foundation for future compliance applications.

By investing in compliance now, organizations answer the demands of Basel and FRTB regulations while building an infrastructure that produces remarkable savings in software development and staffing expenses. By using their compliance foundation to determine optimal reserve ratios on a continual basis, financial institutions of all sizes maximize available capital and drive investment profits.

Risk demands a strong foundation
Find out why leading financial services firms rely on Neo4j graph technology for compliance and innovation, Effective Internal Risk Models Require a New Technolgoy Foundation. Click below to get your free copy.


Read the White Paper


Catch up with the rest of the FRTB and Neo4j blog series:

Microservice and Module Management with Neo4j

$
0
0
Check out what an expanded schema data model looks like with Neo4j.
Editor’s Note: This presentation was given by John Lavin at GraphConnect New York in October 2017.

Presentation Summary


Refactoring monolithic applications into microservices requires putting thought into managing code and its dependencies. At Vanguard Group, some of the existing Java archives (jars) have 3 to 4 million lines of code.

The team first tried visualizing the dependencies among jars and services in a desktop tool, then used a spreadsheet as an architecture tool. The team realized that the management of modules and services is really a graph problem and so adopted Neo4j.

Using a Maven plugin, they added in jar dependencies every time they ran a build, then used Nexus to add all their existing artifacts into the graph, along with their dependencies. They then began evaluating the relationships using best practices and building metrics from that. Next they added in information from their architecture spreadsheet to enrich their schema.

The team built out two tools to visualize relationships between jars and enforce best practices such as constraining the number of service-to-service calls to reduce risk. With full visibility that Neo4j offers into the relationships and dependencies among all their code, the team can effectively address technical debt moving forward.

Full Presentation: Microservice and Module Management with Neo4j


This blog is all about how we use Neo4j to manage our modules and microservices:



Vanguard Group is the largest provider of mutual funds. We’re a financial services company. I am an enterprise architect at the Vanguard Group and am helping migrate our monolithic software to services and ultimately to the cloud.

Our founder, John Bogle, was the creator of index funds, and that’s still a primary focus with us at Vanguard. I am in the institutional subdivision of IT managing the business of your 401(k) if your 401(k) is with us. We’re also the second largest ETF company, but we’re closing in.

Managing code is critical. That might be pretty obvious, but when I have conversations with other people, the focus really is more on feature delivery, time to market and getting things out the door. If you don’t make it easy to manage code, it just doesn’t get done.

It’s not unusual for some of our oldest jars to have 3 to 4 million lines of code. We have used multiple design patterns over 10 to 15 years. Sometimes we don’t have any design patterns at all. We have dead code, and we have some difficult to accomplish or incomplete impact assessments to be made.

If you’re familiar with code smells, you’re probably familiar with the shotgun surgery antipattern. We have had these difficulties in our modules in the past, and we wanted to easily manage our Java modules and prevent new code from having the same problems.

Visualizing Jar Dependencies Using Structure101


Our first answer was to use a third-party product called Structure101. It is a tool in which you load your jar files into memory on your desktop. It allows you to visualize and depict dependencies and all the relationships between the jars.

The problem we found quickly with using Structure101 was that it works if you know all the dependencies and the jars that you’re relying on, so you can load them and see them in the visualization. We have a lot of different subdivisions in our IT department, and a development team that’s using a jar file quite often is doing so without their knowledge. It is very difficult to find all the interactions if you can’t even load all the jars into the tool.

When we started building out our services and trying to disentangle our monolithic software, we had so many jars in the libraries in the monolith that it would actually crash Structure101. It could not take the visualizations.

And even if we knew what the jar dependencies were, we knew who was using it, and it still loaded in memory, it was still a highly manual process. We’d have a developer load up the jars, bring it into the tool on their desktop and then act on it. These were problems that the tool wasn’t equipped to handle.

Managing Our Services in a Spreadsheet


As we’re moving from our large monolithic services or monolithic applications to adding microservices and having to deal with services and deployments on top of that, the old libraries and the business logic doesn’t go away. We’re adding this extra complexity on top of our existing software.

To tackle this, we started with managing our services in a spreadsheet. (Doesn’t everything really start with a spreadsheet?) It was compiled over the course of a year, where we tried to inventory the services that we would build if we were given an infinite amount of time and an infinite amount of resources.

However, we don’t have an infinite amount of time or an infinite amount of resources. What ended up happening is that, if the business logic hadn’t changed, we just built a service off of it. Usually, we would take the existing software and deploy it as its own service.

We also had cases where we just wanted to put a new user interface on top of the software. In these cases, we just provided a REST endpoint to deliver a new UI and leave the existing software and business logic that works alone.

What that meant is that we had a lot of different services in a lot of different states, and we’re all tracking it in a spreadsheet. We tried to group things and document the dependencies as best we could in the spreadsheet, but it really wasn’t going to work out long-term.

There are other questions that were never answered by a spreadsheet or some other tool.

The basic functions were captured, but there were large gaps (see below). We didn’t know where our service was deployed. We didn’t know what services were being called by other services. We didn’t know what jars and business logic was being used by the services, and we had no idea how we would categorize whether our service is healthy or whether it had a lot of technical debt that we might need to fix later on.

We needed something better.



The Move to Neo4j: Services and Modules Form a Graph


That’s when we learned about Neo4j. And this was our first foray into using graph databases and learning about them.

We realized that the management of our modules and services was really a graph problem. We started out very simply, with a simple node and one dependency or one relationship to that node (see below).

Initial data model schema.

We stored our jar dependencies – the group name inversion – and to automate this, we wrote a Maven plugin to run every single time we did a build to gather this information and send it over to Neo4j.

As we build out new artifacts and new products, we’re actually just sending the data for the jars and its dependencies on other modules along to Neo4j.

After we were successful with gathering the data on our ongoing builds, we went back to our artifact repository, Nexus, and ran an automation job. We got the jar data and module data for previously built artifacts and loaded that into Neo4j. This provided us a wealth of information about a simple jar and dependency graph.

Tracking Metrics on Jars


Robert C. Martin wrote a book back in 2003 called Agile Software Development: Principles, Patterns, and Practices.

In Chapter 20, there is an important section on Java package design. Martin was referring to the partitioning of Java packages within a Java module, but we believe that the principles that he espoused for Java packages go for the Java libraries themselves.

In our graph, we implemented acyclic dependency principles. We fail our build if we have any module cycles, no matter how deep we go. If we have different subdivisions using the same jars, quite often we would find they come around and are depending on each other. We wanted to avoid that and fail the builds immediately to enforce good, healthy jars.

We have a couple of other metrics. Afferent coupling is the incoming dependencies to a jar. Efferent coupling is the outgoing dependencies of a jar. Instability is a metric of the outgoing divided by the sum total of both the incoming and outgoing. You can understand the relationships of the Java library based on these metrics.

One other metric, Levelize Modules, is inspired by a book by Kirk Knoernschild called Java Application Architecture. The idea is to understand where the relationships of different modules are in relationship to your entire stack of Java modules.

Knoernschild came up with different levels, where if a jar that you’re developing only depends on external libraries outside of your company, it will be a level one. External modules are level zero. Level one modules depend only on level zero, and level two modules are only dependent on level zero or level one, and so forth.

This is a little bit more fine-grained than if you have different layers in your architecture, such as an entity layer, a data access layer, etc. You can have many different levels within a layer. That gave us some insights on exactly how the modules in our architecture are laid out.

Grouping Jar Libraries with Metadata


Once we moved past getting the data for particular jar libraries in, we then took on the incorporation of the data from the spreadsheet where we were managing our services and began to model that in Neo4j.

At the top of the diagram below, we put services within groups that we want to logically define. And we can have different metadata tagged such as whether it’s in the public cloud or in an on-prem database. We have the ability to dynamically tag any of our artifacts, which are mostly services right now. We have many different attributes that we want to assign to them. We pulled those from our spreadsheet.

Check out the expanded data model schema using Neo4j graph database.

Any artifact is built via an instance. Every single time we build our web services, we will create an instance here through a Maven plugin that is running on the services. We will record the new instance that was just built and record the jar dependencies they are using.

This ties in with our preexisting jar libraries, and we can associate our services all the way down through our jars. Any service can call another service, so we have a USES relationship on the artifact.

Modularity Assessment Tool Suite and Enterprise Service Catalog


On top of this data, we created two tools. We gathered this data using a Maven plugin, and we created two sets of tools.

The first one we named Modularity Assessment Tool Suite or MATS, which is primarily designed to manage our jar files. The second one, the Enterprise Service Catalog, which we code-named Excelsior, was for the management of what used to be a spreadsheet and is now stored in Neo4j.

Learn about the Modularity Assessment Tool Suite by Vanguard.

We designed a Maven plugin for each tool.

For the MATS, we also have a Sonar plugin. SonarQube is a code-quality analysis tool that provides your code-coverage numbers, your code-smells, all of the nitty-gritty static analysis. And this is really where our tools are playing – in static analysis. We designed a plugin so that we can represent the data that we’re storing in Neo4j in Sonar.

Check out the tool components of the Modularity Assessment Tool Suite.

Another thing we designed for the Modularly Assessment Tool Suite is some D3 and Angular visualizations. For Service Catalog, we have another Maven plugin. Our catalog is completely Angular-based. We plan to integrate our Angular UI JavaScript modules in as well.

So, we have a planned Grunt plugin, and we have some D3 visualizations modeled off our Modularity Assessment Tool Suite visualizations planned as well.

Here is a small graph with some example information of how the data might look.

Simple example of a graph.

From left to right, I have a single artifact, which is an account service, and it built a 1.0 and a 2.0 version. The green nodes are jar modules. Since this is such a small graph, you can see that I did introduce a cycle in version 2 in this graph. This is the sort of data we’re looking to store.

We can find our afferent coupling or incoming dependencies with just a simple query, and this gives us our incoming dependencies on our group artifact and version. I can see how many incoming dependencies there are.

Afferent couple and incoming dep.

Reversing the query shows the outgoing dependencies. Something that was manual in a third-party tool, and I had to load it in memory, now becomes an extremely simple Cypher query.

Efferent coupling, outgoing dep.

Here’s one last Cypher query that just gets me the cycle that we saw on the illustration above.

See module cycles in a graph database.

With just these simple metrics and a little bit of math, we get a lot of these module metrics through simple Cypher queries.

The follow are a couple of our visualizations or metrics that we use. Our Sonar plugin (below) shows code quality metrics.

See how visualizations work in Neo4j.

This is not only good for our development team to see their code health, but it is essential to engage our management team in understanding, not only are we delivering on time, but whether what we are delivering is quality or not.

From a management perspective, a single project isn’t as useful as looking at an entire aggregation — of whether or not it’s all of a monolith, all of a product ownership or all of a number of different jars that a particular manager owns.

We created a grade. And based on the type of module it is, we can assess a red, green, yellow to that metric.

Once we assign a grade overall to the module, then we can roll up our grades to give an aggregate grade for a different collection of many different modules. This is really helpful for engaging our management with how it’s trending and whether we’re getting better or worse.

Below is a screenshot from the enterprise service catalog.

Discover how data visualization works with Neo4j.

Since managers love spreadsheets, we came up with a visualization of our graph data that is more in spreadsheet form (see below).

We offer a lot of different ways to filter on the data. You can do a full-text search on all the services at the top. But we also have a pull down so you can select services in production or in public cloud, or have certain types of technical debt if you’re tagging your services or jars with technical debt.

You can do queries based on those filters and then shorten your list and render them out.

See how data visualization works with a Neo4j graph database.

If you clicked on any of the services, a detailed list – the information, the metadata that we have provided – is in groupings on the left tabs.

Selected is the jar dependencies. So if you select a service and you want the jar dependencies, you select the version. Then we’ll query Neo4j and get all of the direct dependencies that service has. If I were to collapse this, the transitive dependencies come in. I do a full depth search on all the artifact dependencies no matter how deep they go.

Now we move to D3 visualizations to depict our modules and service dependencies in a way where we can understand the depth and the fan out of how our dependencies for our services and modules are moving throughout our architecture (see below).

Graph database visualizations using Neo4j.

On the left-hand side, I have an example of a Vanguard module and all of its dependencies. In this case, we’re four layers deep with our dependencies that I’ve expanded out.

Static Analysis at Build Time versus at Runtime via an API Gateway


Some tools like an API gateway are looking at your runtime. By collecting this data and storing it in Neo4j at build time, we’re doing a static analysis. We’re doing it statically and looking at the actual code for all the code paths or the dependencies that are declared there.

For the financial services industry, we have services that are only called during certain times of the year, such as statements called every month or every quarter or every year end and tax forms. A lot of the call graph won’t necessarily be populated by anything that’s looking at the active runtime service calls such as an API gateway.

For us to do a complete impact assessment, we really wanted to be able to have a complete look when we’re making changes by using static analysis. Those one-off calls will get swept in if you’re looking at the code. They will not be visible in an API gateway.

This approach is great, not only for impact assessment, but also for production support.

If we’re having an outage where we’re having errors on a particular service or we’re seeing it in certain number of services, we can query on those services and see where it’s going to fan out. Because if one service is dependent on another, we can see where a trouble in one service is going to go in other places in our site. We can see cascading failures, and we can trace them back to particular services that they may have in common.

Containing Service to Service Calls


We want to contain the number of service-to-service calls we make.

We don’t want to say if these were modules instead of services, we don’t want to have service A calling 15 other services to render its data. And if service A at the very front is a critical system that we can’t have down, the more services that are dependent on that, there’s greater likelihood that service is going to be unavailable more often.

What we would like to do is design metrics around this to count the number of services that a particular service is dependent on and put a hard limit on that. If you have too many services being called, that probably means you’ve got your bounded context wrong in your services.

Determining Call Depth


The other thing we want to see is call depth.

We might decide that we don’t want to go any deeper than two or three service calls deep. If there are serial calls from service to service to service, there’s more likelihood that any one of those could kill your entire service. We’re investigating ways to determine appropriate metrics to identify problem areas before we have an outage in production.

Here is a visualization of our modules and the dependencies that those modules have.

Graph visualizations using Neo4j.

We added some bells and whistles that we found very helpful.

We are able size the dots based on the scale of incoming or outgoing dependencies or their instability or any other metric. The larger the dot, the larger the number of outgoing dependencies. You can actually see where we’re having a little centrality. We can actually see exactly where these modules are playing key roles in other modules.

On the top right above, if I were to depict one of our large monolithic applications with all the jars dependent on it, I would have a hard time finding any particular node. We added a full text-based search that you can search on any module that’s in the graph and it will highlight for us in our depiction.

We have multiple ways to filter. We record external dependencies such as Spring libraries and other Apache libraries. Since we don’t have any control over those, we might not want to visualize them. We have a radio button so we can turn them off and remove them from a visualization.

On the right-hand side, we have a legend that shows the colors of the different groupings of the modules. Not only does this show colors, but that checkbox will allow us to remove different types of modules from the visualization immediately so that we can focus on particular groupings of modules that are important for our particular situation.

Future Plans


In terms of future plans, while we have a Sonar plugin for our module metrics, we do not have a Sonar plugin yet for our services. We also want to be able to look up our web service in our source control SonarQube system and see the metrics for our service right there. This would engage our management team into being able to see these metrics when they’re developing their services.

Whenever we talk to managers, they go right to the center dashboards and they look at the numbers. They see whether it’s trending up or down, and then they make a phone call to their project manager and tech lead and say, “Hey, why are my numbers going down? My boss is going to tell me about it.”

It’s a really great way of pulling these numbers and metrics up, and keeping your code clean just because you’re all looking at it.

Future data modeling plans at Vanguard.

We’ll have afferent and efferent coupling. We’re going to look at service-level depth, service-call depth, and the total number of services.

There are some other metrics that we’re looking on capturing and rendering on our center dashboard. Right now, we’re primarily automating our services. We have manually put in our UIs, but we want to develop a Grunt plugin and automate gathering our user interfaces.

At Vanguard, we’re migrating to an Angular JS, client-side architecture. We want to be able to identify what services those user interfaces are calling. We want to build a Grunt plugin, gather that data, and store it in Neo4j.

On the backend, we also want to pull in our data management and governance. And we also wanted to use static analysis.

For this, we’re using BCEL to gather our data access routine. So if we’re using a database, we’re calling tables and columns because we’re still mostly a relational database shop. We want to gather tables, columns, and stored procedures that we call. And we also want to attach it with a jar file and associate that stored procedure or those tables with the jar file that’s using those.

This will give us an end-to-end look at where our data, our critical data and our sensitive data is being used, and where it’s exposed. Also, we’ll be able to identify any problems and ask questions like, “Why are you deploying this social security number over to this UI? Why are you exposing this personal credentials information.”

We have a process to manage our data, but some of the steps are manual. We’re looking to use Neo4j to automate those steps so that we have a much more accurate and complete picture of how our data is being used and where it is being exposed throughout all of our sites.

We have a lot of dead code or libraries that are 10 or 15 years old – big, difficult to manage jar files. We want to eliminate those slowly. We want to be able to take our services and jars and identify where deprecated libraries are being used, and say, “While you’re in this service, let’s get rid of this dead code. Let’s get rid of this old logic that we’ve been looking to retire,” and be able to identify them at a glance with a simple Cypher query.

The last thing we’re doing is that we’re using our service catalog for not only just looking at what is, but what we want to be.

When a project team goes into their planning sprint and their early-working sprints, we use this tool – this is one of our primary use cases – to identify which services we want to be using. We want to provide additional information as far as where those services are being used, what’s partially built out and where they may add their logic so that it makes sense in our overall architectural scheme.


Want to learn more on how relational databases compare to their graph counterparts? Get The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook

Graphs Are Game-Changing for Cybersecurity: 5-Minute Interview with Eric Spiegelberg, Senior Consultant at GraphAware

$
0
0
Read this 5-minute interview with Eric Spielgelberg, Senior Consultant at GraphAware.
“Cyber threat intelligence is high volume, unstructured and highly related. That last attribute makes it great for graphs,” said Eric Spiegelberg, Senior Consultant at GraphAware.

In this week’s five-minute interview (conducted at GraphConnect New York), we discuss GraphAware’s work on cybersecurity and its natural language processing framework for Neo4j. Eric Spiegelberg believes that graph technology will be omnipresent, and that – with the release of the Panama Papers – it has already changed the world.

Read this 5-minute interview with Eric Spielgelberg, Senior Consultant at GraphAware.

Talk to us about the kinds of Neo4j projects you’re working on at GraphAware.


Eric Spiegelberg: I’ve been spending a lot of my time at GraphAware recently on a project involving cybersecurity research. We had a blog published on the GraphAware website as well as on the Neo4j blog.

We’ve been investigating how we can apply graphs to cybersecurity research and the industry in general because we feel that graphs in general – and specifically Neo4j – are an excellent fit for each other.

One of the reasons for that is the shape of the data. We found that cyber threat intelligence is high volume, unstructured and highly related, and it’s particularly that last attribute that makes it great for graphs.

Anything that’s highly related, the more related it is, the better graphs are going to do and the more we find that Neo4j thrives in being applied to cybersecurity.

Tell us a bit about GraphAware’s natural language processing framework.


Spiegelberg: Our natural language processing framework is our big announcement here at GraphAware.

We’ve expended significant time and energy doing a lot of research and development, and our team has just announced the first release of our natural language platform. We’re very excited.

We’ll have as much information as available on our website, and we’re really curious to gauge the reaction from the community, generate lots of interest and see how we can all make the best use of it.

As a developer, what first attracted you to Neo4j?


Spiegelberg: That is a great question. I got my first taste of graphs, and there’s no going back. For Neo4j, the slogan is that “graphs are everywhere.” And once you get a taste of a graph database, everywhere you look, graphs are, in fact, everywhere. Whether you’re talking about something in computers or technology, art or life, graphs truly are everywhere.

Once I got a taste of it as a technologist, it was really a life-changing and a career-changing moment. I knew it was something I wanted to go into. And then, as you get interested in graphs, immediately Neo4j bubbles up to the top – at least it did for me – in terms of leadership of the technology, as well as the size and intensity of the community.

And it’s a very welcoming community. When you look at Stack Overflow, there’s a high amount of traffic, with questions and answers.

A lot of the people I talk to get very excited very quickly. But because it’s so flexible and so powerful, you quickly find yourself asking, “What’s the best way to do this?” And I found that the Neo4j community really supports each other; they want to help each other succeed.

Can you talk to me about some of the most interesting or surprising results you’ve seen from the Neo4j projects you’ve worked on?


Spiegelberg: For me, the most insightful or surprising result of graph databases is the Panama Papers. That’s something that I’m very interested on a technical level, but the geopolitical ramifications of the whole situation are just astounding.

For me, that was an “aha” moment, because it showed that – while Neo4j is just this fantastically, interesting technology – in the real world, not everyone cares about technology. They want results. They don’t necessarily care how you get there. But the Panama Papers shows that graph technology is really going to change the world, and in this case, it already actually has.

If you could start over with Neo4j, taking everything you know now, what would you do differently?


Spiegelberg: If I could go back to my first Neo4j project, I don’t think I would do very much differently. Naturally, there’s a learning curve that we all experience. On your first project, you learn, and you make a lot mistakes.

But what I found is that a great example of the power of graphs is that they’re so flexible. As my experience level and comfort with graphs and Neo4j grew, the graph database was flexible and we could make adjustments and accommodate new learnings on the fly. With other technologies or other projects, you would have to be stuck and live with it.

And so looking back at the power of Neo4j and the power of graphs, the reason that there’s nothing I need to change now is because I was able to change it as I went.

What do you think the future of graph technology looks like in cybersecurity?


Spiegelberg: Right now, thought leaders are working on specifications and technologies to advance cybersecurity, and they have recognized graph technology as a fundamentally game-changing technology they can apply.

In the area of cybersecurity, I think the sky’s the limit for graphs, and the adoption level is going to go through the roof. And I think that’s it’s also true that graphs are everywhere. It’s hard to overstate the future of graphs, because I think graph technology is going to become omnipresent and infiltrate every aspect of technology.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com


Want to learn more about what you can do with graph databases? Click below to get your free copy of the O’Reilly Graph Databases book and discover how to harness the power of connected data.

Download My Free Copy

This Week in Neo4j – Supercharge Developer Productivity with new release of neo4j-graphql.js, Cosine similarity on GoT, New Kettle Plugins Released

$
0
0

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.

This week we learn how to supercharge developer productivity with the latest release of neo4j-graphql.js, there’s a new release of the Kettle plugins for Neo4j, we have a GraphConnect experience report, and blog posts showing how to use the new Jaccard and Cosine Similarity algorithms.


This week’s featured community member is Ralf Becher, Managing Director at TIQ Solutions GmbH.

Ralf Becher - This Week’s Featured Community Member

Ralf Becher – This Week’s Featured Community Member

Ralf has been a member of the Neo4j community for more than 6 years and has built integrations with the Tableau and QlikView business intelligence products, as well as presenting at many meetups on this subject.

Ralf was also interviewed on the Graphistania podcast in April 2015, where he explained how the combination of graphs and BI tools can help us gain even more insight into our data.

On behalf of the Neo4j community, thanks for all your work Ralf!

Announcing New Features In neo4j-graphql.js


Will Lyon announced a new version of neo4j-graphql.js, which now makes it possible to spin up a GraphQL API backed by a graph database with just type definitions.

The 1.0.1 release has several new features to help supercharge your developer productivity. These include:

    • Auto-generate Query/Mutation types and resolvers
    • Augment a GraphQL schema with pagination, ordering, and _id fields
    • Flexible handling of relationship types, including relationship properties
    • Middleware support can be used to implement authentication/authorization with generated resolvers

If you haven’t tried out GraphQL with Neo4j, now is the time!

Creating a Neo4j Sandbox


My colleague Elaine Rosenberg has created a video that shows how to get up and running with the Neo4j Sandbox.



The Neo4j Sandbox creates a temporary Neo4j instance in the cloud for learning about Neo4j graphs. We have several sandboxes with built in datasets such as the Panama Papers and Russian Twitter Trolls, but you can also create a blank sandbox if you just want to play with Neo4j and aren’t able to install the Neo4j Desktop.

Cosine similarity on GoT, Finding your neighbours, Jaccard similarity on product categories


Releases: Kettle plugins for Neo4j


Matt Casters released a new version of the Kettle plugins for Neo4j. This version adds Metadata Injection support to handle more complex scenarios.

For those that want to test WebSpoon, you can use a Docker image that Matt created with all the latest plugins installed. To run the latest Neo4j server alongside to test it, Matt has also created a Docker Compose file which you can find in the dockerfile-webspoon-neo4j GitHub repository.

GraphConnect Experience Report, Word-Pair Frequency Graph, Medium Graph


    • Mos Zhang has written up an experience report from the GraphConnect conference that was hosted in New York a couple of weeks ago.
    • Devansh Trivedi has been participating in the 100 days of Machine Learning challenge, and on Day 1 built a Word-Pair Frequency Graph in Neo4j. Good luck with the rest of the challenge Devansh!
    • Sahil Jadon has written a blog post showing how you might build a graph based on the Medium blogging platform. Sahil then shows how to write recommendation queries to suggest new content for users to read.

On the podcast: Michael McKenzie


This week Rik interviewed Michael McKenzie on the Graphistania podcast. Michael was our featured community member of the week for 4th August 2018.

Michael explains how he first became interested in graphs while trying to work out how building codes and texts were interrelated – an information graph being the solution to this problem.

Michael goes on to explain his experience working with the Cypher query language and his use of the GRANDstack on some personal passion projects.

Next Week


What’s happening next week in the world of graph databases?

Date Title Group

October 11th 2018

GraphTalk Edinburgh

Glasgow Graph Databases

Tweet of the Week


My favourite tweet this week was by Lilach Manheim:

Don’t forget to RT if you liked it too.

That’s all for this week. Have a great weekend!

Cheers, Mark

Enterprise AI for IT Ops: 5-Minute Interview with Clayton Ching, Global Head of Customer Success at Digitate

$
0
0
Read this 5-minute interview with Clayton Ching of Digitate to learn about AI and graph databases.
“I believe everyone in the IT ops space is going to have to implement graph data a lot sooner than they think,” said Clayton Ching, Global Head of Customer Success at Digitate.

IT operations is complex, with thousands of systems and connections and dependencies between those systems. Overlay the business context of those systems and you’ve got an ideal fit for graph technology.

In this week’s five-minute interview, we discuss why Clayton Ching views a graph database as a requirement for Ignio, Digitate’s flagship product for cognitive automation of complex IT infrastructure

Read this 5-minute interview with Clayton Ching of Digitate to learn about AI and graph databases.

Talk to us about how you guys use Neo4j at Digitate.


Clayton Ching: We’re in the IT operations space. Ignio is designed around the concept of cognitive automation.

We have several products in the mix right now: Ignio for IT operations, or IT ops; Ignio for Batch; and Ignio for SAP ERP. These three product offerings fundamentally operate off of our foundational platform called Ignio.

One of the key elements of Ignio is that it can understand context across IT enterprise applications and infrastructure. We have to understand the dependencies and the relationships in that structure. In other words, we have to understand that server A is connected to server B, server B is connected to an app server, that app server is connected to a web server, and there may be databases attached to these systems.

Within the enormous complexity of an IT infrastructure, we have to understand dependencies and the relationships that are attached to them. In addition to that, we also have to understand the business context behind all of this and the relationships within the business flow.

Dealing with that type of complexity and the enormity of an IT enterprise, we then require the use of Neo4j, which provides us with graph data that leads us very nicely to being able to create, update and understand relationships within an IT enterprise.

What made you choose Neo4j


Ching: We chose Neo4j because it is the leading graph database solution on the market. We were very impressed with your technology. We were very impressed with your customer base. And we’re certainly impressed with our dealings with the individuals within the company. I think all of that coming together really helped the selection process.

Can you talk to me about some of your most interesting or surprising results you’ve had while using Neo4j?


Ching: We are now able to visualize and understand the relationships within an extremely complex IT environment. We couldn’t do that before. At least, we could not accomplish it easily.

We were working with other types of databases, for instance, relational databases and NoSQL databases. Generating relationships and querying information in that type of environment is just an enormous task.

What we’re finding right now is that with Neo4j, we’re able to accomplish a lot of these activities a lot more efficiently, a lot more effectively, and in ways that we never could before.

If you could start over, taking everything you know now, what would you do differently?


Ching: I think we’ve learned that we should have started with a graph database to begin with. It would have been a much easier implementation, and it would have given us with the ability to understand the relationships of the system, which we’ve always required and have never been able to do as effectively. Our big takeaway is that we should’ve done this sooner.

The current release of Ignio does not operate on Neo4j’s graph database. It does not use graph data at all right now. That is our current release. Our next release will use the Neo4j solution.

What do you think about graphs and AI?


Ching: Naturally, there are many components in the creation of a proper AI framework. You’re going to need to look at not only data – starting with contextual awareness – but also at the different types of ways of understanding data and analyzing different types of data required in an AI environment. I think graph is going to play a key role in that process, along with other types of data and methods.

To move into the AI world, you’re going to need to look at data from many different perspectives. You’re going to have to store. You’re going to have to render. You’re going to get analytics from data from a variety of different sources of which, I believe, graph data will be a fundamental part for a long time to come.

Where do you feel the future of graph technology is headed within the IT operations space?


Ching: I think graph data is an absolute requirement. I believe everyone in the IT ops space is going to have to implement graph data a lot sooner than they think.

What graph data provides is the reduction of the complexity. It takes everything to a point where you’re able to understand relationships and context in a way that a typical database simply cannot provide. A few years from now – and I think it will be a lot sooner – it’ll be an absolutely essential part of products in the IT ops space.

Want to share about your Neo4j project in a future 5-Minute Interview? Drop us a line at content@neo4j.com


Want to learn more on how relational databases compare to their graph counterparts? Get The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook

This Week in Neo4j – Weighted PageRank, Backups on Kubernetes, Modeling Financial Risks

$
0
0

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days.

This week we work out the best tennis player in the world using weighted PageRank, we learn how to do backups on Kubernetes, and how to model financial risks. We also have a great story about using Neo4j to store the storylines of an interactive Theatre Production, and there’s the launch of the Graph Gallery Graph App.


This weeks featured community member is Dimitry Solovyov.

Dimitry Solovyov - This Week’s Featured Community Member

Dimitry Solovyov – This Week’s Featured Community Member

Dimitry has been working with our partner Neueda. Over the last year he was one of the main contributors to the Cypher for Gremlin transpiler.

Dimitry is an active member of the openCypher implementers group and has frequently presented on the progress of the project at the groups’s meetings and other events.

He also co-presented on the topic at GraphConnect 2017 in NYC in Cypher Everywhere: Neo4j, Hadoop/Spark and the Unexpected

On behalf of the Neo4j community, thanks for all your work Dimitry!

Weighted PageRank with Neo4j Graph Algorithms


This week we released version 3.4.8.0 of the Neo4j Graph Algorithms library, which now has support for weighted PageRank. Tomaz Bratanic and I have written blog posts showing how to use it.

Tomaz beat me to it, showing the difference in using non weighted and weighted PageRank to find the most influential IP addresses on an AT&T Network telecommunications dataset from Kaggle.

I then wrote a blog post where I attempted to reproduce Filippo Radicchi’s paper in which he works out who was the best tennis player ever. Spoiler alert: It’s Roger Federer!

How to Model Financial Risk with a Graph Database


Joe Depeau presented a webinar this week in which he showed how to model financial risk in a graph database, with a particular focus on FRTB compliance.



Joe shows how to model investment risk at the trading desk level as a graph, and finishes with a demo of such a model using Neo4j Bloom

How to Painlessly Unite Art with Java, JavaScript, and Graphs


Alex Tavgen published an article explaining one of the coolest uses of Neo4j that I’ve come across.

Alex and his team produced a theatre production where the story evolves based on audience participation. After each scene the audience votes and the next scene is based on the outcome of the vote. If they vote for a utopia, it will descend into dystopia.

Behind the scenes they use Neo4j to store a graph of all the scenarios built by the scriptwriters. It’s all wired together to a web application that uses Spring Boot, which has support for Neo4j out of the box.

Extensibility for Java Developers, Kubernetes Backups, Next Generation Chatbots


    • Jennifer Reif shared the slides from her talk at the DevUp conference titled Extensibility for Java Developers in Neo4j. Jennifer covers a diverse range of topics, including Spring Data Neo4j, APOC procedures, a Kafka to Neo4j integration, and more.
    • David Allen has written a blog post showing How to backup Neo4j Running in Kubernetes using a specialized Docker container that has Neo4j installed and stores the resulting archive in a Google storage bucket.
    • Adrián Rivero has written an article titled the next generation of chatbots with NLP services and Graphs. Adrián explains how graphs will sit at the middle of chatbot systems, providing the context needed to answer questions effectively.
    • Max De Marzi has written Part 3 of his series on Dynamic Decision Trees. In this post Max shows how to extend the approach to handle cases where not all the facts are known up front, but instead are asked at each step of the tree.

Meet the Graph Gallery – Graph Examples on your Desktop


Graph Apps are single-page applications that takes advantage of some services provided by Neo4j Desktop around the the management of Neo4j Databases.

This week Michael Hunger launched a new Graph App – “Graph Gallery”. It allows you to browse and search Graph Examples (also known as Graph Gists) provided by the Neo4j Community across a variety of use cases and industries.

With a single click, you can launch any of those examples as a Browser Guide in the Neo4j Browser of your currently running database.

Graph Analytics + Graph Viz, Personalising Category Pages


On the podcast: Michael Simons


This week Rik interviewed Michael Simons, a Software Engineer on our Spring Data Neo4j and Neo4j OGM team.

Michael gives an overview of Spring Boot and how Neo4j OGM and Spring Data Neo4j play in that ecosystem, and explains his entry into the world of graphs via jQAssistant, the popular software analytics tool.

Michael also shares his views on the future of the software industry and his plans to build a new talk around the intersection of SQL and Cypher analytical queries. I look forward to seeing that talk when it’s ready!



Want to help build Neo4j as a Service?


The Neo4j Cloud team are growing and need SRE and engineering people to help build and power the managed Neo4j-as-a-service offering.

If you’re interested or know somebody who might be, you can learn more at the link below.

Next Week


What’s happening next week in the world of graph databases?

Date Title Group

October 15th 2018

Algorithms, Graphs and Awesome Procedures

GraphDB Sydney

Tweet of the Week


My favourite tweet this week was by Jessica Kerr:

Don’t forget to RT if you liked it too.

That’s all for this week. Have a great weekend!

Cheers, Mark

Graph Algorithms in Neo4j: Connected Data & Graph Analysis

$
0
0
Learn about graph algorithms in Neo4j along with connected data and graph analysis
Until recently, adopting graph analytics required significant expertise and determination, since tools and integrations were difficult and few knew how to apply graph algorithms to their quandaries and business challenges. It is our goal to help change this.

We are writing this blog series to help organizations better leverage graph analytics so they make new discoveries and develop intelligent solutions faster.

While there are other graph algorithm libraries and solutions, this series focuses on the graph algorithms in the Neo4j platform. However, you’ll find these blogs helpful for understanding more general graph concepts regardless of what graph database you use.

This week we’ll explore why graph algorithms are required to analyze today’s connected data.

Learn about graph algorithms in Neo4j along with connected data and graph analysis


Today’s Data Needs Graph Algorithms


Connectivity is the single most pervasive characteristic of today’s networks and systems.

From protein interactions to social networks, from communication systems to power grids, and from retail experiences to supply chains – networks with even a modest degree of complexity are not random, which means connections are not evenly distributed nor static.

This is why simple statistical analysis alone fails to sufficiently describe – let alone predict – behaviors within connected systems. Consequently, most big data analytics today do not adequately model the connectedness of real-world systems and have fallen short in extracting value from huge volumes of interrelated data.

As the world becomes increasingly interconnected and systems increasingly complex, it’s imperative that we use technologies built to leverage relationships and their dynamic characteristics.

Not surprisingly, interest in graph analytics has exploded because it was explicitly developed to gain insights from connected data. Graph analytics reveal the workings of intricate systems and networks at massive scales – not only for large labs but for any organization. Graph algorithms are processes used to run calculations based on mathematics specifically created for connected information.

Making Sense of Connected Data


There are four to five “Vs” often used to help define big data (volume, velocity, variety, veracity and sometimes value) and yet there’s almost always one powerful “V” missing: valence.

In chemistry, valence is the combining power of an element; in psychology, it is the intrinsic attractiveness of an object; and in linguistics, it’s the number of elements a word combines.

Although valence has a specific meaning in certain disciplines, in almost all cases there is an element of connection and behavior within a larger system. In the context of big data, valence is the tendency of individual data to connect as well as the overall connectedness of datasets.

Some researchers measure the valence of a data collection as the ratio of connections to the total number of possible connections. The more connections within your dataset, the higher its valence.

Your data wants to connect, to form new data aggregations and subsets, and then connect to more data and so forth. Moreover, data doesn’t arbitrarily connect for its own sake; there’s significance behind every connection it makes. In turn, this means that the meaning behind every connection is decipherable after the fact.

Although this may sound like something that’s mainly applicable in a biological context, most complex systems exhibit this tendency. In fact, we can see this in our daily lives with a simple example of highly targeted purchase recommendations based on the connections between our browsing history, shopping habits, demographics, and even current location. Big data has valence – and it’s strong.

Scientists have observed the growth of networks and the relationships within them for some time. Yet there is still much to understand and active work underway to further quantify and uncover the dynamics behind this growth.

What we do know is that valence increases over time but not uniformly. Scientists have described preferential attachment (for example, the rich get richer) as leading to power-law distributions and scale-free networks with hub-and-spoke structures.

Hub-and-spoke data, preferential attachment.

Highly dense and lumpy data networks tend to develop, in effect growing both your big data and its complexity. This is significant because densely yet unevenly connected data is very difficult to unpack and explore with traditional analytics.

In addition, more sophisticated methods are required to model scenarios that make predictions about a network’s evolution over time such as how transportation systems grow. These dynamics further complicate monitoring for sudden changes and bursts, as well as discovering emergent properties.

For example, as density increases in a social group, you might see accelerated communication that then leads to a tipping point of coordination and a subsequent coalition or, alternatively, subgroup formation and polarization.

This data-begets-data cycle may sound intimidating, but the emergent behavior and patterns of these connections reveal more about dynamics than you learn by studying individual elements themselves.

For example, you could study the movements of a single starling but until you understood how these birds interact with each other in a larger group, you wouldn’t understand the dynamics of a flock of starlings in flight.

In business you might be able to make an accurate restaurant recommendation for an individual, but it’s a significant challenge to estimate the best group activity for seven friends with different dietary preferences and relationship statuses. Ironically, it’s this vigorous connectedness that uncovers the hidden value within your data.

Hidden values in connected data.

Economist Jeffrey Goldstein defined emergence as “the arising of novel and coherent structures, patterns and properties during the process of self-organization in complex systems.” That includes the common characteristics of:

    • Radical novelty (features not previously observed in systems);
    • Coherence or correlation (meaning integrated wholes that maintain themselves over some period of time);
    • A global or macro “level” (i.e., there is some property of “wholeness”);
    • Being the product of a dynamical process (it evolves); and
    • An ostensive nature (it can be perceived). (Source: Wikipedia)

Conclusion


For today’s connected data, it’s a mistake to scrutinize data elements and aggregations for insights using only simple statistical tools because they make data look uniform and they hide evolving dynamics. Relationships between data are the linchpin of understanding real-world behaviors within – and of – networks and system.

In the coming weeks, we’ll explore the power of graph algorithms and the way that they reveal the dynamics of your ever-changing connected data, empowering you to understand your data in new ways and uncover patterns that are undiscoverable using traditional analytics approaches.


Find the patterns in your connected data
Learn about the power of graph algorithms in this ebook, A Comprehensive Guide to Graph Algorithms in Neo4j. Click below to get your free copy.


Read the Ebook


Decyphering Your Graph Model

$
0
0
Watch Dom Davis' presentation on how to decypher your graph model.
Editor’s Note: This presentation was given by Dom Davis at GraphConnect Europe in May 2017.

Presentation Summary


Graphs really are everywhere, and building your graph database model from the highest possible vantage point using natural language – and the language specific to your domain – helps you develop a model that truly stands the test of time.

Full Presentation: Decyphering Your Graph Model


In this blog, we’re discussing how to develop the best graph model for your particular domain from the highest possible level:



At the startup Tech Marionette, we’re building the next generation of configuration management databases. This is backed by Neo4j because the assets in an enterprise don’t live in silos of a relational database. They’re interconnected graphs. And graphs really are everywhere! It’s not just a catchy marketing slogan.

Finding graphs is easy, but modeling them is the fun part. Most basic texts on graphs start with vertices and edges:

Learn how graphs start with vertices and edges.

From there, we dive off into graph theory. That said, making the leap from the world of numbers and letters into something slightly more useful isn’t that hard. And being a property graph with Neo4j, we can embellish our data with some useful stuff.

But jumping straight into Cypher isn’t necessarily the best way to go about discovering the model of your world.

Building Your Model Using Natural Language


A graph is essentially a way of modeling the world using interconnected triples in the format of noun-verb-noun. Take the below example, graphs (noun) are (verb) everywhere (noun):

Watch Dom Davis' presentation on how to decypher your graph model.

It’s just English. The astute among you may notice there are countless other languages out there, many of which don’t follow this particular format. But we can make this work for any language regardless of the order of subject, verb and object. You simply reason about your model in your natural language and then map it back to the subject-verb-object field graph when you’re done.

Building a Natural Language Model in Your Domain


If you’re going to model the world, let’s start with the nouns of that world. If I was going to model this conference, we might start with the nouns below:
Discover how natural language processing workings for a graph model.

Taking our nouns, we can then form sentences with verbs:

Learn how natural language processing works by connecting verbs to nouns.

We’re creating a model that we can reason about because it’s using natural language. And once we have our nouns and our verbs, we have labels and relationships. The graph model just falls out nice and easy:

See an example of Cypher natural language processing for a graph model.

Now we can start embellishing our data. A “speaker” has a name, and the phrase “has a” implies a property. “Room” also has a name, and “talk” has a title and a start time – but does it have an end time? Or does it have a duration?

This really comes down to the question you’re going to be asking of your model. Questions like “How long did I spend in talks?” and “How long did I spend giving talks?” are possibly better answered with a duration, because it’s an easier calculation. But a question like “Will I be out of talk A in time for talk B?” may be easier with an end time.

We could put “company” and “roles” as properties of the speaker, but someone could have multiple roles at different companies. Also, “speaker has role at company” looks very much like verbs and nouns. Not only that, but “delegate has role at company,” too.

So let’s build these as part of the model, not tucked away inside properties.

Now we have the basis of a model that we’ve developed using language that’s easily understood, even by people who aren’t familiar with Cypher or Neo4j. This allows you to speak with these domain experts and build your model using natural language.

From Model to Graph


There are some considerations you need to take into account when you convert your model into a graph. While our verbs made sense of our nouns, we’re now viewing the world as instances of those nouns:

See an example of a graph model.

While “speaker has role” in our model made sense, “Dom Davis has CTO” doesn’t work in English, which shows that the semantics of our model didn’t survive the translation into the graph world.

I’ve highlighted another potential issue by having “role” as a one-to-many relationship, which requires the speaker-to-role relationship to be one-to-one.

To understand why, we need to look at a slightly different data set:

Learn about what a flawed graph model looks like.

Person (blue) has role at company (green). Because director (yellow) has many relationships in and many relationships out, with this particular model, it’s impossible to tell who is the director of which company.

Instead, we need to have an unambiguous route or path for us to follow with the below, Model A:

See an example of a graph model.

But this isn’t the only way we could have modeled the data. If we just care about companies and company directors, Model B might actually be more sensible:

A graph model connected more nodes.

Data-wise, Models A and B are pretty much the same. Although Model A is more flexible, having hundreds of different relationships between roles is not a good design.

We can store properties like “start dates” on the role, as well as on the “has_director” relationship, but we can’t index those properties and they are extremely inefficient to search. Relationship properties are really only there to help you make a traversal decision, or to give you data once you’ve made that particular traversal.

If you’re going to search on relationship properties, that’s a sign you may need to stick them in a node — even if adding extra nodes into the model may be an alien concept. But unless you have an atomic node and an atomic relationship with no properties, your graph could always be described with more nodes.

Diving Into Cypher


While (:Speaker {name: "Dom Davis"}) can also be written as (:Speaker)-[:HAS_NAME]->(:'Dom Davis'), let’s consider the “speaker-has-role” path in determining which works better:



For my conference profiles, I needed to include a primary role that could be dropped into my bio. We could tag this in our relationship with the “has role.”

But wherever you see this particular construct…



… you can also replace it with a specific relationship type:



You can also record it as a new node, which in this case has something coming off the role:



The abundance of ways to describe things within the graph is why you really want to drive the model with the language of the domain, not from the Cypher query.

If you consider the graph model that I’m working with, we have the idea of concepts, properties and relationships. This might sound like one-to-one mapping with nodes, properties and relationships in the graph, but it’s more complex than that. I have no idea how many properties a particular concept may have, and I have no idea what they’re going to be called.

Hopefully, we all agree that the below setup is absolute madness:



And while this next example is more extensible, I don’t want to see the query plan for things like “find me all the concepts with a ‘name’ property:”



Instead, we looked at how we described the domain. Concepts have properties, so while “has a” implies a property on a node, “has many” implies relationships and nodes.

The solution is the following, which effectively defines property nodes using property nodes and relationships (which is all very meta):



And then below this, we have the idea of instances, which have values:



So we’re defining a schema on our graph and then storing data under that schema.

In fact, we even have a schema node, which lets us do some really interesting stuff in our meta-model. Because the concepts defined in our model can called different things by different people, we can include the idea of “aliases” and “primary language.” You can then define aliases on that model and start asking questions using the terms you would naturally use.

Take a ticketing system for example. You could talk about any ticketing system you’d like, such as Jira or Bugzilla, and each ticketing system could have tickets, issues or tasks. All you have to do is add the aliases:



Conclusion


While the building blocks that Neo4j provides are simple, they’re also incredibly flexible and powerful. In the preceding example I’ve used them to model something that’s very basic, but which in itself lets you model something quite complex.

Have I just reinvented the wheel? No, because when we came to model the domain, we weren’t talking about ticketing systems. We were talking about arbitrary concepts — schemas, concepts, properties, relationships, instances and values — with properties and relationships between them. These were my nouns as I was discussing the domain.

We shouldn’t ignore what the language of the domain is telling us. If we wrote our model at the level of labels, nodes, relationships and properties as our nouns, we would continually have to change our queries and extend our query library every time the model changed.

When we use our model with more arbitrary concepts, it provides us with two models to describe and reason about: the meta-model. The meta-model is mostly complete and static, while the new model is evolving. And we reason about it using the same type of language, and the same advice applies because it really is just graphs all the way down.


Want to learn more about graph databases and Neo4j? Click below to register for our online training class, Introduction to Graph Databases and master the world of graph technology in no time.

Sign Me Up
Viewing all 1139 articles
Browse latest View live