Introducing the New Blazing-Fast Query Optimizer for Neo4j

April 1, 2018, 12:00 pm

≫ Next: A BIG Thank You to the Neo4j Community! [GraphTour Update]

≪ Previous: This Week in Neo4j – Graph Visualization, GraphQL, Spatial, Scheduling, Python

SAN MATEO, Calif. – APRIL 1, 2017 – As a strong kickoff to the second quarter of 2018, Neo4j, Inc. is officially announcing a new, faster-than-ever Cypher query optimizer as part of the Neo4j Graph Platform.

If you're reading this about Neo4j, then you've been fooled. Happy April Fools' Day 2018!

The new query optimizer far exceeds the performance of the old rule-based planner and the more recent cost-based planner (nicknamed “Ronja“) for Cypher queries of any length or complexity. We’re confident that our enterprise customers and community users alike will enjoy the benefits of this newer, faster and more humanly accessible query optimizer.

The Backstory

The development roots of the Neo4j graph database go all the way back to the turn of the millennium, but the development roots of our new query optimizer date all the way back to 1975 (when East Germany was still a thing).

Early on in building the company that would later become Neo4j, Inc. (even before we hired employee #10), CEO Emil Eifrem knew he needed a blazing fast way to optimize graph queries for his new graph database. His reasoning was that if the most powerful graph database on the planet is the human brain, then the best query optimizer would be one that could operate at the speed of thought.

In order to make this dream a reality, Emil did what most of us would have done in similar circumstances: Embark on a “geek cruise” around the Baltic Sea (seriously though, who does that?).

Hours after presenting his graph database to a bunch of fellow geeks on the cruise, a connection was made: A way was introduced to integrate this sort of heuristic optimization approach into Neo4j. The year was 2008.

Two years later, that approach would become more deeply embedded within the company, even while other Cypher query optimizers were also in development. And now, a full decade after that world-changing moment, we’re proud to release this new query optimizer to the world. (For a grand total of 43 years of development!)

Introducing the Der Hunger Query Optimizer

In the spirit of our other Northern-European-nicknamed query optimizers (looking at you, Ronja), we’ve christened this new one the Der Hunger optimizer.

Why that name? First, it just seemed natural. When Emil first made that connection back in 2008, the name just came with the optimizer. Second, Der Hunger is hungry. Hungry for graphs, and hungry for helpfulness.

Der Hunger is 10000x faster than our standard cost-based optimizer (hit the bricks, Ronja!) and will be slowly rolled out to all Neo4j instances as quickly as we’re able to scale the solution (pending FDA and probably OSHA approvals). Until then, optimization times will just have to wait their turn.

The Der Hunger optimizer will also fully integrate with all aspects of the Neo4j Graph Platform, including even the lesser-known Neo5j graph database, Neo4j Matrix Edition, Neo 4 Java and NullDB features.

While planning and optimizing your queries as fast as Der Hunger can manage, its AI-comparable algorithms also answer millions of questions on Stack Overflow, Slack and Twitter. Why? Because it never sleeps. No really, we can’t emphasize it enough: Der Hunger never sleeps. Like ever. (“He” was born that way.)

In fact, we’ve even been able to tune the neural network functionality behind Der Hunger in order to get it to write a few blog posts for us. You’ve probably read one without even knowing it. (Der Hunger writes under a pen name for obvious reasons.)

How Can You Use the New Optimizer?

Similar to other experimental features in our Cypher runtime, the new optimizers can be enabled with a query prefix.

This will cause your query to be sent for processing by the new optimizer to create a highly efficient query plan which is then re-incorporated into your execution plan cache for further use.

Just use:

cypher planner=DerHunger 
MATCH … my complex query …

Der Hunger’s Exclusive Beta Release This Spring

With the start of Q2 2018, Der Hunger will be made available in an exclusive beta process to both our enterprise customers and community users alike.

Why not just customers exclusively? Because the Der Hunger optimizer loves graphs and values relationships too much; he simply can’t exclude optimizing someone’s queries just because they’re not a paying customer.

While innovating with query optimization, we figured we may as well innovate in our business model as well. During the beta stage, all maintenance fees for Der Hunger optimizer will be paid in coffee and sushi.

Unfortunately, the beta release will be limited because if everyone jumps on using the new optimizer, it will result in a worldwide DoH (Denial of Hunger) attack across all Neo4j instances.

Finally, even though Neo4j data is built to maintain strict ACID compliance, the Herr Hunger optimizer will use an eventually consistent approach until our IRB finally allows us to just clone him already.

Ready to sign up for the beta? Click the button below to get added to the waitlist:

Optimize My Queries, Herr Hunger!

The post Introducing the New Blazing-Fast Query Optimizer for Neo4j appeared first on Neo4j Graph Database Platform.

↧

A BIG Thank You to the Neo4j Community! [GraphTour Update]

April 6, 2018, 6:34 am

≫ Next: Graph Algorithms in Neo4j: How Connections Drive Discoveries

≪ Previous: Introducing the New Blazing-Fast Query Optimizer for Neo4j

Learn how awesome the Neo4j community was (and is) during the EMEA GraphTour in 2018

Graph love is going global, and we’ve taken note.

That’s why this year, instead of one centralized conference in Europe, we went on tour to eight cities across Europe and the Middle East.

While eight one-day events in eight cities may seem like a lot, we just weren’t satisfied. We wanted to do more. That’s where the Neo4j community (you!) came in, helping us to organize the community GraphTour!

How the Neo4j Community Stepped Up

Luckily for us, the Neo4j community shares our passion and enthusiasm for changing the world through graph technology. And, because our community lives in every city in the entire world (and because they’re awesome) they took the opportunity to organize a Neo4j Community GraphTour on their own. The response was astounding!

Across all the countries in the world – all the different languages, cultures, ideas, and people – there’s one thing that undoubtable connects us: our love for graphs. We received over 60 submissions in over 24 different cities of individuals who wanted to be involved in the Neo4j Community GraphTour – from students and researchers, developers and architects, CTOs and CEOs, and more!

Here are all of the European & Middle Eastern cities where a Neo4j Community GraphTour happened (or will happen very soon):

Check out all of the cities where the Neo4j community planned events in conjuctions with the GraphTour in Europe and the Middle East

While the Neo4j team was available to help with guidance and support, the organization of these events was in the hands of the community. From finding the venue, scheduling the speakers, preparing the talks, ordering food, publishing and promoting the event – it was all managed by the Neo4j community.

Our Biggest Thank You!

We know how much work it is to organize events like this, and we want to extend our deepest gratitude for all our community members who volunteer their time and energy to contribute, teach and lead others to make the Neo4j community stronger.

All combined, your enthusiasm for graph database made the Community GraphTour events so successful. We take off our hats and thank you, Neo4j community!

It’s because of people like you that (graphs)-[:ARE]->(everywhere).

–Karin Wolok, for the Neo4j team

Want in on all of this graph love?
Click below to register for the North America GraphTour happening in a city near you all across the U.S. and Canada – and connect with Neo4j community members from around the globe.

Get My Ticket to the GraphTour

The post A BIG Thank You to the Neo4j Community! [GraphTour Update] appeared first on Neo4j Graph Database Platform.

↧

Graph Algorithms in Neo4j: How Connections Drive Discoveries

April 9, 2018, 8:26 am

≫ Next: Neo4j on All the Swag: The GraphGear Store Is Here! [+Get a $10 Discount]

≪ Previous: A BIG Thank You to the Neo4j Community! [GraphTour Update]

Graph algorithms are the powerhouse behind the analysis of real-world networks — from identifying fraud rings and optimizing the location of public services to evaluating the strength of a group and predicting the spread of disease or ideas.

Learn how to drive discoveries in connected data using graph algorithms in the Neo4j Graph Platform

In this series on graph algorithms, we’ll discuss the value of graph algorithm and what they can do for you. This week, we’ll take a look at how powerful graph algorithms offer a practical approach to graph analytics and review an example of how to find the most influential categories in Wikipedia using Neo4j.

Algorithms: The Graph Analysis Powerhouse

Based on the unique mathematics of graph theory, graph algorithms use the connections between data to evaluate and infer the organization and dynamics of complex systems. Data scientists use these penetrating graph algorithms to surface valuable information hidden in connected data. They then use this analysis to iterate prototypes and test hypotheses.

A Practical Approach to Graph Analytics

Graph analytics have value only if you have the skills to use them and if they can quickly provide the insights you need. Therefore, the best graph algorithms are easy to use, fast to execute and produce powerful results.

For transactions and operational decisions, you need real-time graph analysis to provide a local view of relationships between specific data points. To discover the overall nature of networks and model the behavior of intricate systems, you need global graph algorithms that provide a broad view of patterns and structures across all data and relationships.

Other analytics tools layer graph functionality atop databases with non-native graph storage and computation engines. These hybrid solutions seldom support ACID transactions, which can ruin data integrity. Also, they must execute complicated JOINs for each query, crippling performance and wasting system resources.

Alternatively, you could maintain multiple environments for graph analytics, but then your algorithms aren’t integrated with – nor optimized for – a graph data model. This bulky approach is less efficient, less productive, more costly and greatly increases the risk of errors.

Real-time graph algorithms require exceptionally fast (millisecond-scale) results whereas global graph algorithms can be very computationally demanding. Graph analytics must have algorithms optimized for these different requirements with the ability to efficiently scale — analyzing billions of relationships without the need for super-sized or burdensome equipment. This kind of versatile scale necessitates very efficient storage and computational models as well as the use of state-of-the-art algorithms that avoid stalling or recursive processes.

Finally, a collection of graph algorithms must be vetted so your discoveries will be trustworthy and include ongoing educational material so your teams will be up to date. With these fundamental elements in place, you make progress on your breakthrough applications with confidence.

Example: Analyzing Category Influence in Wikipedia

Let’s look at an example of how to use Neo4j graph analytics to analyze the most influential categories in Wikipedia searches.

The graph below shows only the largest of 2.6 million clusters found with the most influential categories in green. It reveals that France has significant influence as a large cluster-category with many, high-quality transitive links.

Using the PageRank graph algorithm to look at category influence in Wikipedia

The Neo4j Label Propagation algorithm grouped related pages as a cluster-category in 24 seconds and then PageRank was used to identify the most influential categories by looking at the number and quality of transitive links in 23 seconds (using 144 CPU machine and 32GB RAM of 1TB total, SSD).

Conclusion

Graph algorithms must be optimized to support different use cases. Real-time recommendations demand real-time graph analysis, while finding patterns in large datasets requires global graph analysis. Optimized graph algorithms support all use cases with high performance.

In the coming weeks, we’ll take a closer look at the Neo4j Graph Platform that supports graph analytics, including its performance versus Spark GraphX. We’ll also explore specific examples of the wide range of powerful algorithms that Neo4j supports.

Find the hidden value in your connected data with powerful graph algorithms: Click below to get your copy of Optimized Graph Algorithms in Neo4j and learn how Neo4j jumpstarts your analysis for any use case.

Read the White Paper

The post Graph Algorithms in Neo4j: How Connections Drive Discoveries appeared first on Neo4j Graph Database Platform.

↧

Neo4j on All the Swag: The GraphGear Store Is Here! [+Get a $10 Discount]

April 10, 2018, 12:00 am

≫ Next: Neo4j as a Critical Aspect of Human Capital Management (HCM)

≪ Previous: Graph Algorithms in Neo4j: How Connections Drive Discoveries

A lot of folks in our community ask us about Neo4j swag. They are excited when they receive it, and they wear it often and with pride.

Totally excited to wear my new @neo4j shirt! It arrived Saturday, just in time for my birthday! pic.twitter.com/Z6Gl4oBFRi
— Jeffrey A. Miller (@xagronaut) April 2, 2018

We took a moment and thought: Why do people love Neo4j swag so much? and came to the conclusion: people use their clothing to project their passions and personal identity. For those who are in the ridiculously awesome Neo4j community, they want the whole world to know they’re a part of something great.

Introducing the GraphGear Store

In order to let graphistas show off their part in the Neo4j community, we decided to create the GraphGear Store to make Neo4j swag available to all the graph lovers around the world!

Learn about the new GraphGear store that has all of the Neo4j swag you could ever, ever want (ever)

In the online store, you’ll find stickers, hoodies, backpacks, T-shirts, and even shoelaces. So, you can really rock your favorite graph database from head to toe. We even have onesies for the youngest graphistas in your life!

My beautiful baby new node Kaia Aubrey wanted a @neo4j shirt like her daddy. She got one thanks to @kotisdesign ! pic.twitter.com/MgMEL846MN
— ryan boyd (@ryguyrg) May 26, 2016

New items arrive seasonally, so take a look around and make sure to keep up with all the new things we have in store. The GraphGear store is also a great place to find a gift for a graphista in your life!

Have an idea for a swag item we don’t yet have in store? Let us know! Email: community@neo4j.com.

Note: Our merch warehouse is in the US, but we ship internationally. Get more info on shipping policies here.

4 Tips on How to Get Neo4j Swag for Free

Tip #1: Get Certified

Want a free Neo4j T-shirt? Every Neo4j Certified Developer gets one. Click here to get started on Neo4j certification.

I finally recieve my awesome #neo4j certification T-shirt . pic.twitter.com/q07koynpfk
— Nabil Belakbir (@nabilblk) April 28, 2016

Tip #2: Organize a Neo4j Event

Are you organizing an event around Neo4j and want some swag for it? Reach out to us at community@neo4j.com and we can send you some free swag for your event.

Some sweet @neo4j swag showed up at the office. pic.twitter.com/7nVu0J2LUu
— Chris White (@Rwdsubi) January 12, 2018

Tip #3: Help the Neo4j Community

Think you deserve a free piece of swag for something you did for the Neo4j community? Email us at community@neo4j.com and tell us why!

Thanks for the #hackathon swag @neo4j #graphQL pic.twitter.com/Jkb1kxqmNc
— Bonnie Brennan (@bonnster75) June 8, 2017

Tip #4: Complete Our New Developer Survey

Want $10 off for your first purchase at the GraphGear store? Fill out this new Neo4j developer survey for your discount code!

What Are You Waiting For?

We are so excited about the new GraphGear store, and we know you will be too.

Catherine just received a box of awesome from @ryguyrg and @neo4j.

I had to catch myself, there was some seriously cool stuff here.

Props on including a webcam cover, installing that on her Chromebook ASAP. pic.twitter.com/SjE2o8k7PN
— 🅲🆁🆃🅴🆁 (@crtr0) March 20, 2018

Check out the the GraphGear store today and get your own box of awesome!

Seriously, though:
Treat yourself to $10 worth of Neo4j swag at the GraphGear store just by filling out this super-simple survey:

Get My $10 Discount Code

The post Neo4j on All the Swag: The GraphGear Store Is Here! [+Get a $10 Discount] appeared first on Neo4j Graph Database Platform.

↧

Neo4j as a Critical Aspect of Human Capital Management (HCM)

April 11, 2018, 12:00 am

≫ Next: Meet SemSpect: A Different Approach to Graph Visualization [Community Post]

≪ Previous: Neo4j on All the Swag: The GraphGear Store Is Here! [+Get a $10 Discount]

Editor’s Note: This presentation was given by Luanne Misquitta at GraphConnect Europe in May 2017.

Presentation Summary

In this presentation, Luanne Misquitta shares her facility for engaging with the challenges of human capital management (HCM) in contemporary organizations using graph technology.

She describes the the evolution of HCM and how contemporary organizations are made up of teams that form and reform depending on current needs, and how this translates into the traditional problems of recruiting, learning, performance evaluations, and talent management.

She explains how graphs can help you find hidden potential in your organization and address the latest HCM trends including people analytics and viewing your organization as networks of teams. She explains organizational network analysis and describes how graphs enable you to quite literally bring it all together and take your HCM to a new level.

Full Presentation: Neo4j as a Critical Aspect in Human Capital Management

What we are going to be talking about today is the application of Neo4j in human capital management (HCM).

The Evolution of Human Capital Management

Human capital management (HCM) is very interesting to me personally because I started my whole journey with graph technology working at a people management company. Today I’m going to talk about the application of Neo4j in people management or HCM, as we know it.

We are not going to talk about HR – a very old concept which has been transformed into HCM.

HR was human resources. It’s always been a part of most companies, and back in the day it dealt with personnel files, paying salaries, generating performance reviews, awarding bonuses, and resolving minor conflicts. But that’s about all it was. It was called human resources, and nobody calls people working in companies “resources” anymore (or at least they shouldn’t).

Human capital management is now focused on people as capital for your company. People are the most valuable asset you have. They drive your profits. Good teams contribute to your revenue. Bad teams can do terrible things to your company.

So HCM manages the life of a person in her career while she is at your company. Also today, HCM is important because people are the ambassadors of your company, thanks to places like Glassdoor, where people can go and publicly complain (or compliment) their boss and their companies.

In the ’50s, when the whole industrialization movement started – mostly driven by Henry Ford – the focus was on operational efficiency. You drove profits by operations. People were tools to get the job done, so HR was very applicable at this time.

But then there were new management styles that started being introduced between the ’60s and ’80s that changed the structure of the organization and those styles made organizations more hierarchical, where the executives had all the power. People were heavily managed. They were managed by goals. They were managed by objective.

In the 1990s, with thought leaders such as Steve Jobs, you begin to focus more on the people in your company but the primary focus is still on the individual. You had star performers in your company. And those were the people you recognized. You also had people who were not star performers, and as you’ll see, it is a bit of a problem when you focus only on individuals.

Teams and Graphs

Customer service became a driving factor, and therefore top-down, hierarchical management no longer worked because it never goes hand-in-hand with customer success. Today, people are still important but the focus is not on the individual: it’s the team that’s important.

So if your team performs well, you do well collectively. It’s no longer acceptable to have a high-performing individual who’s not a team player because this causes more problems in the long run. You still have high-performance individuals, but then you have to find a role that suits them because they are not team players.

So we are in an age where we talk about networks of teams and the moment you see networks, it automatically indicates that you should probably be using a graph.

A graph of a person's connections in a company

People in companies are very connected. Everything about a person at your company is connected. Think about skills, what position he occupies, what position you want to promote him to. And what if he wants a different position?

Her experience, the teams that she belongs to, her certifications: there is a well-defined career path that you as an organization would like to have your employees work on because that helps you retain employees. Endorsements, awards, education: everything is connected. They are connected to a person, but they are also interconnected.

Because everything is connected, there is value in relationships. People data is highly connected.

Any organization is made up of connections. And this data is not always very structured, especially nowadays because you have internal social networks and employees providing feedback on various items in various forms. Your data could start out sparse and it could grow. It is no longer something that you know upfront, plan for and put it into structured tables.

More importantly, there are very interesting implications in your second- and your third-level relationships. You’ve already seen what the first-level relationships are. You can do a lot with just first-level relationships. But you can do even more with second- and third-level relationships.

For example, what skills do people have which are not part of their formal job description? You probably hired someone based on skills that you were looking for. Most likely, they have additional skills that you can leverage in your company. But do you know what those skills are? Do you know how you can apply them in your company?

Some people serve as top talent attractors. They attract more and more talent into the organization. And if you don’t know who these people are and they leave, what it really implies is that you’ve lost people who bring in good people. So how do you find top talent attractors? What learning activities do you recommend to your top performers to get them ready for the next step in their career path?

How do you retain top performers? You give them interesting work. You enrich their career path. You put them on track to be promoted. But how do you find top performers? How do you know what they want to do? How do you recommend things for them to do? It all goes back to what Neo4j CEO Emil Eifrem has always said in the past: what is core to your business has to be connected. And people are core to any business.

Now let’s talk about the Neo4j graph database. It’s actually very simple. It’s a labeled property graph model – not relations but properties.

What’s important here is that your relationships are first-class citizens. A relational database (RDBMS) also has relationships, but they are not first-class entities. And it’s very difficult to extract value out of those relationships.

With Neo4j, there’s no need to infer connections using out-of-band processing or using foreign keys. It’s very explicit. It’s right there for you to exploit. Second, it’s flexible. You can find patterns easily. It’s efficient to query, and it’s schema-free so it’s very easy to evolve over time, unlike traditional database tables where you have to know your data structure in advance and when you need to change it you’re in a lot of trouble.

With Neo4j you can easily evolve, which is very relevant to today’s organizations where the amount of data you collect from various sources keeps growing and changing over time, and you need to be able to model this data quickly to take advantage of it before it’s too late.

A company I worked at had complex structures like this:

Complex connections within an RDBMS data model for human capital management (HCM)

Everything was put into tables and columns, but the fact is that your data does not really fit into tables and columns. It’s forced into that structure because 10 years ago that was all you had.

The problem with this is that it’s easy to put the data in, but when you want to get it out and examine relationships between data elements, it gets harder and harder, as you join across more and more, and still more, tables. Next you start to denormalize the tables. And then you end up with the inverse of what you actually wanted. You wanted a structured database but it didn’t fit, so you denormalized it. Now it’s not structured, and it’s still not efficient.

It was when I was trying to find direct and indirect reports in an organizational hierarchy that I first started using graph database technology. There are only so many CONNECT BYs you can do. In a relational database, you sometimes get into complicated stored procedures to find very simple things in deep hierarchies. But if you were to translate it to Cypher, it would be this:

A Cypher query for finding direct and indirect reports in an organization

This is simple, easy to understand, easy to write, and much, much faster than you would ever get out of a relational database. And this is the core of people management. It’s your reporting chain. Imagine how much more you can do when you extend this to all other aspects of your organization.

Talent Management Analytics

People management suites traditionally have been broken down into a couple of areas, and they’ve been fairly isolated silos.

Recruiting and onboarding: find people, get them into your organization, and train them.
Formal training processes: These are needed for two reasons. One is because you need to train employees to get them new skills so you can get customers, and the other is for compliance reasons.
Performance: The much-dreaded performance review is a whole product in people management suites. It exists to rate employees and deliver a review.
Talent: This is one of the later modules that came about, maybe 12 or 15 years ago, and it focused on retaining talent, identifying people who were likely to leave.

The reality is, people management software has really been more or less the same for all these years. If you continue reading the reports produced by analysts, such as Bersin, you will find that they mostly say the same things over and over again. There hasn’t been a good solution to all this.

But apart from these modules is one more which is fairly hot these days: people analytics. About 71% of companies see people analytics as a high priority. This is from Deloitte’s 2017 Human Capital Trends report 2017. It’s a very high priority, but where are we with people analytics?

The talent analytics maturity model in human capital managment (HCM)

If you look at the bottom, the lowest level is Operational Reporting, which is a standard report for compliance. Then you go into Advanced Reporting where you have analysis of trends, maybe some kinds of benchmarks.

Next is Advanced Analytics, where you’re trying to find issues and solutions through data. The top level is Predictive Analytics where you’re looking for things like which people are likely to leave. How is my company growing? What does my succession pipeline look like? Who do I think will fit this role in the next five years?

Sadly, 56% of companies are still at level one. And although 71% of companies think people analytics is very important, there are really just 4% doing this type of analytics. And this has been a trend for a fairly long period of time considering that people analytics is not new. It’s not this year’s new technology. It’s been around for a while.

What are the problems and why are we still at this operational reporting level? I’m going to walk you through what the traditional products are that comprise a people management suite.

Recruiting and Onboarding

Let’s start with recruiting and onboarding because that’s where people enter your company.

You no longer – well, we hope that you no longer – recruit people by cold calling them and not knowing anything about them, operating on a standard job description. When such people enter your company you don’t know what to do with them or you put them into a role that they didn’t join your company for in the first place.

Most large companies go through this. I think startups are off the hook here.

What has changed in recruiting is before you had human resources with very large job descriptions that were collated over time and then sent out to the market. You got candidates after a very long cycle. By the time you got them, maybe the reason why you wanted them in the first place had changed. This is very common nowadays.

But the other more important issue is now the model of working is moving to teams. You want to assemble teams for the purpose of a particular project. You assemble teams with complementing skills, they do the job, they disassemble and they move on.

This is very similar to SWAT teams or what they call the Hollywood principle where you assemble a cast for a movie. They have the right skills for the job at that given point in time. They do the job well. And then they disassemble and various parts of the team go on to form other teams.

You no longer want that team that’s been in your company for 10 years. They’re just there as a team because that’s the way they’re painted on your organization chart. That also leads to problems such as: How do you train them? How do they learn new things?

When you start recruiting for a team, you need to be very agile. You need to know immediately what this person needs to do in the team. What is the role he or she needs to fulfill? What skills do they need? And then you need to recruit very precisely.

You don’t want to recruit someone and then have them go through a long training program; by then your team has moved on. These are some of the changes in recruiting and onboarding. And you will note that it’s very much linked to your team composition, the positions open at the time, the skills required, and the learning required.

Learning

The next is learning or training. It could be for compliance, and it could be for retaining people.

From BBC Capital, a study showed that only 12% of the employees apply new skills learned in their training to their actual jobs. And this shows a very large disconnect between training departments and teams actually working on the ground. What you are trained for is not necessarily what you can apply to your job.

This is good in one sense because employees learn new things, but it’s also bad because they learn irrelevant new things. And then they take that skill and leave your company to apply their learning somewhere else where it’s relevant.

You might think that you are training to retain, but if you train for the wrong thing, you actually lose people because they find that they can no longer use those wonderful new skills that you’ve trained them on. You have nothing for them to do.

So this is a problem with training, with learning really. And with learning, there are a couple of stages that are not addressed very well in today’s learning management suites.

The first stage is immediate learning where you need to learn enough to be good at your job right away. The second is intermediate, where you need to catch up on competencies and grow in your current role. You’re in a position. You want to grow further somewhere, so what do you learn? What should you learn?

This depends on two things. First, what is your employee interested in doing? Second, what career plans do you have for your employee? This is intermediate learning.

Finally, you have long-term goals that are more transitional about what do you need to do as a person to grow your career to some number of years down the line. If companies can help employees do this learning or provide them with enough opportunity to find this learning themselves, they end up retaining these people and also building succession plans.

Here’s a simple example (pictured below), if you were to put learning into a graph. Learning can be quite complex, especially when you go down to competency management, fulfilling competencies in a variety of ways. You have different ways to acquire certifications, and you have a lot of compliance around certifications, which expire at various points in time.

For some of them, you take an online course. For others, you must attend an instructor-led course. Some of them require a combination of courses and work experience and so on. It’s hard to query. It’s hard to find out – given a person and an end goal – all the steps she needs to do to acquire all the skills if you are querying a relational database.

Employee learning recommendation engine graph data model

But if you’re using a graph, it is extremely easy. You can already see that you’ve got a person who is interested in a position. Now that position requires a certification, which is fulfilled by a couple of learning activities. Some of them are courses; some of them are work experience.

The interesting thing is this person already holds a certification. He’s completed the learning. He’s completed the work experience. And that work experience is also applicable to the new certification that he needs for that position.

So really the delta that he’s looking at is attending one course. And that is a very easy answer. It is very easy to motivate people to actually learn something to move them up. And it’s relevant because you know that this skill is needed for the new position.

Performance Evaluations

Everyone hates performance evaluations.

82% of companies report that they are not worth the time. Most of them report that there’s widespread manager bias and the results do not motivate anyone. What has changed here in performance management is the time that millennials started joining companies.

Millennials are used to providing feedback. With everything you do on the Internet, you provide some sort of feedback. So if a manager can rate me, I also want to rate my manager. I think that’s fair, right? Most employees will agree that you want to rate your managers and you would expect that if your manager had a bad rating that you would no longer work for that manager, but we all know it never happens. Managers rate you badly, but that’s about it.

Performance management is really important because of the whole feedback trend. You need people to take action on their performance reviews and this actually links back to learning. You need to know who to promote. You need to identify the kinds of people in these teams who could be high-performance people but are not team players. They’re still very valuable to your organization, but maybe you need to move them out of your team.

Talent Management

Talent management is extremely important, as everyone knows. You want to find people who attract talent. You want to recommend learning to people so that they can take on the next challenge. You want to find new skills that people have which you are not immediately aware of just because it’s not on their official job description.

There are a lot of studies over the years that show that intelligence, agility, self-management, and self-discipline are characteristics of high-performance individuals, and that high-performance individuals learn by doing. These are the kinds of people who you want to identify and you want to promote because that is how they learn. And there are always characteristics that apply to certain kinds of people. This is highly related to talent management, and it’s still a problem that people are trying to solve.

Finding Hidden Potential

Here’s another use case: Finding hidden potential if you were to model your organization in a graph.

Learn how Neo4j plays an essential role in making your human capital management (HCM) more effective

In the image above, we have a person in the center in blue, and he holds certifications, has some skills, currently works in position 1, has some work experience, but he also wrote blogs, maybe externally, maybe on your internal company network.

He has attended various courses, and he’s interested in another vacant position – which you may or may not know directly. You might know because you just did a performance review. You might know because he’s applied to that position through your internal hiring portal.

What’s also important is that you get a lot of information. Sure everyone writes a blog, but how good is the blog? What is the blog about? Maybe it’s about skills that you did not know this person had, or maybe it’s validated by an expert in your company who has a skill needed for the position that your person is interested in.

So you’re basically endorsing people internally by finding out who has who has endorsed him and validated the fact that he knows what he is talking about. Who has rated him, who has reviewed him? And therefore this is a real use case from a previous place that I worked at where you had people who had all these skills and wanted to be in positions, yet the company was hiring from the outside. They just didn’t know that these people were sitting in their organization.

When we put it into a graph, we surfaced these people and, in fact, those four people went on to be the new iOS team and were hugely successful afterward.

How hard was it to create that query? We called it “finding hidden potential.” If you have a graph of your organization, it only takes a couple of hours to write the query and surface the information you need.

You just need one Cypher query of this form (pictured below), which basically says, find the position that’s vacant, the skills it requires, people who have these skills, preferably people who have been validated by other people who are hopefully experts. It’s very simple. Even if you do nothing fancy, this query will give you a lot of information on skills that you didn’t know existed in your company.

A Cypher query example of finding employee hidden potential in an organization

People Analytics

Now we get to people analytics, which brings together what was once called human resources and your business data. Neither makes sense alone. You have to merge them.

You need to identify high-performing people. You need to analyze flight risk. Most companies are fairly surprised when key people leave. This presents compliance risks, which goes back to learning, identifying career paths, and having people on this career path.

You have a very high-value career path with someone who has probably entered a leadership position. But do you have anyone in your company that you can actually groom for this position?

What are the characteristics of high-performance sales teams, for example? These are very strong use cases for people analytics. And once you find out why your sales teams are so good, you figure out which people comprise that sales team. What are their characteristics? You then you look for similar people – whether inside or outside your company.

A fairly recent example of people analytics being applied to most domains is sentiment analysis, which is extremely important in companies for two reasons. One is for negative sentiment, where you identify toxic people. Toxic people have a way of spreading negative vibes quickly. The sooner you find these people and the sooner you remove them, the better.

On the positive side, some people are very good at communicating messages. They amplify messages. And finding these people is also important because sometimes companies need to push new policies which might be slightly controversial. But if you find the right people to convince and carry that message throughout your company, you will have a much higher success rate rather than sending out an official notice from the CEO. We know how that works.

Sentiment analysis is extremely important: What do people feel about the projects they’re working in and the people they’re working with? The managers they’re working with? It ties all the way back to performance and learning.

Networks of Teams

Let’s take a look at what is interesting in this year’s predictions, especially the one from Bersin. All through the previous years, they have been talking about finding one magic database that will combine information from all the systems. But this year, it looks like everyone has finally accepted that this is never going to happen.

There are too many large companies with too many large systems and a lot of investment in them. So these systems of record are here to stay. It’s good that it’s finally put to rest: You’re not going to replace everything with one database. And that is perfect because what you want to do is extract what is valuable insights from all these systems of record and connect them.

Here’s another interesting trend. Earlier when these reports started off, from Forrester and others, they started with a hierarchy for team structure. Over the last two years, they started talking about networks of teams, which is where things are now. But in this year’s report, they’ve actually shown how teams keep changing and how they’re united: they are united by common goals and values, reward, feedback and sharing knowledge.

An organizational chart of a network of teams

This is very important and represents a marked shift from last year. It did not even exist, although we spoke about it. And this validates where we are going: everything has to be connected for you to get value out of it.

Organizational Network Analysis

The other important thing is when you have connected information, you start finding out things that were hard to find out before. Organizational network analysis is a big topic. There are a lot of people in this field. And some of the very simple things you can get out of it fairly quickly are three types of people.

First, you’ll find core connectors, the red figures in the image below. Core connectors are highly collaborative people. They work well within teams and across teams. They make sure that your organization stays together and that knowledge flows through the organization. They are very, very key people.

If you remove enough of them, you’ll find that your communication breaks down, so it’s very important to make sure that this does not happen. You need to keep a good balance of knowledge flowing in your organization.

Bridges, core connectors and peripheral people in an organization

The second type of person is a bridge (indigo in the image above), who connects two disparate sets of people or sub-organizations. There are many organizations that are still fairly siloed and you’ll always find one or two people who act as communicators between these silos. It’s good because at least you have some communication between them. It’s bad because this person becomes a bottleneck.

The same actually applies to co-connectors, who are usually high-performance individuals. It’s good, and you want to encourage this, but you also want to know who those people are because over time base fatigue sets in. There are well-known studies that show high-performance people experience burn-out at some point in time. There is only so much that they can coordinate and maintain relations with other people in the organization. You want to identify these people, and you want to move them around.

And the last are the peripherals (grey in the image above), the people who sit on the edge. They’re probably very good, but they’re also not very good team players, so you want to make sure that they stay on the edge, where they contribute as best they can. If you put a non-team player into a team and force them to be a team player, it almost never works.

Nobody will be happy: not the team, and not the peripheral. So it’s very important to find this out, and it’s easy to do so because once you have your social networks and you’re analyzing connections, you get this for free.

Bringing It All Together with Graphs

Now I would like to quickly revisit people management. We spoke about these four products within the people management suite.

People management revisted as a graph data model

Sitting in individual silos, you’re applying graphs to each of them, but the real value is not in each silo. The real value is when you bring them all together.

Recruiting is impacted by learning. It’s impacted by talent attractors in your company. It’s impacted by what you’re looking for, the skills you need. Talent is also impacted by the learning you provide to your employees. It’s also impacted by the quality of people you hire.

If you dilute the culture of your company, you’ll find that the people that started that culture will leave. Performance is extremely important because it links to everything else. It cannot be that standard 360-degree, rate yourself against organizational goals.

Human capital management graph data model overview

When you connect everything – and this is a very small snapshot of everything – you have people and the things they do in your company. They write blogs, they write articles, they earn positions, they have skills, they are part of teams, they provide feedback to other people within teams across projects.

You have sentiment within teams across projects, experts in your company, knowledge sharing. So you want to find out who is contributing content and who are your engaged employees. Engaged employees are productive. Disengaged employees tend to become dangerous employees. You want to engage them, or let them go, or you want to fix your company. One of the three.

Many companies have studies to show that they have run into losses purely because knowledge was not shared at the right time with the right people. You want to enhance knowledge sharing. You want to make sure that the people who share knowledge continue to share knowledge. And you want to enrich all the data you have in your company with other complementing data. Learning to talent, talent to performance, performance to recruiting, and so on.

To summarize, people in organizations naturally fit into graphs. Your organization is a graph, there’s no denying that. All your information lies in the relationships.

Graphs open up unexplored avenues in these relationships. The moment you put your data into a graph, you will immediately find many things that you can do that you hadn’t even thought of before. Finding cohesive teams, detecting communities, finding influences, finding talent attractors are all classic graph problems.

Inspired by Luanne’s talk?
Click below to register for the North America GraphTour happening in a city near you all across the U.S. and Canada – and connect with leading graph experts from around the globe.

Get My Ticket to the GraphTour

The post Neo4j as a Critical Aspect of Human Capital Management (HCM) appeared first on Neo4j Graph Database Platform.

↧

Meet SemSpect: A Different Approach to Graph Visualization [Community Post]

April 12, 2018, 12:00 am

≫ Next: Your Open Invitation to Publish with the Neo4j Community on Medium

≪ Previous: Neo4j as a Critical Aspect of Human Capital Management (HCM)

[As community content, this post reflects the views and opinions of the particular author and does not necessarily reflect the official stance of Neo4j.]

Understanding large graphs is challenging. Sure, a proper Cypher query can retrieve valuable information. But how do you find the pivotal queries when the structure of your graph is not known? In this post, I discuss SemSpect – a tool that makes use of a visualization paradigm that allows you to ad hoc visualize and interactively query large graphs to understand, analyze and track your graph data as a whole.

Given a large property graph, how do you gain meaningful insights from it?

For instance, what groups of nodes relate to each other? Are there any characteristics in the network or unexpected connections?

Exploring such patterns can help you realize the overall graph structure and to discover anomalies in the data. Trying to invent Cypher queries to make all those patterns explicit is not always a reasonable solution.

Fortunately, the Neo4j apoc.meta.* procedures provide some helpful features in this respect. They ship with the optional APOC procedure library available from Neo4j. For instance, to depict the overall structure of a Neo4j graph you can use:

CALL apoc.meta.graph

For the Neo4j dump of the Paradise Papers data from the ICIJ, the result looks as follows:

A Neo4j graph visualization of the Paradise Papers dataset

While already helpful, this graph visualization is just a static rendering and does not expose any relationships to nodes of the underlying original graph. Furthermore, one can imagine that this meta graph may be itself confusing in case of more diverse node labels or relationships.

Overview: Details on Demand

According to our experience with business-critical graphs, an effective graph dataset needs data-driven exploration and data-sensitive visualization to make sense of large graphs.

Our SemSpect tool aims at enabling even domain and query novices to carry out sophisticated graph research by interacting with a visual representation of the data network.

This data visualization approach is different from commonly known property graph renderings. SemSpect groups nodes by their label and aggregates relationships between groups unless the user asks for details. That difference is key to keeping user orientation and information for large graphs.

Let’s see how this works by playing with the previously mentioned Paradise Papers: consider if a user selects the Officer group as the root of an exploration query (see the image below).

SemSpect depicts this group as a labeled circle showing its number of nodes in the center. The tool guides the user by offering data-driven choices for expanding the exploration graph with the help of a menu to choose a group (called a category in SemSpect) and a relationship for instant exploration.

The expansion choice above will result in an exploration graph – depicted as a tree, spanning from a root group from left to right – showing all officers and those entities to which there is a OFFICER_OF relationship.

As mentioned before, SemSpect aggregates nodes and individual relationships for clarity and comprehensibility. Only when the overall number of nodes of a group is below an adjustable threshold, nodes are shown as gray dots within a group just as displayed for the 39 underlying intermediaries of all officers below.

Learn all about SemSpect, a new tool for intuitive graph visualization compatible with Neo4j

A number in a particular node indicates the number of related nodes in the preceding group. When selecting a node, its property keys are shown in a dossier and its direct or indirect related nodes in other groups are highlighted (when visible).

Connecting the Dots of a Graph

A tabular view lists details of nodes on demand as shown in the screenshot below.

To create a custom group of Officers from Monaco we just need to open the tabular view for Officers (1) and search for “Monaco” in the countries column (2). The resulting selection can be applied as a filter with one click (3). As a consequence of filtering the Officer group, all other depending groups in the exploration graph are adapted accordingly.

The Officers from Monaco can now be named and saved as a custom group. There are many more features in SemSpect such as selective filter propagation, reporting, etc., so I’ll have to elaborate in a follow-up blog post.

Fairly complex queries can be built by successively exploring groups or nodes and interactive filtering. Clearly, the query expressivity of SemSpect does not cover all of Cypher. Instead, its specific strength lies in the data-driven guidance while exploring and intuitive filtering options for querying the graph without learning any query syntax.

For those who often poke around in the dark with their Cypher queries, SemSpect is a great tool to explore their graph data, to answer complex queries and to find data quality issues.

If you want to try it by your own for the Offshore Leaks just jump to http://offshore-leaks.semspect.de.

The Technology Underneath

SemSpect has a Web UI based on HTML5/JavaScript. The Java backend incorporates GraphScale, a technology that can inject reasoning to graph stores such as Neo4j as I briefly introduced in a previous blog post.

This implies that SemSpect can draw on full RDFS and OWL 2 RL reasoning capabilities. However, RDF-based data is not a requirement. We are currently adapting SemSpect such that it can be applied directly to virtually any Neo4j graph database. In such a case, the graph abstraction computed by GraphScale is used as the key index for graph exploration and filtering.

Want to learn more about graph databases and Neo4j? Click below to register for our online training class, Introduction to Graph Databases and master the world of graph technology in no time.

Sign Me Up

The post Meet SemSpect: A Different Approach to Graph Visualization [Community Post] appeared first on Neo4j Graph Database Platform.

↧

Your Open Invitation to Publish with the Neo4j Community on Medium

April 13, 2018, 8:13 am

≫ Next: This Week in Neo4j – Medium, GraphTour, GraphQL, Survey, Swag

≪ Previous: Meet SemSpect: A Different Approach to Graph Visualization [Community Post]

Learn how you're invited to blog with the rest of the Neo4j community in our new Medium publication

Hello everyone,

We, the developer relations team at Neo4j, are always looking for new ways to support the Neo4j developer community. We are starting a Medium publication around Neo4j-related topics to not only share lessons learned as well as tips and tricks, but also to encourage every one of you to contribute and share as well.

How You Can Get Involved

So if you have written a Neo4j or graph-database-related article on Medium in the past or are planning to publish one in the future, reach out to us at devrel@neo4j.com, and we can add you as a writer to our Medium publication and consider your stories for additions, allowing you to educate a larger group of people.

Even if you aren’t a blogger or writer on Medium, you can still follow our official Neo4j account Medium and follow the new Neo4j publication.

What’s Ahead for Neo4j on Medium

Please let us also know via comments what kind of content you’re interested in most, so we can provide the types of articles that you need.

For weekly updates from the Neo4j community, check out “This Week in Neo4j” which we are considering to publish here as well to have all the developer updates in one place.

If you have quick questions – technical or otherwise – be sure to join the neo4j-users Slack and ask in one of the channels there.

Happy building & writing,

Michael Hunger for the Neo4j team

New to graph technology?

Grab yourself a free copy of the Graph Databases for Beginners ebook and get an easy-to-understand guide to the basics of graph database technology – no previous background required.

Get My Copy

The post Your Open Invitation to Publish with the Neo4j Community on Medium appeared first on Neo4j Graph Database Platform.

↧

This Week in Neo4j – Medium, GraphTour, GraphQL, Survey, Swag

April 14, 2018, 7:27 am

≫ Next: Graph Algorithms in Neo4j: Streamline Data Discoveries with Graph Analytics

≪ Previous: Your Open Invitation to Publish with the Neo4j Community on Medium

Welcome to this week in Neo4j where we round up what’s been happening in the world of graph databases in the last 7 days. As my colleague Mark Needham is still on his well earned vacation, I’m filling in this week.

Featured Community Member: Martin Preusse

Martin Preusse is a cell biology researcher and integrator working in Munich.

Martin is working at the Helmholtz Institute of Computational Biology and runs his own startup, Knowing Health which focuses on biological data integration to build an universal “cell map”.

Martin Preusse – This Week’s Featured Community Member

Martin has been promoting and teaching the use of graphs in life-sciences for a long time. Most of his work is dealing with large graphs with hundreds of millions of elements capturing the intrinsic relationships between DNA, RNA, Proteins and their creation, mutation and use in biological pathways.

He gave several meetup and conference talks and represented Neo4j at healthcare hackathons. Martin presented at and helped ourganize our Graphs in Life Sciences Workshop in Berlin.

On behalf of the Neo4j community, thanks for all your work Martin and good luck for your research & work.

GraphTour 2018

This week in Milano, Italy, Graph Tour EMEA finished with a great event with impressive presentations and interesting discussions.

We really enjoyed meeting so many of you during the tour and want to thank everyone involved making the events around Graph Tour so successful.

For those of you in North America, it’s only getting started, with the first event in DC this last week, the upcoming stops of GraphTour are:

May 2nd, San Francisco
May 3rd, Toronto
May 8th, Boston
July 12th Seattle
July 19th Chicago

You’ll find members of our developer relations team (Jennifer, Karin, Ryan, Will) at any of these.

From our Team: GraphGear, Survey, Medium Publication, CLI, Visualization

This week we launched the long awaited Graph Gear Store where you can order Neo4j Swag to your heart’s contempt.

Thanks to everyone who answered our developer survey. We were really thrilled by the postitive response and suggestions for improvements. So far already 400 of you have claimed their $10 discount for the new swag store.

To make it easier for everyone to publish and promote articles around graphs and Neo4j, we launched a Neo4j Publication on Medium.

We already have 15 interesting stories for our more than 6000 followers there. If you want to be considered as a contributor and contribute articles, please drop us an email to devrel@neo4j.com. Feel free to submit your existing Medium posts but especially new ones.

Something to share with your friends and colleagues that wanted to try out Neo4j: Jennifer Reif wrote a quick 30-minute guide on how to get started.

After coming across the open CLI framework (OCLIF) from Heroku, I decided to give it a spin and write a small bolt-shell, read more about it here.

This week we started the article series on Graph Visualisation with Neo4j In the first post we looked at efficient Cypher queries for visualization needs with the Javascript driver. Stay tuned for a weekly update, the next one is from Will on Neovis.js.

GraphQL

Will wrote an article showing how to use the neo4j-graphql plugin that is now available within Neo4j Desktop.
Two weeks ago I missed this really great post by Arne, on Using Neo4j GraphQL with Node.js that was published on the GRANDstack.io blog
A starter kit for Graphql + Neo4j + Node was created by CJ Davis.
Will Lyon published a new version of neo4j-graphql-js. It now includes support for Cypher schema directives on GraphQL Query types.

If you have interesting feedback or use-cases to share for GraphQL and Neo4j, let us know, we’re happy to publish your articles. Also please make sure to send your feedback via the #neo4j-graphql channel in the Neo4j-Users slack or via GitHub issues at the projects in the neo4j-graphql organization.

Articles & More

Chris Betz announced a new Clojure library called neo4j-clj which is built on top of the Java Bolt driver. It has some cool concepts. Check it out!

Thorsten Liebig of Derivo published his second blog post on the visual inference tool Semspect on top of Neo4j.

A really cool experiment is this CMR (Common Metadata Repository) which uses Neo4j as storage for NASA’s EODIS (Earth Observing System Data and Information System). It can be used for recommendations and visualization of metadata related to earthdata.nasa.gov.

The Gavaagi Lexicon is a live representation of term usage in many languages. “Our semantic memories learn language constantly from live data feeds with millions of documents per day from both social and news media.” This project creates a Neo4j Graph from entries of the lexicon.

This library implements a tokenstore in Neo4j for the “passwordless” Express extension that allows building webapps whose users can be authenticated without passwords.

Interviews

You remember Niklas Saers’ articles on Theo and GraphGopher? Now Rik van Bruggen interviewed Niklas in his podcast with interesting insights for Swift and iOS developers.

Rik also interviewed Dilyan Damyanov from Snowplow Analytics about how you can use a graph database for enhancing your event analytics, specifically for clickstream analysis. You might remember the Snowplow articles on that topic.

Next Week

What’s happening next week in the world of graph databases?

Date	Title	Where	Speaker
April 20	Distributed processing of graph data with Neo4j and Apache Spark	DataScienceFest	Iryna Feuerstein
April 20	Discovering the power of graph databases with Python and Neo4j	PyConIT	Fabio Lamanna
April 20	Detecting immigrant communities in cities through the language of Twitter	DataBeersItaly	Fabio Lamanna
April 23	Building Knowledge Graph Using Neo4j	GraphDB Sydney	Joshua Yu
April 19/20	Fundamentals & Modeling Training	Seattle	Michael Kilgore

Date

Title

Where

Speaker

April 20

Distributed processing of graph data with Neo4j and Apache Spark

DataScienceFest

Iryna Feuerstein

April 20

Discovering the power of graph databases with Python and Neo4j

PyConIT

Fabio Lamanna

April 20

Detecting immigrant communities in cities through the language of Twitter

DataBeersItaly

Fabio Lamanna

April 23

Building Knowledge Graph Using Neo4j

GraphDB Sydney

Joshua Yu

April 19/20

Fundamentals & Modeling Training

Seattle

Michael Kilgore

Tweet of the Week

I am fabulous, bitches. #swag @neo4j pic.twitter.com/NothJma7lB
— Boris del Pixel (@PxlPhile) April 13, 2018

Don’t forget to RT if you liked it too.

That’s all for this week. Have a great weekend!

Cheers, Michael

The post This Week in Neo4j – Medium, GraphTour, GraphQL, Survey, Swag appeared first on Neo4j Graph Database Platform.

↧

Graph Algorithms in Neo4j: Streamline Data Discoveries with Graph Analytics

April 16, 2018, 12:38 pm

≫ Next: Graph Technology Is in the POLE Position to Help Law Enforcement [Video]

≪ Previous: This Week in Neo4j – Medium, GraphTour, GraphQL, Survey, Swag

To analyze the billions of relationships in your connected data, you need efficiency and high performance, as well as powerful analytical tools that address a wide variety of graph problems.

Fortunately, graph algorithms are up to the challenge.

Streamline your data discoveries with graph analytics by using the graph algorithms library in Neo4j

In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Last week, we explored how data connections drive future discoveries. This week, we’ll take a closer look at Neo4j’s Graph Analytics platform and put its performance to the test.

The Neo4j Graph Analytics Platform

Neo4j offers a reliable and performant native-graph platform that reveals the value and maintains the integrity of connected data.

First, we delivered the Neo4j graph database, originally used in online transaction processing with exceptionally fast transversals. Then we added advanced, yet practical, graph analytics tools for data scientists and solutions teams.

An overview of the Neo4j graph analytics library

Streamline Your Data Discoveries

We offer a growing, open library of high-performance graph algorithms for Neo4j that are easy to use and optimized for fast results. These algorithms reveal the hidden patterns and structures in your connected data around community detection, centrality and pathways with a core set of tested (at scale) and supported algorithms.

The highly extensible nature of Neo4j enabled the creation of this graph library and exposure as procedures — without making any modification to the Neo4j database.

These algorithms can be called upon as procedures (from our APOC library), and they’re also customizable through a common graph API. This set of advanced, global graph algorithms is simple to apply to existing Neo4j instances so your data scientists, solutions developers and operational teams can all use the same native graph platform.

Neo4j also includes graph projection, an extremely handy feature that places a logical sub-graph into a graph algorithm when your original graph has the wrong shape or granularity for that specific algorithm.

For example, if you’re looking to understand the relationship between drug results for men versus women, but your graph is not partitioned for this, you’ll be able to temporarily project a sub-graph to quickly run your algorithm upon and move on to the next step.

Example: High Performance of Neo4j Graph Algorithms

Neo4j graph algorithms are extremely efficient so you can analyze billions of relationships using common equipment and get your results in seconds to minutes, and in a few hours for the most complicated queries.

The chart below shows how Neo4j’s optimized algorithms yields results up to three times faster than Apache Spark(TM) GraphX for Union-Find (Connected Components) and PageRank on the Twitter-2010 dataset with 1.4 billion relationships.

The Neo4j Graph Platform vs Apache Spark GraphX

Even more impressive, running the Neo4j PageRank algorithm on a significantly larger dataset with 18 billion relationships and 3 billion nodes delivered results in only 1 hour and 45 minutes (using 144 CPUs and 1TB of RAM).

In addition to optimizing the algorithms themselves, we’ve parallelized key areas such as loading and preparing data as well as algorithms like breadth-first search and depth-first search where applicable.

Conclusion

As you can see, using graph algorithms help you surface the hidden connections and actionable insights obscured within your hordes of data, but even more importantly, the right graph algorithms are optimized to keep your computing costs and time investment to a minimum. Those graph algorithms are available to you know via the Neo4j Graph Platform – and they’re waiting to help you with your next data breakthrough.

Next week, we’ll explore specific graph algorithms, describing what they do and how they’re used.

Catch up with the rest of the graph algorithms in Neo4j blog series:

How Connections Drive Discoveries

The post Graph Algorithms in Neo4j: Streamline Data Discoveries with Graph Analytics appeared first on Neo4j Graph Database Platform.

↧

Graph Technology Is in the POLE Position to Help Law Enforcement [Video]

April 19, 2018, 12:00 am

≫ Next: Graph Algorithms in Neo4j: 15 Different Graph Algorithms & What They Do

≪ Previous: Graph Algorithms in Neo4j: Streamline Data Discoveries with Graph Analytics

A great graph database use case that is starting to emerge is the use of graph technology as a way to better support the police and other law enforcement officials, which also has applications for other security and investigative use cases like anti-terrorism, border control, and social services.

The main way practitioners are doing this is leveraging the POLE (Person, Object, Location, Event) data model for working with crime data.

It turns out that the POLE data model is a great fit for graph database technology and graph algorithms, and can be made even more useful by linking it to data visualization front-ends, including the Neo4j Browser and popular tools like Tableau.

A POLE + Graph Data Model Proof of Concept

To understand the potential of using graph technology for POLE investigations, you can explore the results of a proof of concept we produced using some publicly available datasets in this video:

The basic idea we tested out was whether authorities – not just the police, but also social services and other government agencies – could gain useful insights for investigations based on connections when it comes to navigating complex data.

The idea hinges on who knows who. If person X has come to the attention of the authorities for whatever reason, then who else in X’s network might be of interest?

In some cases, that “interest” could be that they are in relationships with people who have connections to other people with criminal records, for example. They could be an ex-offender who the authorities are worried could slip back into unhealthy relationship patterns; they might be family members that could be potentially drawn into trouble or placed at risk.

Graph Technology as a New Way to Help the Public

This kind of complexity is hard to capture and explore through conventional database technologies like an RDBMS, whereas graph databases excel at mining connected data.

We took a sample dataset of street-level crime in Greater Manchester for a certain month last year and cross-connected a number of other data sources, from geotagging data to addresses to randomly generated person information to see how deep a picture of these connections we could generate.

The results: We built a Neo4j graph database of 29,000 crimes in 15,000 locations, generating 106,000 relationships between the nodes.

Employing relationships like lives-with and party-to, we were able to find deep and complex networks of connections, obscure family relationships, social associations, and clusters of people and crimes that seemed suggestive. These insights could then be used to support ongoing criminal investigations or initiate new ones.

Learn how law enforcement officials are effectively using a POLE data model with graph technology

This exercise shows the scale of what graph technology brings to POLE investigations. What happens next will be down to the relevant authorities and agencies.

However, graph database software and crime data are potent combination to better protect the public, enabling data-driven investigations and decision making, and allowing police forces and law enforcement government agencies to intelligently maximize their resources in the face of budget constraints and an ever-evolving landscape of crime and other security threats.

Use graph tech to catch the bad guys:
Read this white paper – Fraud Detection: Discovering Connections with Graph Databases – to discover how Neo4j is used to proactively detect and prevent fraud across multiple use cases.

Get My Copy

The post Graph Technology Is in the POLE Position to Help Law Enforcement [Video] appeared first on Neo4j Graph Database Platform.

↧

Graph Algorithms in Neo4j: 15 Different Graph Algorithms & What They Do

April 23, 2018, 6:22 am

≫ Next: Introducing Neo4j Bloom: Graph Data Visualization for Everyone

≪ Previous: Graph Technology Is in the POLE Position to Help Law Enforcement [Video]

Graph analytics have value only if you have the skills to use them and if they can quickly provide the insights you need. Therefore, the best graph algorithms are easy to use, fast to execute and produce powerful results.

Neo4j includes a growing, open library of high-performance graph algorithms that reveal the hidden patterns and structures in your connected data.

Learn about the 15 most powerful and effective graph algorithms in the Neo4j Graph Platform

In this series on graph algorithms, we’ll discuss the value of graph algorithms and what they can do for you. Last week, we explored how data connections drive future discoveries and how to streamline those data discoveries with graph analytics.

This week, we’ll take a detailed look at the many graph algorithms available in Neo4j and what they do.

15 Graph Algorithms Optimized for the Neo4j Graph Platform

Using Neo4j graph algorithms, you’ll have the means to understand, model and predict complicated dynamics such as the flow of resources or information, the pathways that contagions or network failures spread, and the influences on and resiliency of groups.

And because Neo4j brings together analytics and transaction operations in a native graph platform, you’ll not only uncover the inner nature of real-world systems for new discoveries, but also develop and deploy graph-based solutions faster and have easy-to-use, streamlined workflows. That’s the power of an optimized approach.

Here is a list of the many algorithms that Neo4j uses in its graph analytics platform, along with an explanation of what they do.

Traversal & Pathfinding Graph Algorithms

Traversal & Pathfinding Algorithms

1. Parallel Breadth-First Search (BFS)

What It Does: Traverses a tree data structure by fanning out to explore the nearest neighbors and then their sub-level neighbors. It’s used to locate connections and is a precursor to many other graph algorithms.

BFS is preferred when the tree is less balanced or the target is closer to the starting point. It can also be used to find the shortest path between nodes or avoid the recursive processes of depth-first search.

How It’s Used: Breadth-First Search can be used to locate neighbor nodes in peer-to-peer networks like BitTorrent, GPS systems to pinpoint nearby locations and social network services to find people within a specific distance.

2. Parallel Depth-First Search (DFS)

What It Does: Traverses a tree data structure by exploring as far as possible down each branch before backtracking. It’s used on deeply hierarchical data and is a precursor to many other graph algorithms. Depth-First Search is preferred when the tree is more balanced or the target is closer to an endpoint.

How It’s Used: Depth-First Search is often used in gaming simulations where each choice or action leads to another, expanding into a tree-shaped graph of possibilities. It will traverse the choice tree until it discovers an optimal solution path (e.g., win).

3. Single-Source Shortest Path

What It Does: Calculates a path between a node and all other nodes whose summed value (weight of relationships such as cost, distance, time or capacity) to all other nodes are minimal.

How It’s Used: Single-Source Shortest Path is often applied to automatically obtain directions between physical locations, such as driving directions via Google Maps. It’s also essential in logical routing such as telephone call routing (least-cost routing).

4. All-Pairs Shortest Path

What It Does: Calculates a shortest path forest (group) containing all shortest paths between the nodes in the graph.Commonly used for understanding alternate routing when the shortest route is blocked or becomes sub-optimal.

How It’s Used: All-Pairs Shortest Path is used to evaluate alternate routes for situations such as a freeway backup or network capacity. It’s also key in logical routing to offer multiple paths; for example, call routing alternatives.

5. Minimum Weight Spanning Tree (MWST)

What It Does: Calculates the paths along a connected tree structure with the smallest value (weight of the relationship such as cost, time or capacity) associated with visiting all nodes in the tree. It’s also employed to approximate some NP-hard problems such as the traveling salesman problem and randomized or iterative rounding.

How It’s Used: Minimum Weight Spanning Tree is widely used for network designs: least-cost logical or physical routing such as laying cable, fastest garbage collection routes, capacity for water systems, efficient circuit designs and much more. It also has real-time applications with rolling optimizations such as processes in a chemical refinery or driving route corrections.

Centrality Algorithms

6. PageRank

What It Does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.

How It’s Used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.

PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain-reaction of species death.

7. Degree Centrality

What It Does: Measures the number of relationships a node (or an entire graph) has. It’s broken into indegree (flowing in) and outdegree (flowing out) where relationships are directed.

How It’s Used: Degree Centrality looks at immediate connectedness for uses such as evaluating the near-term risk of a person catching a virus or hearing information. In social studies, indegree of friendship can be used to estimate popularity and outdegree as gregariousness.

8. Closeness Centrality

What It Does: Measures how central a node is to all its neighbors within its cluster. Nodes with the shortest paths to all other nodes are assumed to be able to reach the entire group the fastest.

How It’s Used: Closeness centrality is applicable in a number of resources, communication and behavioral analysis, especially when interaction speed is significant. It has been used to identifying the best location of new public services for maximum accessibility.

In social network analysis, it is used to find people with the ideal social network location for faster dissemination of information.

9. Betweenness Centrality

What It Does: Measures the number of shortest paths (first found with Breadth-First Search) that pass through a node. Nodes that most frequently lie on shortest paths have higher betweenness centrality scores and are the bridges between different clusters. It is often associated with the control over the flow of resources and information.

How It’s Used: Betweenness Centrality applies to a wide range of problems in network science and is used to pinpoint bottlenecks or likely attack targets in communication and transportation networks. In genomics, it has been used to understand the control certain genes have in protein networks for improvements such as better drug- disease targeting.

Betweenness Centrality has also be used to evaluate information flows between multiplayer online gamers and expertise sharing communities of physicians.

community detection, clustering and partitioning graph algorithms

Community Detection Algorithms

This category is also known as clustering algorithms or partitioning algorithms.

10. Label Propagation

What It Does: Spreads labels based on neighborhood majorities as a means of inferring clusters. This extremely fast graph partitioning requires little prior information and is widely used in large-scale networks for community detection. It’s a key method for understanding the organization of a graph and is often a primary step in other analysis.

How It’s Used: Label Propagation has diverse applications from understanding consensus formation in social communities to identifying sets of proteins that are involved together in a process (functional modules) for biochemical networks. It’s also used in semi- and unsupervised machine learning as an initial preprocessing step.

11. Strongly Connected

What It Does: Locates groups of nodes where each node is reachable from every other node in the same group following the direction of relationships. It’s often applied from a depth-first search.

How It’s Used: Strongly Connected is often used to enable running other algorithms independently on an identified cluster. As a preprocessing step for directed graphs, it helps quickly identify disconnected groups. In retail recommendations, it helps identify groups with strong affinities that then are used for suggesting commonly preferred items to those within that group who have not yet purchased the item.

12. Union-Find / Connected Components / Weakly Connected

What It Does: Finds groups of nodes where each node is reachable from any other node in the same group, regardless of the direction of relationships. It provides near constant-time operations (independent of input size) to add new groups, merge existing groups and determine whether two nodes are in the same group

How It’s Used: Union-Find / Connected Components is often used in conjunction with other algorithms, especially for high-performance grouping. As a preprocessing step for undirected graphs, it helps quickly identify disconnected groups.

13. Louvain Modularity

What It Does: Measures the quality (i.e., presumed accuracy) of a community grouping by comparing its relationship density to a suitably defined random network. It’s often used to evaluate the organization of complex networks and community hierarchies in particular. It’s also useful for initial data preprocessing in unsupervised machine learning.

How It’s Used: Louvain is used to evaluate social structures in Twitter, LinkedIn and YouTube. It’s used in fraud analytics to evaluate whether a group has just a few bad behaviors or is acting as a fraud ring that would be indicated by a higher relationship density than average. Louvain revealed a six-level customer hierarchy in a Belgian telecom network.

14. Local Clustering Coefficient / Node Clustering Coefficient

What It Does: For a particular node, it quantifies how close its neighbors are to being a clique (every node is directly connected to every other node). For example, if all your friends knew each other directly, your local clustering coefficient would be 1. Small values for a cluster would indicate that although a grouping exists, the nodes are not tightly connected.

How It’s Used: Local Cluster Coefficient is important for estimating resilience by understanding the likelihood of group coherence or fragmentation. Analysis of a European power grid using this method found that clusters with sparsely connected nodes were more resilient against widespread failures.

15. Triangle-Count and Average Clustering Coefficient

What It Does: Measures how many nodes have triangles and the degree to which nodes tend to cluster together. The average clustering coefficient is 1 when there is a clique, and 0 when there are no connections. For the clustering coefficient to be meaningful, it should be significantly higher than a version of the network where all of the relationships have been shuffled randomly.

How It’s Used: The Average Clustering Coefficient is often used to estimate whether a network might exhibit “small-world” behaviors which are based on tightly knit clusters. It’s also a factor for cluster stability and resiliency. Epidemiologists have used the average clustering coefficient to help predict various infection rates for different communities.

Conclusion

The world is driven by connections. Neo4j graph analytics reveals the meaning of those connections using practical, optimized graph algorithms including the ones detailed above.

This concludes our series on graph algorithms in Neo4j. We hope these algorithms help you make sense of your connected data in more meaningful and effective ways.

Catch up with the rest of the graph algorithms in Neo4j blog series:

How Connections Drive Discoveries

Streamline Data Discoveries with Graph Analytics

The post Graph Algorithms in Neo4j:<br /> 15 Different Graph Algorithms & What They Do appeared first on Neo4j Graph Database Platform.

↧

Introducing Neo4j Bloom: Graph Data Visualization for Everyone

May 2, 2018, 10:21 am

≫ Next: It’s Time for a Single Property Graph Query Language [Vote Now]

≪ Previous: Graph Algorithms in Neo4j: 15 Different Graph Algorithms & What They Do

Today at GraphTour San Francisco, CEO of Neo4j, Inc. Emil Eifrem announced the arrival of an entirely new product being added to the Neo4j Graph Platform: Neo4j Bloom.

Neo4j Bloom is a breakthrough graph communication and data visualization product that allows graph novices and experts alike the ability to communicate and share their work with peers, managers and executives – no matter their technical level.

Its illustrative, codeless search-to-storyboard design makes it the ideal interface for non-technical project participants to share in the innovative work of their graph analytics and development teams.

GPU-accelerated rendering in Neo4j Bloom graph visualization

GPU-accelerated rendering scales to over 100,000 nodes and relationships at once in Neo4j Bloom.

Why Neo4j Bloom

Neo4j Bloom is designed to help traditional Neo4j users communicate with their non-technical peers in a simple manner. Bloom reveals and explains the concepts of data connectedness for people who may not naturally think that way.

Bloom quickly visualizes related node clusters.

Bloom’s goal is to accelerate the occurances of “graph epiphanies” – the realizations around how people, data, devices, systems and activities throughout the enterprise are all connected – regardless of technical skill.

Neo4j Bloom Reveals Connections

As with all graph visualization tools, Neo4j Bloom reveals non-obvious connections and materializes abstract graph ideas and concepts in a tangible way that users can see and navigate.

Bloom visually reveals the value of data relationships and identifies connectedness paths between interesting clusters and nodes. These situations often include:

Identifying the relationship (or hidden path) between individuals
Connecting people to activities, locations, compaines, devices and other objects
Demonstrating to management the innovative impact of graphs
Illustrating the context and paths of graph designs and Cypher queries

What Neo4j Bloom Does

On a high level, Bloom is a codeless search- and keyword-based graph visualization tool. It is fully connected to the Neo4j Graph Platform and allows for both the navigation and editing of graph datasets stored in the Neo4j graph database.

Bloom is a full graph visualizer and editor

Bloom is a full-featured graph visualizer and editor.

What You See on First Launch

When you first launch Neo4j Bloom, you’re presented with a template that offers a pre-built or auto-generated graph schema that defines the initial perspective for your dataset.

Auto-generated templates attempt to identify node categories by reading the data and identifying what makes the most sense. As a result, the template defines the node category color scheme, search phrase suggestions and node icons.

This diagram represents the users’ first Bloom data visualization.

Additional Features of Neo4j Bloom

Bloom gives you the ability to:

Inspect the animated graph by panning and zooming across the visible domain
Snapshot scenes using a screen capture tool and paste to publish
Select a node and toggle to understand properties and adjacent nodes based on its relationships
Edit nodes, relationships and properties
Pick a template and view the metadata perspective of that template against your data
Initiate queries within the search box based on suggestions and template phrases
Advance the scene and choose a new query to execute in the search box
Save your history including “hints” that inform the illustration so Bloom remembers where you left off

Since Bloom knows your metadata node types, relationships and property values, its search functionality offers suggested search phrases to advise you on the structure of your queries. Bloom search also allows you to apply regex operators and logical operators as search filters in addition to specifying parameter values, such as $nodetype. Finally, you can past Cypher queries directly into search.

A graph visualization of the Paradise Papers dataset

This Bloom data visualization maps the Paradise Papers dataset and shows the connections to the tax sheltering firm Appleby.

Where Neo4j Bloom Fits within the Neo4j Graph Platform

We are very excited to announce the release of Neo4j Bloom today – with a slated release date for June 2018 – but this new product is only a part of the other releases happening across the Neo4j Graph Platform this spring, including Neo4j Database 3.4 (more details coming soon).

The introduction of Bloom to the Neo4j Graph Platform

Neo4j Bloom is one of the many parts of the Graph Platform being released in Q3 2018.

In order to run Neo4j Bloom, you’ll need to meet the following prerequisites:

Access to a running instance of Neo4j Enterprise Edition (local or networked)
A licensed Neo4j Desktop instance
The Neo4j database to which Bloom connects must be indexed for the data which will be visualized
A Bloom license attached to that server instance
An input device (keyboard) supported by the Bloom Dialog Box

Conclusion

The whole Neo4j team is proud to announce the upcoming release of Neo4j Bloom as the first entirely separate product that we’ve produced in years. We believe that graph visualization is the logical next step in realizing the vision that (graphs)-[:ARE]->(everywhere).

We hope you enjoy it.

The post Introducing Neo4j Bloom: Graph Data Visualization for Everyone appeared first on Neo4j Graph Database Platform.

↧

It’s Time for a Single Property Graph Query Language [Vote Now]

May 15, 2018, 11:17 am

≫ Next: Neo4j Graph Database 3.4 GA Release: Everything You Need to Know

≪ Previous: Introducing Neo4j Bloom: Graph Data Visualization for Everyone

The time has come to create a single, unified property graph query language.

Different languages for different products help no one. We’ve heard from the graph community that a common query language would be powerful: more developers with transferable expertise; portable queries; solutions that leverage multiple graph options; and less vendor lock-in.

One language, one skill set.

The Property Graph Space Has Grown…a Lot

Property graph technology has a big presence, from Neo4j and SAP HANA to Oracle PGX and Amazon Neptune. An international standard would accelerate the entire graph solution market, to the mutual benefit of all vendors and – more importantly – to all users.

That’s why we are proposing a unified graph query language, GQL (Graph Query Language), that fuses the best of three property graph languages.

Relational Data Has SQL, Property Graphs Need GQL

Although SQL has been fundamental for relational data, we need a declarative query language for the powerful – and distinct – property graph data model to play a similar role.

Like SQL, the new GQL needs to be an industry standard. It should work with SQL but not be confined by SQL. The result would be better choices for developers, data engineers, data scientists, CIOs and CDOs alike.

Right now there are three property graph query languages that are closely related. We have Cypher (from Neo4j and the openCypher community). We have PGQL (from Oracle). And we have G-CORE, a research language proposal from the Linked Data Benchmark Council [LDBC] (co-authored by world-class researchers from the Netherlands, Germany, Chile, the U.S, and technical staff from SAP, Oracle, Capsenta and Neo4j).

The proposed GQL (Graph Query Language) would combine the strengths of Cypher, PGQL & G-CORE into one vendor-neutral and standardized query language for graph solutions, much like SQL is for RDBMS.

Each of these three query languages have similar data models, syntax and semantics. Each has its merits and gaps. Yet, their authors share many ambitions for the next generation of graph querying, such as a composable graph query language with graph construction, views and named graphs; and a pattern-matching facility that extends to regular path queries.

Let Your Voice Be Heard on GQL

The Neo4j team is advocating that the database industry and our users collaborate to define and standardize one language.

Bringing PGQL, G-CORE and Cypher together, we have a running start. Two of them are industrial languages with thousands of users, and combined with the enhancements of a research language, they all share a common heritage of ASCII art patterns to match, merge and create graph models.

What matters most right now is a technically strong standard, with strong backing among vendors and users. So we’re appealing for your vocal support.

Please vote now on whether we should unite to create a standard Graph Query Language (GQL), in the same manner as SQL.

Should the property graph community unite to create a standard Graph Query Language, GQL, alongside SQL?

For more information, you can read the GQL manifesto here and watch for ongoing updates.

–Emil Eifrem, CEO;
Philip Rathle, VP of Products;
Alastair Green, Lead, Query Languages Standards & Research;
for the entire Neo4j team

The post It’s Time for a Single Property Graph Query Language [Vote Now] appeared first on Neo4j Graph Database Platform.

↧

Neo4j Graph Database 3.4 GA Release: Everything You Need to Know

May 17, 2018, 12:00 am

≫ Next: APOC Release for Neo4j 3.4 with Graph Grouping

≪ Previous: It’s Time for a Single Property Graph Query Language [Vote Now]

Author’s note: What a hectic week in the world of Neo4j! In addition to finalizing the delivery of Neo4j 3.4, we simultaneously built the GQL Manifesto, a call to support a common, unified Graph Query Language. Thank you to the graph community for your strong vote of support! If you have not already voted, please do so.

Learn what's new in Neo4j 3.4, including Multi-Clustering, Cypher performance and new data types

The Neo4j graph database has always been the technology closest to the core of our mission: to help the world make sense of data.

With today’s general availability release of Neo4j Graph Database version 3.4, we believe that mission is advanced further than ever before.

The native graph database is the foundation around which the rest of the Neo4j Graph Platform is built, and we’re proud to be releasing this version that will delight both longstanding community developers and enterprise DBAs alike.

3.4 Features By Edition	Community	Enterprise
Data Types
Date/Time data type
3D Geospatial data types
Performance Improvements
Native String Indexes – up to 5x faster writes
Fast Backups		2x Faster
Enterprise Cypher Runtime up to 70% faster	–
100B+ Bulk Importer		Resumable
Enterprise Scaling & Administration
Multi-Clustering (partition of clusters)	–
Automatic Cache Warming	–
Rolling Upgrades	–
Resumable Copy/restore cluster member	–
New diagnostic metrics and support tools	–
Property Blacklisting	–

Here is a closer look at the release-defining features of Neo4j Database 3.4:

Multi-Clustering

Multi-Clustering is the flagship feature of Neo4j Database 3.4, advancing the Graph Platform in scale, expanded uses and performance.

With Multi-Clustering, you can create and manage multiple domain-specific database clusters, effectively partitioning the graph into independent parts. We view this as a step in our march toward fully-sharded horizontal scaling of graph data.

Multi-Clustering can be used to logically partition graphs; create highly-available, large-scale multi-tenant SaaS systems; or oversee multiple graph implementations across the enterprise. For example, Multi-Clustering is perfect for building GDPR-compliant data lineage systems by country, or segmenting a graph database according to product line or division.

Directory Service

Multi-Clustering comes with a new directory service that manages a routing table of locations for each named database cluster. The directory service lives within lower levels of Bolt drivers at the same level as cluster load balancing and routing logic, all of which saves developers innumerable headaches.

Multi-Clustering Scalability Use Cases & Strategies

Here are just a few scalability use cases of Multi-Clustering we initially imagined (we’re sure you’ll surprise us with even more):

1. Physical Graph Partitioning

For the horizontal scaling of databases with logically distinct graphs, Multi-Clustering can be used to adopt a strategy of physical graph partitioning.

Physical graph partitioning might include naming and managing graphs according to geography (e.g., country), customer ID, products, use cases, versions, or data center as individual clustered instances. Or, you could use this approach for the creation and storage of multiple analytic graphs derived from graph-based analysis.

Physical graph partitioning is a cloud-friendly model, especially considering server-to-server encryption, multi-data center or zone support in conjunction with the above-mentioned strategies.

2. Cluster-Based Multi-Tenancy

Using Multi-Clustering for a cluster-based multi-tenant strategy allows you to define baseline schemas and data templates independent of a given tenant. You can also name graph data according to tenant ID and route it accordingly.

This strategy allows SaaS providers to deploy tenants as triplets of cloud instances that both separates individual customer data and provides high availability and customer-centric security – all without disturbing the top-level behavior and operation of the application or service.

3. Multi-Graph Operations within the Enterprise

Finally, Multi-Clustering can be used to combine oversight of use-specific graphs within an enterprise organization, such as metadata, GDPR compliance services, identity management, network topology management and Customer 360 experience data.

New Data Types

Neo4j Database introduces two brand-new data types: date/time data and three-dimensional geospatial data. These new data types enable optimized Cypher queries for searches across time or space.

Temporal Data in Neo4j

The introduction of date/time data expands graph-based thinking into other types of temporal (time-situated) logic and queries that matches modern research happening at leading universities across the globe. Temporal data is also important for Internet of Things (IoT) use cases, versioning and other changes-over-time implementations.

With the new date/time data types, you can more easily tap into a variety of use cases, such as:

Time trees
Change logs
Temporal incentives (“Offer this coupon until this date.”)
Complements to spatial queries (“Optimize route based on commute hour.”)

The new date/time data includes a variety of formats and conforms to a familiar SQL-like model.

3D Spatial Data in Neo4j

In addition to traditional latitude and longitude, the new geospatial data types in Neo4j also include Cartesian coordinates (x, y, z), radial distances, altitude, depth and slope.

Neo4j Database 3.4 now supports three-dimensional geospatial search as a data type and in Cypher queries.

These new data types greatly expand the types of searches and use cases for graph data, including location-based searches (“Find me a coffee shop within 100 meters”) and 3D routing requests (“Route the delivery to the 3rd floor”).

Another example: Using these new data formats, you could build a real-time bike-messenger delivery system that could not only locate addresses, but also specify time of delivery and elevation changes for the rider.

A three-dimensional geospatial search in Neo4j 3.4

Another 3D geospatial search example: Recommend a shirt available in a multi-floor store close by in the men’s department. In Neo4j 3.4, Cypher queries now support the data types necessary to complete such searches and recommendations.

Performance Improvements

Neo4j 3.4 is faster in terms of both reads and writes, and these overall performance improvements are proportionally reflected in both Community and Enterprise Editions (with some differences).

The 3.4 release removes multiple layers of APIs between the kernel, interpreters and compilers, producing impressive performance improvements that other graph-layered products will find challenging to reproduce.

Blazing-Fast Writes

Writes are now up to 5x faster for nodes with indexed string properties, thanks to native string indexes and lessening dependence on third-party libraries.
A new kernel API streamlines internal instructions.
Bulk imports can handle over a 100 billion nodes and relationships.
Transaction states consume less memory thanks to various efficiency improvements (including native indexing) working together.

Writes with Native String Indexes

Native string indexes offer a 500% improvement in Neo4j 3.4

Writes are now up to 5x faster for nodes with indexed string properties, thanks to native string indexes. This reduces Neo4j’s dependency on the popular external indexing library Lucene, and gives Neo4j finer-grained control over index response times.

Speedy Reads

Internal testing shows that Cypher runtime is 20% faster than for Neo4j 3.3 Community Edition and Cypher runtime is 50-70% faster than Neo4j 3.3 Enterprise Edition.

Improved Cypher runtime read performance in Neo4j 3.4

Internal testing shows that Cypher runtime is 50-70% faster in Neo4j 3.4 Enterprise Edition than in Neo4j 3.3 Enterprise Edition.

New Administrator Features

Database administrators, DevOps and other support staff have had an important voice in strengthening Neo4j both in the past and in the 3.4 release. Some of the key highlights include:

Hot backups are now twice as fast as in previous releases.
After restart or restore, active cache warming now automatically warms the page cache to its previous operational state, getting servers back online in record time. This active warm-up exercise also cascades to Read Replicas within that Causal Cluster. The effect is that applications enjoy the peak operational responsiveness – immediately.
A new diagnostic utility (dump tool) improves the speed and accuracy of collaboration support cases between customers and Neo4j Support.

Cluster Member Management

Data store copy and catch-up features to enable a new empty instance to join a cluster and become operational in no time. This feature adds full transaction history as well as bulk-load historic data and transaction leftovers.
Catch-up functions can be stopped and resumed, and also include ongoing raft log updates to complete making a new instance fully armed and operational.

Rolling Upgrades

Rolling upgrades allow for updating older instances while keeping other members stable and without requiring a restart of the environment.
All new patch, minor and major versions will support rolling upgrades starting from Neo4j 3.4.
Rolling upgrades will operate with both read-only and read/write instances.

Neo4j 3.4 now supports rolling upgrades so you can update older instances while keeping other members stable and without requiring a restart of the environment.

Database Security Advancements

As with past releases, the Neo4j Database 3.4 release continues to robustly uphold modern database security principles, often not available in competing graph stores or other NoSQL databases.

Our current database security features include:

User- and role-based security within the database
LDAP and AD directory integration
Kerberos authentication (ticket-based)
HTTPS access to all user-facing interfaces
TLS encrypted traffic among cluster routing, cluster members, including through Bolt application drivers and across data centers
Encrypted data at rest via file-system encryption

With Neo4j 3.4, administrators can now implement property blacklisting by name or role, securing property visibility. This feature is similar to SQL-centric column level security without impacting performance.

The Neo4j 3.4 graph database allows for property-level blacklisting and security

With Neo4j 3.4, administrators can now implement property security by name, blacklisting properties for users.

Conclusion

As the core of the Neo4j Graph Platform, this 3.4 release of the Neo4j Graph Database indicates an upgrade for the entire platform that relies on it. We’re confident that the upgrades in Neo4j 3.4 will deliver stunning spillover results into all of the new products and features of the Graph Platform as they roll out later this year.

What's new in the Neo4j Graph Platform with the Neo4j 3.4 release

Neo4j Database 3.4 is just one of many recent or soon-to-be-released upgrades to the Neo4j Graph Platform.

We encourage you to download Neo4j Database 3.4 and try it out for yourself – whether as part of Neo4j Desktop or as part of your Enterprise Edition license.

While we know some of the most clear and obvious ways that this release will help you harness your data connections, we’re even more excited to hear how our millions of users worldwide will use these new features to build applications beyond the limits of our wildest imagination.

For all of the Neo4j team,

–Philip Rathle

Start pushing boundaries today:
Download Neo4j Desktop right now and see for yourself what you can do with the leading platform for connected data.

Download Neo4j 3.4

The post Neo4j Graph Database 3.4 GA Release: Everything You Need to Know appeared first on Neo4j Graph Database Platform.

↧

APOC Release for Neo4j 3.4 with Graph Grouping

May 18, 2018, 5:37 am

≫ Next: Neo4j 3.4 Release Highlights in Less Than 8 Minutes [Video]

≪ Previous: Neo4j Graph Database 3.4 GA Release: Everything You Need to Know

Just in time for the Neo4j 3.4.0 release, we also pushed out two versions of APOC – 3.3.0.3 and 3.4.0.1. You can download them from GitHub, Maven or most conveniently with a single click in Neo4j Desktop.

Please note that the “self-upgrade” process in Neo4j Desktop might leave the previous APOC version in your plugins directory, so you’ll have to remove it yourself if your DB fails to restart after upgrading.

This time, we had to spend much more effort on updating the internals, as Neo4j 3.4 comes with a new SPI (Kernel API) for more efficient interaction with the Cypher runtime.

As APOC uses that SPI in a number of places, we can thank Stefan Armbruster who took on the job of updating all of those.

I also want to thank everyone who contributed or reported back issues or feature requests.

Although this is release is lighter on features, there are two things that made it in that I hope will make all your lives easier.

Load Excel (XLS)

Much of business data is still living in Excel spreadsheets today, because quick computation, summariziation, charts and formatting are very handy. And as I learned a long time ago from Simon Peyton Jones, Excel’s expression language is the most widely used, immutable functional language in the world.

One other feature that is very useful is to group data that belongs together into several sheets of the same file. The main difference between apoc.load.xls and apoc.load.csv is that you can address individual sheets or even regions.

We use the Apache POI library to read Excel, but as I didn’t want to grow APOC by many megabytes, you’ll have to add these dependencies yourself if you want to use Excel loading.

They are linked in the documentation for both procedures. Those two also come with a number of other cool features like:

Provide a line number
Provide both a map and a list representation of each line
Automatic data conversion (including split into arrays)
Option to keep the original string formatted values
Ignoring fields (makes it easier to assign a full line as properties)
Headerless files
Replacing certain values with null

So if you have a sheet like this (below), you cannot just access the individual Sheets, but also a region as shown here. The name of the sheet is Offset so don’t get confused

CALL apoc.load.xls('file:///path/to/file.xls','Offset!B2:F3',
  {mapping:{Integer:{type:'int'}, Array:{type:'int',array:true,arraySep:';'}}})

Resulting in:

String Boolean Integer Float Array

String	Boolean	Integer	Float	Array
`"Test"`	`true`	`2`	`1.5`	`[1,2,3]`

"Test"

true

2

1.5

[1,2,3]

Graph Grouping

The other feature that I’m really happy about is graph grouping. Quite some time ago, Martin Junghanns told me about the graph operators in Gradoop, one of which is graph grouping.

I found this concept to be a really cool and useful idea, and I implemented a first version in APOC a while ago.

This is a way of summarizing a graph by grouping nodes by one or more properties, resulting in virtual nodes that represent these groups.
Then for each of the virtual nodes, all of the relationships between each group are aggregated too.
And you can provide additional aggregation functions for both nodes and relationships besides just counting them (e.g., sum of values or min/max of timestamps, that also turn into properties of the aggregated graph entities).

The documentation for this procedure also details all the command line and configuration options, such as skipping orphans or post-filtering the results.

Here is a quick example:

CALL apoc.nodes.group(['User'],['country','gender'])
YIELD node, relationship RETURN *;

This is especially helpful to get a bird’s eye view of the graph, such as a summarization.

So for example, you can group a User graph by country and gender or a citation-graph by publication year.

As Martin and Max had built a nice JavaScript application demoing that feature, I took it and adapted it to use a Neo4j / APOC backend.

Find the adapted source code in this GitHub repository. It is self-contained and even hosts a running app.

As part of this I also added a number of functions to access node and relationship attributes that also work with virtual nodes and relationships.

Going forward, I want to make it run on graph projections and also improve performance further.

Release Summary

Features/Improvements

apoc.load.xls for loading data from Excel files, supports both .xls and .xlsx
Improvements for apoc.nodes.group, e.g., filtering of rel-types or of outputs by counts
Accessor functions for (virtual) entities (e.g., to postfilter them by property or label)
Dijkstra algorithm supporting multiple results
date.format(null) returns null, also add ISO8601 convenience format

Bugfixes

Fix for apoc.periodic.iterate with statements that already started with WITH
Fix for deleted nodes in an explicit index
apoc.cypher.runTimeboxed uses separate thread
Missing Iterator Utils in APOC .jar file
Add missing apoc.coll.combinations()
Check for availability before running sync index update thread

Documentation

Docs for apoc.load.csv and apoc.load.xls
Docs for apoc.group.nodes
Docs for apoc.coll.contains

So please go ahead and try out the new features and update your APOC dependency to the latest version.

You should also make it a habit to learn one new APOC procedure or function each day. There more than are enough for every day of the year

Start with CALL apoc.help('keyword') to not get lost anymore.

Cheers,
Michael

Want to take your Neo4j skills up a notch? Take our online training class, Neo4j in Production, and learn how to scale the world’s leading graph database to unprecedented levels.

Take the Class

The post APOC Release for Neo4j 3.4 with Graph Grouping appeared first on Neo4j Graph Database Platform.

↧

Neo4j 3.4 Release Highlights in Less Than 8 Minutes [Video]

May 21, 2018, 9:34 am

≫ Next: Neo4j ETL 1.2.0 Release: What’s New + Demo

≪ Previous: APOC Release for Neo4j 3.4 with Graph Grouping

Hi everyone,

My name is Ryan Boyd, and I’m on the Developer Relations team here at Neo4j. I want to talk with you today about our latest release Neo4j 3.4.

Overview

In Neo4j 3.4 we’ve made improvements across the entire graph database system, from scalability and performance, to operations, administration and security. We’ve also added several new key features to the Cypher query language, including spatial querying support and date/time types.

Scalability

Let’s talk about the scalability features in Neo4j 3.4.

In this release, we’ve added Multi-Clustering support. This allows your global Internet apps to horizontally partition their graphs by domain, such as country, product, customer or data center.

Now, why might you want to do this? You might want to use this new feature if you have a multi-tenant application that wants to store each customer’s data separately. You might also want to use this because you want to geopartition your data for certain regulatory requirements or if you want enhanced write scaling.

Look at the four clusters shown in the image above. Each of these clusters has a different graph, but they are managed together. They can also be used by a single application with Bolt routing the right data to the right cluster, and the data is kept completely separate.

Read Performance

As with all releases, in Neo4j 3.4 we made a number of improvements to read performance.

If you look at a read benchmark in a mixed workload environment, you can see that from Neo4j 3.2 to 3.3 we improved performance by 10%.

Now, for this release, we spent the last several release cycles working on an entirely new runtime for Neo4j Enterprise Edition. I’m proud to say that in Neo4j 3.4 we’ve made all queries use this new Cypher runtime, and that improves performance by roughly 70% on average.

Write Performance

Write performance is also important.

In our ongoing quest to take writes to the next level, we’ve been hammering away at one component that incurs roughly 80% of all overhead when writing to a graph. Now, what component it is may not so obvious – it’s indexes.

Lucene is fantastic at certain things. It’s awesome at full text for instance. But it turns out to be not so good for ACID writes with individually indexed fields. So we’ve moved from using Lucene as our index provider to using our native Neo4j index.

We’ve actually moved to a native index for our label groupings in 3.2, for numerics in 3.3, and now, with the string support in 3.4 we’ve added a lot of the common property types to the new native index. This is what results in our significantly faster performance on writes.

Our native index is optimized for graphs. Its ACID-compliance allows you fast reads, and as you can see, approximately 10 times faster writes. The image below shows you the write performance for the first 3.4 release candidate when writing strings.

At the point at which we implemented the new native string index, we have approximately a 500% improvement in the overall write performance.

Ops and Admin

We’ve also made a number of improvements around operations and administration of Neo4j in the 3.4 release. Perhaps the most important is rolling upgrades.

Neo4j powers many mission-critical applications, and something many customers have told us is that they want the ability to upgrade their cluster without any planned downtime. This feature enables just that. So if you’re moving from Neo4j 3.4 to the next release, you could do it by upgrading each member in the cluster separately in a rolling fashion.

Neo4j 3.4 also adds auto cache reheating. So let’s say that you normally heat up your cache when your Neo4j server starts. When you restart your server the next time, we’ll automatically handle the reheating of your cache for you.

The performance of backups is also important to many of our customers and they are now two times faster.

Spatial & Date/Time Data

With Neo4j 3.4, we’ve now added the power of searching by spatial queries. Our geospatial graph queries allow you to search in a radius from a particular point and find all of the items that are located within that radius. This is indexed and highly performant.

In addition to supporting the standard X and Y dimensions, we’ve also added support so that you can run your queries in three dimensions. Now, how you might use this is totally up to you.

Think about a query like “Recommend a shirt available in a store close by in the men’s department”. You can take your location and find the different stores. And then, once you’re in a particular store you can use that third dimension support – the Z axis – to find the particular floor and rack where that shirt is available.

In addition to the spatial type, we’ve also added support for date and time operations.

Database Security

We’ve also added a new security feature in this release that focuses on property-level security for keeping private data private.

Property-level security allows you to blacklist certain properties so that users with particular roles are unable to access those properties. In this case, users in Role X are unable to read property A. And users with Role Y are unable to read properties B and C.

Try It Out with the Neo4j Sandbox

For the GA release of Neo4j 3.4, we’ve created a special Neo4j Sandbox. The 3.4 sandbox has a guide that guides you through the new date/time type and spatial querying support.

Watch the video for a quick demo of the new Neo4j Sandbox, or try it out yourself by clicking below.

Explore spatial and time data in Neo4j like never before:
Get started with the download-free Neo4j Sandbox and play with our pre-populated datasets or load your own.

Try Out the Neo4j Sandbox

The post Neo4j 3.4 Release Highlights in Less Than 8 Minutes [Video] appeared first on Neo4j Graph Database Platform.

↧

Neo4j ETL 1.2.0 Release: What’s New + Demo

May 31, 2018, 12:00 am

≫ Next: Are You the GraphConnect 2018 Presenter We’ve Been Looking for? [CFP Is Now Open!]

≪ Previous: Neo4j 3.4 Release Highlights in Less Than 8 Minutes [Video]

In the last seven months since GraphConnect New York, we have worked on enhancing the Neo4j ETL tool adding support for all relational databases with a JDBC driver.

We also did some backend optimizations and made few changes to the UI for Neo4j ETL. With data and databases being messy, we also fixed a number of issues resulting of under-specified data operations.

The tool is now fully integrated with Neo4j Desktop (from version 1.1.3), and a Neo4j ETL activation key will unlock it for you. Please ask your trusted Neo4j contact for one or send an email to devrel@neo4j.com. After adding the key, you can then add Neo4j ETL as an additional graph-app to your projects.

The new release has a number of new features and capabilities that we’ll discuss below. But first, we want to demonstrate the tool in action.

If you’d rather watch it in action, we also recorded a quick demo:

ETL Data Transfer from Microsoft SQL Server

Let’s see how to define a connection to a relational database (for this example I use a Docker instance of Microsoft SQL Server).

Project Selection

After starting the Neo4j ETL tool you select the project you want to work in from the drop-down box.

Connection Setup

On the left sidebar, click on Connections and then set up the database connection.

MySQL and PostgreSQL drivers are bundled, so if you use a different database, then you have to provide a valid JDBC driver (jar-file) via JDBC driver path.

You can also change the suggested connection URL.

Test and Save the MS-SQL WideWorldImporters connection.

Metadata Mapping

Afterwards, we’ll continue on the other sidebar tab (>), a.k.a., Import data from source in the left sidebar and then click on IMPORT DATA.

Now we’ll inspect my relational database metadata and see the resulting data mapping.

Before doing that, you have to choose:

The source, that is the JDBC connection to the relational database and
the Neo4j target instance.

The ETL tool is fully integrated with Neo4j Desktop, so I can see all Neo4j graphs that are defined for a specific project and also the current status (running, stopped, etc.).

The JDBC driver options for data import using Neo4j ETL

From this frame, you can also delete your source connections using the (x) icon on the connection box.

After clicking Start Mapping, the Neo4j ETL tool starts to inspect the relational database and extract the database metadata into a JSON file representing how tables and columns are to be mapped into the Neo4j graph database. The log output is then displayed in the lower part of the frame.

Metadata mapping using the ETL tool for Neo4j

Clicking Next gets us on the next screen, which is the Mapping Editor.

Mapping Editor

The WideWorldImporters database is made of a lot of tables, so the resulting mapping is quite complex. Here I am focusing only on the Country table where I can see all the columns, their data types and how they are converted to Neo4j data types.

Using the Mapping Editor, I can change:

The name of the resulting node,
The name of each property and
The resulting data type

The latter might result in conversion issues so pay attention when doing data type conversions.

Once I have completed all the changes, I can Save Mapping which stores the edits. And then I start to import data from the WideWorldImporters database to the Neo4j instance.

Learn what's new in the 1.2.0 release of the Neo4j ETL tool and see for yourself with this demo

Data Import

In the last frame, you can choose one of four import modes:

The first two modes (neo4j-import, neo4j-shell) are offline (i.e., Neo4j should be stopped before running them) while the others (cypher-shell, direct cypher) are online modes.

If you try to import your data with an online mode but your instance is stopped, you will get an error conversely, and vice versa.

The import will stream results and also report the update statistics.

After the import from the relational database to Neo4j is complete, you can explore your data through the Neo4j Browser. For example, starting from a country (Italy), you can see the (limited) related entities until the third level of relationship.

Neo4j Browser for graph data visualization

New Features and Bugfixes

Multi schema support: the Neo4j ETL tool now can “parse” more than one schema at time.
Additional driver support: The Neo4j ETL tool comes with two embedded JDBC drivers (for MySQL and PostgreSQL), but you can set up an external JDBC driver for Microsoft SQL Server, IBM DB2 and Oracle RDBMS (the list is not limited to these drivers but they are the default ones in the combobox). You can add a jar using the --driver parameter.
The resulting mapping file can now be written to a file without output redirection using --output-mapping-file. The mapping file now is also different for each import, currently the following naming convention is supported: <databasetype>_<databasename>_<schemaname>_mapping.json. In future releases, we want to move this file into the same directory where all the CSV files are created.
Fetch size has been added to a default value of 10000 records. It will be configurable in future releases.
The Neo4j ETL tool has undergone additional testing with Microsoft SQL Server sample databases AdventureWorks and WideWorldImporters and a DB2 sample database in addition to the previous tests.
When importing through Cypher, all fields are now mapped correctly according to their data type. Now the Neo4j ETL tool creates a separate directory for each schema/catalog when writing the CSV files.
Schema names are well-separated from table names: The Neo4j ETL tools doesn’t rely anymore on splitting names with . in order to separate the schema name from the table name. These changes also reflect on the mapping.json file where the name of the schema is explicitly written.
The concept of catalog/schema has been generalized, so when you need to filter what you’re going to inspect use the --schema parameter. If using the UI, you don’t need to think about this.
CSV files are now escaped according to the standard CSV escaping rules. No more backticks appear as a quoting character.
No more numeric overflow when converting numbers from Oracle 11 or older versions.
Constraints are now created correctly according to the information that is retrieved by the SchemaCrawler. Currently there is no support for multi-column primary keys, but we are working on it, and it will be available in the next release.

Neo4j Desktop ETL – UI Updates

UI updates now reflect all the previously listed capabilities (see screenshots above).
Once a database connection is defined, it’s now possible to remove it using a X icon on the top-right corner of the connection box.
Bugfix: No more stdout buffer exceeded error when creating big mapping JSON output from the UI because the way the UI writes the JSON file has been reviewed to handle big JSON files better.

Documentation

The documentation was updated to explain how to setup a Docker container with a MS-SQL sample dataset in addition to new command-line interface options.

Try It Now

Try out the new release via either the command-line tools or the mentioned Neo4j Desktop Graph App.

A big thanks to everyone for giving us feedback and suggestions that help us to improve the Neo4j ETL tool. Please continue to provide feedback by either submitting GitHub issues or joining neo4j.com/slack and asking in the #neo4j-etl channel.

Want in on the awesome world of graph tech?
Click below to get your free copy of the Learning Neo4j ebook and catch up to speed with the world of graph database technology.

Get the Free Book

The post Neo4j ETL 1.2.0 Release: What’s New + Demo appeared first on Neo4j Graph Database Platform.

↧

Are You the GraphConnect 2018 Presenter We’ve Been Looking for? [CFP Is Now Open!]

June 1, 2018, 12:00 am

≫ Next: Matt Casters – the Mind Behind Kettle – Has Joined the Neo4j Team

≪ Previous: Neo4j ETL 1.2.0 Release: What’s New + Demo

Bad news droids: You’re not the presenters we’re looking for.

Learn about the Call for Papers (CFP) at GraphConnect 2018 and submit your talk before July 1st!

However, we’ll need some help finding the perfect presenters for GraphConnect 2018. That’s why the Call for Papers (CFP) is now open!

Wait, What’s GraphConnect?

Connected data isn’t just powering tomorrow’s innovations – it’s changing how we innovate. GraphConnect is the global conference where graph technology innovators share their stories of success, best practices and trail-blazing innovation.

Now in its sixth year, GraphConnect draws graph enthusiasts from a wide array of roles, industries and locales. You’ll have the opportunity to share your story with hundreds of developers, software engineers, data scientists and business executives from all over the globe.

GraphConnect 2018 will be held September 20th and 21st at the Marriott Marquis in New York City. Keynotes and conference sessions are on the 20th, while the 21st is a full day of Neo4j training classes led by the expert engineers.

The Talks We Are Looking for

So, what sort of presentations are we looking for at GraphConnect 2018? Here are just a few topics we have in mind – but think of these as inspirations, not limitations!

Digital transformation
Cloud deployments
Artificial intelligence & machine learning
Design thinking
Graphs at scale
Integrations: API integration, GraphQL, polyglot persistence, etc.

Given any of these topics (including the one in your head right now that isn’t listed above), here are the types of presentations we’re looking for:

Developer best practices, i.e., lessons learned from the front lines of Neo4j production deployments. Share your knowledge, experience and code with other developers so we can all learn together.
Business case studies: How did graph technology transform your business model or bottom line? How did you get sign-off from decision makers? How has the change impacted your operations?
Technical narratives: You had a problem, and you solved it. Show us how you did it! Walk us through your design thinking. Show us your stack. How does graph tech fit in, and why was it the solution?

Wait, But What If I’m Not the Presenter You’re Looking for?

You are. You are the presenter we’re looking for at GraphConnect 2018!

If you’re a first-time, padawan speaker, we’ll help you prepare for your talk by providing guidance on content as well as rehearsals to help with speaking style. We may even be able to set you up to talk at a local meetup to build your confidence.

And if you’re a veteran, jedi-level presenter, be sure to let us know! We can also provide help fine-tuning content and speaking style if you desire.

Bottom line: we welcome everyone – padawan and jedi alike.

Everything Else You Need to Know

GraphConnect presentations are typically 45 minutes long. You also have the option of submitting a lightning talk, which are 15 minutes in length.

Need more inspiration? Check out videos of past presentations at GraphConnect.com.

The Call for Papers (CFP) is open until July 1st, so be sure to submit yours today.

We look forward to reading your proposals!

Do. Or do not. There is no try:
Submit your presentation idea to the GraphConnect 2018 CFP before July 1st – the galaxy is counting on you!

Submit My Talk

The post Are You the GraphConnect 2018 Presenter We’ve Been Looking for? [CFP Is Now Open!] appeared first on Neo4j Graph Database Platform.

↧

Matt Casters – the Mind Behind Kettle – Has Joined the Neo4j Team

June 7, 2018, 12:00 am

≫ Next: What Advice Do *You* Have for a First-Time GraphConnect Presenter? [Submit Your Tips & Tricks]

≪ Previous: Are You the GraphConnect 2018 Presenter We’ve Been Looking for? [CFP Is Now Open!]

Speaking on behalf of the Neo4j team, I am excited to announce that Matt Casters has joined Neo4j, Inc. Matt has a sterling reputation in the open source community for his work in leading the Kettle project.

Matt and I also worked together at Pentaho, so I was more than happy to sit down and interview him when he recently joined the team.

For readers who don’t know you, could you give us a little background?

Matt Casters: Sure. I’m originally from Flanders, Belgium. I come from a background of system management and databases, originally working as an Oracle DBA. After that, I got into data integration and started work on Kettle.

The Kettle project was open sourced in 2005, and shortly thereafter joined the Pentaho project. In 2015, Hitachi Data Systems bought Pentaho, and I transitioned a lot of the Kettle development work to the HDS team, which allowed me to pursue other projects.

What is Kettle? Does it connect to Neo4j?

Matt: Kettle – a.k.a. Pentaho Data Integration – is an ETL [Extract, Transform, Load] engine that also connects to other applications that allow the user to define data integration jobs and other data transformations.

For Neo4j developers, you can load your RDBMS or other NoSQL data into a graph database for further connected data analysis – or you can read data from your Neo4j instance via Kettle and then send that data to relational reporting and data visualization tools like Tableau.

There’s a lot of other possibilities too between Neo4j and Kettle, including workflow management, data lineage, central metadata, MDM, data quality and more.

Right now you can download Neo4j plugins from the Pentaho marketplace (just search for “Neo4j”) with more plugins and integrations to follow soon.

Learn about why Matt Casters has joined the Neo4j team and what that means for the Kettle community

What made you want to join the Neo4j team?

Matt: Neo4j is quite successful, and I’ve known it for quite a while. It is a nice team of people and very open source minded. I think my new position at Neo4j will give me a real opportunity to make a difference. The Neo4j team has lots of expertise, so I think it’s a big win-win.

I’m excited to continue to work with the Kettle community out there and now the Neo4j community as well.

And what will you be doing in your new role at Neo4j?

Matt: As the Chief Solutions Architect, I’ll be building a solutions integration architecture for Neo4j.

My role will be part of the Neo4j Solutions team. Ultimately, by using Kettle, we want to accelerate deployment by streamlining integration and make best practices more easily repeatable across different Neo4j projects.

That’s a broad goal, but I think a few incarnations of that include rounding out the Graph Platform, creating more data integration choices for Neo4j, and streamlining the data import process so that customers can widen their graph analysis with less effort.

What are you most excited about joining the Neo4j team?

Matt: I think I’m most excited for the growth potential of the technology.

For instance, we can do incredible things with the graph engine. It’s not just like any database out there; it’s very different. Integrating the world of Kettle with Neo4j will be a killer combination.

How can Neo4j community members get to know you better?

Matt: Online, I blog at http://www.ibridge.be/, and I’m also on GitHub and LinkedIn. Follow me on Twitter (@mattcasters) to see which events I’ll be attending or speaking at next.

I’m looking forward to being a part of the Neo4j community!

New to graph technology?
Grab yourself a free copy of the Graph Databases for Beginners ebook and get an easy-to-understand guide to the basics of graph database technology – no previous background required.

Get My Copy

The post Matt Casters – the Mind Behind Kettle – Has Joined the Neo4j Team appeared first on Neo4j Graph Database Platform.

↧

What Advice Do You Have for a First-Time GraphConnect Presenter? [Submit Your Tips & Tricks]

June 8, 2018, 12:00 am

≫ Next: The ROI on Connected Data: The Overlooked Value of Context for Business Insights [+ Airbnb Case Study]

≪ Previous: Matt Casters – the Mind Behind Kettle – Has Joined the Neo4j Team

Calling all conference speakers (and seasoned audience members)!

In case you missed the news, we recently opened up the Call for Papers (CFP) for GraphConnect 2018. In our effort to make all presenters feel welcome and to help them reach their full potential, we want to put together a list of helpful tips and tricks for prospective speakers.

But, we need your help!

Let us know what advice you'd like to share with first-time conference presenters at GraphConnect

What Makes a Conference Presentation Great?

We want to hear from you: What are some things that make talks GREAT? What are some of your best practices to make a proposal stand out from the crowd?

Even if you’re not a jedi-level presenter but just a veteran attendee of tech conferences and you’ve never presented, we still would love to know:

What makes a speaker more engaging?
What have past presenters done that made your experience more memorable or fun?
What would like you like more speakers to do in their presentations that they aren’t (mostly) doing now?

Tell Us What You Think and We’ll Share the Best Ideas

Think you know what makes a presentation great?

Let us know on Twitter by tweeting to @GraphConnect or fill out this handy Google form below.

We’ll collect ideas until next Thursday (June 14th) and then we’ll publish the best ideas – as many as we can – next Friday (June 15th).

We’re excited to hear your ideas and share them with future GraphConnect speakers and the rest of the Neo4j community. Check back for the final list next week!

Want to share your Neo4j story at GraphConnect?
Submit your presentation idea to the GraphConnect 2018 CFP before July 1st – and we hope to see you on the stage this autumn in New York City!

Let’s Do This

The post What Advice Do *You* Have for a First-Time GraphConnect Presenter? [Submit Your Tips & Tricks] appeared first on Neo4j Graph Database Platform.

↧

The Backstory

Introducing the Der Hunger Query Optimizer

How Can You Use the New Optimizer?

Der Hunger’s Exclusive Beta Release This Spring

How the Neo4j Community Stepped Up

Our Biggest Thank You!

Algorithms: The Graph Analysis Powerhouse

A Practical Approach to Graph Analytics

Example: Analyzing Category Influence in Wikipedia

Conclusion

Introducing the GraphGear Store

4 Tips on How to Get Neo4j Swag for Free

What Are You Waiting For?

Presentation Summary

Full Presentation: Neo4j as a Critical Aspect in Human Capital Management

The Evolution of Human Capital Management

Teams and Graphs

Talent Management Analytics

Recruiting and Onboarding

Learning

Performance Evaluations

Talent Management

Finding Hidden Potential

People Analytics

Networks of Teams

Organizational Network Analysis

Bringing It All Together with Graphs

Overview: Details on Demand

Connecting the Dots of a Graph

The Technology Underneath

How You Can Get Involved

What’s Ahead for Neo4j on Medium

Featured Community Member: Martin Preusse

GraphTour 2018

From our Team: GraphGear, Survey, Medium Publication, CLI, Visualization

GraphQL

Articles & More

Interviews

Next Week

Tweet of the Week

The Neo4j Graph Analytics Platform

Streamline Your Data Discoveries

Example: High Performance of Neo4j Graph Algorithms

Conclusion

A POLE + Graph Data Model Proof of Concept

Graph Technology as a New Way to Help the Public

15 Graph Algorithms Optimized for the Neo4j Graph Platform

Traversal & Pathfinding Algorithms

1. Parallel Breadth-First Search (BFS)

2. Parallel Depth-First Search (DFS)

3. Single-Source Shortest Path

4. All-Pairs Shortest Path

5. Minimum Weight Spanning Tree (MWST)

Centrality Algorithms

6. PageRank

7. Degree Centrality

8. Closeness Centrality

9. Betweenness Centrality

Community Detection Algorithms

10. Label Propagation

11. Strongly Connected

12. Union-Find / Connected Components / Weakly Connected

13. Louvain Modularity

14. Local Clustering Coefficient / Node Clustering Coefficient

15. Triangle-Count and Average Clustering Coefficient

Conclusion

Why Neo4j Bloom

Neo4j Bloom Reveals Connections

What Neo4j Bloom Does

What You See on First Launch

Additional Features of Neo4j Bloom

Where Neo4j Bloom Fits within the Neo4j Graph Platform

Conclusion

The Property Graph Space Has Grown…a Lot

Relational Data Has SQL, Property Graphs Need GQL

Let Your Voice Be Heard on GQL

3.4 Features By Edition

Community

Enterprise

Multi-Clustering

The Talks We Are Looking for