Skip to content
This repository was archived by the owner on Jun 9, 2020. It is now read-only.
Michael Hunger edited this page Mar 18, 2011 · 1 revision

Tutorial Spring Data Graph

Allow me to introduce - Cineasts.net

Once upon a time I wanted to build a social movie database myself. First things first - I had a name: "Cineasts" - the people crazy about movies. So I went ahead and bought the domain, cineasts.net. So, the project was almost done.

I had some ideas as well. Of course there should be Actors who play Roles in Movies. I needed the Cineast, too, someone had to rate the movies after all. And while they were there, they could also make friends. Find someone to accompany them to the cinema or share movie preferences. Even better, the engine behind all that could recommend new friends and movies to them, derived from their interests and existing friends.

I looked for possible sources for data, IMDB was my first stop, but they charge 15k for data usage. Fortunately I found themoviedb.org which has liberal terms and conditions and a nice API for fetching the data.

There were many more ideas but I wanted to get something done over the course of one day. So this was the scope I was going to tackle.

Scope: Spring

Being a Spring Developer, I would, of course, choose components of the Spring Framework to do most of the work. I’d already come up with the ideas - that should be enough.

What database would fit both the complex network of cineasts, movies, actors, roles, ratings and friends? And also be able to support the recommendation algorithms that I thought of? I had no idea. But, wait, there was the new Spring Data project that started in 2010 bringing the convenience of the Spring programming model to NoSQL databases. That should fit my experience and help me getting started. I looked at the list of projects supporting the different NoSQL databases. Only one mentioned the kind of social network I was thinking of - Spring Data Graph for Neo4j, a graph database. Neo4j’s pitch of "value in relationships" and the accompanying docs looked like what I needed. I decided to give it a try.

Preparations - Required Setup

To setup the project I created a public github account and began setting up the infrastructure for a spring web project using maven as build system. So I added the dependencies for the springframework libraries, put the web.xml for the DispatcherServlet and the applicationContext.xml in the webapp directory.

TODO setup code?

With this setup I was ready for the first spike: creating a simple MovieController showing a static view. Check. Next was the setup for Spring Data Graph. I looked at the README at github and then checked it with the manual. Quite a lot of maven setup for aspectj but otherwise not so much to add. I added just a few lines to my spring configuration.

TODO config code?

I spun up jetty to see if there were any obvious issues with the config. Check.

Setting the Stage - Movies Domain

The domain model was the next thing I planned to work on. I wanted to flesh it out first before diving into library details. Going along the ideas outlined before I came up with this. I also peeked in the datamodel of my import data source themoviedb to confirm that it matched my expectations.

In Java code this looked like. Pretty straightforward.

class Movie {
    int id;
    String title;
    int year;
    Set<Role> cast;
}

class Actor {
    int id;
    String name;
    Set<Movie> filmography;
    Role playedIn(Movie movie, String role);
}
class Role {
    Movie movie;
    Actor actor;
    String role;
}
class User {
    String login;
    String name;
    String password;
    Set<Rating> ratings;
    Set<User> friends;
    Rating rate(Movie movie, int stars, String comment);
    void befriend(User user);
}
class Rating {
    User user;
    Movie movie;
    int stars;
    String comment;
}

I wrote some basic tests to assure that the basic plumbing worked. Check.

Graphs ahead - Learning Neo4j

Then came the unknown - how to put these domain objects into the graph. First I read up about graph databases, especially Neo4j. Their datamodel consists of nodes and relationships all of which can have properties. Relationships as first class citizens - I liked that. Then there was the possibility to index both by field, value pairs to quickly get hold of them as starting points for further processing. Other useful operations were manual traversal of relationships and a powerful traversal based on a query like Traversal Description. That all seemed pretty easy.

I also learned that Neo4j was transactional and provided the known ACID guarantees for my data. This was unsual for a NoSQL database but easier for me to get my head around than non-transactional eventual persistence. That also meant that I had to manage transactions somehow. Keep that in mind.

enum RelationshipTypes implements RelationshipType { ACTS_IN };

GraphDatabaseService gds = new EmbeddedGraphDatabase("/path/to/store");
Node forest=gds.createNode();
forest.setProperty("title","Forest Gump");
forest.setProperty("year",1994);
gds.index().forNodes("movies").add(forest,"id",1);

Node tom=gds.createNode();
tom.setProperty("Tom Hanks");

Relationship role=tom.createRelationshipTo(forest,ACTS_IN);
role.setProperty("role","Forest Gump");

Node movie=gds.index().forNodes("movies").get("id",1).getSingle();
print(movie.getProperty("title"));
for (Relationship role : movie.getRelationships(ACTS_IN,INCOMING)) {
	Node actor=role.getOtherNode(movie);
	print(actor.getProperty("name") +" as " + role.getProperty("role"));
}

Conjuring Magic - Spring Data Graph

Decorations - Annotated Domain

But that was the pure graph database. Using this in my domain would pollute my classes with lots of graph database details. I didn’t want that. Spring Data Graph promised to do the heavy lifting for me. So I checked that next. Obviously it heavily depended on aspectj magic. So there would be certain behavour that was just observable without being visible in my code. But I was going to give it a try.

I looked at the documentation again, found a simple Hello-World example and tried to understand it. The entities were annotated with @NodeEntity, that was simple, so I added it too. Relationships got their own annotation named @RelationshipEntity. Property fields should be taken care of automatically.

Ok lets put this into a test. How to assure that a field was persisted to the graph store? There seemed to be two possibilities. First was to get a GraphDatabaseContext injected and use its getById() method. The other one was a Finder approach which I ignored for new. Lets keep things simple. How to persist an entity and how to get its id? No idea. So further study of the documentation revealed that there were a bunch of methods introduced to the entities by the aspects. That was not obvious. But I found the two that would help me here - entity.persist() and entity.getNodeId().

So my test looked like this.

@Autowired GraphDatabaseContext graphDatabaseContext;

@Test public void persistedMovieShouldBeRetrievableFromGraphDb() {
    Movie forestGump = new Movie("Forest Gump", 1994).persist();
    Movie retrievedMovie = graphDatabaseContext.getById(forestGump.getNodeId());
    assertEqual("retrieved movie matches persisted one",forestGump,retrievedMovie);
    assertEqual("retrieved movie title matches","Forest Gump",retrievedMovie.getTitle());
}

That worked, cool. But what about transactions I didn’t declare the test to be transactional? After further reading I learned that persist() creates an implicit transaction - so that was like an EntityManager would behave. Ok for me. I also learned that for more complex operations on the entities I needed external transactions.

Do I know you? - Indexing

Then there was an @Indexed annotation for fields. I wanted to try this too. That would guide the next test. I added an @Indexed to the id field of the movie. This field is intended to represent the external id that will be used in URIs and will stable over database imports and updates. This time I went with the Finder to retrieve my indexed movie.

@NodeEntity
class Movie {
    @Indexed
    int id;
    String title;
    int year;
}

@Autowired FinderFactory finderFactory;

@Test public void persistedMovieShouldBeRetrievableFromGraphDb() {
    int id=1;
    Movie forestGump = new Movie(id, "Forest Gump", 1994).persist();
    NodeFinder<Movie> movieFinder = finderFactory.createNodeEntityFinder(Movie.class);
    Movie retrievedMovie = movieFinder.getByPropertyValue(id);
    assertEqual("retrieved movie matches persisted one",forestGump,retrievedMovie);
    assertEqual("retrieved movie title matches","Forest Gump",retrievedMovie.getTitle());
}

TODO This failed with an exception about not being in a transaction. Oh, I forgot to add the @Transactional. So I added it to the test.

Serving a good cause - Repository

That was the first method to add to the repository. So I created a repository for my application, annotated it with @Repository and @Transactional.

@Repository @Transactional
public class CineastsRepostory {
    FinderFactory finderFactory;
    Finder<Movie> movieFinder;
    @Autowired
    public CineastsRepostory(FinderFactory finderFactory) {
        this.finderFactory = finderFactory;
        this.movieFinder = finderFactory.createNodeEntityFinder(Movie.class);
    }
    public Movie getMovie(int id) {
        return movieFinder.getById(id);
    }
} 

A convincing act - Relationships

Value in Relationships - Creating them

Next were relationships. Direct relationships didn’t require any annotation. Unfortunately I had none of those. So I went for the Role relationship between Movie and Actor. It had to be annotated with @RelationshipEntity and the @StartNode and @EndNode had to be marked. So my Role looked like this:

@RelationshipEntity
class Role {
    @EndNode
    Movie movie;
    @StartNode
    Actor actor;
    String role;
}

When writing a test for that I tried to create the relationship entity with new, but got an exception saying that this was not allowed. Some weird restriction about having only correctly constructed RelationshipEntities. So I remembered a relateTo method from the list of introduced methods on the NodeEntities. After quickly checking it turned out to be exactly what I needed. I added the method for connecting movies and actors to the actor - seemed more natural.

public Role playedIn(Movie movie, String roleName) {
    Role role = relateTo(movie, Role.class, "ACTS_IN");
    role.setRole(roleName);
    return role;
}

What was left - accessing those relationships. I already had the appropriate fields in both classes. Time to annotate them correctly. For the fields providing access to the entities on the other side of the relationship this was straightforward. Providing the target type again (thanks to Java’s type erasure) and the relationship type (that I learned from the Neo4j lesson before) there was only the direction left. Which defaults to OUTGOING so only for the movie I had to specify it.

@NodeEntity
class Movie {
    @Indexed
    int id;
    String title;
    int year;
    @RelatedTo(elementClass = Actor.class, type = "ACTS_IN", direction = Direction.INCOMING)
    Set<Actor> cast;
}

@NodeEntity
class Actor {
    @Indexed
    int id;
    String name;
    @RelatedTo(elementClass = Movie.class, type = "ACTS_IN")
    Set<Movie> cast;

    public Role playedIn(Movie movie, String roleName) {
        Role role = relateTo(movie, Role.class, "ACTS_IN");
        role.setRole(roleName);
        return role;
    }
}

May I introduce ? - Accessing Relationships themselves

While reading about those relationship-sets I learned that they are handled by managed collections of spring data graph. So whenever I add something to the set or remove it, it automatically reflects that in the underlying relationships. Neat. But this also meant I mustn’t initialize the fields. Something I will certainly forget not to do in the future, so watch out for it.

I didn’t forget to add test for those. So I could assure that the collections worked as advertised (and also ran into the intialization problem above).

But I still couldn’t access the Role relationships. There was more to read about this. For accessing the relationship in between the nodes there was a separate annotation @RelatedToVia. And I had to declare the field as readonly Iterable<Role>. That should make sure that I never tried to add Roles (which I couldn’t create on my own anyway) to this field. Otherwise the annotation attributes were similar to those used for @RelatedTo. So off I went, creating my first real relationship (just kidding).

@NodeEntity
class Movie {
    @Indexed
    int id;
    String title;
    int year;
    @RelatedTo(elementClass = Actor.class, type = "ACTS_IN", direction = Direction.INCOMING)
     Set<Actor> cast;
    
    @RelatedToVia(elementClass = Role.class, type = "ACTS_IN", direction = Direction.INCOMING)
    Iterable<Roles> roles;
}

After the tests proved that those relationship fields really mirrored the underlying relationships in the graph and instantly reflected additions and removals I was satisfied with my domain so far and went for some coffee and chocolate.

Requisites - Populating the database

Time to put this on display. But I needed some test data first. So I wrote a small class for populating the database which could be called from my controller. To make it safe to call it several times I added index lookups to check for existing entries. A simple /populate endpoint for the controller that called it would be enough for now.

TODO code

Behind the scenes - Peeking at the Datastore

Eye candy - Neoclipse visualization

After filling the database I wanted to see what the graph looked like. So I checked out two tools that are available for inspecting the graph. First Neoclipse, an eclipse RCP application or plugin that connects to existing graph stores and visualizes their content. After getting an exception about concurrent access, I learned that I have to use Neoclipse in readonly mode when my webapp had an active connection to the store. Good to know.

TODO neoclipse image

Hardcore "Hacking" - Neo4j Shell

Besides my movies and actors connected by ACTS_IN relationships there were some other nodes. The reference node which is kind of a root node in Neo4j and can be used to anchor subgraphs for easier access. And Spring Data Graph also represented the type hierarchy of my entities in the graph. Obviously for some internal housekeeping and type checking.

For us console junkies there is also a shell that can reach into a running neo4j store (if that one was started with enableRemoteShell) or provide readonly access to a graph store directory.

neo4j-shell -readonly -path /path/to/my/graphdb

It uses some shell metaphors like cd and ls to navigate the graph. There are also more advanced commands like using indexes and traversals. I tried to play around with them in this shell sesson.

TODO shell session

Showing off - Web views

After I had the means to put some data in the graph database, I also wanted to show it. So adding the controller method to show a single movie with its attributes and cast in a jsp was straightforward. Actually just using the repository to look the movie up and add it to the model. Then forward to the /movies/show view and voilá.

TODO screenshot of movie display

What was his name? - Searching

The next thing was to allow users to search for some movies. So I needed some fulltext-search capabilities. As the index provider implementation of Neo4j builds on lucene I was delighted to see that fulltext indexes are supported out of the box.

So I happily annotated the title field of my Movie class with @Index(fulltext=true) and was told with an exception that I have to specify a separate index name for that. So it became @Indexed(fulltext = true, indexName = "search"). The corresponding finder method is called findAllByQuery. So there was my second repository method for searching movies. To restrict the size of the returned set I just added a limit for now that cuts the result after that many entries.

public void List<Movie> searchForMovie(String query, int count) {
    List<Movie> movies=new ArrayList<Movie>(count);
    for (Movie movie : movieFinder.findAllByQuery("title", query)) {
        movies.add(movie);
        if (count-- == 0) break;
    }
    return movies;
}

Look what i’ve found - Listing Results

I then used this result in the controller to render a list of movies driven by a search box. The movie properties and the cast was accessed by the getters in the domain classes.

TODO jsp fragment

Movies 2.0 - Adding social

But this was just a plain old movie database (POMD). My idea of socializing this business was not realized.

See, mom a Cineast! - Users

So I took the User class that I already coded up before and made it a full fledged Spring Data Graph member.

@NodeEntity
class User {
    @Indexed
    String login;
    String name;
    String password;
    @RelatedTo(elementClass=Movie.class, type="RATED")
    Set<Rating> ratings;
    
    @RelatedTo(elementClass=User.class, type="FRIEND")
    Set<User> friends;
    
    public Rating rate(Movie movie, int stars, String comment) {
        return relateTo(movie, Rating.class, "RATED").rate(stars, comment);
    }
    public void befriend(User user) {
        this.friends.add(user);
    }
}
class Rating {
    @StartNode User user;
    @EndNode Movie movie;
    int stars;
    String comment;
    public Rating rate(int stars, String comment) {
       this.stars=stars; this.comment = comment;
       return this;
    }
}
=== Beware, Critics - Rating

I also put a ratings field into the movie to be able to show its ratings. And a method to average the stars it got.

class Movie {
    @RelatedToVia(elementClass=Rating.class, type="RATED", direction = Direction.INCOMING)
    Iterable<Rating> ratings;

    public int getStars() {
        int stars, int count;
        for (Rating rating : ratings) {
            stars += rating.getStars(); count++;
        }
        return count == 0 ? 0 : (float)stars / count;
    }
}

Fortunately my tests showed my the division by zero error when calculating the stars for a movie without ratings. I also added a few user and ratings to the database population code. And three methods to rate movies, lookup users and add friends to the repository.

TODO code

Protecting Assets - Adding Security

To use the user in the webapp I had to put it in the session and add login and registration pages. Of course the pages that only worked with a valid user account had to be secured as well.

I used Spring Security to that, writing a simple XXX provider that used my repository for looking up the users and validating their credentials.

TODO example config, code
TODO code

After that a logged in user was available in the session and could so be used for all the social interactions. Most of the work done next was adding controller methods and JSPs for the views.

Oh the Glamour - More UI

TODO screenshots

The dusty archives - Importing Data

Now it was time to pull the data from themoviedb.org. Registering there and getting an API key was simple, using the API on the commandline with curl too. Looking at the JSON returned for movies and people I decided to pimp my domain model and add some more fields so that the representation in the UI was worth the effort.

For the import process I created a separate importer that used HttpClient and JSON to fetch and parse the data and then some transactional methods to actually insert it as movies, roles and actors. User data was not available so I created an anonymous user called 'Cineast' that I attributed all the ratings and comments to. I also created a version of the importer that read the json files from local disk, so that I didn’t have to strain the remote API that much and that often.

TODO import code

Movies! Friends! Bargains! - Recommendations

In the last part of this exercise I wanted to add some recommendation algorithms to my app. One was the recommendation of movies that my friends liked very much (and their friends in descending importance). The second was recommendations for new friends that also liked the movies that I liked most.

Doing this kind of ranking algorithms is the real fun with graph databases. They are applied to the graph by traversing it in a certain order, collecting information on the go and deciding which paths to follow and what to include in the results.

Lets say I’m only interested in the top 10 recommendations each.

// 1/path.length()*stars
user.breathFirst().relationship(FRIEND, OUTGOING).relationship(RATED, OUTGOING).evaluate(new Evaluator(Path path) {
    if (path.length > 5) return EXCLUDE_AND_STOP;
    Relationship rating = path.lastRelationship();
    if (rating.getType().equals(RATED)) {
        rating.getProperty()
        return INCLUDE_AND_STOP;
    }
    return INCLUDE_AND_CONTINUE;
})