Database

Goals

There are several goals with respect to the database and the ORM (Object Relational Mapping) tool:

###Platform Independence PostGres and MySQL are both good open-source databases. Plus, many use Oracle, even with its licensing costs. Users might prefer one over the others. Therefore, we want to be able to support both. ###Be able to use AWS We want to be able to use all the features of AWS, including being able to use multiple platforms, replication, scaling, etc. Therefore, we want to use PostGres, MySQL, or Oracle. ###Minimize SQL To maintain platform independence, we need to minimize SQL, since some SQL is specific to a database. ###Work with POJO objects We want to specify an object, including all database attributes, in the Java class. We want all reads and writes to be transparent and happen automatically.

##Hibernate Hibernate allows us to achieve our goals of platform independence, AWS, minimizing SQL, working with POJO objects, and more. Once one gets the hang of it, it greatly reduces the amount of code that needs to be written for reading and writing objects.

The downside is that, while hiding a great deal of the database work, it can obscure problems and make things more difficult until one truly learns the Hibernate system.

It is intended that annotations be used (as opposed to an XML-based schema config file) to specify how an object is mapped to the database. This way, the mapping info is directly in the class definition, and one doesn't need to deal with a separate file.

Primary Keys

Every object to be persisted needs a unique Id per row. That way, Hibernate can compare objects for equality and such. This also means that each table should have a primary key column, which, of course, is also expected for any table in a relational database. If multiple columns are needed to establish a unique id/primary key, then we can simply annotate multiple columns with @Id. It should be noted that can have null values for primary key columns. This is unfortunate, because null should be considered a valid identifier. So for AVLReports, can't have block be part of the primary key because sometimes it will be null.

The primary key will automatically create an index so that the db can quickly confirm that the object being inserted is unique. So, we don't need to create a separate index on the primary key column to speed up queries that would benefit from an index on that column. It already exists. But if the primary key is on multiple columns, then things are much more complicated. When multiple columns are used for a primary key, then an index is created, but, most likely, it will simply use a concatenation of what is in the two columns. This is adequate for the db to quickly make sure that an object is unique before inserting it, but it will not be adequate for speeding up some queries. For example, for AvlReport the primary key is on time and vehicleId. The index can therefore be something like time||vehicleId (concatenation). If one does a query for reports between a certain time frame, the index unfortunately won't help, since it is really just a string that includes time and vehicleId. So, we can't find rows based on time. So for this case, we need a separate index to speed up such queries.

Connection Pooling

The C3P0 db connection pooler is used because the one that comes with Hibernate is not intended for production use. Also, C3P0 appears to be widely used. If you want to get rid of the C3P0 status that is printed, by default, when hibernate starts, you need to recompile C3P0 sources after changing com.mchange.v2.c3p0.Debug.DEBUG to false. This is a public static final field that cannot be changed by configuration files.

Deleting old data

In order to delete all config data, you need to avoid constraint violations by deleting in this order. Note: this will delete the data for all configuration revisions. If you want to only delete it for certain revs, then you need to add a clause to the SQL.

-- Deleting entire config
delete from trippattern_to_path_jointable;
delete from traveltimesfortrip_to_traveltimesforpath_jointable;
delete from block_to_trip_jointable;

delete from activerevisions;
delete from agencies;
delete from calendardates;
delete from calendars;
delete from fareattributes;
delete from farerules;
delete from frequencies;
delete from routes;
delete from stoppaths;
delete from stops;
delete from transfers;
delete from traveltimesforstoppaths;
delete from trips;
delete from traveltimesfortrips;
delete from trippatterns;
delete from blocks;

Dropping Tables

If you are making bunch of changes to schema, then it can be easier to drop the old tables, and then read in the new ddl SQL file from scratch. But, the tables need to be dropped in a particular order, and that order is unfortunately not correct in the ddl file.

drop table if exists ActiveRevisions;
drop table if exists Agencies;
drop table if exists ArrivalsDepartures;
drop table if exists AvlReports;
drop table if exists Block_to_Trip_joinTable;
drop table if exists Blocks;
drop table if exists CalendarDates;
drop table if exists Calendars;
drop table if exists FareAttributes;
drop table if exists FareRules;
drop table if exists Frequencies;
drop table if exists Matches;
drop table if exists PredictionAccuracy;
drop table if exists Predictions;
drop table if exists Routes;
drop table if exists Stops;
drop table if exists Transfers;
drop table if exists VehicleEvents;
-- This is where the tricky stuff starts
drop table if exists TravelTimesForTrip_to_TravelTimesForPath_joinTable;
drop table if exists TripPattern_to_Path_joinTable;
drop table if exists StopPaths;
drop table if exists TravelTimesForStopPaths;
drop table if exists Trips;
drop table if exists TravelTimesForTrips;
drop table if exists TripPatterns;

Dropping Tables without dropping historic data

drop table if exists ActiveRevisions;
drop table if exists Agencies;
-- drop table if exists ArrivalsDepartures;
-- drop table if exists AvlReports;
drop table if exists Block_to_Trip_joinTable;
drop table if exists Blocks;
drop table if exists CalendarDates;
drop table if exists Calendars;
drop table if exists FareAttributes;
drop table if exists FareRules;
drop table if exists Frequencies;
-- drop table if exists Matches;
-- drop table if exists PredictionAccuracy;
-- drop table if exists Predictions;
drop table if exists Routes;
drop table if exists Stops;
drop table if exists Transfers;
-- drop table if exists VehicleEvents;

-- This is where the tracky stuff starts
drop table if exists TravelTimesForTrip_to_TravelTimesForPath_joinTable;
drop table if exists TripPattern_to_Path_joinTable;
drop table if exists StopPaths;
drop table if exists TravelTimesForStopPaths;
drop table if exists Trips;
drop table if exists TravelTimesForTrips;
drop table if exists TripPatterns;

Reclaiming disk space via Vacuum

To clear out data can, then of course do a delete SQL call. But this doesn’t actually make the space available again to the database or the operating system. To make the space available to the database, you should do a VACUUM. To return space to operating system, you need to do a VACUUM FULL, but this takes a while and locks the table, which could be a big problem. For details, see http://www.postgresql.org/docs/9.1/static/sql-vacuum.html

It is recommended that one does a VACUUM ANALYZE frequently, such as nightly. It does a regular vacuum plus an analyze.

Note that can be verbose using VACUUM (VERBOSE, ANALYZE) TABLE_NAME;

If you have lots of data and can’t lock table using vacuum full, then things are a bit complicated. The best solution is likely to rename the table, and then create the table again. Since renaming and creating are so fast, it shouldn’t take significant time, so you shouldn't lose any data. Then, you can copy into the new table just the data that want to keep around. Then, remove the renamed table. Removing the renamed table will free all of the disk space used. This way, you never have to fully lock the tables and won’t lose data.

Miscellaneous Notes

With Hibernate 4.0, we ran into lots of problems with the documentation, including online, being out of date. For example, the Hibernate @Entity tag has been deprecated and one must now use the JPA one by specifying the appropriate import. Even the whole way of creating a session has changed and will continue to change. This is a nuisance, but bearable.

One other subtle gotcha with Hibernate is that Sessions are not threadsafe. Don't pass them between threads!

Fractional seconds in MySQL

Storing time is important, yet is a nuisance when MySQL is used. Previously MySQL did not support fractional seconds. But, we really do want to store msec as part of times. This is especially true for AVL data where need to avoid duplicate key problems with respect to a primary key that uses a timestamp. Fortunately, since MySQL 5.6.4, one can specify fractional timestamps (and other time values). See http://dev.mysql.com/doc/refman/5.6/en/fractional-seconds.html for details. When specifying timestamp, you need to specify number of digits past the decimal point. The default for MySQL is 0 for backwards compatibility. For other databases the default is 6! So, it is best to explicitly specify the precision to TIMESTAMP(3) (or perhaps 6) so that fractional seconds will work with any database.

Timezones

Timezones are an incredible nuisance. With PostGres and Hibernate, it is difficult to use the type “timestamp with time zone” because usually a Date is simply converted to a “timestamp”. It can be done by the SchemaGenerator class by declaring a special schema type of Postgres, but that could be rather complicated. Therefore, times are simply stored as times, without a timezone.

A further complication is that one uses Java Date objects for queries including for storing or retrieving objects with dates. To get this to work the application doing the queries must have the timezone set properly using TimeZone.setDefault(TimeZone.getTimeZone("America/New_York"));

Useful info on PostGres and timezones is at http://www.postgresql.org/docs/9.1/static/datatype-datetime.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database

Database

Goals

Primary Keys

Connection Pooling

Deleting old data

Dropping Tables

Dropping Tables without dropping historic data

Reclaiming disk space via Vacuum

Miscellaneous Notes

Fractional seconds in MySQL

Timezones

Clone this wiki locally