Skip to content

DataReporter

Alexius Wadell edited this page Jan 19, 2017 · 2 revisions

The DataReporter object handles importing MoTeC log files into the Datastore for futher analysis via Datamaster. Ideally, only a single machine/ user uses DataReporter to updated a central Datastore (ie. hosted on a network drive). With the remaining users connecting to the shared Datastore but not adding to it via DataReporter. This is not a strict requirement, however the

Prerequisites

Before DataReporter can be used the following must first occur:

  1. Install MoTeC i2Pro
  2. Create an Google Drive for storing MoTeC Log Files
  3. Create an OAuth 2.0 Client Secret and ID for using Google Drive API See Google's Instructions using an account that has access to Google Drive Folder used for storing MoTeC Log Files.
  4. Create the Master SQLite Database using MasterDirectory.sql. DB Browser for sqlite is a pretty solid open source program for creating and working with sqlite databases. The path to this file is the master_directory_path.
  5. Create a directory for storing datasource's *.MAT files, this is the datastore_path. It is recommended (but not required) to place the master directory inside of the datastore folder.
  6. Update config.ini to include:
    • client_id, client_secret from step 3
    • master_directory_path from step 4
    • datastore_path from step 5

Additional Suggestions

Importing MoTeC Log files can take a significant amount of time to run (~1000 files/hr), but once done checking for new files via dr.RefreshDatastore takes minutes (~1000/min). Additionally once imported Datamaster can quickly process hundreds of datasource with minimal effort. Given the large amount of time required for the initial export, it is recommended that the Datastore and Master Directory be stored on a networked drive so multiple users can access the Datastore without personally running an import.

Additionally, by maintaining a shared version of the Datastore that is updated by a single (or limited number of) users, most user can be spared having to set up DataReporter.

Unfortunately, accessing files over a network connection can be significantly slower that accessing files stored locally. While the time difference can be minimal over a fast connection, some user may want to keep a local copy of the Datastore and regularly check the shared version for updates. This can be done simply by changing the values of master_directory_path and datastore_path in config.ini to point to the local copy.

Export Process

Once DataReporter has been set up, refreshing the Datastore is a simple matter of calling:

dr = DataReporter;
dr.RefreshDatastore;

And waiting for the process to complete. Please note that while DataReporter will lock down the current MATLAB session, by opening a second window on the same (or different) machine, Datamaster can be used to examine datasources as they are exported.

Finding MoTeC Log Files

The first step of the export process is to poll the Google Drive API for a list of every file the user (The account used to create the OAuth 2.0 Client Secret/ID) has access to with an *.LD or *.LDX extension. The API in turn returns the following information:

Property Name Description
id The name Google uses internally for the file
name The filename of the file
md5Checksum The MD5 Checksum of the file, Used later for detecting modifications/ duplicates
modifiedTime The last time the file was modified
webContentLink The URL that can be used to download the file from Google Drive

Matching the *.LD file to the *.LDX file

Recall that each MoTeC Log File is really 2 files one foo.LDX and one foo.LD. The next step in the export process is to match each *.LD file to it's *.LDX counterpart. The problem is that some files have been duplicated and other are matching.

LD/LDX Matching Problem

Case Description
Everything Matches An *.LD File can be matched to an *.LDX file on name alone
Duplicate File Multiple .LD/.LDX Files share the same MD5 Hash (ie. Are copies of each other) but each can be paired to an *.LDX file based on name
Missing File An *.LD File has no matching *.LDX, or vise versa

In the case of missing files, DataReporter will simply ignore the orphan file and move on. For Duplicate Files, Datamaster will pick the oldest *.LD file and the *LDX file with the a modifiedTime closest the the *.LD's modifiedTime. This procedure was designed with the intent of always picking the original/ unmodified version.

Exporting from i2PRo

Each *.LD/ *.LDX file pair is then downloaded from Google Drive using their webContentLink. Once downloaded the MoTeC i2Pro API is used to export the log file to a MATLAB *.MAT File. Interestingly the i2Pro API claims to require a license to use, luckily the features of the API that are needed by DataReporter can be used without a license. This may disappear in a future release of i2Pro but does work as of MoTeC i2Pro 1.1.2.475. Additionally the *.LDX file is directly parsed using regular expressions to extract information stored in the log file details, such as the Venue and Driver.

The export *.MAT File is then reopen and channel data is re saved using single floating point precision rather than MATLAB's default double floating point precision. Doing this cuts the file size in half with minimal reduction in precision, as the ADL3 logs channel data using single floating point precision in the first place.