best option for storing metadata associated to each data file

29 views (last 30 days)
I have thousands of files which contain columns of timeseries data from experimental 'games' (e.g. continuous player positions, ball positions, etc). I need to store metadata for each file containing info about the experiment that data is coming from (e.g. Experiment X; protocol #__; success criteria; etc). I primarily use Matlab to read in data. I know nothing about methods for storing metadata so am a bit overwhelmed with choices and don't want to use an unnecessarily complex file type. I'm leanring about XML and JSON but is there a simple, beginner-friendly solution anyone can recommend that will work easily with Matlab?
I won't need to access the metadata very often... it'll exist mostly just for record keeping purposes. I will, however, need to regularly read in the data itself (and ignore the metadata).

Answers (2)

Paul Shoemaker
Paul Shoemaker on 2 Dec 2019
Edited: Paul Shoemaker on 2 Dec 2019
Hello Lilly,
If your data really is time-based then you should consider formatting the data to be a timetable, which have Properties that you can populate with various metadata. Here's an example (straight from Matlab help for timetable):
MeasurementTime = datetime({'2015-12-18 08:03:05';'2015-12-18 10:03:17';'2015-12-18 12:03:13'});
Temp = [37.3;39.1;42.3];
Pressure = [30.1;30.03;29.9];
WindSpeed = [13.4;6.5;7.3];
WindDirection = categorical({'NW';'N';'NW'});
TT = timetable(MeasurementTime,Temp,Pressure,WindSpeed,WindDirection)
TT =
3×4 timetable
MeasurementTime Temp Pressure WindSpeed WindDirection
____________________ ____ ________ _________ _____________
18-Dec-2015 08:03:05 37.3 30.1 13.4 NW
18-Dec-2015 10:03:17 39.1 30.03 6.5 N
18-Dec-2015 12:03:13 42.3 29.9 7.3 NW
Matlab timetables are really easy to work with once you get used to them. Now that you have this timetable (TT), you can access its properties, like so:
TT.Properties
ans =
TimetableProperties with properties:
Description: ''
UserData: []
DimensionNames: {'MeasurementTime' 'Variables'}
VariableNames: {'Temp' 'Pressure' 'WindSpeed' 'WindDirection'}
VariableDescriptions: {}
VariableUnits: {}
VariableContinuity: []
RowTimes: [3×1 datetime]
StartTime: 18-Dec-2015 08:03:05
SampleRate: NaN
TimeStep: NaN
CustomProperties: No custom properties are set.
Use addprop and rmprop to modify CustomProperties.
Notice that there are lots of fields in here you can use to capture metadata. You can define the Description field:
TT.Properties.Description = 'My example time table' % Or whatever you want
The Properties.UserData field is sort of a catch-all where you can literally put any and all information you could possibly want in there. You can even define your own properties for the table via the CustomProperties option.
Once you have it the way you want, simply save the timetable variable to a file of your choosing. When you reload it into Matlab, all of those properties you set will still be there.
I hope this helps!
Paul Shoemaker
MatlabInvesting.com
  2 Comments
Centauri Jolene
Centauri Jolene on 3 Dec 2019
Thanks Paul - My data is timeseries data but its usually sampled around 50-100 measurements per second(Hz), and global/real-world time doesn't really matter. Would a timetable data format be helpful for this as well?
Paul Shoemaker
Paul Shoemaker on 3 Dec 2019
Lilly,
Yes, I think timetable would work well in that application. Another option to consider is simply the table format, which is similar to timetable in that it has a Properties field, but it doesn't have to be time-based. In many cases, the application of these two formats can overlap.
As Rik said below, you can also store a separate file for meta, but personally I prefer to embed in the data file so long as the metadata is not so big that it materially affects file load time. I just like files that are self-contained. I'll even go as far as to create carbon copies of my scripts that were used to create the file and store them in the actual data file, just in case the scripts change over time.
HDF, which I think is what Rik mentioned below, is yet another option, but I think it's a bit less approachable than the built-in timetable and table options in Matlab.
Paul Shoemaker
MatlabInvesting.com

Sign in to comment.


Rik
Rik on 3 Dec 2019
Edited: Rik on 3 Dec 2019
Another way of thinking about metadata is that it is data itself. Store it as a separate plain text file with the exact same name as your normal data and add something like '.meta' or '.description' to it.
If you want to keep your data and metadata together you should consider changing your storage format so it allows a header.
Edit: to avoid confusion; I didn't mean HDF (although you can of course consider that), but I meant something like this format (so just writing things in a plain text header):
someParameter1=1.0
someParameter2="car brand"
someUID="1239062jkasdfawe9823"
~~~~
2019-12-01 1.2 2.4 100
2019-12-02 4.1 8.7 827
2019-01-08 5.9 7.1 534

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!