Labelling columns of large array in a searchable way?

2 views (last 30 days)
I'm working for some advice on working with large datasets.
I am trying to label individual strips of continous data in a way that the label can then be used to group/sort by a specific tag: 1, 2 or 3.
Currently for each dataset I am reshaping into an array (800x43200), generating variable names containing tag (1x43200 cell), making a table and saving as a txt.
Then I need if I need all the 1 tags from all datasets, I have to read each table, use a for-loop and str2num on the variable name, parse out the tag and use that to gather correct columns.
This doesn't seem like the best way of doing it, I thought perhaps I should be using tabularText datastores or tall tables but these don't seem to help with my sorting/averaging of specific tags.
Any advice you can offer to point me in the right direction will be greatly appreciated.
  3 Comments
Cris LaPierre
Cris LaPierre on 20 Mar 2021
Perhaps I don't quite get your naming scheme, but with a table, if you know the variable name you want to load, you shouldn't have to use a for loop and str2num to get it. See how to access data in tables. Syntax depends on if you want a table returned or an array, but there are various ways you can use the variable name as is to return either.
load patients.mat
T = table(Age,Smoker,Height,Weight,Systolic,Diastolic);
T.Height
ans = 100×1
71 69 64 67 64 68 64 68 68 66
T(:,["Height","Weight"])
ans = 100×2 table
Height Weight ______ ______ 71 176 69 163 64 131 67 133 64 119 68 142 64 142 68 180 68 183 66 132 68 128 66 137 71 174 72 202 65 129 71 181
Jack Bray
Jack Bray on 20 Mar 2021
Edited: Jack Bray on 20 Mar 2021
Thanks for the quick replies, sorry I wasn't clearer, I'll try to explain what I mean in matlab:
% each dataset starts as one long column, I have over 1000 datasets
% currently it looks something like this:
for kk = 1:numel(datasets)
rawdata = load(datasets{kk}); % rawdata = 1x34560000 double
epcdata = reshape(rawdata,800,43200);
for ii = 1:43200
tag(ii) = %use data to get tag: 1,2 or 3
end
% tag = 1x43200 double
for ii = 1:43200
vnames{ii} = [num2str(ii) '_' num2str(tag(ii))];
end
T = array2table(epcdata,'VariableNames',vnames);
writetable(T)
end
% This is all so I can search each dataset for specific tags like this for tag = 1:
for kk = 1:numel(Tables)
T = readtable(Tables{kk})
for ii = 1:43200
tag(1,ii) = str2double(T.Properties.VariableNames{ii}(end));
end
ones(:,kk) = mean(T(:,find(tag == 1)),2);
end
This method of using the variable name of the table as a label to search for seems silly but I can't figure out how something like this should be done.

Sign in to comment.

Accepted Answer

Seth Furman
Seth Furman on 22 Mar 2021
table supports custom metadata properties.
In your case, you could add a "tag" custom variable property to T as in the following example.
rng default
rawdata = randi(100,1,34560000); % rawdata = 1x34560000 double
epcdata = reshape(rawdata,800,43200);
vnames = string(1:43200);
T = array2table(epcdata,'VariableNames',vnames);
tag = randi(3,1,43200);
T = addprop(T,"tag","variable");
T.Properties.CustomProperties.tag = tag;
Now the "tag" and variable name properties are distinct
>> T(1:5,1:5)
ans =
5×5 table
1 2 3 4 5
__ __ __ __ __
82 69 68 82 25
91 14 44 19 39
13 73 70 13 44
92 12 26 83 84
64 12 1 64 83
>> T.Properties.CustomProperties.tag(1:5)
ans =
3 3 1 1 1
Note that you will have to write your table to a MAT file instead of a text file in order to preserve the custom property you added.
save T.mat T
Please let me know if this meets your use case.
  1 Comment
Jack Bray
Jack Bray on 22 Mar 2021
Thank you for your answer, I think I can use this approach to make my code much more efficient.
You saved me a lot of hassle as I was just about to attempt to convert it all into HDF5 and use the attributes as a custom tag but using tables will be much more conveinent.
Thanks again!

Sign in to comment.

More Answers (1)

Jan
Jan on 20 Mar 2021
vnames{ii} = [num2str(ii) '_' num2str(tag(ii))];
This hides the tags in the names of the variables. This complicated method requires even more complicated methods to access the tags later.
Store the tags as numbers, e.g. as additional column.
  1 Comment
Jack Bray
Jack Bray on 20 Mar 2021
Thanks for this, I had thought about adding an extra row for tags and just using the numbers but it still requires reading each table in order to search for specific tags. Maybe this is just the easiest way to do it.
I just imagined there might be an easier way of organising/searching this kind of columnar data, perhaps using datastores or tall tables etc

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!