Extracting Data from Cell array with multiple values per column

Hi, I have this cell array of 4 columns, but in each column there are 3 values.
RESHAPED
D2 =
195×4 cell array
{[" 95.288 95.307 95.…"]} {[" 131.028 131.028 131.…"]} {[" 0.745516 0.745517 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.4433 95.4633 95.4…"]} {[" 131.028 131.028 131.…"]} {[" 0.745511 0.745509 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.6074 95.6285 95.6…"]} {[" 131.028 131.028 131.…"]} {[" 0.745486 0.745481 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.7799 95.802 95.8…"]} {[" 131.028 131.028 131.…"]} {[" 0.745438 0.745432 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
{[" 95.9603 95.9834 96.0…"]} {[" 131.028 131.028 131.…"]} {[" 0.745381 0.745373 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]}
How can I create 4 columns of 1D vectors, (and numeric values not strings), where each of the 3 values in a row are stacked under each other vertically like this
95.288 131.028 0.745516 -3.99239E-008
95.307 131.028 0.745517 -3.99239E-008
95.… 131..... 0.745… -3.99239E-008
95.4433 131.028 0.745511 -3.99239E-008

9 Comments

Can you upload the variable, in a MAT file? You can use the paper clip icon in the INSERT section of the toolbar.
Also, it is unclear to me what size/shape/type you want to output to be. Still a 195x4 cell array? 4 different vectors? Or a single numeric array?
Sure, here's my original data,
I actually did this for the question above as there are 4 groups of data embedded
D2=reshape(data,[],4)
And 4 single vectors would be fine (Im only going to add them to a uitable so can concatenate them)
So, it looks like there is no guarantee on what the lengths of the embedded strings are, no guarantee that there will be a whitespace character between the numbers within the string, and also how many numbers are within the string is not constant. The most difficult aspect of this is probably parsing the string, not the reformatting.
There's definetly 4 seperate groups of data, and when i did the reshape like this
{[" 95.288 95.307 95.…"]} {[" 131.028 131.028 131.…"]} {[" 0.745516 0.745517 0.745…"]} {["-3.99239E-008-3.99239E-008-…"]
There will always be the 4 columns - and the number of values in each row within each column will be the same - could we do a count on the 1st element to see how many in each column - row?
I mean within the cells. For example,
load("testdata.mat","data")
D2=reshape(data,[],4);
D2{end,3}
ans = "-3.99239E-008-3.99239E-008-3.99239E-008-3.99239E-008-3.99239E-008-3.99239E-008"
D2{end,4}
ans = "9.132E-0059.132E-0059.132E-0059.132E-0059.132E-005 -999 -999 -999"

Hmm... so theres a lot of data there thats not useful for me. Im only after data that the 1st column (which is position) is between 100 and 120.

In this region, the number of elements in each row should be the same

My raw data contains 3950 rows by 4 cols and is pre allocated a "marker value" of -999
Then the elements are filled from an experiment and the number of experimental points is a variable.
The data seems to be outputed from my instrument in columns of 8
I read them into matlab using this (to eliminate the rows containing all -999)
nb=s.NumBytesAvailable;
data={}; ct=0; idx=0;
while nb>0
d = readline(s);
G=isempty(regexp(d,'^(\s+-999)+|:$','once'));
if G==1
idx=idx+1;
data{idx,1}=d;
end
nb=s.NumBytesAvailable;
end
Its this output that I have uploaded as a mat file
That's all well and good, but I don't see how it avoids the hard problem. Look at row 125 of D2 (which I transposed here, so it is easier to see):
load("testdata.mat","data")
D2=reshape(data,[],4);
D2(125,:)'
ans = 4×1 cell array
{[" 117.76 117.79 117.82 117.85 117.88 117.91 117.94 117.97"]} {[" 131.021 131.021 131.021 131.021 131.021 131.021 131.021 131.021"]} {[" 0.740383 0.740383 0.740383 0.740383 0.740383 0.740383 0.740383 0.740383"]} {["5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006" ]}
We need to parse this row, because the first cell has values in 100-120. But 4th cell has the problem I mentioned, that there is no whitespace separting the numbers. And there seem to be only 6 numbers there, not the 8 I might expect from the other elements. And the numbers in the 4th cell don't occupy just 8 characters.
My impression is that you need to inspect your data more carefully, come up with all the steps of the algorithm (i.e. the "rules") that gives what you need, and then get the coding help you need here.
To take nothing for granted and leave no stone unturned: if you have control over the data output, work on that rather than undoing the mess at the data import stage.
That said, starting with the 4 columns, here's a start for the parsing problem the cyclist highlighted: assume that the exponent always has 3 digits (with an optional minus sign), then my mind goes to regexp. The following works, but returns different size arrays for each column in your reshaped cell array - if you are expecting 1560 rows from the data set, then it seems the 3rd and 4th columns are so malformed they don't even return the right number of data points (assuming the parsing worked). If that is the case, then even if you only want rows where some conditions are met on the first 2 columns, you would not be sure that you can safely associate the rows.
load("testdata.mat")
% reshaped dimensions are given/known
data_r = reshape(data,[],4);
col = cell(1,4); % pre-allocate
for i = 1:4
% convert the cell column to string
tmp = string(data_r(:,i));
% join everything together into a super string to parse
tmp = join(tmp);
% parse using regexp
% key assumption: exponent always has 3 digits
parsed = reshape(regexp(tmp,"[+-]?(\d+\.\d+|\d+)([eE][+-]?\d{3})?","match"),[],1);
parsed = str2double(parsed);
parsed(parsed==-999) = nan;
nVars(i) = numel(parsed);
col{i} = parsed;
end
% pad all data to max number of rows to inspect in table form
nVarsMax = max(nVars)
nVarsMax = 1560
for i = 1:4
p = nVarsMax - nVars(i);
col{i} = padarray(col{i},p,nan,'post');
end
T = table(col{:});
head(T,15)
Var1 Var2 Var3 Var4 ______ ______ _______ ___________ 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08 93.993 131.03 0.74552 -3.9924e-08
tail(T,15)
Var1 Var2 Var3 Var4 ______ _______ ____ ____ 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN 131.03 0.74552 NaN NaN

Sign in to comment.

 Accepted Answer

I noticed big problems with array dimensions in columns 3 and 4, this will not allow to get arrays of type double. Here is some combinations of lengths:
C =
8 8 6 8
8 8 8 6
8 8 8 7
8 8 8 8
For this task I got the result by expanding arrays to the same size (filling missing elements with zeros), but I would recommend to check the correctness of the initial data recording, maybe there are problems there.
%% Source data reading
load('testdata.mat')
D2=reshape(data,[],4)
%% Converting data
% add spaces for all problem strings in cellarray, "5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006"
D2 = cellfun(@(x) regexprep(x,'([-\d\.]+E-?\d{3})|(.+)','$1 '),D2,'UniformOutput',false);
% convert strings to nums inside cells
D2 = cellfun(@(x) str2num(x),D2,'UniformOutput',false);
%% Check cells data lenght
data_length = cellfun(@(x) length(x),D2); % аггау with length of all elements
C= unique(data_length,'rows') % unique combinations
C =
8 8 6 8
8 8 8 6
8 8 8 7
8 8 8 8
max(data_length) % now we can see that max length is 8 - it will be our target length
ans =
8 8 8 8
%% Add zeroes to short rows (when length of row < 8)
D2 = cellfun(@(cell_element) fill_with_zeros_broken_data(cell_element), D2,'UniformOutput',false)
%% Reshaping
result = reshape([D2{:}],[],4)
%% Local function
function cell_element = fill_with_zeros_broken_data(cell_element)
if length(cell_element)<8
cell_element(8)=0;
end
end

More Answers (1)

Hi! Here is a short 3 line solution, but it may be a bit hard to understand in some places. I will leave comments, but feel free to ask me questions if anything is unclear.
Sample data:
testdata = ...
{[" 95.288 95.307 95.1"], [" 131.028 131.028 131.1"], [" 0.745516 0.745517 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.4433 95.4633 95.4"], [" 131.028 131.028 131.1"], [" 0.745511 0.745509 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.6074 95.6285 95.6"], [" 131.028 131.028 131.1"], [" 0.745486 0.745481 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.7799 95.802 95.8"], [" 131.028 131.028 131.1"], [" 0.745438 0.745432 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"];
[" 95.9603 95.9834 96.1"], [" 131.028 131.028 131.1"], [" 0.745381 0.745373 0.745"], ["-3.99239E-008-3.99239E-008-3.99239E-008"]}
testdata = 5×4 cell array
{[" 95.288 95.307 95.1"]} {[" 131.028 131.028 131.1"]} {[" 0.745516 0.745517 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.4433 95.4633 95.4" ]} {[" 131.028 131.028 131.1"]} {[" 0.745511 0.745509 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.6074 95.6285 95.6" ]} {[" 131.028 131.028 131.1"]} {[" 0.745486 0.745481 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.7799 95.802 95.8" ]} {[" 131.028 131.028 131.1"]} {[" 0.745438 0.745432 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]} {[" 95.9603 95.9834 96.1" ]} {[" 131.028 131.028 131.1"]} {[" 0.745381 0.745373 0.745"]} {["-3.99239E-008-3.99239E-008-3.99239…"]}
Solution
The data in the last column needs to be split: we'll add a space between the string numbers.
{["-3.99239E-008-3.99239E-008-3.99239E-008"]} % we'll repalce this
{["-3.99239E-008 -3.99239E-008 -3.99239E-008"]} % with this
The fast way to do this is with regular expressions. To apply the regular expression to each cell, we'll use the cellfun():
testdata(:,end) = cellfun(@(x) regexprep(x,'(-.+?E.+?)(?=-|$)','$1 '),testdata(:,end),'UniformOutput',false)
testdata = 5×4 cell array
{[" 95.288 95.307 95.1"]} {[" 131.028 131.028 131.1"]} {[" 0.745516 0.745517 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.4433 95.4633 95.4" ]} {[" 131.028 131.028 131.1"]} {[" 0.745511 0.745509 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.6074 95.6285 95.6" ]} {[" 131.028 131.028 131.1"]} {[" 0.745486 0.745481 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.7799 95.802 95.8" ]} {[" 131.028 131.028 131.1"]} {[" 0.745438 0.745432 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]} {[" 95.9603 95.9834 96.1" ]} {[" 131.028 131.028 131.1"]} {[" 0.745381 0.745373 0.745"]} {["-3.99239E-008 -3.99239E-008 -3.992…"]}
OK, now we can convert strings to numbers inside cells:
testdata = cellfun(@(x) str2num(x),testdata,'UniformOutput',false)
testdata = 5×4 cell array
{[95.2880 95.3070 95.1000]} {[131.0280 131.0280 131.1000]} {[0.7455 0.7455 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.4433 95.4633 95.4000]} {[131.0280 131.0280 131.1000]} {[0.7455 0.7455 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.6074 95.6285 95.6000]} {[131.0280 131.0280 131.1000]} {[0.7455 0.7455 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.7799 95.8020 95.8000]} {[131.0280 131.0280 131.1000]} {[0.7454 0.7454 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]} {[95.9603 95.9834 96.1000]} {[131.0280 131.0280 131.1000]} {[0.7454 0.7454 0.7450]} {[-3.9924e-08 -3.9924e-08 -3.9924e-08]}
Now we need to get the data out of cells. The main hack here is to use [testdata{:}] which creates a row of doubles. Then all that remains is to split it into 4 columns with reshape():
result = reshape([testdata{:}],[],4)
result = 15×4
95.2880 131.0280 0.7455 -0.0000 95.3070 131.0280 0.7455 -0.0000 95.1000 131.1000 0.7450 -0.0000 95.4433 131.0280 0.7455 -0.0000 95.4633 131.0280 0.7455 -0.0000 95.4000 131.1000 0.7450 -0.0000 95.6074 131.0280 0.7455 -0.0000 95.6285 131.0280 0.7455 -0.0000 95.6000 131.1000 0.7450 -0.0000 95.7799 131.0280 0.7454 -0.0000 95.8020 131.0280 0.7454 -0.0000 95.8000 131.1000 0.7450 -0.0000 95.9603 131.0280 0.7454 -0.0000 95.9834 131.0280 0.7454 -0.0000 96.1000 131.1000 0.7450 -0.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Zeros in the last column may be a little confusing, but when changing the display format we see that the result is correct:
format shortE
result
result = 15×4
9.5288e+01 1.3103e+02 7.4552e-01 -3.9924e-08 9.5307e+01 1.3103e+02 7.4552e-01 -3.9924e-08 9.5100e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5443e+01 1.3103e+02 7.4551e-01 -3.9924e-08 9.5463e+01 1.3103e+02 7.4551e-01 -3.9924e-08 9.5400e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5607e+01 1.3103e+02 7.4549e-01 -3.9924e-08 9.5629e+01 1.3103e+02 7.4548e-01 -3.9924e-08 9.5600e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5780e+01 1.3103e+02 7.4544e-01 -3.9924e-08 9.5802e+01 1.3103e+02 7.4543e-01 -3.9924e-08 9.5800e+01 1.3110e+02 7.4500e-01 -3.9924e-08 9.5960e+01 1.3103e+02 7.4538e-01 -3.9924e-08 9.5983e+01 1.3103e+02 7.4537e-01 -3.9924e-08 9.6100e+01 1.3110e+02 7.4500e-01 -3.9924e-08
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

4 Comments

if i read correctly, your regexp relies on a minus sign separating the exponent of the previous value from the start of the next value. But if you look at the data extract the cyclist highlighted, this is not always going to work.
@J. Alex Lee is correct, this regular expression fails to handle some data correctly:
x = "5.53099E-0066.03304E-0066.53553E-0067.04676E-0067.57499E-0068.1279E-006";
regexprep(x,'(-.+?E.+?)(?=-|$)','$1 ')
ans = "5.53099E-0066.03304E-0066.53553E -0067.04676E-0067.57499E -0068.1279E-006 "
@J. Alex Lee @Stephen23 Sorry, I haven't read the thread until now and didn't notice the message with the source file. I'll redo the regular expression now, but I see that there are big problems with the dimensions of the arrays in the cells...
Rather than use regexp to find only E's and add a space and split the string up later, you can just match regexp instances directly on the full expression of a value in scientific notation (with or without E).
Google search gave me this expression - modified at the end with \d{3} to assert that there will always be 3 digits after the "e".
"[+-]?(\d+\.\d+|\d+)([eE][+-]?\d{3})?"
As Pavel noted, the number of data in each column are inconsistent as demonstrated in my comment.

Sign in to comment.

Categories

Products

Release

R2024b

Tags

Asked:

on 19 Jun 2025

Commented:

on 25 Jun 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!