agrupacion de datos de una tabla

8 views (last 30 days)
Holas,
tengo una tabla 'tData' de tamano: 152709x14, que resume senal de varias variables de un instrumento de medicion (FTIR) y quisiera disminuir el ruido agrupando los valores de la senial en grupos de a 10 es decir obtener una nueva tabla 'tDataReduced' de un tamano 15270x14. la tabla tData cuenta con una columna de texto con el nombre de cada serie de medicion llamada 'MeasID', una columna de fecha y hora llamada 'Time' y 12 columnas con valores numericos.
Para este proposito, Estoy usando este codigo:
% (1) Crear un vector de índices para agrupar las filas en grupos de 10
indices = repelem(1:ceil(height(tData)/10), 10, 1);
indices = indices(1:height(tData));
% (2) Calcular el promedio de cada grupo de 10 filas
tDataReduced = grpstats(tData, indices, 'mean');
% (3) Eliminar la columna de índices generada por grpstats
tDataReduced(:, 'GroupCount') = [];
%%%%%%%%%%%%%%%%%%%5
Pero se tranca en el paso (2) indicando este error:
%*****************
Index exceeds the number of array elements. Index must not exceed 14.
Error in dsgrpstats (line 97)
[group,glabel,groupname] = mgrp2idx(a_data(groupvars),a_nobs);
Error in grpstats (line 144)
[varargout{1:nargout}] = dsgrpstats(x,group,whichstats,varargin{:});
%******************
Algunba idea que me pueda ayudar?

Accepted Answer

Voss
Voss on 17 Jul 2024
Edited: Voss on 17 Jul 2024
% first, I construct a table similar to yours
N = 152709;
MeasID = cellstr(char(64+randi(26,N,4)));
Time = datetime(randi(30,N,6));
C = num2cell(rand(N,12),1);
tData = table(MeasID,Time,C{:})
tData = 152709x14 table
MeasID Time Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 ________ ____________________ ________ _______ _________ _______ ________ ________ _______ _______ _______ _________ _________ ________ {'BAIX'} 14-Oct-0016 15:04:08 0.74662 0.66584 0.97243 0.87197 0.16557 0.47469 0.2506 0.60395 0.86371 0.39017 0.17175 0.61027 {'KQVE'} 11-Mar-0013 23:01:14 0.89935 0.57533 0.97518 0.15691 0.81941 0.024631 0.28559 0.67214 0.9678 0.10555 0.23931 0.75911 {'RTDP'} 28-Jun-0004 01:03:30 0.68013 0.83434 0.19336 0.39673 0.24662 0.66457 0.03651 0.77372 0.96092 0.9078 0.1652 0.45785 {'EFAT'} 20-Mar-0014 02:26:13 0.053861 0.5456 0.27471 0.38658 0.71067 0.52239 0.88472 0.69064 0.4962 0.68826 0.27238 0.66459 {'GYEY'} 24-Sep-0017 04:13:15 0.07359 0.74494 0.63175 0.76147 0.32292 0.081403 0.45976 0.33457 0.97788 0.36092 0.6186 0.93396 {'BTAL'} 11-Nov-0023 16:26:29 0.87689 0.78009 0.036542 0.78547 0.63962 0.87256 0.94112 0.98786 0.1229 0.80758 0.72292 0.15229 {'OBIW'} 17-Sep-0018 17:20:20 0.65785 0.75491 0.4657 0.62428 0.48063 0.82777 0.5978 0.52455 0.37039 0.0096658 0.88045 0.43789 {'YQDL'} 25-Apr-0006 09:15:15 0.97658 0.85495 0.079595 0.2679 0.61404 0.23152 0.61481 0.17822 0.81922 0.65674 0.81724 0.11674 {'MHVL'} 19-Jul-0022 23:13:20 0.032466 0.38461 0.20308 0.89468 0.10013 0.49899 0.91968 0.45019 0.32341 0.30873 0.15687 0.7115 {'ARYA'} 01-Mar-0023 19:11:10 0.17789 0.91077 0.30519 0.52604 0.095028 0.36208 0.5468 0.75843 0.33732 0.37849 0.62022 0.16637 {'IOLX'} 27-Feb-0030 15:12:06 0.41864 0.95644 0.13891 0.90773 0.014836 0.90455 0.65606 0.63802 0.17448 0.6043 0.63695 0.85284 {'KHJC'} 27-Dec-0009 16:01:19 0.63461 0.27975 0.88259 0.85227 0.46034 0.96396 0.32063 0.88355 0.39404 0.042222 0.70572 0.91728 {'TTTW'} 01-Apr-0024 19:20:08 0.96205 0.88056 0.0078747 0.58999 0.70797 0.12374 0.30285 0.22808 0.73668 0.76896 0.0025572 0.093245 {'COOO'} 05-Oct-0001 14:19:10 0.33551 0.87829 0.22246 0.64423 0.59508 0.9902 0.93756 0.99927 0.32911 0.1336 0.59499 0.74522 {'SFNO'} 04-Jul-0022 21:03:26 0.29299 0.54507 0.78344 0.95124 0.29257 0.72571 0.13398 0.41114 0.74129 0.9658 0.98474 0.31675 {'ORFM'} 06-Jun-0009 22:02:08 0.51373 0.65547 0.099982 0.92044 0.6864 0.11025 0.18933 0.47621 0.93311 0.76635 0.42722 0.073502
% second, use grpstats to calculate the mean of each group, with grouping
% done according to indices, over all the numeric and datetime table variables
indices = repelem(1:ceil(height(tData)/10), 10, 1);
indices = indices(1:height(tData));
tData.group_idx = indices(:);
vars = tData.Properties.VariableNames;
vars(ismember(vars,{'MeasID','group_idx'})) = [];
tDataReduced = grpstats(tData,'group_idx','mean','DataVars',vars)
tDataReduced = 15271x15 table
group_idx GroupCount mean_Time mean_Var3 mean_Var4 mean_Var5 mean_Var6 mean_Var7 mean_Var8 mean_Var9 mean_Var10 mean_Var11 mean_Var12 mean_Var13 mean_Var14 _________ __________ ____________________ _________ _________ _________ _________ _________ _________ _________ __________ __________ __________ __________ __________ 1 1 10 07-Feb-0016 05:55:29 0.51752 0.70514 0.41375 0.5672 0.41946 0.45606 0.55374 0.59743 0.62398 0.46139 0.46649 0.50105 2 2 10 13-Feb-0015 18:09:59 0.63774 0.70607 0.3648 0.77554 0.34877 0.55199 0.49512 0.64596 0.47208 0.52363 0.54743 0.42753 3 3 10 11-May-0013 06:47:28 0.4609 0.52306 0.40594 0.5143 0.51193 0.36895 0.65375 0.51091 0.44099 0.43514 0.5454 0.41105 4 4 10 05-Oct-0016 04:06:59 0.50106 0.48854 0.38952 0.5806 0.55353 0.52183 0.46437 0.449 0.35994 0.48053 0.50099 0.56911 5 5 10 02-Jul-0014 12:40:20 0.42546 0.47703 0.60594 0.48473 0.49456 0.52169 0.45005 0.50737 0.59695 0.49354 0.53942 0.51315 6 6 10 24-Jun-0015 05:20:08 0.54365 0.42454 0.50526 0.3918 0.54657 0.34156 0.50776 0.64644 0.41142 0.52169 0.37793 0.54506 7 7 10 26-May-0018 12:10:02 0.223 0.62658 0.57745 0.43137 0.64486 0.59084 0.27219 0.47831 0.4504 0.52788 0.4444 0.61095 8 8 10 26-Sep-0015 06:05:38 0.36107 0.45844 0.43259 0.44789 0.53122 0.51577 0.43466 0.44741 0.30144 0.48877 0.44208 0.53721 9 9 10 20-Jul-0014 02:30:31 0.52693 0.52128 0.62287 0.56108 0.55619 0.40734 0.56754 0.45903 0.51274 0.53433 0.40313 0.53245 10 10 10 11-Apr-0013 12:28:32 0.55496 0.37775 0.49104 0.64185 0.583 0.50644 0.59117 0.57325 0.58854 0.48762 0.53708 0.50528 11 11 10 14-Dec-0015 20:58:18 0.47345 0.56358 0.50469 0.42712 0.58192 0.44643 0.51888 0.55781 0.35132 0.53893 0.49807 0.61024 12 12 10 30-Oct-0018 05:14:32 0.67559 0.48449 0.42596 0.46722 0.57329 0.58917 0.52009 0.4463 0.46092 0.36197 0.61369 0.59372 13 13 10 16-Oct-0018 01:22:52 0.3809 0.47299 0.43709 0.60162 0.40392 0.67956 0.4347 0.37734 0.33699 0.4328 0.42047 0.29406 14 14 10 12-Apr-0019 05:11:28 0.44723 0.49497 0.48745 0.65644 0.5099 0.52495 0.58134 0.47569 0.41992 0.39552 0.56101 0.49381 15 15 10 18-Mar-0013 10:42:37 0.45406 0.57539 0.54544 0.45573 0.6396 0.59733 0.57942 0.46537 0.51966 0.34679 0.54309 0.41337 16 16 10 24-Mar-0022 02:02:55 0.48386 0.29842 0.38997 0.69434 0.35066 0.49995 0.55025 0.47631 0.65821 0.46389 0.43511 0.71093
  4 Comments
Camilo Cárdenas
Camilo Cárdenas on 18 Jul 2024
Moved: Voss on 18 Jul 2024
Again: Thank you very much!
Here with I close the question!
Regards, Camilo
Camilo Cárdenas
Camilo Cárdenas on 14 Sep 2024
hello I would like to add an additional comment to my original question.
I have found that with the “movmean” function (I am using Matlab version 2021b) it does in one line and much faster exactly what I want. Maybe it will be of help to anyone who goes through this question.

Sign in to comment.

More Answers (1)

Camilo Cárdenas
Camilo Cárdenas on 19 Jul 2024
Hi Voss,
I realized that there is now an error if the variable “Time” is not excluded.
Additionally, taking into account that the variable “MeasID” groups the data by measurement, and if the amount of data of a measurement is not a multiple of the N value used to reduce the table (10 in this case), then data from two different measurements would be combined. That's why I share here the solution that I applied, sure there is a faster way, but now it is working.
%% Reduce el numero de filas agrupandolas en grupos de 10
disp('.... comenzando')
N = 10;
indices = repelem(1:ceil(height(tData)/N), N, 1);
indices = indices(1:height(tData));
tData.grp_idx = indices(:);
vars = tData.Properties.VariableNames;
vars(ismember(vars,{'MeasID', 'Time', 'grp_idx'})) = []; %segun el forum si se quita Time de aqui tambien se promediaria pero no funciona
tReduced = tData(1:0, :); %tabla vacia con mismas columnas de tData para almacenar los datos ya agrupados
IDs = unique(tData.MeasID);
for ii = 1 : length(IDs)
ID_value = char(IDs(ii));
tTemp = tData(strcmp(tData.MeasID, ID_value), :); %tabla del ii-esimo MeasID
for kk = tTemp.grp_idx(1) : tTemp.grp_idx(end) % kk representa el numero de grupo que se esta trabajando
tTempGrp = tTemp(tTemp.grp_idx == kk, :); % tabla temporal agrupada (de acuerdo del mismo indice grp_idx)
tTempRed = grpstats(tTempGrp,'grp_idx','mean','DataVars',vars); %reduccion de la tabla temporal agrupada (promedio del grupo)
tTempRed.Properties.RowNames = {};
tTempRed.mean_MeasID = tTempGrp.MeasID(1);
tTempRed.mean_Time = mean(tTempGrp.Time);
newColumnNames = replace(tTempRed.Properties.VariableNames, 'mean_', '');
tTempRed.Properties.VariableNames = newColumnNames;
tTempRed = removevars(tTempRed, 'GroupCount');
tTempRed = movevars(tTempRed, {'MeasID', 'Time'}, 'Before', 1);
tReduced = vertcat(tReduced, tTempRed);
end
end
disp('.... ###########')
%%
Saludos!

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!