READ TABLE WITH DIFFERENT SETTING INSIDE

1 view (last 30 days)
hello everyone, i have a problem with a dataset, in the same dataset, the data in the rows are writen in different way, in particular, have different space in the table (first and second screen) and matlab, read only the data written in the first way (with a lot of space from tcoluomns), someone knows the way to read both ?
with small data i'm able to load the file in excell and save, but this is bigger for excell

Accepted Answer

dpb
dpb on 21 Feb 2021
Edited: dpb on 21 Feb 2021
It's a fignewton of looking at the text in the editor or on the screen--some was tab-delimited, some space but there's only one delimiter between columns.
I shortened the first (LH) attachment to only a few lines that contained samples of both...then used detectImportOptions with some guidance and seems to work just fine--
>> opt=detectImportOptions('dataprova.txt',"ExpectedNumVariables",5,'ReadVariableNames',0,"NumHeaderLines",0);
>> prova=readtable('dataprova.txt',opt);
>> prova=table2timetable(prova(:,3:end),'RowTimes',prova.Var1+prova.Var2)
prova =
23×3 timetable
Time Var3 Var4 Var5
____________________ ________ ________ ____________________________
06-Sep-2009 04:19:31 {'M049'} {'ON' } {'R1_Bed_Toilet_Transition'}
06-Sep-2009 04:19:33 {'M049'} {'OFF' } {0×0 char }
08-Sep-2009 08:12:56 {'M025'} {'ON' } {0×0 char }
08-Sep-2009 08:12:58 {'M024'} {'ON' } {0×0 char }
08-Sep-2009 08:12:58 {'M026'} {'OFF' } {0×0 char }
08-Sep-2009 08:12:59 {'M021'} {'ON' } {0×0 char }
08-Sep-2009 08:13:00 {'M019'} {'ON' } {'R2_Meal_Preparation' }
08-Sep-2009 08:13:01 {'M021'} {'OFF' } {0×0 char }
08-Sep-2009 08:13:01 {'M025'} {'OFF' } {0×0 char }
08-Sep-2009 08:13:02 {'M018'} {'ON' } {0×0 char }
15-Sep-2009 13:57:58 {'P001'} {'1619'} {0×0 char }
15-Sep-2009 13:57:58 {'M017'} {'OFF' } {0×0 char }
15-Sep-2009 13:58:00 {'M018'} {'OFF' } {0×0 char }
15-Sep-2009 13:58:01 {'P001'} {'1633'} {0×0 char }
15-Sep-2009 13:58:11 {'M018'} {'ON' } {0×0 char }
15-Sep-2009 13:58:15 {'P001'} {'1623'} {0×0 char }
15-Sep-2009 13:58:16 {'M019'} {'OFF' } {0×0 char }
16-Sep-2009 07:10:18 {'M047'} {'OFF' } {0×0 char }
16-Sep-2009 07:10:20 {'M048'} {'ON' } {'R1_Work' }
16-Sep-2009 07:10:20 {'M046'} {'OFF' } {0×0 char }
04-Oct-2009 01:17:02 {'M029'} {'ON' } {0×0 char }
04-Oct-2009 01:17:04 {'M037'} {'ON' } {'R2_Personal_Hygiene' }
04-Oct-2009 01:17:04 {'M036'} {'OFF' } {0×0 char }
>>
I'd carry on from here by turning the Var3 (and possibly Var4 altho the mix of data there is confusing as to what will want as far as interpreting it) into categorical variable(s). Var5 might be more suited as a string than cellstr, your call. You need to provide meaningful variable names for them, of course.
You could try defining the import data format strings for the columns to read the date/time as a single datetime instead of as datetime/duration into two variables as above, an "exercise left for Student".
The file as it appeared when shortened is:
2009-09-06 04:19:31.032325 M049 ON R1_Bed_Toilet_Transition begin
2009-09-06 04:19:33.050316 M049 OFF
2009-09-08 08:12:56.088385 M025 ON
2009-09-08 08:12:58.088943 M024 ON
2009-09-08 08:12:58.091278 M026 OFF
2009-09-08 08:12:59.098453 M021 ON
2009-09-08 08:13:00.082885 M019 ON R2_Meal_Preparation begin
2009-09-08 08:13:01.091802 M021 OFF
2009-09-08 08:13:01.094138 M025 OFF
2009-09-08 08:13:02.035362 M018 ON
2009-09-15 13:57:58.081746 P001 1619
2009-09-15 13:57:58.090266 M017 OFF
2009-09-15 13:58:00.014018 M018 OFF
2009-09-15 13:58:01.082517 P001 1633
2009-09-15 13:58:11.050943 M018 ON
2009-09-15 13:58:15.082673 P001 1623
2009-09-15 13:58:16.066399 M019 OFF
2009-09-16 07:10:18.092304 M047 OFF
2009-09-16 07:10:20.048173 M048 ON R1_Work begin
2009-09-16 07:10:20.092411 M046 OFF
2009-10-04 01:17:02.070578 M029 ON
2009-10-04 01:17:04.040807 M037 ON R2_Personal_Hygiene begin
2009-10-04 01:17:04.043248 M036 OFF
>>

More Answers (2)

Emanuele De Astis
Emanuele De Astis on 21 Feb 2021
opt=detectImportOptions('Twor.txt',"ExpectedNumVariables",5,'ReadVariableNames',0,"NumHeaderLines",0)
opt.VariableNames={'Date','Sensor','Value','Activity'};
MTA=readtable('Twor.txt',opt);
MTA=MTA(contains(MTA.Activity,({'Sleep','Leave_Home'})),:); % save only the ''Sleep','Leave_Home' ' Activity for now
MTA.Sensor=categorical(MTA.Sensor); % turn sensor into categorical variable
MTA.Activity=categorical(MTA.Activity);
MTA.Value=str2double(MTA.Value); % and data into numeric
meanBySensor=rowfun(@mean,MTA,'InputVariables',{'Value'},'GroupingVariables',{'Sensor'},'OutputVariableNames','SensorMeans');
i try with your opt but don't change nothing
  12 Comments
dpb
dpb on 21 Feb 2021
I showed that
>> opt=detectImportOptions('dataprova.txt',"ExpectedNumVariables",5,'ReadVariableNames',0,"NumHeaderLines",0);
>> prova=readtable('dataprova.txt',opt);
>> prova=table2timetable(prova(:,3:end),'RowTimes',prova.Var1+prova.Var2)
prova =
23×3 timetable
Time Var3 Var4 Var5
____________________ ________ ________ ____________________________
06-Sep-2009 04:19:31 {'M049'} {'ON' } {'R1_Bed_Toilet_Transition'}
06-Sep-2009 04:19:33 {'M049'} {'OFF' } {0×0 char }
08-Sep-2009 08:12:56 {'M025'} {'ON' } {0×0 char }
08-Sep-2009 08:12:58 {'M024'} {'ON' } {0×0 char }
08-Sep-2009 08:12:58 {'M026'} {'OFF' } {0×0 char }
08-Sep-2009 08:12:59 {'M021'} {'ON' } {0×0 char }
08-Sep-2009 08:13:00 {'M019'} {'ON' } {'R2_Meal_Preparation' }
08-Sep-2009 08:13:01 {'M021'} {'OFF' } {0×0 char }
08-Sep-2009 08:13:01 {'M025'} {'OFF' } {0×0 char }
08-Sep-2009 08:13:02 {'M018'} {'ON' } {0×0 char }
15-Sep-2009 13:57:58 {'P001'} {'1619'} {0×0 char }
15-Sep-2009 13:57:58 {'M017'} {'OFF' } {0×0 char }
15-Sep-2009 13:58:00 {'M018'} {'OFF' } {0×0 char }
15-Sep-2009 13:58:01 {'P001'} {'1633'} {0×0 char }
15-Sep-2009 13:58:11 {'M018'} {'ON' } {0×0 char }
15-Sep-2009 13:58:15 {'P001'} {'1623'} {0×0 char }
15-Sep-2009 13:58:16 {'M019'} {'OFF' } {0×0 char }
16-Sep-2009 07:10:18 {'M047'} {'OFF' } {0×0 char }
16-Sep-2009 07:10:20 {'M048'} {'ON' } {'R1_Work' }
16-Sep-2009 07:10:20 {'M046'} {'OFF' } {0×0 char }
04-Oct-2009 01:17:02 {'M029'} {'ON' } {0×0 char }
04-Oct-2009 01:17:04 {'M037'} {'ON' } {'R2_Personal_Hygiene' }
04-Oct-2009 01:17:04 {'M036'} {'OFF' } {0×0 char }
>>
Continuing on from there is straightforward-enough...
prova.Properties.VariableNames={'Sensor','Value','Activity'}; % set variable names
prova.State=prova.Value; % create new "State" column
prova.Value=str2double(prova.Value); % convert Value to numeric
prova.State(isfinite(prova.Value))={''}; % set State numeric-->missing
prova.State=categorical(prova.State); % convert to categorical
prova.Sensor=categorical(prova.Sensor);
prova.Activity=categorical(prova.Activity);
prova=prova(:,[1 2 4 3]); % rearrange order for convenience
Above results in:
>> prova
prova =
23×4 timetable
Time Sensor Value State Activity
____________________ ______ _____ ___________ ________________________
06-Sep-2009 04:19:31 M049 NaN ON R1_Bed_Toilet_Transition
06-Sep-2009 04:19:33 M049 NaN OFF <undefined>
08-Sep-2009 08:12:56 M025 NaN ON <undefined>
08-Sep-2009 08:12:58 M024 NaN ON <undefined>
08-Sep-2009 08:12:58 M026 NaN OFF <undefined>
08-Sep-2009 08:12:59 M021 NaN ON <undefined>
08-Sep-2009 08:13:00 M019 NaN ON R2_Meal_Preparation
08-Sep-2009 08:13:01 M021 NaN OFF <undefined>
08-Sep-2009 08:13:01 M025 NaN OFF <undefined>
08-Sep-2009 08:13:02 M018 NaN ON <undefined>
15-Sep-2009 13:57:58 P001 1619 <undefined> <undefined>
15-Sep-2009 13:57:58 M017 NaN OFF <undefined>
15-Sep-2009 13:58:00 M018 NaN OFF <undefined>
15-Sep-2009 13:58:01 P001 1633 <undefined> <undefined>
15-Sep-2009 13:58:11 M018 NaN ON <undefined>
15-Sep-2009 13:58:15 P001 1623 <undefined> <undefined>
15-Sep-2009 13:58:16 M019 NaN OFF <undefined>
16-Sep-2009 07:10:18 M047 NaN OFF <undefined>
16-Sep-2009 07:10:20 M048 NaN ON R1_Work
16-Sep-2009 07:10:20 M046 NaN OFF <undefined>
04-Oct-2009 01:17:02 M029 NaN ON <undefined>
04-Oct-2009 01:17:04 M037 NaN ON R2_Personal_Hygiene
04-Oct-2009 01:17:04 M036 NaN OFF <undefined>
>>
As noted, you'll want to find the 'State' records that also contain the Activity comment and split those off as well; it's just more of similar kind of edit operations to find "who's who in the zoo!" of containing something other than what is supposed to be in the field.
It still would be far better to get the data written correctly to begin with; but if it can't be regenerated from some other format in which it has been save, then something like the above is the option available to you.

Sign in to comment.


Emanuele De Astis
Emanuele De Astis on 21 Feb 2021

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!