Matlab Save file and data corruption
19 views (last 30 days)
Show older comments
Hello all,
I am having a data corruption issue in MATLAB. The issue is as follows:
I am running a function from the command line. The function takes a path to a directory containing a number of binary data files. The function loops over each of the data files and 'maps' the binary files (it's a datagram binary file so I am mapping packet locations for quicker reading later). The map is a structure array f_map. The map variable is saved to a .mat file. In addition to the map file, an index file is created, which keeps track of the mapping success map_index. The code is shown below.
I keep having to try and add spagetti code checks to not have the function fail or corrupt the data files. Originally I didn't save the index, but when saving f_map failed, I would have to try and re-create the map_index so I switched to saving the index each time. Then I ran into save corrupting map_index . So I used a matfile object which is better suited for this and when that ran into corruption issues I decided to ask for help.
Additional information:
Dataset and map file directories are network drives. I interface to them using a mapped drive path as that was suggested by our IT back in the data.
Files are ~2.15GB each.
Map files are saved as .mat v7.3
My machine has 32GB of ram, a 1TB disk and a 1Gbps ethernet connection.
I am doing this work remotely.
The program, if it didnt crash, would likely still take two full days to map.
Anyone have any ideas or improvements to mitigate this situation?
function map_index = map_all_SCORE_datafiles(data_dir_path,map_dir_path)
% Function generates a packet map of each data file in data_dir_path. Maps
% for each file are saved in the map_dir_path as .mat files. An index of
% map files stores the map file information and mapping success.
%
% USAGE:
% map_all_SCORE_datafiles(data_dir_path,map_dir_path);
%
% map_index = map_all_SCORE_datafiles(data_dir_path,map_dir_path);
%
%
% INPUTS;
% data_dir_path - string containing the directory path to
% the binary datafiles
%
% map_dir_path - string containing the directory path to
% save the index files.
%
%
% OUTPUTS:
% map_index - a structure array containing information
% on the success of the function
% execution
%
%
%% Parameters
fname_map_index = 'index_of_map_files';
save_ext = '.mat';
%% Check for directory existance
% Check for data directory
if ~exist(data_dir_path,'dir')
error('data directory not found')
end
% Check for map directory
if ~exist(map_dir_path,'dir')
error('map file directory not found');
end
%% Get the list of all files in the data
d_dir = dir(fullfile(data_dir_path,'**\**'));
d_dir = d_dir(~[d_dir.isdir]);
num_files = numel(d_dir);
% sort directory by time (using the seconds in the file name)
temp = cellfun(@(x) x(6:end),{d_dir(:).name}','UniformOutput',false);
temp = cellfun(@str2num,temp);
[~,inx] = sort(temp);
d_dir = d_dir(inx);
%% Preallocatate the mapfile_index structure array
%check for an index file
file_index = fullfile(map_dir_path,[fname_map_index,save_ext]);
if exist(file_index,'file')
mp_io_inx = matfile(file_index,'Writable',true);
[~,sz_index] = size(mp_io_inx.map_index);
if sz_index ~= num_files
% this index is not the same size as the dataset, idk why it would
% be that way but it means that we need to recreate the index file
delete(mp_io_inx); clear mp_io_inx; % clear should work fine but sometimes the matfile object keeps control of the object....
mk_index = true;
else
mk_index = false;
end
else
% index not found, create it
mk_index = true;
end
if mk_index == true
map_index.raw_file = [];
map_index.raw_folder = [];
map_index.mapped = [];
map_index.file_size = [];
map_index.packet_count = [];
map_index.start_time = [];
map_index.end_time = [];
map_index.map_folder = [];
map_index(num_files).map_file_name = [];
save(file_index,'map_index');
mp_io_inx = matfile(file_index,'Writable',true);
end
%% Get list of existing map files in map directory
mp_dir = dir(map_dir_path);
mp_dir = mp_dir(~[mp_dir.isdir]);
mp_dir_fname = {mp_dir(:).name};
%% Loop over each file in the data directory and map if necessary.
fprintf(1,'Mapping dataset at : %s \n\n',data_dir_path)
t_func_start = tic;
t_iter_av = 0;
for ii = 1:num_files
t_iter_start = tic;
%% update command window
fprintf(1,'working on file #%d of %d \n',ii,num_files);
%% manage file names
raw_file_name = fullfile(d_dir(ii).folder, d_dir(ii).name);
map_file_name = ['map_',d_dir(ii).name,save_ext];
map_file_path = fullfile(map_dir_path,map_file_name);
fprintf(1,'full file name: %s \n',raw_file_name);
%% Check if already mapped
[map_check] = check_map_file(mp_dir_fname,map_file_path,mp_io_inx.map_index(1,ii));
if map_check == 0
% Map not present, or mapfile data does not match map index data
%% map data file
[f_map, map_stats,map_success] = map_SCORE_datafile(raw_file_name,1);
%% update stats
t_map_index.raw_file = d_dir(ii).name;
t_map_index.raw_folder = d_dir(ii).folder;
if map_success
t_map_index.mapped = 1;
t_map_index.file_size = map_stats.file_size;
t_map_index.packet_count = map_stats.packet_count;
t_map_index.start_time = map_stats.start_time;
t_map_index.end_time = map_stats.end_time;
t_map_index.map_folder = map_dir_path;
t_map_index.map_file_name = map_file_name;
fprintf(1,'\tfile size : %0.2fGB \n',map_stats.file_size/1e9);
fprintf(1,'\tpacket count: %d \n',map_stats.packet_count);
else
t_map_index.mapped = 0;
fprintf(1,'\tmapping operation failed!\n');
end
%% save data map - The try exception block is becuase of the save() failing to close the file, corrupting it.
if map_success
fprintf(1,'\tsaving map file : %s \n',map_file_path);
try
save(map_file_path,'f_map','map_stats');
catch
% it failed to save. try again
fprintf(1,'\t cannot save map file: trying again \n');
pause(1)
try
f_string = ['delete ',map_file_path];
eval(f_string);
save(map_file_path,'f_map','map_stats');
catch
% it failed a second time, skip it.
fprintf(1,'\t saving failed for second time. skip file \n');
f_string = ['delete ',map_file_path];
eval(f_string);
t_map_index.mapped = 0;
t_map_index.map_folder = [];
t_map_index.map_file_name = [];
end
end
end
%% Update index info
mp_io_inx.map_index(1,ii) = t_map_index; % ERRORS:Error closing file ...\data_file_maps\index_of_map_files.mat. The file may be corrupt.
elseif map_check == 2
% Update the map index instead of mapping file. This only happens because the index gets corrupted from abov^^^^
load(map_file_path,'map_stats');
temp_index = mp_io_inx.map_index(1,ii);
temp_index.raw_file = d_dir(ii).name;
temp_index.raw_folder = d_dir(ii).folder;
temp_index.mapped = 1;
temp_index.file_size = map_stats.file_size;
temp_index.packet_count = map_stats.packet_count;
temp_index.start_time = map_stats.start_time;
temp_index.end_time = map_stats.end_time;
temp_index.map_folder = map_dir_path;
temp_index.map_file_name = map_file_name;
mp_io_inx.map_index(1,ii) = temp_index; % ERRORS:Error closing file ...\data_file_maps\index_of_map_files.mat. The file may be corrupt.
fprintf(1,'\tfile size : %0.2fGB \n',map_stats.file_size/1e9);
fprintf(1,'\tpacket count: %d \n',map_stats.packet_count);
end
%% update window
t_iter_end = toc(t_iter_start); %seconds
t_iter_av = t_iter_av + (t_iter_end-t_iter_av)/ii; %seconds
t_iter_rem = (t_iter_av*(num_files-ii))/60; % minutes
fprintf(1,'mapping elapsed time : %0.2f \n',t_iter_end);
fprintf(1,'estimated time remaining: %0.2f min (%0.2f hr) \n',...
t_iter_rem, t_iter_rem/60);
fprintf(1,' \n\n');
end
t_func_end = toc(t_func_start); % seconds
fprintf(1,'process finished!!!');
fprintf(1,'elapsed time : %0.2f hr\n',t_func_end/3600);
map_index = mp_io_inx.map_index; % Pull the map index back in for user inspection at end of run
t_files_read = sum([map_index(:).mapped]);
fprintf(1,'total files mapped: %d of %d \n\n',t_files_read,num_files);
end
%% Helper function
function [map_check] = check_map_file(mp_dir_fname,map_file_path,map_index)
% Helper function. Checks if map_file already exists and if the data is
% up-to-date
[~,map_file_name,ext] = fileparts(map_file_path);
%% Check if we have mapped this file already
if any(strcmp(mp_dir_fname,[map_file_name,ext]))
% mapfile is present -> check data consistency
% Check mapfile can be opened and has the map stats.
warning('off','MATLAB:whos:UnableToRead'); % suppress the warning % this is because
mp_io = matfile(map_file_path);
try
map_stats = mp_io.map_stats;
catch
map_stats = [];
end
warning('on','MATLAB:whos:UnableToRead');
delete(mp_io); clear mp_io;
if isstruct(map_stats)
% mapfile can be opened -> check stats match
temp = struct2cell(map_index);
temp = temp([1,4:7]);
temp2 = struct2cell(map_stats);
if all(cellfun(@isempty,temp))
% This is an empty map entry, just fill the entry. I hate
% matlab at this point
fprintf(1,'\tMap check: file already mapped, map_index out-of-date -> update index, Continue\n');
map_check = 2;
elseif isequal(temp,temp2)
% map stats match the map index info -> return true
fprintf(1,'\tMap check: file already mapped, map_index is up-to-date -> Continue\n');
map_check = 1;
else
% map stats do not match the index info -> return false
fprintf(1,'\tMap check: map_index is out of date -> re-map raw file.\n');
map_check = 0;
end
else
% mapfile cannot be opened -> return false
fprintf(1,'\tMap check: cannot open map_file -> re-map raw file.\n');
map_check = 0;
end
else
% File is not present -> map
fprintf(1,'\tMap check: file not yet mapped.\n');
map_check = 0;
end
end
4 Comments
Answers (0)
See Also
Categories
Find more on Whos in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!