Comparing the same items in two massive files
4 views (last 30 days)
Show older comments
I need to read two excel files (first column only), skip a few headers and then select whats common to both.
There are 848,000 lines so before I embark on this, I was wondering if someone could give me some pointers on how to do the compare as the files are so large. Thanks jason
2 Comments
Adam
on 9 Apr 2015
If you can't fit ~1.7 million cell results in memory at one time or it is too unwieldy for working with in Matlab just read the column in in chunks of e.g. 100,000 lines at a time and do your comparison on those.
Accepted Answer
Adam
on 9 Apr 2015
Edited: Adam
on 9 Apr 2015
I originally assumed you meant "common to both" on the same line in each file. Comparing every line with every other is a little more complex then!
Are we talking about data here or just random strings and other stuff?
If it is data you could just read the two columns of data and do an
intersect
on the results. Again, if the data is too large to load in all at once then you can load in the data in blocks again, taking all blocks of the 2nd file for each block of the first, do the same process and consolidate all the results at the end.
2 Comments
Adam
on 10 Apr 2015
Yeah, intersect won't work on cell arrays.
ismember( col1, col2 );
should give you the indices of elements in col1 that are also in col2.
As for reading from excel I'm afraid it isn't something I have ever done so I don't know any tips or tricks for that.
You should be able to run a few test programs reading in different amounts and timing how long it takes though and using any arguments to the xlsread function that might help (e.g. I don't know what the 'basic' flag does or if it speeds it up).
When reading most files indexing in certain ways is more efficient than in other ways even if you read in far more data than you want and then chuck away what you don't want after reading it in. I don't know for Excel data though.
More Answers (0)
See Also
Categories
Find more on Data Import from MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!