Comparing a million data from csv files takes too much time

3 views (last 30 days)
Hello everyone,
I am quite new to this program and need some help regarding this problem. I want to compare 1 million number to make sure there are no same number meet each other (n-1 ~= n). I tried to program the code, and using tic toc to measure time, elapsed time recorded is 40944.541765 seconds. This amount of time just for one csv file. actually i do want to make the code run for every csv file in the folder, but it is quite complicated so i just tried to focus to make calculation to one csv file first. How could i optimize this piece of code and make the calculation more accurate ? Thank You
data = csvread('data.csv',9); % Read the csv
a = zeros(1,999999); % Initialize a variable
for i=1:999998
t = data(i) ~= data(i+1); % make sure that n != n+1
a(i) = t; % Saving t value to a array
v=sum(a(:)==0); % Counting boolean 0 in a array
end
csvwrite('count.csv',v); % Writing the number to new csv file

Accepted Answer

Bhaskar R
Bhaskar R on 17 Sep 2022
I assume, you want to calculate the number of nonzero difference data from one value to next to that value
We can do without loops, this may help you
tic
data = randi(100, [1, 999999]); % taken a randon data of your data length
v = sum(diff(data) ~= 0);
toc
Elapsed time is 0.022935 seconds.
  1 Comment
Rizky Alfi
Rizky Alfi on 17 Sep 2022
Thank you sir. Actually I've tried to calculate it in microsoft excel first to make sure the matlab output is correct using =a2<>a1 in column B and =COUNTIF(B1:B1000000;"false"). Your answer is insightful. I've tried your answer but the adjustment i need to do is change the
v = sum(diff(data) ~= 0);
to
v = sum(diff(data8a) == 0);
to output the same output as microsoft excel. I will accept your answer. Thank you

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!