How to perform row operations on a large MATRIX or CELL efficiently?

Question

Ashish Keshkamat on 6 Oct 2017

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/360035-how-to-perform-row-operations-on-a-large-matrix-or-cell-efficiently

Answered: Ashish Keshkamat on 14 Nov 2017

I have a large CELL of the order 18000x4 (Or a MATRIX of the similar order) Each CELL element is a 'name' and therefore a string (In case of Matrix it is a 'number') The operation I need to do is explained in the following example (Please find relevant files attached). Example:

CELL INPUT
'1' '3' '4' []
'1' '3' '5' []
'1' '3' '6' []
'1' '4' '4' []
'1' '4' '5' []
'1' '4' '6' []
'2' '3' '4' []
'2' '3' '5' []
'2' '3' '6' []
'2' '4' '4' []
'2' '4' '5' []
'2' '4' '6' []
'5' '4' [] []
'5' '7' [] []
'6' '8' '4' []
'6' '8' '7' []
'9' '10' [] []
CELL OUTPUT
'1' '4' [] []
'1' '3' '5' []
'1' '3' '6' []
'2' '4' [] []
'2' '3' '5' []
'2' '3' '6' []
'4' '5' [] []
'5' '7' [] []
'4' '6' '8' []
'6' '7' '8' []
'10' '9' [] []

There are 2 main operations done to get the output:

1) Delete the duplicates in the rows. (1 4 4) ---> (1 4) and (2 4 4) ---> (2 4) in the above example.

2) After step (1), compare every row and delete the supersets of the row if any. (1 3 4) (1 4 5) (1 4 6) deleted because they are supersets of (1 4). Similarly, (2 3 4) (2 4 5) (2 4 6) are deleted because they are supersets of (2 4)

I have written a MATLAB function called 'minimize' which does this operation. But, when I need to do this for a large CELL of order 8656x4, MATLAB keeps running and there is no solution after 48hours. I need to get this done for even larger orders of the CELL and within a time frame of 1-2 hours.

Could you please provide any method/suggestion to efficiently solve this?

2 Comments
Show NoneHide None

Carl on 10 Oct 2017

Edited: Carl on 10 Oct 2017

Open in MATLAB Online

Hi Ashish. I see from your .mat files that you have analogous matrices of type 'cell' and 'double'. Just to clarify, is it only slow when performing those operations on the 'cell' data? Or do you see the same performance when operating on the 'double' type matrices as well?

Initial suggestions for improving performance would be to take a vectorized, rather than loop-based, approach to copying data. For example, change:

for k = 1:size(temp_name,2)
    minimal_row(i,k) = temp_name(k);
end

to

minimal_row(i,:) = temp_name;

Or,

for j = 1:size(number_matrix,2)
    if number_matrix(i,j) == 0
        break;
    else
        temp_name(j) = number_matrix(i,j) ;
    end
end

to

temp_name = number_matrix(i,:);
temp_name = temp_name(temp_name ~= 0)

This type of syntax can be optimized more efficiently.

Ashish Keshkamat on 11 Oct 2017

Hello Carl, It is relatively better when I solve using 'double' type matrices as compared to cells. But,the difference is not appreciable.

Thank you for your feedback. Let me try the vectorised approach.

Sign in to comment.

Sign in to answer this question.

Answer 1

Ashish Keshkamat on 14 Nov 2017

1
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/360035-how-to-perform-row-operations-on-a-large-matrix-or-cell-efficiently#answer_291057

Hello All,

Thank you for your time and effort to help me solve this problem. I have figured it out to solve the 'minimize' more efficiently. This function is part of a bigger program I am working on, and was the bottle-neck. There is a pattern in which the input to 'minimize' is generated. I track the duplicates in while generating the input and only compare those rows and delete the super-sets among them. This reduces the computations from thousands to hundreds and sometimes even to tens.

Carl, Thanks for the suggestion of vectorization. I used it in many instances.

Sharath, Thank you for pointing out the flaws in my Algorithm. It helped me to think in different ways to reduce the number of computations.

With all the optimization done, I am able to get my output using input of the order 8656x4 in 20 seconds.

Best Regards, Ashish

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Sharath Chandran on 11 Oct 2017

2
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/360035-how-to-perform-row-operations-on-a-large-matrix-or-cell-efficiently#answer_285208

Edited: Sharath Chandran on 3 Nov 2017

Hi Ashish,

I reckon, this has more to do with your Algorithm.

Let us consider a matrix of dimension 'm x n' ('m' rows & 'n' columns). The time complexity of the first operation that you perform is O(m * n). This roughly equals to '34624' number of computations which is reasonable number.

On the other hand the time complexity of the second operation that you are performing is O(m^2 * n), which roughly equals to at least '299705344' number of computations, after plugging in the value 'm=8656' and 'n=4'. This is clearly the bottleneck of your current algorithm and hence it needs to be optimized.

Alternate approach:

1. After performing step 1, store each row in a Hash-Map - O(m) (let us ignore the time-complexity of storing data in Hash-Map for the time being).

2. For every row, generate subsets and check if any subset is present in the Hash-Map. This should not be an overhead as max. no. of subsets for a 4-element row is 2^4-2 = 14. Here we are subtracting '2' as we are not considering the empty set and the set itself - O(2^n).

3. If any subset is already present in the Hash-Map, reject this set (row) since one of its subset already exists.

4. Total time complexity - O(m * 2^n). This roughly equals to at least '138496' number of computations which is a drastic improvement.

I hope this was helpful.

Please comment below in case you have any queries.

_

Regards,

MathWorks Technical Support Team

Note: This is not the exact time complexity analysis as we have not taken into account the complexity of storing data in the Hash-map.

1 Comment
Show -1 older commentsHide -1 older comments

Ashish Keshkamat on 7 Nov 2017

Hello Sharath,

Thank you for your feedback. I understand the proposed algorithm reduces the number of computations drastically. But, I'm not aware of the concepts of using Hash-map (Hash-table).

Let me see what can be done.

Sign in to comment.

How to perform row operations on a large MATRIX or CELL efficiently?

2 Comments
Show NoneHide None

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to perform row operations on a large MATRIX or CELL efficiently?

2 Comments Show NoneHide None

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments