Count the Number of Times a Specific String Occurs in a given Column

Question

Midimistro on 23 Dec 2016

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/318028-count-the-number-of-times-a-specific-string-occurs-in-a-given-column

Commented: Midimistro on 5 Jan 2017

I have the following code that is meant to find the best time offset between TVATime and TVBTime. Each time stamp acts as part of a "block" that must overlap at some point with another "block" in the correct order of blocks. Basically, it acts as a comb in hair. For example, TVBTime creates two subvector: TVBTimeStart & TVBTimeEnd [not shown] to make up a block. Eac TVBTime block also has a set of string identifiers IDed as "IDA" or "IDB", where IDA greatly outnumbers IDB [where 1 (IDB) to every 330 (IDA)].

The current problem with the code is that tempVecD, the stuck in the middle case, has a unique condition where the IDA strings are all that is possible. The problem is that this unique condition is not possible due to the nature of the data, and thus is a false offset. Instead there must be the maximum number of "IDB" strings as possible, although there can be more IDA strings than IDB strings. Is there any better way to create this code, or is there a way to fix the existing code seen below, specifically the numel(toothType(:,toothColumn)=='Transmission')/CPsize==1) line?

TVAsize=numel(TVATime);
TVBsize=numel(TVBTime);
Offsets=UserDefinedStart:.001:UserDefinedEnd;
for b = Offsets
        tic;
        TestVectorAa = TVAStartTime+b;   %Start (typical value is TVATime+.003[user defined: missfixer/2])
        TestVectorAb = TVAFinishTime+b;  %Finish(typical value is TVATime-.003[user defined: missfixer/2])
        TestVectorAc = TVATime+b;        %Center
        for i = 1:TVBsize
            tmpVecA=logical.empty;       %Memory Management
            tmpVecB=logical.empty;
            tmpVecC=logical.empty;
            tmpVecD=logical.empty;
            tmpTypeVec=strings;
            tmpVecA=(TestVectorA >= TVBTimeStart(i)) & (TestVectorA <= TVBTimeEnd(i));%StartVector
            tmpVecB=(TestVectorB >= TVBTimeStart(i)) & (TestVectorB <= TVBTimeEnd(i));%FinishVector
            tmpVecC=(TestVectorC >= TVBTimeStart(i)) & (TestVectorC <= TVBTimeEnd(i));%CenterVector
            tmpVecD=(TestVectorA <= TVBTimeStart(i)) & (TestVectorB >= TVBTimeEnd(i)) & (round(abs(TestVectorA-TestVectorC),6)==round(missFixer/2,6));%Stuck-in-Middle Case 1
            tmpTypeVec=transmissionType(i); 
            tmpVec=(tmpVecA|tmpVecB|tmpVecC|tmpVecD);
            if (  any(find(tmpVec==1))  )
            %see if a value of TVA falls within TVB and its life (or visa versa)
            %if it does:
            %1) add a tick to be used as a percentage later
            %2) add the corresponding TVB identifier (either "IDA" or "IDB"              
                toothCount(tmpVec==1, toothColumn) = toothCount(tmpVec == 1, toothColumn) + 1;
                toothType(tmpVec==1,toothColumn)=toothType(tmpVec==1,toothColumn)+tmpTypeVec;
            end
        end
        %The following line is where the code fails
        %(always seems to go to this if, even if the column not 100% filled with "IDA"):
        if(numel(toothType(:,toothColumn)=='IDA')/TVAsize==1)
            %"IDA" count must be less than 100% in any given column.
            %If it is equal to 100%, do the following:
            disp(['Bad Match']);
            quality(toothColumn)='bad';
            %Reset toothCount for that column to 0 since it is providing an impossible match.
        else
            quality(toothColumn)='good';
        end;
        tick=(numel(find(toothCount(:,toothColumn)~=0)));
        disp(['ticks = ', num2str(tick)]);
        tickcount(end+1) = tick;
        %tickcount = [tickcount; tick];
        percentCalc = tick/CPsize*100.0;
        %calculate the precentage
        disp(['percent = ', num2str(percentCalc)]);
        offset(end+1)= b; %adds an additional element of the offset "a" to the growing vector of "offset" to be used for later comparison
        %offset = [offset; b] % Legacy version, column vector format
        percent(end+1)= percentCalc; %does same thing as previous line.
        %percent = [percent; percentCalc] %Legacy version, column vector format
        percentCalc = 0; %reset percentCalc
        disp(['percent reset = ', num2str(percentCalc)]);        
        toothColumn=toothColumn+1;
        toc;
    end
  %Find max Column with maximum "IDB" Count
  IDBPerCol=sum((toothType=='IDB'),1);
  maxIDBIndex=find(max(IDBPerCol));
  %show the value closest to true offset
  [bestPercent] = percent(maxIDBIndex);
  bestOffset = offset(maxRIDBIndex);
  bestTick = tickcount(maxIDBIndex);

Heres a small data sample set:

TVA=[1.002; 1.017; 32.006; 32.027; 33.100; 60.003; 60.028; 60.051]; %significantly different size than TVB
TVBStart=1:.0157:75;
TVBEnd=1.000256:.0157:75.000256; %Same size as TVBStart
TVBID=???; %Can be randomly generated; Must be where IDA is the primary, and IDB is sporadic; same size as TVBStart and TVBEnd;
missFixer=.006; %not included is the code that divides missFixer by 2 and uses it to create TVAStartTime and TVAEndTime
UserDefinedStart=-10; %or user defined value, works in seconds
UserDefinedEnd=10; %or user defined value, works in seconds

Let me know if you need any additional details or have questions. Without tmpVecA, tmpVecB, tmpVecD (which implement the "block" ability for TVATime) and without the string comparison implementation, the code runs fine and returns the expected offsets (let me know if you need the code for this). The problem is when I give TVATime a width like TVB, but I need this width for TVA for closer investigation of the data.

The output needs to be the best offset and best percent, where the best offset is located where there the "IDB" string count in a given column of toothType is at its highest and tickcount is at its highest (in otherwords, all elements in TVA have an equal in TVB (maximum tick count), and that this same column in toothType has the highest count of "IDB" strings possible.

I'll understand if this is extremely hard for anyone to grasp.

2 Comments
Show NoneHide None

John BG on 23 Dec 2016

and TVBTime or a sample not supplied because ..

Midimistro on 5 Jan 2017

Edited: Midimistro on 5 Jan 2017

because TVBTime is the struct that consists of TVBStart and TVBEnd. As mentioned before, TVATime and TVBTime are nothing more than "blocks" that consist of 2 arrays of the same size that contain a starting time and an end time for the respective test vector.

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg on 24 Dec 2016

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/318028-count-the-number-of-times-a-specific-string-occurs-in-a-given-column#answer_248358

Edited: Greg on 24 Dec 2016

Replace "numel" with "sum" (or "nnz" if you like...)

numel(toothType(:,toothColumn)=='IDA') --> sum(toothType(:,toothColumn)=='IDA')

Also, I recommend using strcmp instead of ==, but that's not part of the original question.

2 Comments
Show NoneHide None

Greg on 24 Dec 2016

I further recommend comparing the 2 sizes directly, rather than the dividend to 1. I.e., sum(...) == TVAsize

Midimistro on 5 Jan 2017

Open in MATLAB Online

Your answer is correct, however both of us missed the following additional correction:

Original:

maxIDBIndex=find(max(IDBPerCol));

Fix:

maxIDBIndex=(max(IDBPerCol)==IDBPerCol);

The original was finding the actual max value, not the index, which was what I needed. The fix finds the indexes (locations) where the max value exists. Now the code works flawlessly. Thank you! and I didn't even expect anyone to even solve/understand half of what I was trying to accomplish.... I give you credit for that :)

Sign in to comment.

Count the Number of Times a Specific String Occurs in a given Column

2 Comments
Show NoneHide None

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Count the Number of Times a Specific String Occurs in a given Column

2 Comments Show NoneHide None

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

2 Comments
Show NoneHide None

2 Comments
Show NoneHide None