Count the Number of Times a Specific String Occurs in a given Column

8 views (last 30 days)
I have the following code that is meant to find the best time offset between TVATime and TVBTime. Each time stamp acts as part of a "block" that must overlap at some point with another "block" in the correct order of blocks. Basically, it acts as a comb in hair. For example, TVBTime creates two subvector: TVBTimeStart & TVBTimeEnd [not shown] to make up a block. Eac TVBTime block also has a set of string identifiers IDed as "IDA" or "IDB", where IDA greatly outnumbers IDB [where 1 (IDB) to every 330 (IDA)].
The current problem with the code is that tempVecD, the stuck in the middle case, has a unique condition where the IDA strings are all that is possible. The problem is that this unique condition is not possible due to the nature of the data, and thus is a false offset. Instead there must be the maximum number of "IDB" strings as possible, although there can be more IDA strings than IDB strings. Is there any better way to create this code, or is there a way to fix the existing code seen below, specifically the numel(toothType(:,toothColumn)=='Transmission')/CPsize==1) line?
TVAsize=numel(TVATime);
TVBsize=numel(TVBTime);
Offsets=UserDefinedStart:.001:UserDefinedEnd;
for b = Offsets
tic;
TestVectorAa = TVAStartTime+b; %Start (typical value is TVATime+.003[user defined: missfixer/2])
TestVectorAb = TVAFinishTime+b; %Finish(typical value is TVATime-.003[user defined: missfixer/2])
TestVectorAc = TVATime+b; %Center
for i = 1:TVBsize
tmpVecA=logical.empty; %Memory Management
tmpVecB=logical.empty;
tmpVecC=logical.empty;
tmpVecD=logical.empty;
tmpTypeVec=strings;
tmpVecA=(TestVectorA >= TVBTimeStart(i)) & (TestVectorA <= TVBTimeEnd(i));%StartVector
tmpVecB=(TestVectorB >= TVBTimeStart(i)) & (TestVectorB <= TVBTimeEnd(i));%FinishVector
tmpVecC=(TestVectorC >= TVBTimeStart(i)) & (TestVectorC <= TVBTimeEnd(i));%CenterVector
tmpVecD=(TestVectorA <= TVBTimeStart(i)) & (TestVectorB >= TVBTimeEnd(i)) & (round(abs(TestVectorA-TestVectorC),6)==round(missFixer/2,6));%Stuck-in-Middle Case 1
tmpTypeVec=transmissionType(i);
tmpVec=(tmpVecA|tmpVecB|tmpVecC|tmpVecD);
if ( any(find(tmpVec==1)) )
%see if a value of TVA falls within TVB and its life (or visa versa)
%if it does:
%1) add a tick to be used as a percentage later
%2) add the corresponding TVB identifier (either "IDA" or "IDB"
toothCount(tmpVec==1, toothColumn) = toothCount(tmpVec == 1, toothColumn) + 1;
toothType(tmpVec==1,toothColumn)=toothType(tmpVec==1,toothColumn)+tmpTypeVec;
end
end
%The following line is where the code fails
%(always seems to go to this if, even if the column not 100% filled with "IDA"):
if(numel(toothType(:,toothColumn)=='IDA')/TVAsize==1)
%"IDA" count must be less than 100% in any given column.
%If it is equal to 100%, do the following:
disp(['Bad Match']);
quality(toothColumn)='bad';
%Reset toothCount for that column to 0 since it is providing an impossible match.
else
quality(toothColumn)='good';
end;
tick=(numel(find(toothCount(:,toothColumn)~=0)));
disp(['ticks = ', num2str(tick)]);
tickcount(end+1) = tick;
%tickcount = [tickcount; tick];
percentCalc = tick/CPsize*100.0;
%calculate the precentage
disp(['percent = ', num2str(percentCalc)]);
offset(end+1)= b; %adds an additional element of the offset "a" to the growing vector of "offset" to be used for later comparison
%offset = [offset; b] % Legacy version, column vector format
percent(end+1)= percentCalc; %does same thing as previous line.
%percent = [percent; percentCalc] %Legacy version, column vector format
percentCalc = 0; %reset percentCalc
disp(['percent reset = ', num2str(percentCalc)]);
toothColumn=toothColumn+1;
toc;
end
%Find max Column with maximum "IDB" Count
IDBPerCol=sum((toothType=='IDB'),1);
maxIDBIndex=find(max(IDBPerCol));
%show the value closest to true offset
[bestPercent] = percent(maxIDBIndex);
bestOffset = offset(maxRIDBIndex);
bestTick = tickcount(maxIDBIndex);
Heres a small data sample set:
  • TVA=[1.002; 1.017; 32.006; 32.027; 33.100; 60.003; 60.028; 60.051]; %significantly different size than TVB
  • TVBStart=1:.0157:75;
  • TVBEnd=1.000256:.0157:75.000256; %Same size as TVBStart
  • TVBID=???; %Can be randomly generated; Must be where IDA is the primary, and IDB is sporadic; same size as TVBStart and TVBEnd;
  • missFixer=.006; %not included is the code that divides missFixer by 2 and uses it to create TVAStartTime and TVAEndTime
  • UserDefinedStart=-10; %or user defined value, works in seconds
  • UserDefinedEnd=10; %or user defined value, works in seconds
Let me know if you need any additional details or have questions. Without tmpVecA, tmpVecB, tmpVecD (which implement the "block" ability for TVATime) and without the string comparison implementation, the code runs fine and returns the expected offsets (let me know if you need the code for this). The problem is when I give TVATime a width like TVB, but I need this width for TVA for closer investigation of the data.
The output needs to be the best offset and best percent, where the best offset is located where there the "IDB" string count in a given column of toothType is at its highest and tickcount is at its highest (in otherwords, all elements in TVA have an equal in TVB (maximum tick count), and that this same column in toothType has the highest count of "IDB" strings possible.
I'll understand if this is extremely hard for anyone to grasp.
  2 Comments
Midimistro
Midimistro on 5 Jan 2017
Edited: Midimistro on 5 Jan 2017
because TVBTime is the struct that consists of TVBStart and TVBEnd. As mentioned before, TVATime and TVBTime are nothing more than "blocks" that consist of 2 arrays of the same size that contain a starting time and an end time for the respective test vector.

Sign in to comment.

Accepted Answer

Greg
Greg on 24 Dec 2016
Edited: Greg on 24 Dec 2016
Replace "numel" with "sum" (or "nnz" if you like...)
numel(toothType(:,toothColumn)=='IDA') --> sum(toothType(:,toothColumn)=='IDA')
Also, I recommend using strcmp instead of ==, but that's not part of the original question.
  2 Comments
Greg
Greg on 24 Dec 2016
I further recommend comparing the 2 sizes directly, rather than the dividend to 1. I.e., sum(...) == TVAsize
Midimistro
Midimistro on 5 Jan 2017
Your answer is correct, however both of us missed the following additional correction:
Original:
maxIDBIndex=find(max(IDBPerCol));
Fix:
maxIDBIndex=(max(IDBPerCol)==IDBPerCol);
The original was finding the actual max value, not the index, which was what I needed. The fix finds the indexes (locations) where the max value exists. Now the code works flawlessly. Thank you! and I didn't even expect anyone to even solve/understand half of what I was trying to accomplish.... I give you credit for that :)

Sign in to comment.

More Answers (0)

Categories

Find more on Simulink Functions in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!