Outerjoin: resulting table has repeated rows and has erased some data

28 views (last 30 days)
Hi everyone!
I have two tables: table 1, and table 2 (sorry it's showing up as table 11 in the pictures). They each have one column in common - it's called "a". They are of different lengths: table 1 is a 96514x2 double, and table 2 is a 95646x2 double.
I want to combine these two tables. Any times they have in common, I'd like them to share the data and put it into one row, and any times they don't have in common, I'd like the table that doesn't have that time to just be NA.
I have used the function outerjoin. however, when I use this, I get the resulting table T, which is a 459,134x3 double. As you can see from the attached file, many of these are repeats, AND it's smaller than the parent two tables, so I lose a lot of data.
I don't know why this is happening, or what I should do instead.
and then finally, table T
here is my code:
%this is me making my tables (which are originally an array, haha, sorry) into actual tables
table1=array2table(table1)
table1.Properties.VariableNames(1:2)={'a', 'b'}
table2=array2table(table2)
table2.Properties.VariableNames(1:2)={'a','c'}
%now me using the outerjoin function
T=outerjoin(table1, table2, 'MergeKeys', true)
thank you for your help!!
  2 Comments
Dyuman Joshi
Dyuman Joshi on 2 Nov 2023
"I don't know why this is happening"
Because the values are not the same, though they might appear to be so.
Can you share the data you are working with? If they are too big to attach, a sample of it would be good enough.
Use the paperclip button to attach.
Ariana
Ariana on 27 Nov 2023
Hi Dyuman,
you were right! I did have many many duplicate rows. I was able to identify them and remove them! Thank you so much for your help

Sign in to comment.

Answers (2)

Vatsal
Vatsal on 17 Nov 2023
Hi,
The outerjoin function in MATLAB performs an outer join operation on two tables based on one or more key variables. It includes all rows from both tables and fills the missing values with NaN (Not-a-Number) for the non-matching rows. However, it seems that the resulting table T is not as expected.
One possible reason for this behaviour is that the key variable "a" might contain duplicate values in both tables. When performing an outer join, the duplicate values can cause the resulting table to have repeated rows.
To address this issue, you can try the following steps:
  • Remove any duplicate rows from both tables using the unique function. This ensures that each key value appears only once in each table.
  • Perform the outer join on the cleaned tables using the outerjoin‘ function.
To learn more about the usage and syntax ofouterjoin, you may refer to the MathWorks documentation links below:
I hope this helps!
  1 Comment
Ariana
Ariana on 27 Nov 2023
Hi Vatsal,
Thank you for this! It turned out there were many duplicate rows, I just didn't know it. Removing them helped a lot!

Sign in to comment.


Steven Lord
Steven Lord on 27 Nov 2023
Rather than storing your data as tables with one variable containing serial date numbers, consider converting those serial date numbers to datetime and converting your tables to timetables using table2timetable. If you do that you could synchronize the timetables (which would allow you to change the time basis, synchronize them to times that appear in either the intersection or union of the times, etc.)

Categories

Find more on Tables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!