Script to remove polynomial/quadratic error off CSV data

[tl;dr: read a csv, fit a curve, substract it from the data and write back to the csv]
Hello everyone,
for a research project I have large amounts of data coming off a profilometer. If you don't know, this is a device that measures the surface profile, in my case of a thin film on a piece of glass, and stores it as X/Y-data in .csv form. Inherent to this data is an error caused by the curvature of the glass plate, that needs to get removed. One such measurement will produce about 40000 lines of data.
I have determined that a quadratic compensation is good enough for what I'm looking to measure, so I have an area in front of and behind the film, as well as in the middle, where there is no film, which can be used to fit a quadratic polynome. The data is quite noisy, so you need to take an average over a couple 100 points. What I would like to do is write a script that reads a CSV file, fits a quadratic polynome to these areas that are known to be the glass plate and subtracts this polynome from the data, so I will hopefully end up with data that is compensated for the curvature of the glass plate, which is then added to the CSV file, ideally in a third column, if that is even possible.
Unfortunately, I am quite new to Matlab, although I managed to cobble together a script that could read a CSV file and plot it in the past, I don't know where to even start with this one. Has anyone ever done this or knows how to do it?
Best, IJ

6 Comments

Start with what you already had/have and work from there...
Lay out the steps in a logical order and then implement those steps. It's not as hard as it may seem.
There are builtin fitting tools in the MATLAB base product (polfit, polyval) that will do the job easily enough; there are more sophisticated tools in Curve Fitting and/or Statistics TB if you have those and want to do more with the fit as far as test statistics, etc., etc., ...
The first step will be to have some way to identify just which pieces of the data are those to fit and then to look carefully at the kind of data you are getting to see what you might want/need to do about smoothing it first or the like.
We can't really say much about that without the data; attaching at least one and preferabley a few of these profiles would certainly lead to a much higher likelihood of somebody really doing something specific.
I tried to attach some of the files, but I can't see if it worked. Identifying the spot is easy, it's right in the middle, and you could safely go 250 points either way. I am actually already stumbling at the first step, how do I tell Matlab to use the first 500, middle 500 and last 500 points and disregard the rest?
Just use an indexing vector...nothing difficult in that, particularly -- but finding an algorithm to fit something smooth to these data is going to be a trick methinks given the characteristics -- I looked at just the first trace--
Npt=500; % the number points in regions to use
ix=[{1:Npt}; {N2+[-Npt/2:Npt/2]}; {N-Npt:N}]; % build reference cell array of regions
Let's see what that gives us...
plot(G60X(:,1),G60X(:,2)) % plot the whole trace first
hold on % get ready to add on top
cellfun(@(ix)plot(G60X(ix,1),G60X(ix,2),'r-'),ix) % add the sections in red
xlim([0 max(G60X(:,1))]) % blow up so can see interest areas
ylim([-200 200])
results in
Will need to look at those areas in red much more closely, but simply fitting the data will not produce anything at all approximating the baseline -- and the center area "hump" is peculiar to my eye...
As I said, the data is ugly. The y axis reads in nanometers, so any particle, scratch or even noise in the room causes these spikes to happen. That's just the joy of profilometry for you. Looking at this, you could even fit to a much broader area, but not on every sample, the transition from substrate to film will be this evident.
Also, our substrates are cheap float glass. Given the number of samples we produce and the yield we get, polished glass would just be prohibitively expensive. But float glass has these dips and crests that need to be accounted for.
That hump in the middle can be disregarded. As I said, quadratic correction will be plenty accurate. If the three red areas would end up on the y=0 axis, it would be good enough for what I need. Nobody expects the polts t be good looking, I'll just need to get an idea of the film thickness across my sample.
Ah...that's a lot less restrictive of a problem statement than I had inferred from prior... :)
Are the spikes "real" in that they're going to be influencing this estimate across the sample or would/should rejecting them be part of the algorithm?
I've not looked at the rest, there are a relatively few meally large spikes of from 2-3X to 5-6X the surrounding area that are extremely large excursion at the beginning/ending although they have some noise/structure at the peak (that may/may not be real?). Would it be desirable/acceptable to remove those and replace with, say, spline interpolant between?
That likely could be done reasonably robustly and then, having done that in your three selected areas, just fit that parabola on the means of those locations. You could investigate the effect of fitting the raw data as well, but I suspect it wouldn't help much and would, in fact, reintroduce more noise than would help.
I've got other tasks right now, but I'll try to look again later this evening...but those would be my thinking of what I'd probably try. findpeaks if you have Signal Processing TB could be very helpful in peak-locating.
The spikes are part of the data and should actually stay in the representation, at least in the film. They may indicate contaminants, air bubbles in the film etc. which help to judge the surface quality. You could still do the interpolation for your data processing, but I wouldn't bother to be honest. Making the selected area bigger or moving it to a less noisy spot is probably easier, and it does not have to be perfect at all, I just need to take out the overall bend.
Thanks anyways for taking so much time out of your day.

Sign in to comment.

Answers (2)

The detrend function and/or the Remote Trends task may be of interest or use to you.

4 Comments

Thanks, I will give it a try
detrend is linear only; no curvature.
I've never seen the "Remote Trends" thingie before; you can probably experiment with it to see, but I suspect you're still going to have difficulties in how to remove the spikes programmatically. What is clear to the eye isn't necessarily that simple a task to code in general.
As of release R2019a detrend allows you to remove polynomial trends. See the Release Notes.
The Remove Trends task is new as of release R2019b. See the Release Notes.
I tried detrend, but I could not get it to select the right part of my data.

Sign in to comment.

Hey everyone,
just to update you that I managed to perform the correction using the polyfit function. I selected three slices of my original data for a new matrix, to which I fitted a polynome that I then subtracted. Here's the section of the code that's doing the job:
FirstSeries = readmatrix('G53_in.csv');
% SecondSeries = readmatrix('G53_cross.csv');
FirstSeries_selection = [FirstSeries(1:2000, :); FirstSeries(ceil(end/2)-1000:ceil(end/2)+1000, :); FirstSeries(end-2000:end, :)];
corrector = polyfit(FirstSeries_selection(:, 1), FirstSeries_selection(:, 2), 2);
x_axis = FirstSeries(:, 1);
y_1 = FirstSeries(:, 2);
% y_2 = SecondSeries(:, 2);
y_1_fit = polyval(corrector,x_axis);
y_1_corrected = y_1 - y_1_fit;
And here's a plot generated with the original data, the fit and the corrected data.
Writing back to the CSV data is still a work in progress.

Categories

Products

Release

R2021a

Asked:

on 9 May 2021

Edited:

on 18 May 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!