# How to match mean and standard deviation of 2 datasets for data that cannot be less than 0

4 views (last 30 days)
Yoni Verhaegen -WE-1718- on 7 Feb 2024
Commented: Les Beckham on 8 Feb 2024
Hi all,
I have two datasets of monthly precipitation sums at a certain location. One of them is observed, the other is modelled by a climate model. I want to match the mean and standard deviation of both datasets, so that the time series from the model matches with those from the observations. However, when I apply the formula, some negative values occur in the time series of the modelled monthly precipitation sums in order to make the standard deviation fit. I wonder whether there is a solution for this or if anyone knows how I can solve this.
Thanks!
data_mod=[70.74191271 66.54238669 28.60091702 55.56554018 66.04186858 77.06576381 72.57394329 99.62497103 42.51156832 22.81012399 107.3993961 48.45702239 33.71119171 61.09975519 74.43277952 39.14433747 113.1039794 67.93592923 96.95537867 12.99913771 53.6158074 48.05637989 52.27533536 99.27060261 54.73806827 148.462539 17.94473213 93.65016815 32.89454535 52.36015655];
data_obs=[27 38.6 146.1 61.8 44.6 7.5 50.4 23.2 8.1 89.7 23.1 83.3 86.5 46 14.5 27.7 81 30 50.3 165.7 15.5 106.7 56.7 52.5 75.1 100.1 6.9 18.7 93.4 16.6];
data_transformed = mean(data_obs(:)) + (data_mod - mean(data_mod(:)))*(std(data_obs(:))/std(data_mod(:)));
mean(data_transformed(:))
std(data_transformed(:))

Les Beckham on 7 Feb 2024
Edited: Les Beckham on 7 Feb 2024
Your model data doesn't look at all like the observed data, so I'm not surprised that trying to force their statistics to match gives you unexpected results.
data_mod=[70.74191271 66.54238669 28.60091702 55.56554018 66.04186858 77.06576381 72.57394329 99.62497103 42.51156832 22.81012399 107.3993961 48.45702239 33.71119171 61.09975519 74.43277952 39.14433747 113.1039794 67.93592923 96.95537867 12.99913771 53.6158074 48.05637989 52.27533536 99.27060261 54.73806827 148.462539 17.94473213 93.65016815 32.89454535 52.36015655];
data_obs=[27 38.6 146.1 61.8 44.6 7.5 50.4 23.2 8.1 89.7 23.1 83.3 86.5 46 14.5 27.7 81 30 50.3 165.7 15.5 106.7 56.7 52.5 75.1 100.1 6.9 18.7 93.4 16.6];
plot(1:numel(data_obs), data_obs, '.-', 1:numel(data_mod), data_mod, 'o-')
legend('Observed', 'Modeled')
grid on
mean(data_mod) % <<< Model mean is much higher than observed mean!
ans = 63.6862
mean(data_obs)
ans = 54.9100
data_transformed = mean(data_obs(:)) + (data_mod - mean(data_mod(:)))*(std(data_obs(:))/std(data_mod(:)));
mean(data_transformed(:))
ans = 54.9100
std(data_transformed(:))
ans = 40.8168
Yoni Verhaegen -WE-1718- on 7 Feb 2024
Yes, precipitation patterns are often very difficult to capture by climate models, especially in mountaineous areas where precipitation varies strongly at small spatial scales. But is my request solveable anyway even though data are not looking similar?
Les Beckham on 8 Feb 2024
Sorry, but I would have to say no, unless I misunderstand what your "request" or goal really is. Perhaps if you explain in more detail what your ultimate goal is, someone can provide more suggestions. You say that you "want to match the mean and standard deviation of both datasets, so that the time series from the model matches with those from the observations.", however, since the model very clearly isn't anything like the observations, this is going to be very difficult.