# Find block of minimum distance between values

4 views (last 30 days)
Beder on 20 Mar 2017
Answered: Afiq Azaibi on 24 Mar 2017
Dear community, thank you all for investing time in reading my issue. Hopefully I have explained it precisely enough. Peter
I'm currently working on a project with weather-data with about 1TB of input-data. The data is stored in datatype single (in h5 files). I'm trying to scale the data to the integer-range and then save it.
Example: Temperature range is from -80°C to 50°C. So I scale it and then additionally save the values -80=0 50=256 to be able to convert it back.
Since I would like to have the same procedure over all variables (temperature, radiation, windspeed, pressure...) I'm currently looking for any algorithm that might be able to automatically identify smart clusters of data.
Stupid Cluster example: Scaling Blocks of predefined size to integer. If the size is too big, this might result in clustering hot desert and cold mountain region and therefore loosing a lot of precision. If the size is too small it results in unnecessary big files.
Smart Cluster Example: Arctica has Temperatures from -80°C to -50°C. The algorithm should now scale this area and write it's max and min. Or: radiation: there is no use in scaling from 0:1200 W/m². It should only scale the "daylight areas".
Is there any algorithm out there or ideally a matlab code, which might help me solving this issue?
My current workflow (stupid clusters adjusted to a each input variable) does not suit my needs properly.

Afiq Azaibi on 24 Mar 2017
You can use the kmeans clustering in order to group sets of similar temperature values together. Finding the right k clusters to use will ensure that your range is neither too wide nor narrow. After finding this, you will have an ideal range for the efficient conversion.
https://www.mathworks.com/help/stats/kmeans.html#inputarg_X