I need help with a clustering task I'm doing. The essence of the problem, there is data on vegetation indices. Simple example for R
clu=structure(list(ndvi_mr75_60_40 = c(0.97, 0.97, 0.8, 0.87, 0.84, 0.83, 0.87, 0.78, 0.99, 0.87, 0.85, 0.91, 0.89, 0.75, 0.97, 0.82, 0.97, 0.99, 0.93, 0.94), ndre_m75_60p20 = c(0.4, 0.42, 0.52, 0.55, 0.37, 0.32, 0.46, 0.5, 0.35, 0.33, 0.37, 0.47, 0.44, 0.43, 0.38, 0.47, 0.44, 0.53, 0.29, 0.51), ndwi_m75_60p20 = c(0.24, 0.26, 0.35, 0.3, 0.31, 0.27, 0.3, 0.28, 0.09, 0.08, 0.21, 0.27, 0.22, 0.31, 0.12, 0.28, 0.2, 0.27, 0.09, 0.29), arvi_m75_60p20 = c(0.58, 0.58, 0.79, 0.75, 0.47, 0.43, 0.7, 0.68, 0.57, 0.45, 0.52, 0.6, 0.68, 0.65, 0.52, 0.61, 0.62, 0.7, 0.37, 0.72), evi_m75_60p20 = c(0.45, 0.44, 0.6, 0.64, 0.39, 0.33, 0.56, 0.55, 0.41, 0.33, 0.39, 0.48, 0.53, 0.51, 0.4, 0.49, 0.49, 0.59, 0.27, 0.61), evi_mr75_p20 = c(0.38, 0.38, 0.55, 0.4, 0.41, 0.3, 0.36, 0.39, 0.55, 0.51, 0.52, 0.37, 0.55, 0.44, 0.45, 0.39, 0.4, 0.4, 0.54, 0.38), wri_m75_60p20 = c(0.47, 0.51, 0.29, 0.31, 0.8, 0.68, 0.5, 0.41, 0.38, 0.52, 0.45, 0.36, 0.4, 0.39, 0.4, 0.39, 0.34, 0.29, 0.71, 0.31), wri_mr75_45p10 = c(0.55, 0.58, 0.39, 0.33, 0.94, 0.79, 0.65, 0.59, 0.68, 0.91, 0.53, 0.56, 0.57, 0.42, 0.63, 0.48, 0.54, 0.4, 0.81, 0.53), wri_mr75_20 = c(0.74, 0.77, 0.39, 0.32, 0.97, 0.82, 0.77, 0.54, 0.61, 0.98, 0.47, 0.59, 0.52, 0.36, 0.65, 0.38, 0.55, 0.36, 0.92, 0.45), ndvi_s85_50 = c(48.51, 47.65, 45.27, 52.05, 37.47, 26.14, 47.43, 45.54, 57.16, 44.9, 47.7, 46.19, 57.25, 44.47, 60.44, 43.22, 57.02, 64.49, 49.04, 56.35), cluster = c(1L, 1L, 1L, 1L, 3L, 5L, 1L, 1L, 4L, 3L, 1L, 1L, 4L, 3L, 4L, 3L, 4L, 4L, 1L, 4L)), class = "data.frame", row.names = c(NA, -20L)) I used kmean .Сluster is number cluster for obs. Next for each cluster there data for yield. there example
yield=structure(list(cluster = c(1L, 1L, 1L, 1L, 3L, 5L, 1L, 1L, 4L, 3L, 1L, 1L, 4L, 3L, 4L, 3L, 4L, 4L, 1L, 4L), yield = c(2260L, 2016L, 2777L, 1701L, 2202L, 2260L, 1254L, 2103L, 2942L, 1318L, 1633L, 2190L, 2270L, 2767L, 1463L, 2190L, 1773L, 2280L, 1855L, 1670L)), class = "data.frame", row.names = c(NA, -20L)) having this, i can get histogram
here for each of six cluster provided histogram of yield. As you can see, the yield histograms between clusters overlap very strongly. it is necessary to obtain a uniform distribution of probabilities for yields.
In other words, 1.we have data on vegetation indices and yields as input, only yields do not need to be clustered, only vegetation indices are clustered
2.After we have clustered the vegetation indices and obtained the cluster number, we add a column with yield, rows of vegetation indices and yield values correspond to
- then get histogram of yield for each cluster, like i provided. Indeed each row is ID of garden bed.(that is, vegetation index data for this bed and its yield)
Therefore, the main question for which I created this topic is how to normalize the data on vegetation indices in such a way that after receiving clusters, the yield between clusters does not overlap, i.e. achieve a uniform distribution of probabilities like this (artificially painted) 
Or what is the best method to choose to achieve the "correct result"
Any help is valuable to me