To summarize the goal as presented in the question, starting with a vector $x\in[0,1]$, perform a transformation $f(x)=x'$ such that:
- $x'\in[0,1]$
- $\bar{x'}=p$, where $p\in[0,1]$
- If $x_i>x_j$, then $x'_i>x'_j$ ($f$ is an increasing function)
There are myriad transformations that can accomplish this. One general strategy is to transform $x$ to the positive reals via some function $g$, multiply the transformed values by a positive constant, $a$, and then invert the transformation:
$$x'=g^{-1}(a\cdot g(x))$$
Alternatively, transform $x$ to $(-\infty,\infty)$, add or subtract $a$, and then invert the transformation.
Notice that $g$ can be the quantile function of any distribution with support on $[0,\infty)$ (or $(-\infty,\infty)$). A few examples:
- Exponential distribution: $g(x)=-\ln(x)$ (which results in $x'=x^a$)
- Log-logistic distribution: $g(x)=x/(1-x)$
- Half-Cauchy distribution: $g(x)=\text{tan}(\pi x/2)$
Each will behave a little differently, so choose a transformation that has desirable properties for what you are doing.
Demonstrating in R:
set.seed(1839) x <- runif(1000) p <- 0.56 np <- p*length(x) f <- vector("list", 6) # log-logistic f[[1]] <- function(a) (y <- a^2/(1/x - 1))/(y + 1) # exponential f[[2]] <- function(a) pexp(qexp(x, exp(a))) # gamma f[[3]] <- function(a) pgamma(qgamma(x, 1, 1), 1, exp(a)) # chi-squared f[[4]] <- function(a) pchisq(exp(a)*qchisq(x, 1), 1) # log-normal f[[5]] <- function(a) plnorm(exp(a)*qlnorm(x)) # normal f[[6]] <- function(a) pnorm(qnorm(x), a) a <- sapply(f, \(f) uniroot(\(a) sum(f(a)) - np, 0:1, extendInt = "yes", tol = 1e-9)$root) y <- lapply(1:length(f), \(i) f[[i]](a[i])) max(abs(sapply(y, mean) - p)) #> [1] 4.24305e-11 r <- rank(x) all(sapply(y, \(y) all.equal(rank(y), r))) #> [1] TRUE
Another consideration is efficiency when implemented in code. Here are a few candidate functions in R to benchmark.
remean1 <- function(x, p, np = p*length(x)) { x^uniroot(\(r) sum(x^r) - np, log(p)/log(range(x)), tol = 1e-9)$root } remean2 <- function(x, p, np = p*length(x)) { a <- uniroot(\(a) sum((y <- a/(1/x - 1))/(y + 1)) - np, (1/p - 1)/(1/range(x) - 1), tol = 1e-9)$root (y <- a/(1/x - 1))/(y + 1) } remean3 <- function(x, p, np = p*length(x)) { a <- uniroot(\(a) 2*sum(atan(tan(x*pi/2)/a))/pi - np, atanh(range(x))/atanh(p), tol = 1e-9)$root 2*atan(tan(x*pi/2)/a)/pi } remean4 <- function(x, p, np = p*length(x)) { a <- uniroot(\(a) sum(tanh(atanh(x)/a)) - np, atanh(range(x))/atanh(p), tol = 1e-9)$root tanh(atanh(x)/a) } mean(remean1(x, p)) - p #> [1] -2.251033e-11 mean(remean2(x, p)) - p #> [1] 0 mean(remean3(x, p)) - p #> [1] -9.473866e-12 mean(remean4(x, p)) - p #> [1] -4.674039e-14
Benchmark
microbenchmark::microbenchmark( remean1(x, p), remean2(x, p), remean3(x, p), remean4(x, p) ) #> Unit: microseconds #> expr min lq mean median uq max neval #> remean1(x, p) 2117.7 2176.00 2504.393 2236.10 2475.75 15133.2 100 #> remean2(x, p) 237.0 336.30 391.087 351.55 400.60 1573.0 100 #> remean3(x, p) 513.0 569.80 816.528 593.15 660.60 13371.0 100 #> remean4(x, p) 702.0 759.95 988.719 781.40 950.65 10681.8 100