3
$\begingroup$

I am trying to use R to find the optimal solution for my problem with positive coefficients. Here are my data:

 th inp tcyc tinst tmem tcom 1 2 2 26219765385 1975872868 52449810 782964 2 2 4 38080459431 3155342008 76744867 1878903 3 2 8 64572439641 6230494010 137754355 4351706 4 2 16 140168021516 13757989992 285524252 10605705 5 2 32 308925389816 31497131498 628391048 26040711 6 4 2 13206650786 988226883 25631315 844126 7 4 4 19078145632 1577873809 37085281 2125333 8 4 8 33742095874 3114415906 65962626 5222236 9 4 16 70956149286 6881357755 134957687 12180392 10 4 32 153411672670 15754506070 296548768 31057252 11 8 2 6572843040 494094967 12380740 808816 12 8 4 9452222628 788984621 17538152 2034061 13 8 8 16765943294 1557329849 30549900 5016827 14 8 16 34677550217 3440679505 61614420 12493699 15 8 32 74852648112 7876116794 133525620 29824686 16 16 2 3252373719 247026385 5958559 672396 17 16 4 4669800482 394452497 8097991 1676579 18 16 8 8269859136 778889584 13651458 4196829 19 16 16 16353025378 1720301596 26775255 10393194 20 16 32 37113657641 3938965759 55505822 25011009 21 32 2 1630888153 123512114 2683400 461526 22 32 4 2293598746 197173135 3682504 1213596 23 32 8 4045995970 389408822 5858031 3055324 24 32 16 8217603991 860041282 10973460 7502244 25 32 32 17978101850 1969647650 22909347 17953100 26 48 2 1064344042 82295143 1822133 381178 27 48 4 1523091067 131488491 2331228 949354 28 48 8 2677097592 259536252 3552229 2381626 29 48 16 5400541381 573140686 6489032 5875310 30 48 32 11837404077 1313066425 13318331 13968230 

I use linear regression in R, s <- lm(tcyc ~ 0+tinst+tmem+tcom, data=fit), to get the optimal value with intercept 0. But I get negative coefficients which does not make any sense.

coef(s) tinst tmem tcom 20.8745 -281.2288 -320.7204 

I am not sure whether is it the best way to model and find the optimal parameter for tinst, tmem and tcom. How do you find positive coefficients for the model?

Further explaining this problem in Detail:::

Background: Trying to predict the execution time of an application in the future many-core systems empirically by learning the application behavior. As it is a multithreaded program, it will have communication contnention bottleneck if the application demands high inter-core communication. The general system equation looks like

Total executiong time cycles (T_cyc) = Total cycles spent in Instruction (T_inst) + Total cycle spent in Memory instructions (T_mem) + Total cycle spent in Communication (T_com)

i,e T_cyc=T_inst+T_mem+T_com.

If I use a simulator I can get the T_inst,T_mem and T_com directly and find out the independent contribution of each component to the T_cyc. But using a hardware, I can only get the counts or number of events. Ie, N_inst, N_mem and N_com. So what I have is

T_cyc= a* N_inst + b* N_mem + c* N_com

Where a,b,c has to be determined.

I tried solving the problem using lsqnonneg (non-negative least square method) in MATLAB to find the a,b,c. At times from the data I get b and c value ZERO which is totally meaningless.

Things to notice: N_inst is a very high value. N_mem and N_com are bit lower in magnitude and hence I face this problem of b and c results as ZERO.

Questions: 1. Is this a proper tool to solve such a linear equation system? If not, what else should I try? 2. Is it a problem due to the sample size fed to the solver? 3. I see that for most applications trend of N_cyc, N_inst,N_mem are monotonic but N_com is non-monotonic and can it affect the solved values? If so, how to isolate this component and find its contribution individually?

$\endgroup$
3
  • $\begingroup$ Try s<-lm(tcyc ~ tinst+tmem+tcom-1, data=fit) $\endgroup$ Commented Mar 12, 2014 at 15:33
  • 3
    $\begingroup$ @Georg, as far as I know, +0 and -1 are fully equivalent in R. They both suppress they intercept. $\endgroup$ Commented Mar 12, 2014 at 16:01
  • 1
    $\begingroup$ yes, patrick is right. I tried using nnls package but still it was giving some coefficient as 0, again it doesn't hold good. $\endgroup$ Commented Mar 12, 2014 at 16:05

1 Answer 1

7
$\begingroup$

It is often the case that suppressing the intercept leads to regression coefficients that don't make sense. In my experience, there are rarely cases where suppressing the intercept makes sense, even if the scientific plausibility suggests that it might be justifiable (such as stopping distance versus cruising speed or creatinine clearance versus kidney mass in grams: you LEAVE the intercept IN with such analyses!). This is a problem of extrapolation.

Just eyeballing these data, I imagine that the estimated intercept would be a largely non-zero value. Since these data appear to come from some sort of computing time, comparing flops versus elapsed time, etc. the non-zero intercept could have a host of interpretations such as a boot time for running a process, a system lag as memory is allocated for an operation, or any other non-neglible system processes that aren't measured as part of an experimental run. Furthermore, and more subtle, there may be non-linear effects which are influencing your results. The regression coefficient from intercept-in OLS still provides a great way of estimating the first order linear trend through those data, even if the trend is curvilinear... only when you leave the intercept IN.

My first recommendation is to look at the output from running pairs(fit). And just look at the trend.

Nonetheless, if your goal is to simply find optimal positive coefficients in the model, you can do so with using by-hand optimization, either ML or Gibb's sampling, though don't be surprised if those results make no sense. Example of by-hand optimization:

X <- model.matrix(~ tinst+tmem+tcom-1, data=fit) y <- fit$tcyc negLogLik <- function(b) { b <- exp(b) ## restrict to positive only values yhat <- b %*% X ## calculate fitted -var(y-yhat) ## objective foo } nlm(negLogLik, c(1,1,1)) ## minimize objective foo 
$\endgroup$
9
  • 1
    $\begingroup$ +1, excellent discussion of the issues surrounding suppression of the intercept. I once made a similar point (when is it OK to remove the intercept), but not nearly as well as this. $\endgroup$ Commented Mar 12, 2014 at 18:19
  • $\begingroup$ can this problem be solved by increasing the number of samples? or running more experiments with various thread size? $\endgroup$ Commented May 19, 2014 at 12:36
  • $\begingroup$ @user41797 nope. An analysis is like an engine, sample size is just the throttle, a bad probability model is like bad gas. Feed crap into the carbs, get crap for acceleration. $\endgroup$ Commented May 19, 2014 at 19:55
  • $\begingroup$ Why stopping distance vs. cruising speed would require the intercept, if the relationship is linear? $\endgroup$ Commented May 31, 2015 at 15:06
  • 1
    $\begingroup$ @RobertKubrick Only use a intercept-less model for calculating a within-pair differences analysis. Carlin's paper is a good review. $\endgroup$ Commented Jun 1, 2015 at 18:40

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.