I am trying to use R to find the optimal solution for my problem with positive coefficients. Here are my data:
th inp tcyc tinst tmem tcom 1 2 2 26219765385 1975872868 52449810 782964 2 2 4 38080459431 3155342008 76744867 1878903 3 2 8 64572439641 6230494010 137754355 4351706 4 2 16 140168021516 13757989992 285524252 10605705 5 2 32 308925389816 31497131498 628391048 26040711 6 4 2 13206650786 988226883 25631315 844126 7 4 4 19078145632 1577873809 37085281 2125333 8 4 8 33742095874 3114415906 65962626 5222236 9 4 16 70956149286 6881357755 134957687 12180392 10 4 32 153411672670 15754506070 296548768 31057252 11 8 2 6572843040 494094967 12380740 808816 12 8 4 9452222628 788984621 17538152 2034061 13 8 8 16765943294 1557329849 30549900 5016827 14 8 16 34677550217 3440679505 61614420 12493699 15 8 32 74852648112 7876116794 133525620 29824686 16 16 2 3252373719 247026385 5958559 672396 17 16 4 4669800482 394452497 8097991 1676579 18 16 8 8269859136 778889584 13651458 4196829 19 16 16 16353025378 1720301596 26775255 10393194 20 16 32 37113657641 3938965759 55505822 25011009 21 32 2 1630888153 123512114 2683400 461526 22 32 4 2293598746 197173135 3682504 1213596 23 32 8 4045995970 389408822 5858031 3055324 24 32 16 8217603991 860041282 10973460 7502244 25 32 32 17978101850 1969647650 22909347 17953100 26 48 2 1064344042 82295143 1822133 381178 27 48 4 1523091067 131488491 2331228 949354 28 48 8 2677097592 259536252 3552229 2381626 29 48 16 5400541381 573140686 6489032 5875310 30 48 32 11837404077 1313066425 13318331 13968230 I use linear regression in R, s <- lm(tcyc ~ 0+tinst+tmem+tcom, data=fit), to get the optimal value with intercept 0. But I get negative coefficients which does not make any sense.
coef(s) tinst tmem tcom 20.8745 -281.2288 -320.7204 I am not sure whether is it the best way to model and find the optimal parameter for tinst, tmem and tcom. How do you find positive coefficients for the model?
Further explaining this problem in Detail:::
Background: Trying to predict the execution time of an application in the future many-core systems empirically by learning the application behavior. As it is a multithreaded program, it will have communication contnention bottleneck if the application demands high inter-core communication. The general system equation looks like
Total executiong time cycles (T_cyc) = Total cycles spent in Instruction (T_inst) + Total cycle spent in Memory instructions (T_mem) + Total cycle spent in Communication (T_com)
i,e T_cyc=T_inst+T_mem+T_com.
If I use a simulator I can get the T_inst,T_mem and T_com directly and find out the independent contribution of each component to the T_cyc. But using a hardware, I can only get the counts or number of events. Ie, N_inst, N_mem and N_com. So what I have is
T_cyc= a* N_inst + b* N_mem + c* N_com
Where a,b,c has to be determined.
I tried solving the problem using lsqnonneg (non-negative least square method) in MATLAB to find the a,b,c. At times from the data I get b and c value ZERO which is totally meaningless.
Things to notice: N_inst is a very high value. N_mem and N_com are bit lower in magnitude and hence I face this problem of b and c results as ZERO.
Questions: 1. Is this a proper tool to solve such a linear equation system? If not, what else should I try? 2. Is it a problem due to the sample size fed to the solver? 3. I see that for most applications trend of N_cyc, N_inst,N_mem are monotonic but N_com is non-monotonic and can it affect the solved values? If so, how to isolate this component and find its contribution individually?
s<-lm(tcyc ~ tinst+tmem+tcom-1, data=fit)$\endgroup$+0and-1are fully equivalent in R. They both suppress they intercept. $\endgroup$