3

I am very new to docker and recently wrote a dockerfile to containerize a mathematical optimization solver called SuiteOPT. However, when testing the optimization solver on a few test problems I am experiencing slower performance in docker than outside of docker. For example, one demo problem of a linear program (demoLP.py) takes ~12 seconds to solve on my machine but in docker it takes ~35 seconds. I have spent about a week looking through blogs and stackoverflow posts for solutions but no matter what changes I make the timing in docker is always ~35 seconds. Does anyone have any ideas what might be going on or could anyone point me in the right direction?

Below are links to the docker hub and PYPI page for the optimization solver:

Docker hub for SuiteOPT

PYPI page for SuiteOPT

Edit 1: Adding an additional thought due to a comment from @user3666197. While I did not expect SuiteOPT to perform as well in the docker container I was mainly surprised by the ~3x slowdown for this demo problem. Perhaps the question can be restated as follows: How can determine whether or not this slowdown is caused purely to the fact that I am executing a CPU-RAM-I/O intensive code inside of a docker container instead of due to some other issue with the configuration of my Dockerfile?

Note: The purpose of this containerization is to provide a simple way for users to get started with the optimization software in Python. While the optimization software is available on PYPI there are many non-python dependencies that could cause issues for people wishing to use the software without running into installation issues.

9
  • 2
    Besides an undoubtedly positive benefits of using containers for reasonably repetitive or mass deployment of pre-configured ready-to-use eco-systems in an almost COTS-fashion to the crowds, what has lead you to the assumption, that any such docker-containerisation technology will result in executing a {CPU- |RAM-I/O}-intensive code-inside-an-abstracted-container without any negative externalities - be it add-on costs from running the abstraction / containerisation -engine- ( read slower ) plus awfully wasted L1/L2/L3-Cache-Efficient-reuse effects, that will not happen inside container? Commented Feb 19, 2020 at 22:42
  • 3
    You may have to dig more by analyzing your specific case with a tool like perf. For example in this article: Another reason why your Docker containers may be slow the performance was bad due to a library used for logging. To visually see what perf record ... captures check Flame Graphs and Netflix FlameScope Commented Feb 20, 2020 at 9:36
  • 2
    Always welcome @chrundle. As Anastasios has posted above, the awfully adverse inefficiency comes from the fact of immense cross-dependency of C-groups sharing - the biggest sin in performance hunting in distributed-systems - Let me propose an A / B / C test - run the same on the bare metal [A] + next inside a VM (may use VmWare tool for "packing" the bare-metal as-is into VM + VmWare Player for private use), still on the same bare metal device [B] + the same as a container [ C] --- If performance matters, the VM-isolation v/s Docker-shared C-groups approach data will tell you. Commented Feb 20, 2020 at 10:30
  • 2
    I've taken some notes here: github.com/tgogos/flamescope_test. I'm not sure but they might help :-) Commented Feb 21, 2020 at 14:04
  • 1
    @user3666197 Your A / B / C test sounds like a good idea. I will try to get around to setting up and running the B test soon. Commented Feb 21, 2020 at 14:05

1 Answer 1

1

Q : How can determine whether or not this slowdown is caused purely to the fact that I am executing a CPU-RAM-I/O intensive code inside of a docker container instead of due to some other issue with the configuration of my Dockerfile?

The battlefield :

enter image description here ( Credits: Brendan GREGG )

Step 0 : collect data about the Host-side run processing :


 mpstat -P ALL 1 ### 1 [s] sampled CPU counters in one terminal-session (may log to file) 

 python demoLP.py # <TheWorkloadUnderTest> expected ~ 12 [s] on bare metal system 

Step 1 : collect data about the same processing but inside the Docker-container

plus review policies set in --cpus and --cpu-shares ( potentially --memory +--kernel-memory if used )
plus review effects shown in throttled_time ( ref. Pg.13 )

cat /sys/fs/cgroup/cpu,cpuacct/cpu.stat nr_periods 0 nr_throttled 0 throttled_time 0 <-------------------------------------------------[*] increasing? 

plus review the Docker-container's workload view-from-outside the box by :

cat /proc/<_PID_>/status | grep nonvolu ### in one terminal session nonvoluntary_ctxt_switches: 6 <------------------------------------[*] increasing? 

systemd-cgtop ### view <Tasks> <%CPU> <Memory> <In/s> <Out/s> 

Step 2 :

Check observed indications against the set absolute CPU cap policy and CPU-shares policy using the flowchart above

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.