1

I'm using blitz.io to basically blast my site with traffic to see how my 'learning sysadmin' holds up under load. I realize this is as effective or relevant to real-world as a sledgehammer, but I really wanted to just see a comparison when I change a setting.

My setup in order of communication (all on Amazon EC2), Amazon Linux AMI

  • 1x Amazon elastic load balancer
  • 2x Nginx servers which upstream to...
  • 1x Php-fpm server (soon to be 2x). Which connects to...
  • 1x RDS mysql server

Everything is behind a VPC

For my testing, the site I am serving is a Wordpress installation with W3 Total Cache.

Originally I had 1x Nginx + 1x Php-fpm (1x Rds is implied) all as micro-servers. I believe I got 850 req/sec before I started getting a lot of time outs (times > 1000ms).

During this time, the CPU went to 100% on both phpfpm and nginx. So..

I then added a second nginx server. After that, I converted both nginx servers to 'large' as well as the php-fpm to 'large'.

I multiplied my php-fpm settings x5 and much to my dismay the tests were nearly identical... the only difference this time is both CPU and Memory went MAX about 5% on all 3 servers. It's like hardly ANY resources were being used. I looked in my logs for errors and didn't really see much...

I have looked at my settings many times and I know I'm missing something huge...

The site content for the wordpress section can be completely cachable... if I update anything on it I will clear the cache. Theres a second half to my site but its ALL static content, no db queries. I do use a php 'loader' script that loads in various content from include files but that's it.. pretty lightweight.

I have heard something about ulimit or rather.. could that be a problem?

I'm trying to do 6000 users over the course of 1 minute

My configs

Server 1: Nginx

nginx.conf user www www; worker_processes auto; events { worker_connections 1024; } error_log /var/log/nginx/error.log warn; pid /var/run/nginx.pid; http { server_tokens off; include /etc/nginx/mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log /var/log/nginx/access.log main; keepalive_timeout 65; sendfile on; tcp_nopush on; tcp_nodelay off; gzip on; gzip_http_version 1.0; gzip_disable "msie6"; gzip_comp_level 5; gzip_buffers 16 8k; gzip_min_length 256; gzip_proxied any; gzip_vary on; gzip_types # text/html is always compressed by HttpGzipModule text/css text/plain text/x-component application/javascript application/json application/xml application/xhtml+xml application/x-font-ttf application/x-font-opentype application/vnd.ms-fontobject image/svg+xml image/x-icon; ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2; ssl_ciphers RC4:HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; ssl_session_cache shared:SSL:10m; ssl_session_timeout 10m; upstream php { # ip_hash; server 10.0.0.210:9001; } include sites-enabled/*; } 

relevant nginx setting..

/etc/nginx/conf/cache-descriptors.conf

open_file_cache max=1000 inactive=20s; open_file_cache_valid 30s; open_file_cache_min_uses 2; open_file_cache_errors on; 

Server 2: php-fpm

php-fpm.conf

include=/etc/php-fpm.d/*.conf [global] pid = /var/run/php-fpm/php-fpm.pid error_log = /var/log/php-fpm/error.log log_level = notice emergency_restart_threshold = 5 emergency_restart_interval = 2 

php.ini

In this file I honestly didnt change much in php.ini except for the CGI path setting for the nginx zero day exploit. Maybe one more setting or two but vanilla for the most part

/etc/php-fpm.d/www.conf

[www] listen = 9001 ; # nginx-master, nginx-2 listen.allowed_clients = 10.0.0.248,10.0.0.155 user = www group = www pm = dynamic pm.max_children = 500 pm.start_servers = 150 pm.min_spare_servers = 50 pm.max_spare_servers = 250 pm.max_requests = 1200 request_terminate_timeout = 30 slowlog = /var/log/php-fpm/www-slow.log security.limit_extensions = .php php_flag[display_errors] = off php_admin_value[error_reporting] = 0 php_admin_value[error_log] = /var/log/php-fpm/www-error.log php_admin_flag[log_errors] = on php_admin_value[memory_limit] = 128M php_value[session.save_handler] = files php_value[session.save_path] = /var/lib/php/session 

If anyone has any ideas this would be greatly appreciated. I'm definitely hitting some kind of 'invisible limit' that I'm not seeing.

Thanks!

PS If you have any better way to benchmark I would be all ears..

screenshot of RDS is in the comment below (i kept it at micro)

here is what happened with test

blitz

7
  • Did you increase your database server? What are the database server stats like during this period? Commented Mar 31, 2013 at 15:09
  • I didn't increase RDS because there's hardly a dent in it (I assume because caching is doing all the work). Unless I'm reading this wrong (it's the 2nd big spike.. first one was from my first test): i.imgur.com/UszMGWT.png Commented Mar 31, 2013 at 15:17
  • Are you able to confirm that the limit is not anything to do with the capacity of the client performing the test rather than the server doing the work? Perhaps you ran out of available throughput for example. Commented Mar 31, 2013 at 15:31
  • Blitz.io is meant for load testing up to 100,000, so I dont think 6000 would be a problem :/ Commented Mar 31, 2013 at 15:32
  • This is what happened i.imgur.com/72Aufcb.png Commented Mar 31, 2013 at 15:41

1 Answer 1

1

It seems you're reaching limits yes, like number of ephemeral port range, number of max file descriptors opened, depletion of sockets, swapping out memory to disk, or even you're getting out of socket memory.

Look at /var/log/messages, dmesg, /proc/net/sockstat to look for clues on where your bottlenecks are.

Without logs it's hard to be of help.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.