Tuesday, September 23, 2008

Lessons Learned from Scalability Testing

Currently I work on JBoss Portal project and recently I have been working on its performance and scalability part. The outcome of my recent effort is published at this blog entry. This was a very prolonged testing which required lots of patience, working with several teams, learning many new different tools and concepts. But now it feels like every bit of it was worth it. Here are the lessons that I learned not in any particular order:

  • Performance and scalability testing is not easy. Besides it being heavily technical I believe it's an art and you mature by experience.
  • Set a goal first otherwise there is no end to this. Be very concrete: I want my application to handle x concurrent users with response time of less than y second/s and it should scale more than z%.
  • If the application/system you are trying to improve the performance of and making more scalable is not developed by you, it's a must that people who developed are aware of scope of your work and helpful. I had good fortune of this being the case. I do not even want to imagine how it would have been otherwise.
  • The best way to find bottlenecks is taking several thread dumps and taking very close look at them. I found it the most revealing and helpful than any profiler (commercial or open source). TDA is a good tool to analyze thread dump. I used three profilers: JBoss Profiler, JProfiler, SAP's Memory Analyzer.
  • Regardless of what people say, open source load testing tools serve most purposes. I found Grinder to be very straightforward, not that hard to configure, and very vibrant community.
  • For managing distributed deployment, use or extend a distributed framework. I used Smartfrog components to manage instances of JBoss AS, start up load generator, start up apache web server and load balancer otherwise I would have had to log into around 15 servers to start/stop/configure these components.
  • Try to optimize most common code execution path. Optimizing code base which crawls but is also used rarely wont help you much.
  • JVM's default garbage collection configurations are pretty good. I have toyed with several configurations and none of those seem to have any significant impact in my use case. My point is that you play with GC configurations when you have fixed other problems.
  • JConsole that comes free with Sun JDK is pretty good tool to observe GC behavior, memory footprint and different logical memory spaces (eden, perm etc) of heap. Use that as a first tool for your profiling.
  • It's a common knowledge that database is always a bottleneck. Hibernate's show_sql switch is a very good friend. Make sure that you do not see queries being executed more than they should. Tuning those bumped up performance quite a bit.
  • The first thing you turn off after you have done basic optimization is logging. It has a big overhead simply because logging requires write to file and disk access.
  • Don't use NFS for testing.
  • If servers are part of cluster, make sure that they all are part of same subnet and secluded from any other network traffic.
  • Find a compromise between loading everything in memory and loading everything on demand (lazy loading). The former may show a good performance in the beginning but crawl once memory fills up and lot of full GC starts happening. The latter will have higher DB utilization. These settings are dependent on nature of your application and nature of how the application will be used. There is no universal setting. Let me know if you find one.
  • This blog is a WIP. I will add more bullet points as I think of them. :-)