Friday, February 29, 2008

Java Performance Tuning

I Love Binary Search (Page last updated June 2006, Added 2006-06-29, Author Tim Bray, Publisher Tim Bray). Tips:

* Binary search is O(log2(N)). This means that a 1 ms search of a million items, will take 1.05ms to search 2 million, 1.10ms to search 4 million, and a whole 1.301ms when you get up to 64 million!
* In absolute terms, a simple O(log2(N)) routine is often faster than a more complex O(1) routine up to any usable size of structure.
* With tree structures you can potentially search really large data sets, too large to fit into memory, using persistent on-disk structures.
* Binary search has minimal memory requirements.
* Binary search is simple enough that you can implement specialized searches that directly reference internal structures (which can avoid unecessary access overheads, though a JIT might eliminate those anyway)
* Memory is cheap, virtual memory mapping is efficient and pages in data, so memory mapping a file could be an efficient way to access a huge data structure.
* You can write a bunch of complicated code to manage the on-disk vs. in-memory parts of your data, or you can pretend it's all in memory and use memory mapped files to take care of the details - and any increase in RAM transparently improves performance.

http://weblogs.java.net/blog/jfarcand/archive/2006/05/tricks_and_tips_1.html
Why you must handle NIO OP_WRITE (Page last updated May 2006, Added 2006-06-29, Author Jean-Francois Arcand, Publisher java.net). Tips:

* When the write buffer is full, SocketChannel.write will return 0, and this needs to be handled in such a way that avoids CPU cycling.
* Obtaining a globally pooled object seems to be quicker than getting a thread local one.
* Checking select to see if a channel is writable in a separate thread can be more efficient than merging the select to the main select call then spinning off to a write thread (and maintaining the writable data accessible across threads) when it is writable - assuming clients are not normally slow consumers.
* Check for 0 (slow client) and -1 (disconnected client) from Selector.select() calls to handle slow/bad clients.
* Time 0 returns (slow client) and check for -1 (disconnected client) from Channel.read - if too many 0's, then maybe explicitly close the client as it is too slow producing data, and is consuming resources.

http://weblogs.java.net/blog/jfarcand/archive/2006/06/tricks_and_tips.html
Why SelectionKey.attach() is evil (Page last updated June 2006, Added 2006-06-29, Author Jean-Francois Arcand, Publisher java.net). Tips:

* Under load you might end up with 10 000 connections, so 10 000 active SelectionKeys. If they all have a ByteBuffer or other attachment, then a lot of memory will be consumed, reducing your scalability and having fun eating all your memory.
* Use SelectionKey.channel() to retrieve the SocketChannel, rather than having separate support in your framework.
* Leave a socket read on its own thread for a configurable length of time, assuming (optimistically) that the read will complete soon enough and allow you to avoid moving the data across to the generic selector thread.
* If a socket read is taking too long to get all the data, you can move the data (ByteBuffer) across to the generic selector thread (e.g. attached to the selector). In tests with Grizzly, on slow a network, with broken clients, etc., blocking read Threads scale better than moving a dormant ByteBuffer to the main selector thread.
* Aim to have one ByteBuffer per Thread, not per selector - this significantly improves scalability by not overloading the VM with dormant ByteBuffers.

http://www.ddj.com/dept/java/188700760
Java 5 & 6 features (Page last updated June 2006, Added 2006-06-29, Author Matt Love, Publisher DrDobbs). Tips:

* The String concatenator operator (+, +=) should be avoided.
* StringBuilder is faster than StringBuffer, but not synchronized.
* Autoboxing imposes an order of magnitude overhead.
* Java 6 provides useful speedups, especially if escape analysis optimizes away object creation.

http://www.javaworld.com/javaworld/jw-06-2006/jw-0619-tuning.html
Common Java EE performance problems (Page last updated June 2006, Added 2006-06-29, Author Steven Haines, Publisher Javaworld). Tips:

* If the garbage collector cannot free enough memory to hold the new object, then it throws an OutOfMemoryError.
* Out-of-memory errors are associated with: application server crashes; Degraded performance; or seemingly endless repeated garbage collections that nearly halts processing and usually leads to an application server crash.
* A Java memory leak is the result of maintaining a lingering reference to an unused object: you are finished using an object, but because one or more other objects still reference that object, so the garbage collector cannot reclaim its memory.
* The following settings are recommended for monitoring garbage collection in a Sun JVM: -verbose:gc -xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. Sun reports the overhead for this level of logging at approximately 5 percent.
* To determine the cause of a memory leak, run your application in a memory profiler and: Execute the use-case; Take a snapshot of the heap; Execute the use-case again; Take another snapshot of the heap; Compare the two heap snapshots and look for objects that should not remain in the heap after executing the use-case
* Wait until your application reaches this steady state prior to performing any trend analysis on the heap.
* Memory leaks from Web requests usually come from sessions. Look at: Page scope; Request scope; Session scope; Application scope; Static variables; Long-lived class variables; the HttpServletRequest object; the servlet init() method.
* Page- or request-scoped variables are automatically cleaned up before a web request completes.
* A workaround for large sessions is to increase the heap or decrease the session time-out.
* Store the minimum information in session-scoped variables.
* Explicitly invalidate sessions when users log out.
* Tune session time-out.
* Classes are loaded into Perm Space, and if this fills up, a full GC is triggered. With -noclassgc, no classes will be deleted from Perm space. This can lead to continual full GCs for no apparent reason.
* Perm space of 128m is reasonable, 256m reasonable if you have a particularly large number of (generated) classes. 512m suggests an architectural problem.
* After tuning memory, the tuning option with the biggest impact in an application server is the size of the execution thread pool.
* Too small a thread pool will leave requests waiting in the queue for processing; too large and the CPU could spend too much time context switching.
* Tune the thread pool size by looking at CPU utilization, the thread pool utilization and the number of pending requests (queue depth).
* The recommendation for a starting point when tuning thread pool size is between 50 and 75 threads per CPU.
* Garbage collection causes CPU spikes, while saturated thread pools (thread pool size too large) cause consistently high CPU utilization.
* A saturated CPU results in abysmal performance across the board, and performance is better if a request arrives, waits in a queue, and then is processed optimally. Aim for CPU between 75 and 85 percent utilized during normal user load.
* If an application is using too much CPU after thread pool tuning, you need to either tune the application code with a code profiler, or add additional hardware.
* Tune database connection pool so that utilization is running at 70 to 80 percent during average load and threads are rarely observed waiting for a connection. But avoid overloading the database.
* Make sure the prepared statement cache in the database is sufficiently large to cache all plans.
* Tune your cache sizes to optimize successful cache hits. Avoid cache thrashing - a low ratio of hits to miss on the cache.
* Minimize the number of transaction rollbacks.

http://www.developer.com/java/other/article.php/3609776
Using the Full-Screen Exclusive Mode API in Java (Page last updated May 2006, Added 2006-06-29, Author Richard G. Baldwin, Publisher developer.com). Tips:

* The Full-Screen Exclusive Mode API allows you to write Java programs that take over the entire screen.
* Active rendering is more responsive than passive rendering, and is usually used with full screen mode applications.

No comments: