Links and descriptions of tools together with a play project for exploring JVM/ART memory problems
Code in this repository is explained here
- Working set: the amount of memory a program requires during some operation. For example, handling HTTP request headers might have working set of a few kB but then as a response, server creates in-memory a big image of 5 MB before sending writing it to response. Here working set of one response is ~5 MB.
- Garbage collection: Automated process of finding objects that are no longer in use and freeing their memory. GC from now on.
- Allocation rate: How much memory we're using by the new objects we're creating within some timeframe. Usually measured as MB/s.
- Heap: The amount of memory available for application data. This is strictly smaller than memory size of the whole process.
- JMX: Java Management Extensions is technology for both observing and commanding JVM.
- Reachable: Object that is still in use in the program and won't be collected by GC. Either by design or by bug.
Two ways to look at memory: rate of allocating data & largest working set.
- Automated way to reclaim one resource: memory
- We might still run out of other resources, say, sockets or file descriptors
- TODO What are root sets, an image + explanation
Problems with memory on garbage-collected runtimes such as JVM and ART can be split into two:
- We don't have enough memory to do all the things we want
- We're creating and abandoning a lot of things and GC has to do more work to clean after us.
In theory, our largest possible working set should be a big as the amount of memory we've given to the runtime. If we
go past that, we'll get the dreaded java.lang.OutOfMemoryError
. Due to various issues related to actual implementation
choices (fragmentation, generation sizes, humongous allocations), we'll always get this earlier.
When does one get OOM on ART?
Identify potential culprits using jmap and Eclipse Memory Analyzer (EMA). If
objects kept in memory are legitimate, then it's a case of too big working set for the given heap and choices are to
increase the heap size or slim the objects.
Use EMA to figure out what are the objects keeping most of the memory and see if they all are needed or maybe some
parts of the data could be replaced with primitive values (IntArray
instead of List<Int>
).
The other case is memory leak. We might be, for example, keeping some map in a thread-local variable even though it was supposed to be cleared after each request.
Many of the garbage collection algorithms have parts where they need to stop execution of the application altogether. During this stop-the-world pause nothing happens, no new frames get created, no input gets read and no response from a web server is written back. Occasional pauses of a few milliseconds are rarely noticeable, but if this happens more often, application suffers.
We might have case of running out of memory where GC is constantly trying to free memory without much success. This will most often be fatal eventually application runs out of memory.
The other case is that the application is allocating a lot of short-living objects that do get collected but cause GC to be run often. This is not fatal, but if application drops frames or a service takes long time to respond, it doesn't provide quality experience to users.
- Check GC logs (see Garbage collection logs )
- Check JMX GC indicators (less accurate)
- Profile allocations using Java Flight Recorder / Android Profiler to discover allocation rate.
- Running out of Metaspace
- "Direct memory" allocated through ByteBuffer#allocateDirect, which is used by some libraries like NIO.
- Native memory on JVM
- ART probably has various resources
- Allocate less
- Find the biggest allocators through profiling and reduce the amount they allocate
- Add more memory for less frequent GC
On JVM, first thing to do is to enable GC logs. On JVM 9+, use -Xlog:gc*:file=<GC-FILE-PATH>
and with earlier JVM
versions tihs dependent on the implementation. On OpenJDK < Java 9 use the following switches:
-verbose:gc
-Xlog:gc:<GC-FILE-PATH>
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime`
On Android, you can view the GC logs using Logcat. GC runs are only recorded when GC is seen slow: the application is not running on the background and there's over 5 ms pause or whole GC takes more than 100 ms. Thus, very frequent but short GC runs won't show up in the logs.
For looking into memory usage of a JVM, we can use jmap
. I can give both a coarse-grained look at the heap, showing
what classes are taking the most memory and a more fine-grained look by taking snapshots of the heap.
For coarse-grained look, run jmap -histo <PID>
. This can give you insights like "we have a lot of strings", but won't
tell you why that is. Sometimes seeing the classes with most instances is enough to explain why we're using more memory
than expected.
To be able to figure out why better, it's better to take snapshot of the heap (usually called heap dumps).
To dump all objects that are still reachable, that is objects that GC would not free:
jmap -dump:live,format=b,file=heap_dump_$(date "+%Y%m%d_%H%M%S").bin <PID>
Change live
to all
to dump all objects. This is rarely needed, but is useful when trying to figure out issues
related to Reference
s.
This heap dump can then be analyzed using Eclipse Memory Analyzer to find culprits for potential memory leaks.
Java Mission Control is an open-source tool for both profiling and for observing JMX metrics in a JVM instance.
This can be used to do both ad-hoc profiling and exploring profiles that have been recorded earlier. Both CPU and memory metrics can be recorded.
Profiling can and usually needs to be run without JMC GUI. This feature is called Java Flight Recorder (JFR), and it used to require commercial license until Java 11 or so. So make sure that using it is fine with the JDK you're using.
RedHat has a very brief tutorial and official documentation has the relevant command line switches.
It's one of the best ways to find out what parts of the application are allocating most and causing too much time being spent in GC.
When you're trying to understand in more detail what actually is in your heap or where you might be leaking memory, Eclipse Memory Analyzer (EMA) is a handy tool. It will open any heap dump < 64 GB, but it might need large amount of heap for its initial processing of the dump. Processing creates indices on the disk that EMA can later-on use to load and analyze the dump faster.
Main functionalities in it are leak detection and analyzing how much specific objects or classes hold memory. This can
help us find that one Map
that just happens to keep half of all objects alive. A good, although old introduction for
the tool can be read here
Android Profiler allows you to capture and analyze heap dump. It has seemingly same functionality as Eclipse Memory Analyzer and Java Mission Control with a more modern UI.
In case EMA is needed, Android Profiler can export heap dump and the dump can be converted to a form that EMA understands.
Java Microbenchmarking Harness is the way to do microbenchmarks. It allows you to
accurately measure how much the
piece of code you're benchmarking allocates. It's similar to what Measurements.kt
in this repository does, but is
more heavy-weight and correct.
Java Object Layout can be used to figure out how much memory a specific object actually uses. Aleksey Shipilëv, who's a great authority on JVM internals has a good blog post on using it.
JDK Reference
-class' subclasses together with ReferenceQueue provide a way to implement caches and freeing of non-memory resources.
- WeakReference is a reference that does not prevent the referent from being collected.
- SoftReference is also a reference that does not prevent the referent from being collected. Difference to
WeakReference
is that GCs have some heuristics that make recently used or created and thusSoftReference
can be used to implement in-memory caches that will be freed if there's need for more memory. - PhantomReference is a reference that is only enqueued after the referred to object has been collected. It can be used to free other resources such a file descriptors or memory allocations after the object that has been using is gone.
- ReferenceQueue is a queue where the references will end up after the thing they refer to has been freed. This means that when
ReferenceQueue#poll
returns a reference, the object it refers to has been freed, and we're free to, say, close the file descriptor the object used.