Building Memory-efficient
Java Applications
Nick Mitchell, Gary Sevitsky (presenter)
IBM TJ Watson Research Center Hawthorne, NY USA
Copyright is held by the author/owner(s).
OOPSLA 2008, October 19–23, 2008, Nashville, Tennessee, USA.
ACM 978-1-60558-220-7/08/10.

Quiz

Small boxes?
Q: What is the size ratio of Integer to int?
a. 1 : 1
b. 1.33 : 1
c. 2 : 1
d. ?
Assume 32-bit platform

Small boxes?
Q: What is the size ratio of Integer to int?
a. 1 : 1
b. 1.33 : 1
c. 2 : 1
d. 4 : 1 
Assume 32-bit platform

Small things?
Q: How many bytes in an 8-character String?
a. 8
b. 16
c. 28
d. ?
Assume 32-bit platform

Small things?
Q: How many bytes in an 8-character String?
a. 8
b. 16
c. 28
d. 64 
Assume 32-bit platform

Bigger? Better?
Q: Which of the following is true about HashSet
relative to HashMap
a. does less, smaller
b. does more, smaller
c. similar amount of functionality, same size
d. ?

Bigger? Better?
Q: Which of the following is true about HashSet
relative to HashMap
a. does less, smaller
b. does more, smaller
c. similar amount of functionality, same size
d. does less, larger 

Small collections?

Q: Put the following 2-element collections in size
order: ArrayList, HashSet, LinkedList, HashMap

Small collections?

Q: Put the following 2-element collections in size
order: ArrayList, HashSet, LinkedList, HashMap

A: ArrayList, LinkedList, HashMap, HashSet

Collections?
Q: How many live collections in a typical heap?
a. between five and ten
b. tens
c. hundreds
d. ?

Collections?
Q: How many live collections in a typical heap?
a. between five and ten
b. tens
c. hundreds
d. tens/hundreds of thousands, even millions 

Roadmap
Quiz
Background & myths
Memory health
Patterns of memory usage

•
Process

Case studies, with JVM background mixed in

Background

Background
•

Our group has been diagnosing memory and performance problems in
large Java systems for more than 8 years

•

Worked with dozens of applications: open source, large commercial
applications, software products

•
•

servers, clients, applications, frameworks, generated code, etc.

Common thread: it is easy to build systems that consume large memory
resources compared to the work accomplished

The big pile-up
Heaps are getting bigger

•
•

Grown from 500M to 2-3G or more in the past few years
But not necessarily supporting more users or functions

Surprisingly common:

•
•
•
•

requiring 1G memory to support a few hundred users
saving 500K session state per user
requiring 2M for a text index per simple document
creating 100K temporary objects per web hit

Consequences for scalability, power usage, and performance

The big pile-up
Not a reflection on the quality of programmers – many are expert

More abstractions = less awareness of costs

•

It is easy for costs to pile up, just piecing together building blocks

The iceberg effect:
App
Frameworks
Frameworks
Frameworks
Frameworks

Myths

Things are fine
Objects (or Strings, HashMaps, …) are cheap

Frameworks are written by experts, so they’ve been
optimized (for my use case!)

The JIT and GC will fix everything

Things are not fine

I knew foo was expensive; I didn’t know it was this
expensive!
It’s no use: O-O plus Java is always expensive
Efficiency is incompatible with good design

Goals
It’s not a lost cause!

•
•

Raise awareness of costs
Give you a way to make informed tradeoffs

Roadmap
Quiz
Background & myths
Memory health
Patterns of memory usage

•
Process

Case studies, with JVM background mixed in

Patterns of memory usage
Data types
High
overhead
Delegation

Fields
Base
class

Collections

High
data

Many, high
overhead

Duplication

Empty

Representation
Unused
space

Small
Special
purpose

Lifetime
Short
Complex
temps

Long
In-memory Space Correlated
designs vs. time lifetime

Large, high
per-entry cost
Special
purpose

Roadmap
Quiz
Background & myths
Memory health
Patterns of memory usage

•
•
•
•
•
•
Process

Modeling your data types
Modeling relationships
…. break …
More relationships
More data type modeling
Object lifetime

Note about measurements

Measurements shown are estimates obtained from experiments on a
sampling of different JVMs. In practice costs will vary across JVMs.
Measures we report may be subject to errors due to data collection,
analysis tools, or interpretation of results. They are intended only to
illustrate the nature of memory problems.

Memory health

The idea of health
TreeMap<Double, Double> (100 entries)
Collection region

•

Schematic of a data
structure

•

Distinguish data
types from
collections

•

A region includes
all of its
implementation
classes

TreeMap
x1 = 3.9KB

Average fanout

Data type region
100

100

Double

Double

x100 = 2.3KB

x100 = 2.3KB

The idea of health
TreeMap<Double, Double> (100 entries)

TreeMap

•

Cost: 8.6KB

•

What are the bytes
accomplishing?

•

How much is
actual data vs.
overhead?

x1 = 3.9KB

100

100

Double

Double

x100 = 2.3KB

x100 = 2.3KB

Data type health
One Double

•

33% is actual data

•

67% is the
representation
overhead

•

From one 32-bit
JVM. Varies with
JVM, architecture.

Double
24 bytes

Double
double
JVM-imposed overhead: 16 bytes

data: 8 bytes

Data type health
Example: An 8-character String

•

only 25% is the
actual data

•

75% is overhead of
representation

•

would need 96
characters for
overhead to be
20% or less

8-char String
64 bytes

String

JVM overhead
16 bytes

bookkeeping
fields 12 bytes

pointer
4 bytes

char[]
chars
JVM overhead
16 bytes

data
16 bytes

Collection health
A 100-entry TreeMap
TreeMap

•

How does a
TreeMap spend its
bytes?

•

Collections have
fixed and variable
costs

x1 = 3.9KB

TreeMap
Fixed overhead: 48 bytes
TreeMap$Entry
Per-entry overhead: 40 bytes

data

Data structure health
TreeMap<Double, Double> (100 entries)

TreeMap
x1 = 3.9KB

100

•

82% overhead
overall

•

Design enables
updates while
maintaining order

•

Is it worth the
price?

100%
overhead

100

Double

Double

x100 = 2.3KB

x100 = 2.3KB

67%
overhead

Data structure health
Alternative implementation (100 entries)
double[]

double[]

1x = 816 bytes

1x = 816 bytes

•

Binary search
against sorted
array

•

Less functionality –
suitable for loadthen-use scenario

•

2% overhead

2%
overhead

Health as a gauge of scalability
TreeMap<Double, Double> (10,000 entries)

TreeMap

•

Overhead is still
82% of cost

•

Overhead is not
amortized in this
design

•

High constant cost
per element:
88 bytes

x1 = 391KB

10000

10000

Double

Double

x10000 = 234KB

x10000 = 234KB

Health as a gauge of scalability
TreeMap<Double, Double>

•

Overhead is still
82% of cost

•

Overhead is not
amortized in this
design

•

High constant cost
per element:
88 bytes

88%
82%

Data
Overhead

1

100K

200K

300K

400K

Health as a gauge of scalability
Alternative implementation

double[]

~ 0%
overhead

double[]

Data
Overhead

2%
1

0%
100K

200K

300K

400K

•

Overhead starts
out low, quickly
goes to 0

•

Cost per element
is 16 bytes, pure
data

Summary: Health
Distinguish actual data from the overhead of representation:

•
•
•

Overhead from your data types
Overhead from your collection choices, fixed vs. variable
Many other ways to break down overhead costs

•

JVM object overhead, delegation costs, empty array slots, unused
fields, duplicate data, ...

Can help answer:

•
•
•

How much room for improvement?
Is the functionality worth the price?
Which overhead will be amortized? If constant, how large?

Patterns of memory usage
Data types
High
overhead
Delegation

Fields
Base
class

Collections

High
data

Many, high
overhead

Duplication

Empty

Representation
Unused
space

Small
Special
purpose

Lifetime
Short
Complex
temps

Long
In-memory Space Correlated
designs vs. time lifetime

Large, high
per-entry cost
Special
purpose

Modeling your data types

•

High-overhead data types

•

Object and delegation costs

Background: the cost of objects
Boolean
16 bytes
header
12 bytes

JVM & hardware impose
costs on objects. Can be
substantial for small
objects

•

Headers enable
functionality and
performance
optimizations

•

8-byte alignment in this
JVM

•

Costs vary with JVM,
architecture

boolean alignment
1 byte 3 bytes

Double
24 bytes
header
12 bytes

•

double
8 bytes

alignment
4 bytes

char[2]
24 bytes
header
16 bytes
From experiment on one 32-bit JVM

2 chars alignment
4 bytes 4 bytes

The cost of delegation
Example: An 8-character String

•

31% is overhead
due to modeling as
two objects

•

Effect varies with
size of String

8-char String
64 bytes

String

JVM overhead
16 bytes

bookkeeping
fields 12 bytes

pointer
4 bytes

char[]
chars
JVM overhead
16 bytes

data
16 bytes

The culture of objects
C++ has 5 ways to organize fields into data types. Java has 2.

•
•
•
•
•

Delegation
Composition
Single inheritance
Multiple inheritance
Union types

Software engineering culture favors reuse and loosely coupled designs

Fine-grained modeling
Case study: server framework, part of connection
Request info

Request
x46K = 67MB

Entry

…
From

Params

36% of cost is
delegation overhead

•

Constant overhead per
Request

•

Can magnify the costs
of other choices

Contact

NameAddress

Url

Params

•

Entry

…

NameAddress

34 instances to
represent a request.
Cost: 1.5K per
request. Will not scale.

one Request

Entry
To

•

Url

Params

Params

NameAddress

Url

Params

Modeling your data types

•

Background: 32- vs. 64-bit JVMs

32- vs. 64-bit
•

64-bit architectures can have a big impact on memory costs. Especially
in designs that have a lot of small objects and pointers

•

Using 64-bit addressing to solve memory problems can cause new ones

•
•
•

Increases object header, alignment, and pointer overhead
One study shows 40-50% avg. increase in heap sizes for benchmarks

Some JVMs have options for extended 32-bit addressing, allowing access
to larger heaps without the footprint cost

32- vs. 64-bit
Example: An 8-character String

•

50% larger

•

Delegated design is
responsible for extra
object header and
pointer costs

•

Fine-grained designs
incur especially high
costs

8-char String
96 bytes

String

JVM overhead
24 bytes

bookkeeping pointer
fields 12 bytes 8 bytes

alignment
4 bytes

char[]
chars
JVM overhead
24 bytes

data
16 bytes

Modeling your data types

•

High-overhead data types

•

Large instance sizes

Bookkeeping fields
Simple example: an 8-character String

•

String users pay a 12byte tax to store
offset, length,
hashcode.
Unnecessary for most
common use cases.

•

19% overhead for an
8-char String

•

Premature
optimization.
Cautionary tale for
library designers!

8-char String
64 bytes

String

JVM overhead
16 bytes

bookkeeping
fields 12 bytes

pointer
4 bytes

char[]
chars
JVM overhead
16 bytes

data
16 bytes

Large instance sizes II
•

Case study: CRM system, part of session data

Highly delegated design

•
Profile

Profile

x1.95K = 4.6MB

•

Large base class and
subclasses, in addition
to delegation costs

•

Problem 1:

Party
…

…

…

ElectronicAddress

PhysicalAddress

…
PhoneNumber

12 Object
Object
12 Object
12
40 ContactInfo
ContactInfo
40 ContactInfo
40
Date Date Date
Date
Date
createDate Date Date Date
Date createDate
createDate
Party enteredBy
Party enteredBy
Party enteredBy
Date updateDate
Date updateDate
Date updateDate
Party updateBy
Party updateBy
Party updateBy
Object primary
Object primary
Object primary
int typeId
int typeId
int typeId
String type
String type
String type
…
…
…
ElectronicAddress 48 PhysicalAddress 100 PhoneNumber
60
…
…
…

•

total:

~40 instances each

100

total:

152

total:

112

•
•

Functionality too
fine grained
Magnifies base class

Problem 2:

•

Storing computed
fields

Large instance sizes III
Case study: Modeling framework
Modeled object

•

Goal: model-based
programming for large
models

•

Support models with
100K objects

•

High base class costs
can severely limit
modeling choices

68 bytes +
your object cost

ModelObjectImpl

JVM overhead
16 bytes

bookkeeping
16 bytes

pointer
4 bytes

•
PropertiesHolder

JVM overhead
12 bytes

bookkeeping
20 bytes

•

forcing other
inefficient choices

Illustrates a variety of
common data type
modeling problems

Large instance sizes III
Case study: Modeling framework

•

Modeled object
68 bytes +
your object cost

•
•

ModelObjectImpl

JVM overhead
16 bytes

Problem: constant field
(Class) stored as
instance variable

bookkeeping
16 bytes

pointer
4 bytes

PropertiesHolder

JVM overhead
12 bytes

Problem: fields
allocated for features
that are not used in
many models

•
•

bookkeeping
20 bytes

Replaced with static
method

e.g. notification,
dynamic types
Refactored,
introducing
BasicObjectImpl
with no storage

Large instance sizes III
Case study: Modeling framework

•

Rarely used fields
moved to lazily
allocated side object

•

Problem: lazy
allocation not working.

Modeled object
68 bytes +
your object cost

•

Fixed

ModelObjectImpl

•
JVM overhead
16 bytes

bookkeeping
16 bytes

pointer
4 bytes

Problem: 5 fields never
used at the same time

•

Combined fields,
using casts

PropertiesHolder

JVM overhead
12 bytes

bookkeeping
20 bytes

•

Problem: stored
computations

•

Recompute as
needed

Large instance sizes III
Case study: Modeling framework

•

Problem: some models
make heavy use of
fields in side object

•

Example: these fields
are used for crossmodel references.
Memory costs force
models to be broken
into fragments,
increasing the need for
these references.

•

Solution: refactoring

Modeled object
68 bytes +
your object cost

ModelObjectImpl

JVM overhead
16 bytes

bookkeeping
16 bytes

pointer
4 bytes

•

PropertiesHolder

JVM overhead
12 bytes

bookkeeping
20 bytes

FlatObjectImpl
avoids delegation
cost

Large instance sizes: summary
Many reasons:

•

Expensive base classes

•
•
•

Some fields are not needed in the general case, or are for rarelyused features
Fine-grained designs using a common base class will multiply the
cost of the base class design

Data fields

•
•
•

Semi-constant fields
Sparse fields
Saving recomputable data unnecessarily – often the result of
premature optimization. Both scalar and reference fields

Typically, many different cases occur together in the same data model

Data type modeling: challenges for developers

•

Java’s limited data modeling means tradeoffs require care

•
•
•
•

Moving rarely-used fields to side objects can incur delegation costs
Moving sparse fields to a map can incur high map entry costs
Verifying actual costs and benefits is essential

Fixing problems of high-overhead data usually means refactoring data
models

•
•

Not easy late in the cycle
Using interfaces and factories up front can help

More data type patterns later

Representing relationships

Patterns of memory usage
Data types
High
overhead
Delegation

Fields
Base
class

Collections

High
data

Many, high
overhead

Duplication

Empty

Representation
Unused
space

Small
Special
purpose

Lifetime
Short
Complex
temps

Long
In-memory Space Correlated
designs vs. time lifetime

Large, high
per-entry cost
Special
purpose

Representing relationships

•

many, high-overhead collections

•

small collections

Small collections in context
Case study: Planning system, level graph edges
Index

•

Two examples of small
high-overhead
collections

•

297K edges cost 31MB

•

Overhead of
representation: 83%

•

Overhead will not
improve with more
vertices

HashMap
x1 = 1.8MB

Keys

Values

ArrayList

HashSet

x65K = 3.1MB

x65K = 16MB

1

1

Vertex

Level

Data

4.5

Integer

Edge

x65K = 1MB

x297K = 9MB

Small collections in context
Map with multiple values per entry
Index

•

HashMap
x1 = 1.8MB

Keys

Values

ArrayList

HashSet

x65K = 3.1MB

x65K = 16MB

1

1

Vertex

Level

Data

4.5

Integer

Edge

x65K = 1MB

x297K = 9MB

Only 5% of sets had
more than a few
elements each

Inside the Java collections
HashSet: many embedded usage assumptions
HashSet

Reuse of library code was
considered important.
Cost: 24 bytes/set + 4/entry

Total cost:
72+ bytes fixed
28 bytes/entry

HashMap
Default capacity 16.
For 5-entry set:
44+ bytes empty slots.
array

Assumes entry, key, value
sets all commonly used.
Cost: 12 bytes
HashMap$Entry

Key

Value
Saves hashcode. Usually
already saved. Cost: 4 bytes/entry

…

•

Not a good choice for
small collections

•

Users, look before you
leap – always measure

•

Framework designers,
beware making usage
assumptions

Small collections in context
Map with multiple values per entry
Index

Remedy
HashMap
x1 = 1.8MB

Keys

Values

ArrayList

ArrayList

x65K = 3.1MB

x65K = 3.7MB

1

1

Vertex

Level

Data

•

Switched to ArrayList.
Saved 77% of that
region.

•

HashSet functionality
was not worth the
cost. Uniqueness
already guaranteed
elsewhere

4.5

Wish list

Integer

Edge

x65K = 1MB

x297K = 9MB

•

Gracefully-growing
collections

Small collections in context
Multipart key as 2-element ArrayList

•

Index

HashMap
x1 = 1.8MB

Keys

Values

ArrayList

ArrayList

x65K = 3.1MB

x65K = 3.7MB

1

1

Vertex

Level

Data

4.5

Integer

Edge

x65K = 1MB

x297K = 9MB

ArrayList has a high
fixed cost. Also
required boxing of
integers.

Inside the Java collections
ArrayList
Cost of minimally sized
2-element ArrayList:
40 bytes fixed +
4 bytes/entry

ArrayList

Fixed costs from delegation
plus bookkeeping fields.

Object[]

entry

entry

…

•

Much lower fixed and
variable costs than
HashMap or HashSet

•

Fixed costs can still
add up for small
collections

Default size and growth
policy can mean overhead from empty slots

Small collections in context
Multipart key class
Index

Remedy:
HashMap
x1 = 1.8MB

Keys

Values

Pair

ArrayList

x65K = 1.3MB

x65K = 3.7MB

1

Vertex

Data

4.5

Edge
x297K = 9MB

•

Introduced Pair class
(Vertex, int level)

•

Again, functionality of
original design was not
worth the cost

•

Reduced key overhead
by 68%

Multipart key
Case study: Apache Commons MultiKeyMap
MultiKeyMap

Array

MultiKey

Array

KeyPart1
KeyPart2

Could have easily created specialized
MultiKey2, MultiKey3, etc.
to avoid delegation cost

…

•

Apache Common
collections frameworks
has the same pattern

•

Paying for flexibility
that’s not needed

•

Cost: additional 20
bytes per entry

Growth policies
Example: creating default-size ArrayLists
Index

•

28% overhead in
ArrayLists just from
empty slots

•

collections optimized
for growth

•

large defaults and
jumps – doubling

•

10% tax on some
copies

HashMap
x1 = 1.8MB

Keys

Values

Pair

ArrayList

x65K = 1.3MB

x65K = 5.2MB
Would be 3.7M
with optimal sizing

1

Vertex

Data

4.5

Edge
x297K = 9MB

Remedies:

•
•

Set initial capacity
trimToSize() after load

Inside the Java Collections
Cost of a 2-element collection
Minimal size
(bytes)

Default size
(bytes)

# of slots for 2
elements using
default size

LinkedList

96

96

3

ArrayList

48 or 56

56 or 80

10 or 12

HashMap

116 or 168

168

16

HashSet

132 or 184

184

16

From experiments with a few different JVMs, all 32-bit.

The cost of empty collections
Case study: CRM system, part of session data
Index

•

Small run had 26M of
session data. Will not
scale.

•

210 empty collections
per session = 28% of
session cost

SessionData
x330 = under 1MB

Person

3

other structures

Profile

15 MB

x1.95K = 4.6MB

70

ArrayList
x101K = 7.9MB

Remedies:

•
•
•

Lazily allocate
Collections.emptySet()
Avoid giving out
references

The Not-so-empty Collections
HashMap

ArrayList

Array

HashSet

Array

0 or 10 slot default
based on JVM

LinkedList

HashMap

Array

LinkedList$Entry

Always allocates
a sentinel entry

•

Minimum of 2 objects
each – component
parts are always
allocated

•

Default sizing can
increase cost (e.g. 16
elements for
HashMap/HashSet)

Inside the Java Collections
Cost of an empty collection
Minimal size
(bytes)

Default size
(bytes)

Default # of
slots

LinkedList

48

48

1 sentinel
entry

ArrayList

48

48 or 80

0 or 10

HashMap

56 or 120

120

16

HashSet

72 or 136

136

16

From experiments with a few different JVMs, all 32-bit.

Representing relationships

•

many, high-overhead collections

•

small collections

•

special-purpose collections

Small concurrent maps
Case study: Chat server framework
Active sessions

ConcurrentHashMap

•

Nested CHMs:
> 1600 bytes each!

•

Cost was 90% of this
structure; 10-20% of
total heap

x1 = 4MB
Session

110K

Chat session
x110K = 10MB

What went wrong:
Subscribers 1

ConcurrentHashMap
x110K = 173MB
Subscriber 1

Subscriber
x110K = 4.2MB

•

Library not intended
for use at this scale

•

Concurrency
requirements were
different at fine vs.
coarse grain

Small concurrent maps
Case study: Chat server framework
Active sessions

Remedies:

•

First considered
reducing width of inner
ConcurrentHashMap
from 16 to 3.
Savings: 67%

•

Used Hashtable, since
high level of
concurrency not
needed. Savings: 90+%

ConcurrentHashMap
x1 = 4MB

Session

110K

Chat session
x110K = 10MB

Subscribers 1

Hashtable
x110K = 17M

Subscriber 1

Subscriber
x110K = 4.2MB

Note:

•

Hashtable less
expensive than similar
Collections$
SynchronizedMap

Inside the Java Collections
Wrapped collections
Collections$
SynchronizedMap

Collections$
UnmodifiableMap

HashMap

…

28 bytes

•

Design is based on
delegation

•

Costs are significant
when collections are
small

•

Fine for larger
collections

HashMap

…

Small wrapped collections
Case study: media e-commerce site (64-bit)
Cache

ConcurrentHashMap
x1 = 3.8MB
Element

64K

•

108MB for
UnmodifiableMap
wrappers. 56 bytes
each

•

Twice the cost as on a
32-bit JVM

CachedElement
x63K = 137MB

Titles

UnmodifiableMap
x1.9M = 106M
1

Titles

32

UnmodifiableMap
x1.9M = 465MB

HashMap
x1.9M = 359MB

Title

1.01

String
x2M = 156MB

What went wrong:

•

Functionality not
worth the cost at this
scale. Unmodifiable
serves a developmenttime purpose

Multikey map: design I
Level graph edge index: as nested map

•

Index

Assume 10K vertices,
5 levels

HashMap
x1 = .3MB

10K

10K

Vertex (key)

HashMap
x10K = 2.4MB

(10K + 1) * HM fixed overhead
60K * HM per-entry overhead
Total cost: 2.6MB

5

5

Level (key)

Edge list (value)

Multikey map: design II
Level graph edge index: nested map, reordered
Index

•

Switching order
eliminated nested
collection fixed costs

•

Savings: 46%.
Consistent savings as
vertices increase

•

Good approach if you
know the distribution

HashMap
x1 = under 1K

5

5

Level (key)

HashMap
x5 = 1.4MB

6 * HM fixed overhead +
(50K + 5) * HM per-entry overhead
Total: 1.4MB

10K

Vertex (key)

10K

Edge list (value)

Multikey map: design III
Level graph edge index: single map, multikey
Index

•

11% better than I,
70% worse than II.

•

Trading fixed costs of
small collections for
per-element cost in a
large collection:
28-byte HM entry +
20-byte Pair

•

Results were surprising

•

Wish list: be able to
extend entry classes

HashMap
x1 = 1.4MB

50K

50K

Edge list (value)

Pair
x50K = 1MB

1

1

Level (key)
Integer or int

Vertex (key)
1 * HM fixed overhead +
50K * HM per-entry overhead +
50K * Pair overhead
Total: 2.4 MB

Multikey map: comparison
Incremental cost per vertex

•

Assume num levels is
much smaller than num
vertices

•

Then II is consistently
better than I

700

Bytes per vertex

600
500
400

•

I.
II.
III.

300
200

•

100
0
2

4

6
Number of levels

8

10

delta per vertex is
constant 128 bytes

Difference of III vs.
others is sensitive to
the number of levels,
even within a small
range

Break

Representing relationships

•

large collections, high per-entry overhead relative
to data

Large collections and scaling
Level graph edge index: single map, multikey
Index

•

Per-element cost is
constant. Constant is
large relative to actual
data.

•

Cost: 48 bytes per
element
Overhead: 83%

HashMap
28*n bytes

n

n

Edge list (value)

Pair
20*n bytes

1

1

Level (key)
Integer or int

Vertex (key)

Cost is dominated by
HM per-entry cost +
Pair cost

What went wrong:

•

high collection perentry + delegation
costs

Inside the Java Collections
Standard collections: per-entry costs.
Per-entry cost
(bytes)
LinkedList

24

ArrayList

4

HashMap

28 or 36

HashSet

28 or 36

•

May be in addition to
the overhead of
introducing another
object

From experiments with a few different JVMs, all 32-bit.
Excludes amortized per-collection costs such as empty array slots. Includes pointer to entry.

Nested collections, high per-element costs
Collaboration service: storing properties
Subscription

•

Stores 7 properties
per subscription, via
session API

•

HT per-entry, boxing
costs add 350 bytes
overhead per session,
impeding scalability

x20K
1

Session
x20K
Properties 1

Hashtable

What went wrong:

x20K = 7M
Attributes

7

String
(shared across
sessions)

Values

•
7

Integer, Long,
etc.
x140K = 2.7M

Cost obscured by
multiple layers, fanouts

What went right:

•

Shared attribute names
across sessions

Nested collections, high per-element costs
Collaboration service: storing properties
Subscription

Remedy:

•

Combined properties
into a single high-level
property, inlining scalar
values

•

7 : 1 reduction in
collection entry costs,
plus reduced boxing
costs

•

Note: still paying for
HT fixed cost

x20K
1

Session
x20K
Properties 1

Hashtable
x20K = 2.6M
Attributes

1

String
(shared across
sessions)

Values

1

SubscriptionProperty
x20K = 1.2M

Representing relationships

•

large collections, high per-entry overhead relative
to data

•

special-purpose collections

Collections involving scalars
Case study: monitoring infrastructure

TreeMap

•

Data structure took
1.2GB

•

Overhead is still 82%
at this giant scale

•

Some alternative scalar
maps/collections
available, with much
lower overhead

x52 = 537MB

265K

265K

Double

Double

x13.4M = 342MB

x13.4M = 342MB

Identity maps
Comparison: HashMap vs. IdentityHashMap

HashMap

IdentityHashMap

x1 = 298KB

x1 = 128KB

10000

Key

10000

Value

10000

Key

•

For maintaining a map
of unique objects,
where the reference is
the key

•
•

Equality based on ==

•

Cost reduced by 59%
in this experiment

10000

Value

Open addressing
implementation avoids
the cost of Entry
objects

Collections & Scaling
Some costs are not amortized as scale increases:

•
•

Overhead of small, nested collections

•
•

Per-element overhead of large collections

the total cost of each collection is what matters – both fixed and
variable parts

collection per-entry and data delegation overheads

The standard collections
JDK Standard Collections

•

Speed has been the focus, not footprint

IBM (Harmony) and Sun implementations not that different in footprint

Hard-wired assumptions, few policy knobs (e.g. growth policies)

Specialized collections are worth learning about:

•

IdentityHashMap, WeakHashMap, ConcurrentHashMap, etc.

Collections alternatives
Apache Commons

•

Many useful collections:

•
•
•

Flat3Map, MultiMap, MultiKeyMap

Focus is mostly on functionality
Footprint similar to standard, with a few exceptions.

GNU Trove

•
•
•

Many space-efficient implementations
e.g. scalar collections
e.g. list entries without delegation cost

Cliff Click nonblocking; Javolution; Amino
Specialized collections within frameworks you use

Important: check your corporate policy re: specific open source frameworks

Summary: using collections

•

•

Choosing and configuring carefully really matters,
since collection implementations are often
expensive

Avoid writing your own if possible

Modeling your data types

•

Too much data

Saving formatted data I
Case study: one layer of chat framework

•

Session data:
Session bridge

Session

82% of cost of this
layer, due to saving
computation of
toString()

x111K = 42MB

What went wrong?
saved toString 3

•

Empty space overhead
in StringBuffer

•

Space cost not worth
the time savings

StringBuffer
x334K = 187MB

Remedies:

•
•

String, not StringBuffer
Recompute as needed

Saving formatted data I: delegation effects
Case study: one layer of chat framework
Inside each Session:
SessionWrapper

SessionBase

•

Data type had been
split in three

•

Same coding pattern
copied to each part

SessionImpl

What went wrong?
StringBuffer

StringBuffer

StringBuffer

•

Delegated design
magnified other costs

Saving formatted data II
Case study: CRM system
Session state fragment:
Profile

Profile

“Y” or “N”

String

n

•
•

Saving formatted data

•

Storing a boolean as a
String. Health ratio is
48 : 1

“10%”

n

String

Some were constants
(“10%”). Some had
few values (“Y”, “N”)

What went wrong?

•

Duplicating data with
high-overhead
representation

•

Space cost not worth
the time savings

Duplicate, immutable data
Case study: Text analysis system, concordance
ConcordanceEntry

ConcordanceEntry

x131K = 41MB

…

Annotation 1

Type

String

1

…

•

17% of cost due to
duplication of Type and
its String data

•

Only a small number of
immutable Types

What went wrong?

•

Interface design did
not provide for sharing

•

Full cost of duplication
was hidden

1

Remedy
char[]

1

•

Use shared immutable
factory pattern

Background: sharing low-level data
String.intern()

•
•
•
•

You specify which Strings to share
Shares the String object and the character array
Make sure it’s worth it, since there is a space cost
Myth that is causes memory leaks

•

Though can hit permspace limits

Boxed scalars

•
•
•

Integer.valueOf(), Boolean.valueOf(), etc.
Shares some common values (not all)
Make sure you don’t rely on ==

Common-prefix data
Case study: application server, class cache
Class map

•

Class loader map of
class names to jar files

•

> 120M of Strings,
mostly duplicate prefix
information

HashMap

Class name

String
120+ MB

Class info

What went wrong?
class info

•
•

Duplication cost
Deeper problem:
misplaced optimization

Remedy

•
•

Implemented trie
Simpler, 2-part
factoring can also work

Managing object lifetime

Patterns of memory usage
Data types
High
overhead
Delegation

Fields
Base
class

Collections

High
data

Many, high
overhead

Duplication

Empty

Representation
Unused
space

Small
Special
purpose

Lifetime
Short
Complex
temps

Long
In-memory Space Correlated
designs vs. time lifetime

Large, high
per-entry cost
Special
purpose

Managing object lifetime

•

short-lived data

Temporaries

Aren’t temporary objects free these days?

•

Some are, and some definitely aren’t

Expensive temporaries
Example: SimpleDateFormat

•
SimpleDateFormat

GregorianCalendar

DecimalFormat

•
String[]

String[]

•

…
DecimalFormatSymbols

…

int[]
…

Date

Costly construction
process. Each call to
the default constructor
results in:

…

123 calls to 55
distinct methods
44 new instances

•

Designed for costs to
be amortized over
many uses

•

Remedy: reuse via a
local variable or
thread-local storage

TimeZone

Background: ThreadLocal storage

•

ThreadLocal: JDK-supplied per-thread variable

•

An application can create many of these for different
purposes

•

Enables reuse without introducing concurrency problems

Tradeoffs
•

Converter, formatter, factory, schema, connection, etc. may be good
candidates for reuse. They can be expensive to create, and are often
designed for reuse

•

Use ThreadLocal or specialized resource pools, depending on
requirements

•
•
•

Sometimes local variables are good enough
Avoid rolling your own resource pools

Not worth caching simple temporaries

•
•
•

Some temporaries are inexpensive to create (e.g. Integer, many
iterators)
ThreadLocal access is usually a hash lookup
Verify cost/benefit

Managing object lifetime

•

long-lived data

Managing lifetime: understanding requirements

Three very different reasons for long-lived data

1.

In-memory design. Data is in memory forever

2.

Space vs. time. Data may be discarded and recomputed

3.

Correlated lifetime. Data alive only during the lifetime of other
objects or during specific phases

Each has its own best practices and pitfalls
Many problems stem from misunderstanding requirements

Managing Object Lifetime

•

If not careful, extending the lifetime of objects can introduce
concurrency problems, leaks, and additional memory overhead
from structures that manage lifetime

Managing object lifetime

•

long-lived data

•

in-memory designs

The limits of objects
Case study: memory analysis tool

•

Some object-oriented
designs will never fit in
memory

•

Estimate and measure
early

Requirement: analyze 80-million object heap
Design: one object per target application object
Hypothetical minimum: if each object needed just 4
fields (type, id, ptr to references, flags):
80M x 32 bytes =
2.5G just to model application objects!
To model references (2-3 per object), and leave scratch
space for algorithms, design would require at least 10G

Note

•

An earlier design used
a modeling framework
with high overhead
costs. Just optimizing
those costs would not
have been sufficient.

The limits of objects
Case study: memory analysis tool

Recommendations:

•

java.nio is one way to
implement a backing
store

•

Column-based
approach is a last
resort. For
optimization of highly
specialized and
protected components

Solution:

•
•

Backing store using memory-mapped files (java.nio)

Built a column-based storage infrastructure with
scalar arrays, to reduce working set and avoid
object header costs

•
•
•

Specialized for this application’s access patterns
Don’t try this at home!

Result is highly scalable – runs in a 256M heap

•

some XML DOM
implementations
use this approach

Managing object lifetime

•

long-lived data

•

space vs. time designs

Space vs. time designs: mechanisms
Mechanisms for saving time by maintaining data in memory:

•
•

caches
resource pools

Also:

•
•

thread-local storage
adding computed fields to data types

Background: Soft References
Weak

Soft

Strong

Soft References:

•
•

Tells GC to reclaim these objects only when the space is really needed

•

Used mostly to avoid recomputation

Will keep an object alive after it is no longer strongly referenced, just in
case it is needed again

•
•

e.g. for caches and resource pools
e.g. for side objects (cached fields) which can be recreated if lost

Caches & pools: a sampling
Case study: class loader “cache”

•
•
•

> 100M of classname strings
Implemented an in-memory design. Purpose was for
performance - should have been a small, bounded cache

•

Unbounded growth (leak). An object pool framework
was used for 20 different purposes, to improve
performance. Unbounded size; strong references.
Solution: soft references

Case study: financial web application

•
•

Caches & pools should
always be bounded

•

Larger caches aren’t
necessarily better

Cache itself was only needed during startup

Case study: high-volume web application

•

•

Cache sized too large, aiming for 95% hit rate
Result: performance problems due to excessive GC

Caches & resource pools: best practices

Soft references are useful for implementing simple caches/pools, but …

•
•

Relying solely on soft references gives up control over policy
May not leave enough headroom for temporary objects, causing the GC
to run more often

Caches / pools should in general be bounded in size
Soft references can be used as an additional failsafe mechanism

Many implementations of caches and resource pools are available
Avoid writing your own if possible

Managing object lifetime

•

long-lived data

•

correlated lifetime designs

Correlated lifetime
Objects needed …
… only while other objects are alive

•
•
•

e.g. annotations on existing objects
e.g. sharing pools
e.g. listeners

… or during specific phases or time intervals

•
•

e.g. loading
e.g. session state, for a bounded length of time

Sharing and growing
Case study: Planning system, sharing pool
Sharing pool

Each iteration of the
algorithm creates
hundreds of thousands
of new expressions

•

Used shared
immutable factory
pattern to save space
on common
subexpressions

•

Result: unbounded
growth due to pool

keys

HashMap
Keeps subexpressions
(and map entries)
around forever

•
Shared data
values

Subexpressions

transient references
for one iteration

Algorithm

Background: Weak References
Weak

Soft

Strong

Weak Reference:

•

Tells GC it may reclaim an object as soon as it is no longer needed

•
•

as long as there are no stronger references to the object

Useful for preventing leaks – ties the lifetime of objects to other objects

•

e.g. for annotations, sharing pools, listener lists

Sharing and not growing
Case study: Planning system, sharing pool
Sharing pool

Remedy:

Strong
reference
keys

ReferenceMap

Shared data
values

Weak
reference

•

Apache Commons
ReferenceMap (Strong,
Weak)

•

Pool entry will be
removed when value is
no longer needed

Subexpressions

Note:
transient references
for one iteration

Algorithm

•

Also considered soft
references. But each
iteration used different
expressions, so no
need to prolong
lifetime. Goal was
space, not time.

Using Weak References
A few common usage patterns

Weak key, strong value

•
•
•

The standard Java WeakHashMap.
Example usage: key = object to be annotated, value = annotation
Caution if key is the same as or strongly reachable from value

Strong key, weak value

•

As in previous example, for sharing pool

Background: weak and soft reference costs
•

Weak and soft references are Objects, and so incur footprint costs

•
•

e.g. 24 bytes for each WeakReference on one 32-bit JVM

Some weak/soft maps entries extend Weak/SoftReference; others add
yet another level of delegation

•

e.g. Apache Commons ReferenceMap: at least 2 objects per entry

Leaks & drag: a sampling
Case study: CRM application

•
•
•

Leak: bug in end-of-request processing failed to
remove an object from a listener queue
Immediate fix: fixed bug in request
For robustness: have listener queue use weak
references

Case study: development tool

•
•

Large index needed only during load time
Easy solution: nulled out pointer

Case study: CRM application

•

Session state retained for 8 hours

•
•

Made worse by costly session state (200K / user)

Easy solution: fixed configuration

•

Failure to unregister
listeners is a common
cause of leaks

Entries too large
Case study: CRM application

small / empty
collections

Sessions

…

one
session

…

duplicated data

…

•

200K session state per
user!

•

Often a pile-up of
multiple problems

…

highly delegated
representations
with 100s or
1000s of instances

.
.
.

…
…

large substructures
retained by accident

Process

•

simple techniques, tools, and resources

Surprises are everywhere
Case study: CRM application
Sessions

…

one
session

…

•

Developers expected
2K and found 200K
session state per user

•

Unit costs can be very
difficult to predict at
every level

…
…
…
…

.
.
.

Measurement

•

Many surprises

•

Especially in Java, it is essential to verify assumptions
empirically throughout the lifecyle

What and when
A few techniques among many:

•

Small, synthetic experiments are extremely valuable, to test out
frameworks and design patterns before they are adopted

•

Of course, incorporate measurement into unit and system tests

•
•
•
•

Use detailed diagnostic tools to periodically check for scale, and
look for surprises
Be mindful of normalization units when designing tests: how many
concurrent users? active sessions?
Understand costs of major units used in lower layers
Run experiments at different scales early on. Are costs amortized
as expected?

•

Cardinality of relationships: state as part of design; verify periodically;
then use in combination with measurement as the basis for estimation

•

Caches and pools: verify that they are working and they are worth it

Tools for heap analysis

•

For analyzing the sources of memory bloat and for verifying
assumptions, tools that rely on heap snapshots are the most valuable

•

Some free tools from IBM and Sun

•
•

Check IBM DeveloperWorks & alphaWorks; Sun Developers
Network

Commercial and open source tools

•
•

SAP MAT: now open source (Eclipse)
YourKit, JProfiler, …. – many only read hprof, not phd format

Gathering heap data
•

IBM and Sun diagnostic guides have information on gathering and
analyzing heap snapshots, and pointers to free tools

•
•
•

Sun: http://java.sun.com/javase/, info with specific JDKs

Formats

•
•
•

IBM: http://www.ibm.com/developerworks/java/jdk/diagnosis/

hprof: Sun & IBM
phd: IBM only

The choice of when to take snapshots is key

•
•

For footprint: at steady state with known load
For footprint of a single feature or for suspected growth:
before/after fixed number of operations, starting after system is
warmed up

Additional resources
JDK library source code is freely available, and can be very worthwhile to consult
Many valuable articles on the web

•
•
•

IBM DeveloperWorks, Sun Developer Network are good starting points
Some misinformation occasionally found on reputable sites
Best practices and tuning guides for specific frameworks

Garbage collection and overall heap usage

•

IBM and Sun diagnosis sites have GC tuning guides, free tools

•
•

IBM Pattern Modeling and Analysis Tool for GC (PMAT) on alphaWorks

Some performance analysis tools have heap monitoring features

Object allocation

•

Most Java performance profilers can show allocation information with calling
context. e.g. hprof (free)

Conclusions
•

Distributed development, layers of frameworks, and Java’s modeling
limitations make it easy to create bloated data designs.

•

In this environment, awareness of costs is essential for making informed
tradeoffs.

•

Some tradeoffs can be made without sacrificing speed or sound design.

•

The concept of data structure health – the ratio of actual data to its
representation – can illuminate where there is room for improvement,
and point out aspects of a design that will not scale.

Acknowledgments
Thanks to:

•
•
•
•
•
•
•

Matthew Arnold
Dave Grove
Tim Klinger
Trevor Parsons
Peter Santhanam
Edith Schonberg
Yeti

See also:

•

N. Mitchell, G. Sevitsky, “The Causes of Bloat, the Limits of Health”, in
Proceedings of Object Oriented Programming Systems Languages and
Applications (OOPSLA) 2007, Montreal, Canada.