Building Memory-efficient Java Applications Nick Mitchell, Gary Sevitsky (presenter) IBM TJ Watson Research Center Hawthorne, NY USA Copyright is held by the author/owner(s). OOPSLA 2008, October 19–23, 2008, Nashville, Tennessee, USA. ACM 978-1-60558-220-7/08/10. Quiz Small boxes? Q: What is the size ratio of Integer to int? a. 1 : 1 b. 1.33 : 1 c. 2 : 1 d. ? Assume 32-bit platform Small boxes? Q: What is the size ratio of Integer to int? a. 1 : 1 b. 1.33 : 1 c. 2 : 1 d. 4 : 1  Assume 32-bit platform Small things? Q: How many bytes in an 8-character String? a. 8 b. 16 c. 28 d. ? Assume 32-bit platform Small things? Q: How many bytes in an 8-character String? a. 8 b. 16 c. 28 d. 64  Assume 32-bit platform Bigger? Better? Q: Which of the following is true about HashSet relative to HashMap a. does less, smaller b. does more, smaller c. similar amount of functionality, same size d. ? Bigger? Better? Q: Which of the following is true about HashSet relative to HashMap a. does less, smaller b. does more, smaller c. similar amount of functionality, same size d. does less, larger  Small collections? Q: Put the following 2-element collections in size order: ArrayList, HashSet, LinkedList, HashMap Small collections? Q: Put the following 2-element collections in size order: ArrayList, HashSet, LinkedList, HashMap A: ArrayList, LinkedList, HashMap, HashSet Collections? Q: How many live collections in a typical heap? a. between five and ten b. tens c. hundreds d. ? Collections? Q: How many live collections in a typical heap? a. between five and ten b. tens c. hundreds d. tens/hundreds of thousands, even millions  Roadmap Quiz Background & myths Memory health Patterns of memory usage • Process Case studies, with JVM background mixed in Background Background • Our group has been diagnosing memory and performance problems in large Java systems for more than 8 years • Worked with dozens of applications: open source, large commercial applications, software products • • servers, clients, applications, frameworks, generated code, etc. Common thread: it is easy to build systems that consume large memory resources compared to the work accomplished The big pile-up Heaps are getting bigger • • Grown from 500M to 2-3G or more in the past few years But not necessarily supporting more users or functions Surprisingly common: • • • • requiring 1G memory to support a few hundred users saving 500K session state per user requiring 2M for a text index per simple document creating 100K temporary objects per web hit Consequences for scalability, power usage, and performance The big pile-up Not a reflection on the quality of programmers – many are expert More abstractions = less awareness of costs • It is easy for costs to pile up, just piecing together building blocks The iceberg effect: App Frameworks Frameworks Frameworks Frameworks Myths Things are fine Objects (or Strings, HashMaps, …) are cheap Frameworks are written by experts, so they’ve been optimized (for my use case!) The JIT and GC will fix everything Things are not fine I knew foo was expensive; I didn’t know it was this expensive! It’s no use: O-O plus Java is always expensive Efficiency is incompatible with good design Goals It’s not a lost cause! • • Raise awareness of costs Give you a way to make informed tradeoffs Roadmap Quiz Background & myths Memory health Patterns of memory usage • Process Case studies, with JVM background mixed in Patterns of memory usage Data types High overhead Delegation Fields Base class Collections High data Many, high overhead Duplication Empty Representation Unused space Small Special purpose Lifetime Short Complex temps Long In-memory Space Correlated designs vs. time lifetime Large, high per-entry cost Special purpose Roadmap Quiz Background & myths Memory health Patterns of memory usage • • • • • • Process Modeling your data types Modeling relationships …. break … More relationships More data type modeling Object lifetime Note about measurements Measurements shown are estimates obtained from experiments on a sampling of different JVMs. In practice costs will vary across JVMs. Measures we report may be subject to errors due to data collection, analysis tools, or interpretation of results. They are intended only to illustrate the nature of memory problems. Memory health The idea of health TreeMap (100 entries) Collection region • Schematic of a data structure • Distinguish data types from collections • A region includes all of its implementation classes TreeMap x1 = 3.9KB Average fanout Data type region 100 100 Double Double x100 = 2.3KB x100 = 2.3KB The idea of health TreeMap (100 entries) TreeMap • Cost: 8.6KB • What are the bytes accomplishing? • How much is actual data vs. overhead? x1 = 3.9KB 100 100 Double Double x100 = 2.3KB x100 = 2.3KB Data type health One Double • 33% is actual data • 67% is the representation overhead • From one 32-bit JVM. Varies with JVM, architecture. Double 24 bytes Double double JVM-imposed overhead: 16 bytes data: 8 bytes Data type health Example: An 8-character String • only 25% is the actual data • 75% is overhead of representation • would need 96 characters for overhead to be 20% or less 8-char String 64 bytes String JVM overhead 16 bytes bookkeeping fields 12 bytes pointer 4 bytes char[] chars JVM overhead 16 bytes data 16 bytes Collection health A 100-entry TreeMap TreeMap • How does a TreeMap spend its bytes? • Collections have fixed and variable costs x1 = 3.9KB TreeMap Fixed overhead: 48 bytes TreeMap$Entry Per-entry overhead: 40 bytes data Data structure health TreeMap (100 entries) TreeMap x1 = 3.9KB 100 • 82% overhead overall • Design enables updates while maintaining order • Is it worth the price? 100% overhead 100 Double Double x100 = 2.3KB x100 = 2.3KB 67% overhead Data structure health Alternative implementation (100 entries) double[] double[] 1x = 816 bytes 1x = 816 bytes • Binary search against sorted array • Less functionality – suitable for loadthen-use scenario • 2% overhead 2% overhead Health as a gauge of scalability TreeMap (10,000 entries) TreeMap • Overhead is still 82% of cost • Overhead is not amortized in this design • High constant cost per element: 88 bytes x1 = 391KB 10000 10000 Double Double x10000 = 234KB x10000 = 234KB Health as a gauge of scalability TreeMap • Overhead is still 82% of cost • Overhead is not amortized in this design • High constant cost per element: 88 bytes 88% 82% Data Overhead 1 100K 200K 300K 400K Health as a gauge of scalability Alternative implementation double[] ~ 0% overhead double[] Data Overhead 2% 1 0% 100K 200K 300K 400K • Overhead starts out low, quickly goes to 0 • Cost per element is 16 bytes, pure data Summary: Health Distinguish actual data from the overhead of representation: • • • Overhead from your data types Overhead from your collection choices, fixed vs. variable Many other ways to break down overhead costs • JVM object overhead, delegation costs, empty array slots, unused fields, duplicate data, ... Can help answer: • • • How much room for improvement? Is the functionality worth the price? Which overhead will be amortized? If constant, how large? Patterns of memory usage Data types High overhead Delegation Fields Base class Collections High data Many, high overhead Duplication Empty Representation Unused space Small Special purpose Lifetime Short Complex temps Long In-memory Space Correlated designs vs. time lifetime Large, high per-entry cost Special purpose Modeling your data types • High-overhead data types • Object and delegation costs Background: the cost of objects Boolean 16 bytes header 12 bytes JVM & hardware impose costs on objects. Can be substantial for small objects • Headers enable functionality and performance optimizations • 8-byte alignment in this JVM • Costs vary with JVM, architecture boolean alignment 1 byte 3 bytes Double 24 bytes header 12 bytes • double 8 bytes alignment 4 bytes char[2] 24 bytes header 16 bytes From experiment on one 32-bit JVM 2 chars alignment 4 bytes 4 bytes The cost of delegation Example: An 8-character String • 31% is overhead due to modeling as two objects • Effect varies with size of String 8-char String 64 bytes String JVM overhead 16 bytes bookkeeping fields 12 bytes pointer 4 bytes char[] chars JVM overhead 16 bytes data 16 bytes The culture of objects C++ has 5 ways to organize fields into data types. Java has 2. • • • • • Delegation Composition Single inheritance Multiple inheritance Union types Software engineering culture favors reuse and loosely coupled designs Fine-grained modeling Case study: server framework, part of connection Request info Request x46K = 67MB Entry … From Params 36% of cost is delegation overhead • Constant overhead per Request • Can magnify the costs of other choices Contact NameAddress Url Params • Entry … NameAddress 34 instances to represent a request. Cost: 1.5K per request. Will not scale. one Request Entry To • Url Params Params NameAddress Url Params Modeling your data types • Background: 32- vs. 64-bit JVMs 32- vs. 64-bit • 64-bit architectures can have a big impact on memory costs. Especially in designs that have a lot of small objects and pointers • Using 64-bit addressing to solve memory problems can cause new ones • • • Increases object header, alignment, and pointer overhead One study shows 40-50% avg. increase in heap sizes for benchmarks Some JVMs have options for extended 32-bit addressing, allowing access to larger heaps without the footprint cost 32- vs. 64-bit Example: An 8-character String • 50% larger • Delegated design is responsible for extra object header and pointer costs • Fine-grained designs incur especially high costs 8-char String 96 bytes String JVM overhead 24 bytes bookkeeping pointer fields 12 bytes 8 bytes alignment 4 bytes char[] chars JVM overhead 24 bytes data 16 bytes Modeling your data types • High-overhead data types • Large instance sizes Bookkeeping fields Simple example: an 8-character String • String users pay a 12byte tax to store offset, length, hashcode. Unnecessary for most common use cases. • 19% overhead for an 8-char String • Premature optimization. Cautionary tale for library designers! 8-char String 64 bytes String JVM overhead 16 bytes bookkeeping fields 12 bytes pointer 4 bytes char[] chars JVM overhead 16 bytes data 16 bytes Large instance sizes II • Case study: CRM system, part of session data Highly delegated design • Profile Profile x1.95K = 4.6MB • Large base class and subclasses, in addition to delegation costs • Problem 1: Party … … … ElectronicAddress PhysicalAddress … PhoneNumber 12 Object Object 12 Object 12 40 ContactInfo ContactInfo 40 ContactInfo 40 Date Date Date Date Date createDate Date Date Date Date createDate createDate Party enteredBy Party enteredBy Party enteredBy Date updateDate Date updateDate Date updateDate Party updateBy Party updateBy Party updateBy Object primary Object primary Object primary int typeId int typeId int typeId String type String type String type … … … ElectronicAddress 48 PhysicalAddress 100 PhoneNumber 60 … … … • total: ~40 instances each 100 total: 152 total: 112 • • Functionality too fine grained Magnifies base class Problem 2: • Storing computed fields Large instance sizes III Case study: Modeling framework Modeled object • Goal: model-based programming for large models • Support models with 100K objects • High base class costs can severely limit modeling choices 68 bytes + your object cost ModelObjectImpl JVM overhead 16 bytes bookkeeping 16 bytes pointer 4 bytes • PropertiesHolder JVM overhead 12 bytes bookkeeping 20 bytes • forcing other inefficient choices Illustrates a variety of common data type modeling problems Large instance sizes III Case study: Modeling framework • Modeled object 68 bytes + your object cost • • ModelObjectImpl JVM overhead 16 bytes Problem: constant field (Class) stored as instance variable bookkeeping 16 bytes pointer 4 bytes PropertiesHolder JVM overhead 12 bytes Problem: fields allocated for features that are not used in many models • • bookkeeping 20 bytes Replaced with static method e.g. notification, dynamic types Refactored, introducing BasicObjectImpl with no storage Large instance sizes III Case study: Modeling framework • Rarely used fields moved to lazily allocated side object • Problem: lazy allocation not working. Modeled object 68 bytes + your object cost • Fixed ModelObjectImpl • JVM overhead 16 bytes bookkeeping 16 bytes pointer 4 bytes Problem: 5 fields never used at the same time • Combined fields, using casts PropertiesHolder JVM overhead 12 bytes bookkeeping 20 bytes • Problem: stored computations • Recompute as needed Large instance sizes III Case study: Modeling framework • Problem: some models make heavy use of fields in side object • Example: these fields are used for crossmodel references. Memory costs force models to be broken into fragments, increasing the need for these references. • Solution: refactoring Modeled object 68 bytes + your object cost ModelObjectImpl JVM overhead 16 bytes bookkeeping 16 bytes pointer 4 bytes • PropertiesHolder JVM overhead 12 bytes bookkeeping 20 bytes FlatObjectImpl avoids delegation cost Large instance sizes: summary Many reasons: • Expensive base classes • • • Some fields are not needed in the general case, or are for rarelyused features Fine-grained designs using a common base class will multiply the cost of the base class design Data fields • • • Semi-constant fields Sparse fields Saving recomputable data unnecessarily – often the result of premature optimization. Both scalar and reference fields Typically, many different cases occur together in the same data model Data type modeling: challenges for developers • Java’s limited data modeling means tradeoffs require care • • • • Moving rarely-used fields to side objects can incur delegation costs Moving sparse fields to a map can incur high map entry costs Verifying actual costs and benefits is essential Fixing problems of high-overhead data usually means refactoring data models • • Not easy late in the cycle Using interfaces and factories up front can help More data type patterns later Representing relationships Patterns of memory usage Data types High overhead Delegation Fields Base class Collections High data Many, high overhead Duplication Empty Representation Unused space Small Special purpose Lifetime Short Complex temps Long In-memory Space Correlated designs vs. time lifetime Large, high per-entry cost Special purpose Representing relationships • many, high-overhead collections • small collections Small collections in context Case study: Planning system, level graph edges Index • Two examples of small high-overhead collections • 297K edges cost 31MB • Overhead of representation: 83% • Overhead will not improve with more vertices HashMap x1 = 1.8MB Keys Values ArrayList HashSet x65K = 3.1MB x65K = 16MB 1 1 Vertex Level Data 4.5 Integer Edge x65K = 1MB x297K = 9MB Small collections in context Map with multiple values per entry Index • HashMap x1 = 1.8MB Keys Values ArrayList HashSet x65K = 3.1MB x65K = 16MB 1 1 Vertex Level Data 4.5 Integer Edge x65K = 1MB x297K = 9MB Only 5% of sets had more than a few elements each Inside the Java collections HashSet: many embedded usage assumptions HashSet Reuse of library code was considered important. Cost: 24 bytes/set + 4/entry Total cost: 72+ bytes fixed 28 bytes/entry HashMap Default capacity 16. For 5-entry set: 44+ bytes empty slots. array Assumes entry, key, value sets all commonly used. Cost: 12 bytes HashMap$Entry Key Value Saves hashcode. Usually already saved. Cost: 4 bytes/entry … • Not a good choice for small collections • Users, look before you leap – always measure • Framework designers, beware making usage assumptions Small collections in context Map with multiple values per entry Index Remedy HashMap x1 = 1.8MB Keys Values ArrayList ArrayList x65K = 3.1MB x65K = 3.7MB 1 1 Vertex Level Data • Switched to ArrayList. Saved 77% of that region. • HashSet functionality was not worth the cost. Uniqueness already guaranteed elsewhere 4.5 Wish list Integer Edge x65K = 1MB x297K = 9MB • Gracefully-growing collections Small collections in context Multipart key as 2-element ArrayList • Index HashMap x1 = 1.8MB Keys Values ArrayList ArrayList x65K = 3.1MB x65K = 3.7MB 1 1 Vertex Level Data 4.5 Integer Edge x65K = 1MB x297K = 9MB ArrayList has a high fixed cost. Also required boxing of integers. Inside the Java collections ArrayList Cost of minimally sized 2-element ArrayList: 40 bytes fixed + 4 bytes/entry ArrayList Fixed costs from delegation plus bookkeeping fields. Object[] entry entry … • Much lower fixed and variable costs than HashMap or HashSet • Fixed costs can still add up for small collections Default size and growth policy can mean overhead from empty slots Small collections in context Multipart key class Index Remedy: HashMap x1 = 1.8MB Keys Values Pair ArrayList x65K = 1.3MB x65K = 3.7MB 1 Vertex Data 4.5 Edge x297K = 9MB • Introduced Pair class (Vertex, int level) • Again, functionality of original design was not worth the cost • Reduced key overhead by 68% Multipart key Case study: Apache Commons MultiKeyMap MultiKeyMap Array MultiKey Array KeyPart1 KeyPart2 Could have easily created specialized MultiKey2, MultiKey3, etc. to avoid delegation cost … • Apache Common collections frameworks has the same pattern • Paying for flexibility that’s not needed • Cost: additional 20 bytes per entry Growth policies Example: creating default-size ArrayLists Index • 28% overhead in ArrayLists just from empty slots • collections optimized for growth • large defaults and jumps – doubling • 10% tax on some copies HashMap x1 = 1.8MB Keys Values Pair ArrayList x65K = 1.3MB x65K = 5.2MB Would be 3.7M with optimal sizing 1 Vertex Data 4.5 Edge x297K = 9MB Remedies: • • Set initial capacity trimToSize() after load Inside the Java Collections Cost of a 2-element collection Minimal size (bytes) Default size (bytes) # of slots for 2 elements using default size LinkedList 96 96 3 ArrayList 48 or 56 56 or 80 10 or 12 HashMap 116 or 168 168 16 HashSet 132 or 184 184 16 From experiments with a few different JVMs, all 32-bit. The cost of empty collections Case study: CRM system, part of session data Index • Small run had 26M of session data. Will not scale. • 210 empty collections per session = 28% of session cost SessionData x330 = under 1MB Person 3 other structures Profile 15 MB x1.95K = 4.6MB 70 ArrayList x101K = 7.9MB Remedies: • • • Lazily allocate Collections.emptySet() Avoid giving out references The Not-so-empty Collections HashMap ArrayList Array HashSet Array 0 or 10 slot default based on JVM LinkedList HashMap Array LinkedList$Entry Always allocates a sentinel entry • Minimum of 2 objects each – component parts are always allocated • Default sizing can increase cost (e.g. 16 elements for HashMap/HashSet) Inside the Java Collections Cost of an empty collection Minimal size (bytes) Default size (bytes) Default # of slots LinkedList 48 48 1 sentinel entry ArrayList 48 48 or 80 0 or 10 HashMap 56 or 120 120 16 HashSet 72 or 136 136 16 From experiments with a few different JVMs, all 32-bit. Representing relationships • many, high-overhead collections • small collections • special-purpose collections Small concurrent maps Case study: Chat server framework Active sessions ConcurrentHashMap • Nested CHMs: > 1600 bytes each! • Cost was 90% of this structure; 10-20% of total heap x1 = 4MB Session 110K Chat session x110K = 10MB What went wrong: Subscribers 1 ConcurrentHashMap x110K = 173MB Subscriber 1 Subscriber x110K = 4.2MB • Library not intended for use at this scale • Concurrency requirements were different at fine vs. coarse grain Small concurrent maps Case study: Chat server framework Active sessions Remedies: • First considered reducing width of inner ConcurrentHashMap from 16 to 3. Savings: 67% • Used Hashtable, since high level of concurrency not needed. Savings: 90+% ConcurrentHashMap x1 = 4MB Session 110K Chat session x110K = 10MB Subscribers 1 Hashtable x110K = 17M Subscriber 1 Subscriber x110K = 4.2MB Note: • Hashtable less expensive than similar Collections$ SynchronizedMap Inside the Java Collections Wrapped collections Collections$ SynchronizedMap Collections$ UnmodifiableMap HashMap … 28 bytes • Design is based on delegation • Costs are significant when collections are small • Fine for larger collections HashMap … Small wrapped collections Case study: media e-commerce site (64-bit) Cache ConcurrentHashMap x1 = 3.8MB Element 64K • 108MB for UnmodifiableMap wrappers. 56 bytes each • Twice the cost as on a 32-bit JVM CachedElement x63K = 137MB Titles UnmodifiableMap x1.9M = 106M 1 Titles 32 UnmodifiableMap x1.9M = 465MB HashMap x1.9M = 359MB Title 1.01 String x2M = 156MB What went wrong: • Functionality not worth the cost at this scale. Unmodifiable serves a developmenttime purpose Multikey map: design I Level graph edge index: as nested map • Index Assume 10K vertices, 5 levels HashMap x1 = .3MB 10K 10K Vertex (key) HashMap x10K = 2.4MB (10K + 1) * HM fixed overhead 60K * HM per-entry overhead Total cost: 2.6MB 5 5 Level (key) Edge list (value) Multikey map: design II Level graph edge index: nested map, reordered Index • Switching order eliminated nested collection fixed costs • Savings: 46%. Consistent savings as vertices increase • Good approach if you know the distribution HashMap x1 = under 1K 5 5 Level (key) HashMap x5 = 1.4MB 6 * HM fixed overhead + (50K + 5) * HM per-entry overhead Total: 1.4MB 10K Vertex (key) 10K Edge list (value) Multikey map: design III Level graph edge index: single map, multikey Index • 11% better than I, 70% worse than II. • Trading fixed costs of small collections for per-element cost in a large collection: 28-byte HM entry + 20-byte Pair • Results were surprising • Wish list: be able to extend entry classes HashMap x1 = 1.4MB 50K 50K Edge list (value) Pair x50K = 1MB 1 1 Level (key) Integer or int Vertex (key) 1 * HM fixed overhead + 50K * HM per-entry overhead + 50K * Pair overhead Total: 2.4 MB Multikey map: comparison Incremental cost per vertex • Assume num levels is much smaller than num vertices • Then II is consistently better than I 700 Bytes per vertex 600 500 400 • I. II. III. 300 200 • 100 0 2 4 6 Number of levels 8 10 delta per vertex is constant 128 bytes Difference of III vs. others is sensitive to the number of levels, even within a small range Break Representing relationships • large collections, high per-entry overhead relative to data Large collections and scaling Level graph edge index: single map, multikey Index • Per-element cost is constant. Constant is large relative to actual data. • Cost: 48 bytes per element Overhead: 83% HashMap 28*n bytes n n Edge list (value) Pair 20*n bytes 1 1 Level (key) Integer or int Vertex (key) Cost is dominated by HM per-entry cost + Pair cost What went wrong: • high collection perentry + delegation costs Inside the Java Collections Standard collections: per-entry costs. Per-entry cost (bytes) LinkedList 24 ArrayList 4 HashMap 28 or 36 HashSet 28 or 36 • May be in addition to the overhead of introducing another object From experiments with a few different JVMs, all 32-bit. Excludes amortized per-collection costs such as empty array slots. Includes pointer to entry. Nested collections, high per-element costs Collaboration service: storing properties Subscription • Stores 7 properties per subscription, via session API • HT per-entry, boxing costs add 350 bytes overhead per session, impeding scalability x20K 1 Session x20K Properties 1 Hashtable What went wrong: x20K = 7M Attributes 7 String (shared across sessions) Values • 7 Integer, Long, etc. x140K = 2.7M Cost obscured by multiple layers, fanouts What went right: • Shared attribute names across sessions Nested collections, high per-element costs Collaboration service: storing properties Subscription Remedy: • Combined properties into a single high-level property, inlining scalar values • 7 : 1 reduction in collection entry costs, plus reduced boxing costs • Note: still paying for HT fixed cost x20K 1 Session x20K Properties 1 Hashtable x20K = 2.6M Attributes 1 String (shared across sessions) Values 1 SubscriptionProperty x20K = 1.2M Representing relationships • large collections, high per-entry overhead relative to data • special-purpose collections Collections involving scalars Case study: monitoring infrastructure TreeMap • Data structure took 1.2GB • Overhead is still 82% at this giant scale • Some alternative scalar maps/collections available, with much lower overhead x52 = 537MB 265K 265K Double Double x13.4M = 342MB x13.4M = 342MB Identity maps Comparison: HashMap vs. IdentityHashMap HashMap IdentityHashMap x1 = 298KB x1 = 128KB 10000 Key 10000 Value 10000 Key • For maintaining a map of unique objects, where the reference is the key • • Equality based on == • Cost reduced by 59% in this experiment 10000 Value Open addressing implementation avoids the cost of Entry objects Collections & Scaling Some costs are not amortized as scale increases: • • Overhead of small, nested collections • • Per-element overhead of large collections the total cost of each collection is what matters – both fixed and variable parts collection per-entry and data delegation overheads The standard collections JDK Standard Collections • Speed has been the focus, not footprint IBM (Harmony) and Sun implementations not that different in footprint Hard-wired assumptions, few policy knobs (e.g. growth policies) Specialized collections are worth learning about: • IdentityHashMap, WeakHashMap, ConcurrentHashMap, etc. Collections alternatives Apache Commons • Many useful collections: • • • Flat3Map, MultiMap, MultiKeyMap Focus is mostly on functionality Footprint similar to standard, with a few exceptions. GNU Trove • • • Many space-efficient implementations e.g. scalar collections e.g. list entries without delegation cost Cliff Click nonblocking; Javolution; Amino Specialized collections within frameworks you use Important: check your corporate policy re: specific open source frameworks Summary: using collections • • Choosing and configuring carefully really matters, since collection implementations are often expensive Avoid writing your own if possible Modeling your data types • Too much data Saving formatted data I Case study: one layer of chat framework • Session data: Session bridge Session 82% of cost of this layer, due to saving computation of toString() x111K = 42MB What went wrong? saved toString 3 • Empty space overhead in StringBuffer • Space cost not worth the time savings StringBuffer x334K = 187MB Remedies: • • String, not StringBuffer Recompute as needed Saving formatted data I: delegation effects Case study: one layer of chat framework Inside each Session: SessionWrapper SessionBase • Data type had been split in three • Same coding pattern copied to each part SessionImpl What went wrong? StringBuffer StringBuffer StringBuffer • Delegated design magnified other costs Saving formatted data II Case study: CRM system Session state fragment: Profile Profile “Y” or “N” String n • • Saving formatted data • Storing a boolean as a String. Health ratio is 48 : 1 “10%” n String Some were constants (“10%”). Some had few values (“Y”, “N”) What went wrong? • Duplicating data with high-overhead representation • Space cost not worth the time savings Duplicate, immutable data Case study: Text analysis system, concordance ConcordanceEntry ConcordanceEntry x131K = 41MB … Annotation 1 Type String 1 … • 17% of cost due to duplication of Type and its String data • Only a small number of immutable Types What went wrong? • Interface design did not provide for sharing • Full cost of duplication was hidden 1 Remedy char[] 1 • Use shared immutable factory pattern Background: sharing low-level data String.intern() • • • • You specify which Strings to share Shares the String object and the character array Make sure it’s worth it, since there is a space cost Myth that is causes memory leaks • Though can hit permspace limits Boxed scalars • • • Integer.valueOf(), Boolean.valueOf(), etc. Shares some common values (not all) Make sure you don’t rely on == Common-prefix data Case study: application server, class cache Class map • Class loader map of class names to jar files • > 120M of Strings, mostly duplicate prefix information HashMap Class name String 120+ MB Class info What went wrong? class info • • Duplication cost Deeper problem: misplaced optimization Remedy • • Implemented trie Simpler, 2-part factoring can also work Managing object lifetime Patterns of memory usage Data types High overhead Delegation Fields Base class Collections High data Many, high overhead Duplication Empty Representation Unused space Small Special purpose Lifetime Short Complex temps Long In-memory Space Correlated designs vs. time lifetime Large, high per-entry cost Special purpose Managing object lifetime • short-lived data Temporaries Aren’t temporary objects free these days? • Some are, and some definitely aren’t Expensive temporaries Example: SimpleDateFormat • SimpleDateFormat GregorianCalendar DecimalFormat • String[] String[] • … DecimalFormatSymbols … int[] … Date Costly construction process. Each call to the default constructor results in: … 123 calls to 55 distinct methods 44 new instances • Designed for costs to be amortized over many uses • Remedy: reuse via a local variable or thread-local storage TimeZone Background: ThreadLocal storage • ThreadLocal: JDK-supplied per-thread variable • An application can create many of these for different purposes • Enables reuse without introducing concurrency problems Tradeoffs • Converter, formatter, factory, schema, connection, etc. may be good candidates for reuse. They can be expensive to create, and are often designed for reuse • Use ThreadLocal or specialized resource pools, depending on requirements • • • Sometimes local variables are good enough Avoid rolling your own resource pools Not worth caching simple temporaries • • • Some temporaries are inexpensive to create (e.g. Integer, many iterators) ThreadLocal access is usually a hash lookup Verify cost/benefit Managing object lifetime • long-lived data Managing lifetime: understanding requirements Three very different reasons for long-lived data 1. In-memory design. Data is in memory forever 2. Space vs. time. Data may be discarded and recomputed 3. Correlated lifetime. Data alive only during the lifetime of other objects or during specific phases Each has its own best practices and pitfalls Many problems stem from misunderstanding requirements Managing Object Lifetime • If not careful, extending the lifetime of objects can introduce concurrency problems, leaks, and additional memory overhead from structures that manage lifetime Managing object lifetime • long-lived data • in-memory designs The limits of objects Case study: memory analysis tool • Some object-oriented designs will never fit in memory • Estimate and measure early Requirement: analyze 80-million object heap Design: one object per target application object Hypothetical minimum: if each object needed just 4 fields (type, id, ptr to references, flags): 80M x 32 bytes = 2.5G just to model application objects! To model references (2-3 per object), and leave scratch space for algorithms, design would require at least 10G Note • An earlier design used a modeling framework with high overhead costs. Just optimizing those costs would not have been sufficient. The limits of objects Case study: memory analysis tool Recommendations: • java.nio is one way to implement a backing store • Column-based approach is a last resort. For optimization of highly specialized and protected components Solution: • • Backing store using memory-mapped files (java.nio) Built a column-based storage infrastructure with scalar arrays, to reduce working set and avoid object header costs • • • Specialized for this application’s access patterns Don’t try this at home! Result is highly scalable – runs in a 256M heap • some XML DOM implementations use this approach Managing object lifetime • long-lived data • space vs. time designs Space vs. time designs: mechanisms Mechanisms for saving time by maintaining data in memory: • • caches resource pools Also: • • thread-local storage adding computed fields to data types Background: Soft References Weak Soft Strong Soft References: • • Tells GC to reclaim these objects only when the space is really needed • Used mostly to avoid recomputation Will keep an object alive after it is no longer strongly referenced, just in case it is needed again • • e.g. for caches and resource pools e.g. for side objects (cached fields) which can be recreated if lost Caches & pools: a sampling Case study: class loader “cache” • • • > 100M of classname strings Implemented an in-memory design. Purpose was for performance - should have been a small, bounded cache • Unbounded growth (leak). An object pool framework was used for 20 different purposes, to improve performance. Unbounded size; strong references. Solution: soft references Case study: financial web application • • Caches & pools should always be bounded • Larger caches aren’t necessarily better Cache itself was only needed during startup Case study: high-volume web application • • Cache sized too large, aiming for 95% hit rate Result: performance problems due to excessive GC Caches & resource pools: best practices Soft references are useful for implementing simple caches/pools, but … • • Relying solely on soft references gives up control over policy May not leave enough headroom for temporary objects, causing the GC to run more often Caches / pools should in general be bounded in size Soft references can be used as an additional failsafe mechanism Many implementations of caches and resource pools are available Avoid writing your own if possible Managing object lifetime • long-lived data • correlated lifetime designs Correlated lifetime Objects needed … … only while other objects are alive • • • e.g. annotations on existing objects e.g. sharing pools e.g. listeners … or during specific phases or time intervals • • e.g. loading e.g. session state, for a bounded length of time Sharing and growing Case study: Planning system, sharing pool Sharing pool Each iteration of the algorithm creates hundreds of thousands of new expressions • Used shared immutable factory pattern to save space on common subexpressions • Result: unbounded growth due to pool keys HashMap Keeps subexpressions (and map entries) around forever • Shared data values Subexpressions transient references for one iteration Algorithm Background: Weak References Weak Soft Strong Weak Reference: • Tells GC it may reclaim an object as soon as it is no longer needed • • as long as there are no stronger references to the object Useful for preventing leaks – ties the lifetime of objects to other objects • e.g. for annotations, sharing pools, listener lists Sharing and not growing Case study: Planning system, sharing pool Sharing pool Remedy: Strong reference keys ReferenceMap Shared data values Weak reference • Apache Commons ReferenceMap (Strong, Weak) • Pool entry will be removed when value is no longer needed Subexpressions Note: transient references for one iteration Algorithm • Also considered soft references. But each iteration used different expressions, so no need to prolong lifetime. Goal was space, not time. Using Weak References A few common usage patterns Weak key, strong value • • • The standard Java WeakHashMap. Example usage: key = object to be annotated, value = annotation Caution if key is the same as or strongly reachable from value Strong key, weak value • As in previous example, for sharing pool Background: weak and soft reference costs • Weak and soft references are Objects, and so incur footprint costs • • e.g. 24 bytes for each WeakReference on one 32-bit JVM Some weak/soft maps entries extend Weak/SoftReference; others add yet another level of delegation • e.g. Apache Commons ReferenceMap: at least 2 objects per entry Leaks & drag: a sampling Case study: CRM application • • • Leak: bug in end-of-request processing failed to remove an object from a listener queue Immediate fix: fixed bug in request For robustness: have listener queue use weak references Case study: development tool • • Large index needed only during load time Easy solution: nulled out pointer Case study: CRM application • Session state retained for 8 hours • • Made worse by costly session state (200K / user) Easy solution: fixed configuration • Failure to unregister listeners is a common cause of leaks Entries too large Case study: CRM application small / empty collections Sessions … one session … duplicated data … • 200K session state per user! • Often a pile-up of multiple problems … highly delegated representations with 100s or 1000s of instances . . . … … large substructures retained by accident Process • simple techniques, tools, and resources Surprises are everywhere Case study: CRM application Sessions … one session … • Developers expected 2K and found 200K session state per user • Unit costs can be very difficult to predict at every level … … … … . . . Measurement • Many surprises • Especially in Java, it is essential to verify assumptions empirically throughout the lifecyle What and when A few techniques among many: • Small, synthetic experiments are extremely valuable, to test out frameworks and design patterns before they are adopted • Of course, incorporate measurement into unit and system tests • • • • Use detailed diagnostic tools to periodically check for scale, and look for surprises Be mindful of normalization units when designing tests: how many concurrent users? active sessions? Understand costs of major units used in lower layers Run experiments at different scales early on. Are costs amortized as expected? • Cardinality of relationships: state as part of design; verify periodically; then use in combination with measurement as the basis for estimation • Caches and pools: verify that they are working and they are worth it Tools for heap analysis • For analyzing the sources of memory bloat and for verifying assumptions, tools that rely on heap snapshots are the most valuable • Some free tools from IBM and Sun • • Check IBM DeveloperWorks & alphaWorks; Sun Developers Network Commercial and open source tools • • SAP MAT: now open source (Eclipse) YourKit, JProfiler, …. – many only read hprof, not phd format Gathering heap data • IBM and Sun diagnostic guides have information on gathering and analyzing heap snapshots, and pointers to free tools • • • Sun: http://java.sun.com/javase/, info with specific JDKs Formats • • • IBM: http://www.ibm.com/developerworks/java/jdk/diagnosis/ hprof: Sun & IBM phd: IBM only The choice of when to take snapshots is key • • For footprint: at steady state with known load For footprint of a single feature or for suspected growth: before/after fixed number of operations, starting after system is warmed up Additional resources JDK library source code is freely available, and can be very worthwhile to consult Many valuable articles on the web • • • IBM DeveloperWorks, Sun Developer Network are good starting points Some misinformation occasionally found on reputable sites Best practices and tuning guides for specific frameworks Garbage collection and overall heap usage • IBM and Sun diagnosis sites have GC tuning guides, free tools • • IBM Pattern Modeling and Analysis Tool for GC (PMAT) on alphaWorks Some performance analysis tools have heap monitoring features Object allocation • Most Java performance profilers can show allocation information with calling context. e.g. hprof (free) Conclusions • Distributed development, layers of frameworks, and Java’s modeling limitations make it easy to create bloated data designs. • In this environment, awareness of costs is essential for making informed tradeoffs. • Some tradeoffs can be made without sacrificing speed or sound design. • The concept of data structure health – the ratio of actual data to its representation – can illuminate where there is room for improvement, and point out aspects of a design that will not scale. Acknowledgments Thanks to: • • • • • • • Matthew Arnold Dave Grove Tim Klinger Trevor Parsons Peter Santhanam Edith Schonberg Yeti See also: • N. Mitchell, G. Sevitsky, “The Causes of Bloat, the Limits of Health”, in Proceedings of Object Oriented Programming Systems Languages and Applications (OOPSLA) 2007, Montreal, Canada.