. . 黄 . . .. . Concurrent Building Blocks in Common Lisp 澗石 (Jianshi Huang) 2010 年 9 月 20 日 . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Today’s Topic Low-level atomic operations & user-space locks . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Today’s Topic Low-level atomic operations & user-space locks APIs in SBCL/LispWorks . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Today’s Topic Low-level atomic operations & user-space locks APIs in SBCL/LispWorks Usage and examples . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Today’s Topic Low-level atomic operations & user-space locks APIs in SBCL/LispWorks Usage and examples Performance . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Today’s Topic Low-level atomic operations & user-space locks APIs in SBCL/LispWorks Usage and examples Performance Possible improvements . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores now: 4˜8 cores . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores now: 4˜8 cores We have customers who have 32 cores on a single machine . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores now: 4˜8 cores We have customers who have 32 cores on a single machine Algorithms design becomes difficult . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores now: 4˜8 cores We have customers who have 32 cores on a single machine Algorithms design becomes difficult Amdahl’s Law speedup = t1 1 = tp (1 − P) + . 黄 澗石 (Jianshi Huang) P S . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores now: 4˜8 cores We have customers who have 32 cores on a single machine Algorithms design becomes difficult Amdahl’s Law speedup = t1 1 = tp (1 − P) + P S Which means, if 90% of the algorithm is parallelized, we get at most 10x speedup. . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems Single-CPU (→) Multi-Core (↑) Customers want software that can use all of their cores 3 years ago: 2 cores 1 year ago: 4 cores now: 4˜8 cores We have customers who have 32 cores on a single machine Algorithms design becomes difficult Amdahl’s Law speedup = t1 1 = tp (1 − P) + P S Which means, if 90% of the algorithm is parallelized, we get at most 10x speedup. We need to squeeze as much as we can (a little is enough to beat the competitors) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization Sometimes depends on global state (max/min values, for example) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization Sometimes depends on global state (max/min values, for example) We cannot ignore shared data access (Amdahl’s Law) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization Sometimes depends on global state (max/min values, for example) We cannot ignore shared data access (Amdahl’s Law) Concurrent data structures/stores are necessary . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization Sometimes depends on global state (max/min values, for example) We cannot ignore shared data access (Amdahl’s Law) Concurrent data structures/stores are necessary List, Trees, Hashtables, etc . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization Sometimes depends on global state (max/min values, for example) We cannot ignore shared data access (Amdahl’s Law) Concurrent data structures/stores are necessary List, Trees, Hashtables, etc But makes synchronization overhead noticeable . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Problems (Cont.) The best way to parallelize an algorithm is to design one so that each CPU shares nothing Not that easy Many algorithms needs data synchronization Sometimes depends on global state (max/min values, for example) We cannot ignore shared data access (Amdahl’s Law) Concurrent data structures/stores are necessary List, Trees, Hashtables, etc But makes synchronization overhead noticeable We need low overhead and scalable synchronization methods . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . We’ll start from bottom up CPU Instruction-level atomic operations . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . We’ll start from bottom up CPU Instruction-level atomic operations User-space locks (against OS locks) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . We’ll start from bottom up CPU Instruction-level atomic operations User-space locks (against OS locks) Higher-level abstractions . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations Essential for multi-core CPUs . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations Essential for multi-core CPUs cache . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations Essential for multi-core CPUs cache memory read/write race . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations Essential for multi-core CPUs cache memory read/write race Relatively expensive but still lowest overhead . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations Essential for multi-core CPUs cache memory read/write race Relatively expensive but still lowest overhead All the higher abstractions depend on it . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations Atomic CPU instructions Instruction Exchange(XCHG) Fetch-and-Add Compare-And-Swap Load-Link/Store-Conditonal Exchange Processor x86, x86-64, Sparc x86, x86-64 x86, x86-64, Sparc Alpha, ARM, MIPS, PPC 1 ( defun exchange ( p l a c e new ) 2 ( atomically 3 ( s e t f ( r e f p l a c e ) new ) ) ) Compare-and-swap 1 ( defun compare−and−swap ( p l a c e o l d new ) 2 ( atomically 3 ( l e t ( ( cu r r en t − v a lu e ( r e f p l a c e ) ) ) 4 (when ( eq c u rr e n t − v a l u e o l d ) 5 ( s e t f ( r e f p l a c e ) new ) ) 6 c u r r e n t −v a l ue ) ) ) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations (Cont.) There are more subtle issues . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations (Cont.) There are more subtle issues Out-of-order execution . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations (Cont.) There are more subtle issues Out-of-order execution Fence instructions (barriers) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Instruction-level atomic operations (Cont.) There are more subtle issues Out-of-order execution Fence instructions (barriers) But we will not cover them today . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap Limitations: car, cdr, symbol-plist, symbol-value, svref . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap Limitations: car, cdr, symbol-plist, symbol-value, svref LispWorks . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap Limitations: car, cdr, symbol-plist, symbol-value, svref LispWorks system:atomic-incf/decf, atomic-fixnum-incf/decf . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap Limitations: car, cdr, symbol-plist, symbol-value, svref LispWorks system:atomic-incf/decf, atomic-fixnum-incf/decf system:atomic-exchange . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap Limitations: car, cdr, symbol-plist, symbol-value, svref LispWorks system:atomic-incf/decf, atomic-fixnum-incf/decf system:atomic-exchange system:compare-and-swap . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . APIs in Common Lisp Implementaions SBCL sb-ext:atomic-incf/decf Limitations: only structure accessor of sb-ext::word type sb-ext:compare-and-swap Limitations: car, cdr, symbol-plist, symbol-value, svref LispWorks system:atomic-incf/decf, atomic-fixnum-incf/decf system:atomic-exchange system:compare-and-swap Limitations car, cdr, symbol-value, svref, sturcture accessor, CLOS slot-value . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Lock Now let’s move on to locks atomic instructions can be simulated as 1 ( defun atomic−fun ( p l a c e ) 2 ( with−lock ( l o c k ) 3 ( fun p l a c e ) ) ) POSIX thread provides mutex It has to assumes the lock might be grabbed by a single thread indefinitely Will block the thread if it cannot obtain the lock Context switching and thread scheduling overhead We also need user-space locks . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Spinlock Busy waiting thread when lock is unavailable 1 ( defun s p i n l o c k − l o c k ( l o c k ) 2 ( atomically 3 ( loop w h i l e ( locked−p l o c k ) ) 4 ( do−lock l o c k ) ) ) Rationale Many synchronized operations are short (several instructions) A lock is only obtained transiently (several to dozens of CPU cycles) A context switching takes at least several hundreds of CPU cycles So don’t give up the thread control to the OS, just keep retrying complements atomic CPU instructions when operation is complicated Actually OS locks also apply this strategy But we cannot customize its algorithm in our application . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Spinlock implementation Unfortunately, SBCL has bogus spinlock implementation 1 #!+sb−thread 2 ( f l e t (( cas () 3 ( i f ( sb ! e x t : compare−and−swap ( s p i n l o c k − v a l u e s p i n l o c k ) n i l new ) 4 ( thread−yield ) 5 ( return−from g e t − s p i n l o c k t ) ) ) ) 6 ( i f ( and ( not * i n t e r r u p t s − e n a b l e d *) * a l l o w − w i t h − i n t e r r u p t s *) 7 ( progn 8 ( with−interrupts ) 9 ( loop 10 ( loop r e p e a t 128 do ( c a s ) ) ; 128 i s a r b i t r a r y h e r e 11 ( sb ! u n i x ::% c h e c k − i n t e r r u p t s ) ) ) 12 ( loop ( c a s ) ) ) ) ) And LispWorks doesn’t have it built-in . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Spinlock implementation (Cont.) So I made my own version (naive though, problems explained later) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ; ; ; c a r o f a cons i s used as a l o c k ; ; ; n i l i s unlocked , t i s l o c k e d ( declaim ( i n l i n e l o c k u n l o c k ) ) ( l o c a l l y ( d e c l a r e ( o p t i m i z e speed ( s a f e t y 0) ) ) ( defun l o c k ( l o c k ) ( d e c l a r e ( t y p e cons l o c k ) ) #+l i s p w o r k s ( loop w h i l e ( system : atomic−exchange−car l o c k n i l t ) ) #+s b c l ( loop w h i l e ( compare−and−swap ( c a r l o c k ) n i l t ) ) ) ( defun u n l o c k ( l o c k ) ( d e c l a r e ( t y p e cons l o c k ) ) ( setf ( car lock ) n i l ) ) ( defmacro with− spinloc k ( ( l o c k ) &body body ) ; ; non−proper macro , use once−only i n s t e a d ‘( let (( lock , lock ) ) ( unwind−protect ( progn ( lock lock ) ( l o c a l l y , @body ) ) ( unlock lock ) ) ) ) ) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmarks . Too much concepts and now let’s do some practice… .. .. . 黄 澗石 (Jianshi Huang) . . . . Concurrent Building Blocks in Common Lisp . . . . . Benchmarks (Counter using atomic-incf) 1 ; ; ; atomic−incf v e r s i o n 2 ( l o c a l l y ( d e c l a r e ( o p t i m i z e speed ( s a f e t y 0) ) ) 3 ( defstruct counter 4 ( p l a c e 0 : t y p e #+s b c l SB−EXT:WORD #+l i s p w o r k s fixnum ) ) 5 6 ( defun d o i t ( c o u n t e r ntimes ) 7 ( d e c l a r e ( type counter counter ) 8 ( t y p e fixnum ntimes ) ) 9 ( dotimes ( i ntimes ) 10 ( declare ( ignorable i ) ) 11 #+s b c l ( atomic−incf ( counter−place c o u n t e r ) ) 12 #+l i s p w o r k s ( system : atomic−fixnum−incf ( counter−place c o u n t e r ) ) ) ) 13 14 ( defun bench ( n t h r e a d s ntimes ) 15 ( d e c l a r e ( t y p e fixnum n t h r e a d s ntimes ) ) 16 ( l e t * ( ( c o u n t e r ( make−counter ) ) 17 ( threads 18 ( loop r e p e a t n t h r e a d s 19 collect 20 #+s b c l ( sb−thread : make−thread ( lambda ( ) ( d o i t c o u n t e r ntimes ) ) ) 21 #+l i s p w o r k s (mp: process−run−function "" ( ) ( lambda ( ) ( d o i t c o u n t e r ntimes ) ) ) ) ) ) 22 #+s b c l (mapc #’sb−thread : j oi n− th re ad t h r e a d s ) 23 #+l i s p w o r k s (mapc #’mp: p r o c e s s − j o i n t h r e a d s ) 24 ( a s s e r t (= ( counter−place c o u n t e r ) 25 (* n t h r e a d s ntimes ) ) ) ) ) ) atomic-incf benchmark . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmarks (Counter using spinlock) 1 ; ; ; u s i n g custom s p i n l o c k i m p l e m e n t a t i o n 2 ( l o c a l l y ( d e c l a r e ( o p t i m i z e speed ( s a f e t y 0) ) ) 3 ( defstruct counter 4 ( p l a c e 0 : t y p e fixnum ) 5 ( l o c k ’ ( n i l ) : t y p e cons ) ) 6 7 ( defun d o i t ( c o u n t e r ntimes ) 8 ( d e c l a r e ( type counter counter ) 9 ( t y p e fixnum ntimes ) ) 10 ( l e t ( ( l o c k ( counter−lock c o u n t e r ) ) ) 11 ( d e c l a r e ( t y p e cons l o c k ) ) 12 ( dotimes ( i ntimes ) 13 ( declare ( ignorable i ) ) 14 ( with −spin lock ( l o c k ) 15 ( i n c f ( counter−place c o u n t e r ) ) ) ) ) ) 16 17 ( defun bench ( n t h r e a d s ntimes ) 18 ( d e c l a r e ( t y p e fixnum n t h r e a d s ntimes ) ) 19 ( l e t * ( ( c o u n t e r ( make−counter ) ) 20 ( threads 21 ( loop r e p e a t n t h r e a d s 22 collect 23 #+s b c l ( sb−thread : make−thread ( lambda ( ) ( d o i t c o u n t e r ntimes ) ) ) 24 #+l i s p w o r k s (mp: process−run−function "" ( ) ( lambda ( ) ( d o i t c o u n t e r ntimes ) ) ) ) ) ) 25 #+s b c l (mapc #’sb−thread : j oi n− th re ad t h r e a d s ) 26 #+l i s p w o r k s (mapc #’mp: p r o c e s s − j o i n t h r e a d s ) 27 ( a s s e r t (= ( counter−place c o u n t e r ) 28 (* n t h r e a d s ntimes ) ) ) ) ) ) spinlock benchmark . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmarks (Counter using OS lock) 1 ; ; ; blockable lock version 2 ( l o c a l l y ( d e c l a r e ( o p t i m i z e speed ( s a f e t y 0) ) ) 3 ( defstruct counter 4 ( p l a c e 0 : t y p e fixnum ) 5 #+s b c l ( l o c k ( sb−thread : make−mutex) : t y p e sb−thread : mutex ) 6 #+l i s p w o r k s ( l o c k (mp: make−lock ) : t y p e mp: l o c k ) 7 ) 8 9 ( defun d o i t ( c o u n t e r ntimes ) 10 ( d e c l a r e ( type counter counter ) 11 ( t y p e fixnum ntimes ) ) 12 ( l e t ( ( l o c k ( counter−lock c o u n t e r ) ) ) 13 ( dotimes ( i ntimes ) 14 ( declare ( ignorable i ) ) 15 #+s b c l ( sb−thread : with−mutex ( l o c k ) 16 ( i n c f ( counter−place c o u n t e r ) ) ) 17 #+l i s p w o r k s (mp: with−lock ( l o c k ) 18 ( i n c f ( counter−place c o u n t e r ) ) ) ) ) ) 19 20 ( defun bench ( n t h r e a d s ntimes ) 21 ( d e c l a r e ( t y p e fixnum n t h r e a d s ntimes ) ) 22 ( l e t * ( ( c o u n t e r ( make−counter ) ) 23 ( threads 24 ( loop r e p e a t n t h r e a d s 25 collect 26 #+s b c l ( sb−thread : make−thread ( lambda ( ) ( d o i t c o u n t e r ntimes ) ) ) 27 #+l i s p w o r k s (mp: process−run−function "" ( ) ( lambda ( ) ( d o i t c o u n t e r ntimes ) ) ) ) ) ) 28 #+s b c l (mapc #’sb−thread : j oi n− th re ad t h r e a d s ) 29 #+l i s p w o r k s (mapc #’mp: p r o c e s s − j o i n t h r e a d s ) 30 ( a s s e r t (= ( counter−place c o u n t e r ) 31 (* n t h r e a d s ntimes ) ) ) ) ) ) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmark Result Atomic instruction v.s. OS lock threads 1 2 4 8 16 32 atomic-incf(SBCL) 0.539 1.424 2.021 1.984 2.479 3.417 atomic-incf(LispWorks) 0.705 1.117 1.567 1.750 2.115 1.953 OS lock 9.118 31.084 45.857 46.599 51.223 40.298 Measured in time, avg of 5 runs, unit: seconds . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmark Result Atomic instruction v.s. OS lock Concurrently increment 64,000,000 times to a counter threads 1 2 4 8 16 32 atomic-incf(SBCL) 0.539 1.424 2.021 1.984 2.479 3.417 atomic-incf(LispWorks) 0.705 1.117 1.567 1.750 2.115 1.953 OS lock 9.118 31.084 45.857 46.599 51.223 40.298 Measured in time, avg of 5 runs, unit: seconds . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmark Result Atomic instruction v.s. OS lock Concurrently increment 64,000,000 times to a counter SBCL/Linux, on a 32core machine threads 1 2 4 8 16 32 atomic-incf(SBCL) 0.539 1.424 2.021 1.984 2.479 3.417 atomic-incf(LispWorks) 0.705 1.117 1.567 1.750 2.115 1.953 OS lock 9.118 31.084 45.857 46.599 51.223 40.298 Measured in time, avg of 5 runs, unit: seconds . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Benchmark Result (Cont.) Spinlock vs OS lock threads 1 2 4 8 16 32 cas-spinlock 1.211 5.925 12.388 26.770 54.061 100.454 xchg-spinlock 2.641 9.519 24.313 43.074 92.910 150.665 OS lock 9.118 31.084 45.857 46.599 51.223 40.298 Measured in time, avg of 5 runs, unit: seconds . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Performance Problem The problem lies in current CPU-memory architecture. . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. Concurrent data structures/stores . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. Concurrent data structures/stores Both SBCL and LispWorks have concurrent Hashtable . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. Concurrent data structures/stores Both SBCL and LispWorks have concurrent Hashtable SBCL has a lock-free queue implementation . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. Concurrent data structures/stores Both SBCL and LispWorks have concurrent Hashtable SBCL has a lock-free queue implementation Array in LispWorks is thread-safe by default . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. Concurrent data structures/stores Both SBCL and LispWorks have concurrent Hashtable SBCL has a lock-free queue implementation Array in LispWorks is thread-safe by default But we need more . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Higher-level Abstractions POSIX Thread conditional variables nessage queue barrier etc. Future (task-level parallelization) Implemented for CLML project To make resource utilization easier (maps tasks onto thread-pool) Low overhead Used OS utilities for synchronization, now added atomic instructions to queues, will use spinlocks later. Concurrent data structures/stores Both SBCL and LispWorks have concurrent Hashtable SBCL has a lock-free queue implementation Array in LispWorks is thread-safe by default But we need more List, Trees, Priority Queues, etc. . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Possible Improvements We need a sophisticated user-space spinlock implementation . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Possible Improvements We need a sophisticated user-space spinlock implementation Maybe a portable spinlock library is desirable . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . Possible Improvements We need a sophisticated user-space spinlock implementation Maybe a portable spinlock library is desirable We need portable concurrent data structure libraries . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . . . End ご清聴ありがとうございます。 . 黄 澗石 (Jianshi Huang) . . . Concurrent Building Blocks in Common Lisp . .