Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Monte Carlo Methods Generation of random variables Markov chains Monte-Carlo Stéphane Paltani ISDC Data Center for Astrophysics Astronomical Observatory of the University of Geneva Statistics Course for Astrophysicists, 2010–2011 Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Error estimation Numerical integration Optimization Numerical integration Optimization Caveat Monte Carlo Methods Stéphane Paltani The presentation aims at being user-focused and at presenting usable recipes Do not expect a fully mathematically rigorous description! What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration This has been prepared in the hope to be useful even in the stand-alone version Please provide me feedback with any misconception or error you may find, and with any topic not addressed here but which should be included Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization General concepts Monte Carlo Methods Stéphane Paltani Monte-Carlo methods: I have been invented in the context of the development of the atomic bomb in the 1940’s I are a class of computational algorithms I can be applied to vast ranges of problems I are not a statistical tool I rely on repeated random sampling I provide generally approximate solutions I are used in cases where analytical or numerical solutions don’t exist or are too difficult to implement I can be used by the Lazy ScientistTM even when an analytical or numerical solution can be implemented What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Overview of the method Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Monte-Carlo methods generally follow the following steps: General concepts Applications Simple examples 1. Determine the statistical properties of possible inputs Generation of random variables 2. Generate many sets of possible inputs which follows the above properties Markov chains Monte-Carlo 3. Perform a deterministic calculation with these sets Numerical integration 4. Analyze statistically the results Optimization √ The error on the results typically decreases as 1/ N Error estimation Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization Numerical integration Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Most problems can be solved by integration Monte-Carlo integration is the most common application of Monte-Carlo methods Basic idea: Do not use a fixed grid, but random points, because: 1. Curse of dimensionality: a fixed grid in D dimensions requires N D points 2. The step size must be chosen first Markov chains Monte-Carlo Error estimation Numerical integration Optimization Error estimation Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Given any arbitrary probability distribution and provided one is able to sample properly the distribution with a random variable (i.e., x ∼ f (x)), Monte-Carlo simulations can be used to: I determine the distribution properties (mean, variance,. . . ) I determine confidence intervals, i.e. R∞ P(x > α) = α f (x)dx I determine composition of distributions, i.e. given P(x), find P(h(x)), h(x) = x 2 ; cos(x) − sin(x); . . . Note that these are all integrals! Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Optimization problems Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Numerical solutions to optimization problems incur the risk of getting stuck in local minima. Markov chains Monte-Carlo Error estimation Numerical integration Optimization Monte-Carlo approach can alleviate the problem by permitting random exit from the local minimum and find another, hopefully better minimum Numerical simulations Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration I Radiation transfer is Google-wise the main astrophysical application of Monte-Carlo simulations in astrophysics I In particle physics and high-energy astrophysics, many more physical processes can be simulated Some physical processes are discretized and random by nature, so Monte-Carlo is particularly adapted Optimization Numerical simulations GEANT4 Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization GEANT4 is also used to determine the performance of X-ray and gamma-ray detectors for astrophysics Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization Probability estimation Head vs tail probability What is the probability to obtain either 3, 6 or 9 heads if one draws a coin ten times? I I Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Binomial probability: P = B(3; 10, 0.5)+B(6; 10, 0.5)+B(9; 10, 0.5) ' 0.33 Generation of random variables Monte-Carlo simulation: Markov chains Monte-Carlo 1. Given a random variable y ∼ U(0, 1), define “head” if y < 0.5, “tail” otherwise 2. Draw 10 random variables xi ∼ U(0, 1), i = 1, . . . , 10 3. Count the number of heads H, and increment T if H = 3, 6, or 9 4. Repeat 2.–3. N times, with N reasonably large 5. The probability is approximately T /N I Monte Carlo Methods Note that this is an integration on a probability distribution, even if it is discrete! Error estimation Numerical integration Optimization Monte Carlo Methods Error estimation What is the uncertainty on the mean? Stéphane Paltani Assuming N random variables xi ∼ N (0, PNσ), i = 1, . . . , N, −1 the estimator of the mean is: x̄ = N i=1 xi and its uncertainty is: √ σx̄ = σ/ N General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo The Monte-Carlo way: Error estimation 1. Draw a set of N random variables yi ∼ N (0, σ), i = 1, . . . , N 2. Calculate the sample mean ȳ = N What are Monte-Carlo methods? Numerical integration PN −1 i=1 yi 3. Redo 1.–2. M times 4. The uncertainty on the mean σx̄ is the root mean square of ȳj , j = 1, . . . , M, i.e. 2 P PM −1 σx̄2 = (M − 1)−1 M j=1 ȳj − ŷ , with ŷ = M j=1 ȳj Optimization Numerical integration How to calculate π? Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? General concepts Applications Simple examples Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization 1. Draw N point (x, y ) uniformly at random in a square 2. Count the C points for which x 2 + y 2 < 1 3. The ratio C/N converges towards π/4 as N 1/2 Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization Random number generators Basic principles Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? I We want to draw many random variables Ni ∼ U(0, 1), i = 1, . . . which satisfy (or approximate sufficiently well) all randomness properties Generation of random variables Random-number generators Transformation method Rejection method I I Ni ∼ U(0, 1), ∀i is not sufficient. We also want that f (Ni , Nj , . . .) ∀i, j, . . . has also the right properties Correlations in k -space are often found with a bad random-number generators I Another issue is the period of the generator I The ran() function in libc has been (very) bad. Do not use this function in applications when good randomness is needed says man 3 rand Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Random number generators Basic algorithm I Monte Carlo Methods Stéphane Paltani Many random number generators are based on the recurrence relation: Nj+1 = a · Nj + c (mod m) What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences These are called linear congruential generators. c is actually useless. I “Divide” by m + 1 to get a number in the range [0; 1[ I Choices of a, m in standard libraries are found to range from very bad to relatively good I A “minimal standard” set is a = 75 = 16807, c = 0, m = 231 − 1 = 2147483647. This is RAN0 I Note that the period is at most m Markov chains Monte-Carlo Error estimation Numerical integration Optimization Random number generators Improvements on RAN0 1. Multiplication by a doesn’t span the whole range of values, i.e. if Ni = 10−6 , Ni+1 ≤ 0.016, failing a simple statistical test I Swap consecutive output values: Generate a few values (∼ 32), and at each new call pick one at random. This is RAN1 2. The period m = 231 − 1 might be too short I Add the outcome of two RAN1 generators with (slightly) different m’s (and a’s). The period is the least common multiple of m1 and m2 ∼ 2 · 1018 . This is RAN2 3. The generator is too slow I Use in C inline Ni+1 = 1664525 · Ni + 1013904223 using unsigned long. Patch the bits into a real number (machine dependent). This is RANQD2 Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Monte Carlo Methods Random number generators Implementations and recommendations Stéphane Paltani What are Monte-Carlo methods? NR: Numerical Recipes GSL: GNU Scientific Library Library Generator Generation of random variables Relative speed Period 231 NR NR NR NR RAN0 RAN1 RAN2 RANQD2 1.0 1.3 2.0 0.25 ∼ ∼ 236 ∼ 262 ∼ 230 GSL GSL GSL MT19937 TAUS RANLXD2 0.8 0.6 8.0 ∼ 219937 ∼ 288 ∼ 2400 Always use GSL! See the GSL doc for the many more algorithms available Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization Transformation method The method The transformation method allows in principle to draw values at random from any distribution Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation 1. Given a distribution p(y ), the cumulative distribution Ry function (CDF) of p(y ) is F (y ) = 0 p(w) dw 2. We want to draw y uniformly in the shaded area, i.e. uniformly over F (y ); by construction 0 ≤ F (y ) ≤ 1, 3. We draw x ∼ U(0, 1) and find y so that x = F (y ) 4. Therefore y (x) = F −1 (x), x ∼ U(0, 1) Numerical integration Optimization Transformation method Example Monte Carlo Methods Stéphane Paltani Exponential deviates: p(y ) = λe−λy F (y ) = 1 − e−λy = x What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method 1 y (x) = − ln(1 − x) λ Note: this is equivalent to 1 y (x) = − ln(x), λ since, if x ∼ U(0, 1), then 1 − x ∼ U(0, 1) as well Note also that it is rather uncommon to be able to calculate F −1 (x) analytically. Depending on accuracy, it is possible to calculate an numerical approximation Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Transformation method A point in space Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences To draw a point in a homogeneous sphere of radius R: 1. φ can be drawn uniformly from U(0, 2π) 2. θ has a sine distribution p(θ) = sin(θ)/2, θ ∈ [0; π[ Transformation: θ = 2 arccos(x) 3. Each √ radius shell has a volume f (R) ∼ R 2 dR, so 3 x R∝ 4. Alternatively, draw a point p at random on the surface of a sphere (x, y , z)/ x 2 + y 2 + z 2 with x, y , z ∼ N (0, 1) Markov chains Monte-Carlo Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization Rejection method The method If the CDF of p(x) is difficult to estimate (and you can forget about inversion), the rejection method can be used Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation 1. Find a comparison function f (x) that can be sampled, so that f (x) ≥ p(x), ∀x 2. Draw a random deviate x0 from f (x) 3. Draw a uniform random deviate y0 from U(0, f (x0 )) 4. If y0 < p(x0 ), accept x0 , otherwise discard it 5. Repeat 2.–4. until you have enough values The rejection method can be very inefficient if f (x) is very different from p(x) Numerical integration Optimization Monte Carlo Methods Rejection method Example Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation The Poisson distribution is discrete: P(n; α) = αn e−α n! Make it continuous: P(x; α) = A Lorentzian f (x) ∝ function 1 (x−α)2 +c 2 α[x] e−α [x]! is a good comparison Numerical integration Optimization Rejection method A point in space Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation A simpler way to draw a point in a homogeneous sphere of radius R based on rejection: 1. Draw three random variables x, y , z from U(−R, R) 2. Keep if x 2 + y 2 + z 2 < R 2 , reject otherwise 3. Repeat 1.–2. until you have enough values Efficiency is 4π 3 3 /2 ' 0.52 Numerical integration Optimization Distributions Monte Carlo Methods Stéphane Paltani GNU Scientific Library implements (not exhaustive!): Gaussian Binomial Correlated bivariate Gaussian Poisson Exponential Laplace Cauchy Spherical 2D, 3D Rayleigh Landau Log-normal Gamma, beta χ2 , F, t What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Markov chains Monte-Carlo Numerical integration Optimization Error estimation Numerical integration Optimization Quasi-random sequences What is random? Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo All sets of points fill “randomly” the area [[0; 1]; [0; 1]] The left and center images are “sub-random” and fill more uniformly the area These sequences are also called low-discrepancy sequences These sequences can be used to replace the RNG when x ∼ U(a, b) is needed Error estimation Numerical integration Optimization Quasi-random sequences Filling of the plane Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization The sequence fills more or less uniformly the plane ∀N Quasi-random sequences Examples of algorithms I I Sobol’s sequence: Count in binary, but using Gray code and put a radix in front: 0.1, 0.11, 0.01, 0.011, 0.001, 0.101, 0.111, . . . This can be generalized to N dimensions This is the red set of points Halton’s sequence: H(i) is constructed the following way: take i expressed in a (small) prime-number base b (say b = 3), e.g. i = 17 ≡ 122 (base 3). Reverse the digits and put a radix in front, i.e. H(17) = 0.221 (base 3) ' 0.92593 This is generalized to N dimensions by choosing different b’s in each dimension This is the blue set of points Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Quasi-random sequences Accuracy Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Convergence in some cases of numerical integration can reach ∼ 1/N Quasi-random sequences Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Random-number generators GNU Scientific Library implements: I Sobol’s sequence I Halton’s sequence I Niederreiter’s sequence Transformation method Rejection method Quasi-random sequences Markov chains Monte-Carlo Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Error estimation Numerical integration Optimization Markov chains Definition Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? A set of random variables xi , i = 1, . . . is a Markov chain if: P(xi+1 = x|x1 , . . . , xi ) = P(xi+1 = x|xi ) Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Monte Carlo Methods Markov chains Examples Stéphane Paltani What are Monte-Carlo methods? I Generation of random variables State machine Markov chains Monte-Carlo I White noise: xi+1 = εi , εi ∼ N (µ, σ) I Random walk: xi+1 = xi + εi , εi ∼ N (µ, σ) I AR[1] process: xi+1 = ϕ · xi + εi , εi ∼ N (µ, σ), |ϕ| < 1 Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC I Google’s PageRank Error estimation Numerical integration Optimization Monte Carlo Methods Markov chains Properties I I Homogeneity: Stéphane Paltani A Markov chain is homogeneous if What are Monte-Carlo methods? P(xi+1 = x|xi ) = P(xj+1 = x|xj ), ∀i, j Generation of random variables Ergodicity: Markov chains Monte-Carlo A Markov chain is ergodic if Markov chains E(n|xn = xi , n > i) < ∞, ∀i (Probably only valid when the number of states is finite) Ergodicity means that all possible states will be reached at some point I Reversibility: A Markov chain is reversible if there exists a distribution Π(x) such that: Π(α)P(xi+1 = α|xi = β) = Π(β)P(xi+1 = β|xi = α), ∀i, α, β Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Monte Carlo Methods Markov chains More on reversibility Stéphane Paltani What are Monte-Carlo methods? If a Markov chain is reversible: Z Z Π(α)P(xi+1 = α|xi = β) = Π(β)P(xi+1 = β|xi = α) = α α Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Z P(xi+1 = β|xi = α) = Π(β) = Π(β) α This property is also called detailed balance. Π(x) is then the equilibrium distribution of the Markov chain. Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Error estimation Numerical integration Optimization Markov chains Monte-Carlo Definition Can we build a Markov chain that has pretty much any requested equilibrium distribution? YES!! The answer is Markov chain Monte-Carlo I I MCMC is one of the ten best algorithms ever! (along with FFT, QR decomposition, Quicksort, . . . ) MCMC uses a homogeneous, ergodic and reversible Markov chain to generate consistent samples drawn from any given distribution I No specific knowledge of the distribution, no specific support function is required I Construction of the Markov chain Monte-Carlo is even straightforward Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Markov chains Monte-Carlo The Metropolis algorithm Assuming a distribution f (x) that we want to sample: 1. Choose a proposal distribution p(x) and an initial value x1 2. Select a candidate step using p(x), so that: x̂ = xi + ε, ε ∼ p(x) 3. If f (x̂) > f (xi ) accept xi+1 = x̂, otherwise accept xi+1 = x̂ with a probability f (x̂)/f (xi ), else reject and start again at 2. 4. Continue at 2. until you have enough values 5. Discard the early values (burning phase) which are influenced by the choice of x1 Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Markov chains Monte-Carlo Random walk MCMC Random-walk updates, i.e. proposal distribution is constant, i.e. x̂ = xi + εi , εi ∼ p(x): I Gaussian update: p(x) = N (0, σ) I Uniform update: p(x) = U(−υ, υ) I Student’s t update I Lévy-flight update: p(x) is heavy-tailed (power law) Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Markov chains Monte-Carlo Non random walk MCMC Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? I Independence sampler: x̂ ∼ Y , i.e. do not use xi ! This is equivalent to the rejection method I Langevin update: x̂ = xi + εi + σ2 ∇ log f (xi ). Uses some information about f (x) to propose a better move I I Gareth’s diffusion: Gaussian update, but σ(xi ) = (maxx f (x))/f (xi ). Makes bigger steps if we are in an area with small probability Gaussian AR[1] update: x̂ = a + b(xi − a) + εi , εi ∼ N (0, σ), |b| ≤ 1. Not sure what good this does Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Markov chains Monte-Carlo Further notes on implementations I Check the rejection ratio. Values between 30 and 70% are conventionally accepted I Discard the burning phase. the autocorrelation function is a standard way to check if the initial value has become irrelevant or not Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC I The width of the proposal distribution (e.g. σ for a Gaussian update or υ for a uniform update) should be tuned during the burning phase to set the rejection fraction in the right bracket I Reflection can be used when an edge of f (x) is reached, e.g. set x̂ = |xi + εi | if xi must remain positive I Be careful with multimodal distributions. Not all modes may be sampled Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Error estimation Numerical integration Optimization Metropolis-Hastings MCMC Limitations of the Metropolis algorithm I A symmetric proposal distribution might not be optimal I Boundary effects: less time is spent close to boundaries, which might not be well sampled Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization I A correction factor, the Hastings ratio, is applied to correct for the bias I The Hastings ratio usually speeds up convergence I The choice of the proposal distribution becomes however more important Metropolis-Hastings MCMC The Hastings ratio I The proposed update is x̂ ∼ Q(x; xi ) I The probability to accept the next point, A is modified so that instead of:   f (x̂) A = min 1, f (xi ) the probability is corrected by the Hastings ratio:   f (x̂) Q(x̂; xi ) A = min 1, f (xi ) Q(xi ; x̂) I If Q(α; β) = Q(β; α), this is the Metropolis algorithm Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Markov chains Markov chains Monte-Carlo Metropolis-Hastings MCMC Error estimation Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Error estimation Error distribution Error propagation Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Resampling methods: bootstrap Numerical integration Optimization Error distribution When hypotheses are not satisfied Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Are these points correlated: N = 45 points, r = 0.42 ? The null-hypothesis probability that there is no correlation is 0.005 Error distribution When hypotheses are not satisfied Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization However, some parts of the plane (y < x − 0.5) are not accessible The correct probability can be estimated using a Monte-Carlo simulation Error distribution Using Monte-Carlo simulations Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration 1. Draw N couples (xi , yi ) ∼ (U(0, 1), U(0, 1)), rejecting those for which yi < xi − 0.5 2. Calculate the correlation coefficient r 3. Repeat 1.–2. many times 4. Study the distribution. In this case the null hypothesis has a ∼ 10% probability Optimization Statistical significance Monte Carlo Methods When hypotheses are not satisfied Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Is the distribution of these N values Gaussian? Use a Kolmogorov-Smirnov test QKS , and then use the null-hypothesis probabilities for this test OK, but I do not know σ Statistical significance Monte Carlo Methods When hypotheses are not satisfied Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap 1. Find the parameters σobserved that minimizes QKS,obs , and the corresponding QKS,obs,min 2. Draw N points from a Gaussian with rms σobserved 3. Find the minimum of QKS,sim as a function of trial σ 4. Repeat 2.–3. many times, and check the distribution of QKS,sim,min against QKS,obs,min . In this case, 20% of the QKS,sim,min exceed QKS,obs,min Numerical integration Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Error estimation Error distribution Error propagation Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Resampling methods: bootstrap Numerical integration Optimization Error propagation The case of the parallax Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables I One measures the parallax π of a given objects I Assume that the known uncertainty σ on π is Gaussian, i.e. the true parallax Π and π are related by π = Π + ε, ε ∼ N (0, σ) I In the limit σ  π, we have the error propagation on the distance D = 1/π 1 σ ∆D = ∂D ∂π ∆π = π 2 ∆π = π 2 I But higher-order terms in the serial expansion make the error distribution of D very asymmetrical I A better solution is to perform the error propagation with a Monte-Carlo simulation Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Error propagation The case of the parallax illustrated Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Monte-Carlo simulations allow to simulate the distribution of the distance D of a source in case of a parallax of 15 ± 1" (Left) and 3 ± 1" (Right) Any inference on confidence intervals (for instance) is bound to be erroneous when σ  π is not satisfied Error propagation Real life example: Black-hole mass I Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization >2 I MBH = c · τ < v is a non-linearly derived parameter from two measured parameters with non-Gaussian uncertainty distributions I Pick many values τ and < v > from their distributions and build the distribution of MBH Error propagation Real life example: Black-hole mass II Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization The resulting distribution can be explored to determine its moment and confidence intervals Two independent estimates can be combined using a maximum-likelihood estimation Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Error estimation Error distribution Error propagation Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Resampling methods: bootstrap Numerical integration Optimization Bootstrap A simple error estimation problem Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Assume if have two sets of random variables xi , i = 1, . . . , N and yj , i = 1, . . . , N, with N = 20 and a correlation coefficient r = −0.63. Are the two variables significantly correlated? Alternatively, what is the uncertainty on r ? In this case we can simulate new sets using Monte-Carlo under the null hypothesis, and estimate significance and uncertainty Numerical integration Optimization Bootstrap Getting something from nothing Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap What if we have no idea about the underlying distribution? Intuition says that would be akin to pulling one’s bootstraps to lift oneself But the data contain hidden information about their distribution! Numerical integration Optimization Bootstrap General principle I I I I Generate many pseudo-samples by random, Monte-carlo, direct extraction of data points from the original sample Each pseudo-sample has the same size as the original sample As a consequence most pseudo-sample contain some repeated data points and some missing data points The statistics of interest is computed for all pseudo-samples, and its distribution is sampled, allowing to infer its statistical properties (uncertainty, mean, confidence intervals, . . . ) Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration Optimization Bootstrap A practical example Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap We generate a similar histogram with the data only: 1. Draw N indices (integers) αi from [1; N] 2. Calculate the correlation coefficient r from the new sets xαi and yαi , i = 1, . . . , N 3. Build and study the distribution of r by repeating 1.–2. many times Numerical integration Optimization Bootstrap Smoothed bootstrap Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Error distribution Error propagation Resampling methods: bootstrap Numerical integration If the statistics of interest is discrete (i.e., can have only few values), the bootstrap distribution will be quite bad 2 One can √ add a smoothing random variable ∼ N (0, σ ), σ = 1/ N, to all the picked values of a bootstrap test Even in the non-smoothed case, the confidence intervals can be pretty good Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Error estimation Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization Optimization Monte Carlo Methods Monte-Carlo algorithm Basic principle R V f dV in M dimensions requires ∼ Stéphane Paltani (1/h)M steps What are Monte-Carlo methods? Generation of random variables Given a volume V and a function f (with being the average of w over V ): r Z 2 f dV ' V ±V N V Markov chains Monte-Carlo Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization This is the basic theorem of Monte-Carlo integration 1. Draw N points at random within V P 2 2. Calculate = N1 N i=1 fi (and ) 3. Multiply by V to obtain the integral Monte Carlo Methods Monte-Carlo algorithm Volume determination Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration The volume of V can be determined by Monte-Carlo algorithm R V f dV , with f ≡ 1 If it is not possible or not easy to sample uniformly the volume, one has to find a volume W containing V , which can be sampled and whose volume is known q 2 2 , but The error is then W N  1, if w ∈ V f (w) = 0, if not Stratified sampling Importance sampling Optimization Monte Carlo Methods Monte-Carlo algorithm How to minimize the error Stéphane Paltani The error term depend on the variance of f within V : r < f 2 > − < f >2 V N Can we reduce < f 2 > − < f >2 ? Z Z Z 1 x y z=−1 e5z dz = Markov chains Monte-Carlo Z Z Z e5 /5 ds x y Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization 1 5z e → ds = e5z dz 5 =⇒ Generation of random variables Error estimation If f varies strongly over V (e.g., f (x, y , z) = e5z ), one can perform a change of variable: s= What are Monte-Carlo methods? s=e−5 /5 Equivalent to a transformation method, but useless Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Error estimation Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization Optimization Monte Carlo Methods Stratified sampling Divide and conquer... Stéphane Paltani If F̂ is the estimator of F = Var(F̂ ) = 1 V R V f dV , Var(f ) σ2 ≡ N N If one cut V in two equal volumes V1 and V2 , the new estimator of F is: !   2 2 σ σ 1 1 1 Fˆ0 = Fˆ1 + Fˆ2 =⇒ Var(Fˆ0 ) = + 2 2 4 N1 N2 What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization The minimum is reached for: Ni /N = σi /(σ1 + σ2 ) (σ1 + σ2 )2 =⇒ Var(Fˆ0 ) = 4N One can therefore cut V into many separate pieces, estimate the rms in each of them, and draw the points in each subvolume proportionally to the rms of f Stratified sampling at work. . . Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization Sampling is performed in the subvolumes where σ is large Stratified sampling The MISER algorithm When V has many dimensions d, stratified sampling needs to cut the volume in K d subvolumes The MISER algorithm uses a recursive stratified sampling. It cuts the volume in two parts one dimension at a time: 1. For each dimension i, cut the current volume in two, V1 and V2 , and determine σ1,2 2. Find the most favorable dimension, e.g., by minimizing σ12 + σ22 3. Perform additional MC evaluation in V1 and V2 minimizing the error 4. Iterate 1.–3. as long as the number of evaluations remains not too prohibitive Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? What are Monte-Carlo methods? Generation of random variables Generation of random variables Markov chains Monte-Carlo Markov chains Monte-Carlo Error estimation Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization Optimization Monte Carlo Methods Importance sampling Using a sampling distribution Stéphane Paltani Instead of sampling uniformly, we use a distribution p: Z pdV = 1 V Z Z f dV = =⇒ V V f pdV = p What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo   r 2 2 f < f /p > − < f /p >2 ± p N Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Given N, the square root is minimized if: Importance sampling Optimization p=R V |f | |f |dV The idea of importance sampling is to find a distribution p as close as possible to f . Importance sampling requires some prior knowledge of f Importance sampling The VEGAS algorithm The VEGAS algorithm implements importance sampling with a separable distribution in d dimensions: p ∝ g1 (x1 ) · g2 (x2 ) · g3 (x3 ) . . . · gd (xd ) VEGAS uses an adaptive strategy to determine gi (xi ) 1. Choose arbitrary gi (xi ) (e.g., constant) 2. Cut each axes in K pieces, and determine the integral and the Kd importance sampling weights gi (k ), k = 1, . . . , N 3. Iterate 2. a (small) number of times to improve the weights gi (k ) and the integral VEGAS efficiency depends strongly on the validity of the separability of f , and will be small if the integrand is dominated by a subspace of dimension d1 < d Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Monte-Carlo algorithm Stratified sampling Importance sampling Optimization Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Numerical integration Swarm intelligence Genetic algorithms Simulated annealing Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Stochastic optimization The problem of local minima Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Minimization of complex functions in many dimensions often shows multiple minima Numerical integration Optimization Stochastic optimization Swarm intelligence Random (i.e., Monte-Carlo) approaches can help In Nature, the problem is often encountered. . . and solved (or approximated) Genetic algorithms Simulated annealing Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Numerical integration Swarm intelligence Genetic algorithms Simulated annealing Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Swarm intelligence The ant colony optimization Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Numerous agents explore the landscape with simple interactions between them Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Ants move at random in the solution space and deposit pheromones behind them, which attract other ants Shorter paths are crossed more rapidly by ants, so more pheromone is deposited, attracting more ants Try also bees! Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Numerical integration Swarm intelligence Genetic algorithms Simulated annealing Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Genetic algorithms Principle Stéphane Paltani Use Darwinian evoution of a gene pool to find the fittest genes I I I Each chromosome represents a candidate solution to the problem at hand, represented as a series of genes A fitness function can be calculated for each chromosome At each generation 1. 2. 3. 4. I Monte Carlo Methods Genes mutate, Chromosomes reproduce sexually Non-biological mechanisms are also possible Chromosomes are selected according to their fitness After the stopping criterion is used, the best chromosome provides the approximate solution What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Genetic algorithms Example Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation You want to build the best possible average spectrum with a sample of N low-S/N high-redshift galaxies, but you know the sample contain 50% interlopers (galaxies with wrongly identified redshifts) I I I The chromosomes are a series of N bits, half of them set to 1, the rest to 0 The fitness function is the summed equivalent width of the expected line features Mutation are random inclusion or exclusion of galaxies Numerical integration Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Outline Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Numerical integration Swarm intelligence Genetic algorithms Simulated annealing Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Simulated annealing Principle Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration In nature, crystal (e.g., ice) can form over large areas, thus minimizing internal energy Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Annealing is the process of slowly cooling a material to relieve internal stress Nature performs an annealing by slowly decreasing temperature, so that misaligned ice crystals can melt and freeze in a more favorable direction Simulated annealing Implementation Simulated annealing requires: 1. A configuration of all possible states S, i.e. the space of the problem 2. An internal energy E, which we want to minimize 3. “Options” for the next state after the current state, similar to updates in Metropolis algorithms 4. A temperature T which is related to the probability to accept an increase in energy:   −∆E S0 is accepted if ς ∼ U(0, 1) < exp kT 5. A cooling schedule, which determines how T evolve with the number of iterations Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Simulated annealing Implementation II Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? I I I I {S} and E are the problem to solve, i.e. we want Smin | E(Smin ) < E(S), ∀S 6= Smin The options are a tricky point, as a completely random move in many dimensions might be very inefficient in case of narrow valleys. The downhill simplex (amoeba) algorithm can be tried The acceptance condition makes it similar to the Metropolis algorithm There are also several possibilities for the cooling schedule, for instance T → (1 − ε)T at each step Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing Simulated annealing Example Monte Carlo Methods Stéphane Paltani What are Monte-Carlo methods? Generation of random variables Markov chains Monte-Carlo Error estimation Numerical integration Optimization Stochastic optimization Swarm intelligence Genetic algorithms Simulated annealing