Math, Statistics & Malaysia: Monte Carlo Integration

1. What is Monte Carlo Integration?

Monte Carlo Integration is a tools that can be used by researchers to compute integration of complex function. For instance, the Normal ($\mu, \sigma$) distribution (or called as the cumulative probability distribution of Normal) which is written as the probability of the variable $X$ to be less than a certain value $x$ (or can be written as $Pr(X \le x$ ) is hard to be solved analytically as the integrals result to a 'non-elementary' integrals. This can be seen through the following equation

$$ Pr(X \le x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2\pi \sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} dx.$$

2. When to apply Monte Carlo Integration?

We can use this technique when computation on the value of a complex integrals is needed. For instance, in my master's research, I need to calculate a double integrals which is in the following form.

$$\int_{0}^{x} \int_{0}^{t} \frac{1}{\sqrt{2\pi \sigma^2}}\lambda e^{u\beta} e^{-e^{u\beta}\lambda v} e^{-\frac{(u-\mu)^2}{2\sigma^2}} dv du$$

where $X$ variable represents biomarker and $T$ variable represents time-to-event. The double integration cannot be solved analytically and other numerical techniques results to a tedious procedure of trial-and-error when setting the initial value. Thus, Monte Carlo Integration techniques came to the rescue.

3. Why it works?

This technique works because it uses random numbers to perform the estimation. In probability theory, if we know that a variable (let's say $Y$) is distributed randomly in a certain fashion such as $g(y)$ distribution (i.e: Normal, Weibull, Exponential, Poisson, etc.), then we can calculate what is the expected value of a function $f(y)$ by solving the following equation

$$ E[f(y)] = \int f(y) \times g(y) dy$$

if $g(y)$ is a continuous distribution and

$$ E[f(y)] = \sum f(y) \times g(y) $$

if $g(y)$ is a discrete distribution.

In statistics, the $E[f(y)]$ can be estimated using its unbiased estimator $\hat{E[(f(y)]}$ which is

$$ \hat{E[f(y)]} = \frac{1}{n} \sum_{i}^{n} f(y_{i})$$

or in laymen terms is called the average of our $y_i$'s.

4. How to perform Monte Carlo Integration?

Need to assume our observation $y_i$'s is coming from a specific distribution. (Typically a Uniform(0,1) distribution. However, in the following examples, I will use the Normal and Exponential distribution).
Generate a lot of $y_i$'s.
For each $i$, if $y_i \le a$, then compute $f(y_i)$. If $y_i > a$, then assign 0.
Compute the average value.

EG:

a. Find the $E[X^3]$ for $x \in [0,10]$ if $X$ is following an Exponential(0.3) distribution.

Solution:

Here, we want to compute the following integral.

$$ \int_{0}^{10} x^3 (0.3)e^{-0.3x} dx.$$

If we use Monte Carlo Integration, our answer can be approximated by

$$ \hat{E[X^3]} = \frac{1}{N}\sum_i (x_i)^3$$

where $x_i$ is randomly generated through Exponential distribution.

Using the R script below, we get the estimated answer as 77.13

x_i <- rexp(10000, rate=0.3) # generate x randomly
# create indicator function
f<- function(x){
  if(x <= 10){
    return(x^3)
  }
  else{
    return(0)
  }
}

f_xi <- lapply(x_i, f)

mean(unlist(f_xi)) # compute the average

## [1] 77.1272

When solving analytically, the question above is solvable with integration by parts (IBP). We will obtain the following result when use the IBP.

$$ E[X^3] = 0.3[-x^3 e^{-0.3x}(1/0.3) -3x^2 e^{-0.3x}(1/0.09) -6x e^{-0.3x}(1/0.027) - 6e^{-0.3x}(1/0.0081)]_{0}^{10}$$

$$ E[X^3] = 0.3(-479.4310287 + 740.7407407) = 78.3929136.$$

b. Find the $Pr(X \ge 5)$ where $X$ is coming from a Normal($\mu=3,\sigma=0.8$) distribution.

Solution:

Here, we are trying to solve the following equation

$$ \int_{5}^{\infty} 1 \times \frac{1}{\sqrt{2\pi (0.8^2)}} e^{-\frac{(x-3)^2}{2(0.8^2)}} dx$$

Our Monte Carlo estimator can be calculated as follows

$$ \hat{E[X^3]} = \frac{1}{N}\sum_i (1_i)$$

where $1_i$ is our indicator function with the following rule

$$ 1_i = \left\{\begin{array}{l} 0 , x_i < 5 \\ 1 , x_i \ge 5 \end{array} \right.$$

Using the R script below, the estimated result is

x_i <- rnorm(10000, mean=3, sd=0.8) # generate x randomly
# create indicator function
f<- function(x){
  if(x >= 5){
    return(1)
  }
  else{
    return(0)
  }
}

f_xi <- sapply(x_i, f)

mean(unlist(f_xi)) # compute the average

## [1] 0.0061

# comparing with the true value
1 - pnorm(5, mean=3, sd=0.8)

## [1] 0.006209665

c. For 2-dimension problem, it is a little bit complex to show the R script but the following are the steps that we can use to estimate the solution.

Let's say that we need to find the solution to the following integrals,

$$ \int_{d}^{c}\int_{b}^{a} f(x,y) dxdy$$

The steps to use are:

Assume both $X, Y$ variables are following a Uniform(a,b) and Uniform(c,d) respectively.
Randomly generate a lot of $x$'s and $y$'s.
The answer is approximate as

$$ (d-c)(b-a) \frac{1}{N}\sum_{i}^{N} f(x_i,y_i).$$

d. Find area of a circle with radius 2.

Solution:

Since a circle can be divided into four quadrant, it is sufficient for us to just compute the area of a single quadrant before multiply the answer by four.

Recalled that with centre (0,0), a circle can be expressed as

$$ x^2 + y^2 = r^2,$$

thus we only need to solve the following integrals

$$ \int_{0}^{2} \int_{0}^{2} x^2 + y^2 dxdy$$

To apply our Monte Carlo Integration, we just need to manipulate the above integrals to be multiplied with any desired probability distribution. For this case, we choose Uniform(0,3) & Uniform(0,3) for our $x$ and $y$. Let $g(x)$ and $f(y)$ denote the chosen Uniform distributions, the integrals above is extended into

$$ \int_{0}^{2} \int_{0}^{2} x^2 + y^2 dxdy = \int_{0}^{2} \int_{0}^{2} (x^2 + y^2) g(x)f(y) \frac{1}{g(x)f(y)} dxdy$$

$$= \frac{1}{g(x)f(y)} \int_{0}^{2} \int_{0}^{2} (x^2 + y^2) g(x)f(y) dxdy$$

The final expression simply means that we just need to find the expected value of $x^2+y^2$ where $x$ & $y$ is randomly generated from Uniform distributions. Denote the Uniform(0,3) density function is written as follows

$$ h(x) = \frac{1}{3-0},$$

hence the desired integrals is approximated as

$$ \approx (3-0)(3-0) \frac{1}{N} \sum_{i}^{N} x_{i}^2 + y_{i}^2$$

We can further reduced the Monte Carlo Integration to be expressed into

$$ (3-0)(3-0) \frac{1}{N} \sum_{i}^{N}1_i$$

$$ 1_i = \left\{\begin{array}{l} 0 , x_{i}^2 + y_{i}^2 > 2^2 \\ 1 , x_{i}^2 + y_{i}^2 \le 2^2 \end{array} \right.$$

The following is the R script used.

x_i <- runif(10000, 0, 3) # generate x randomly
y_i <- runif(10000, 0, 3) # generate y randomly

# create indicator function
f<- function(x,y){
  if(x^2 + y^2 <= 2^2){
    return(1)
  }
  else{
    return(0)
  }
}

result <- matrix(NA,ncol=1, nrow=10000)
for (i in 1:10000){
  result[i] <- f(x_i[i],y_i[i])
}

4*(3-0)*(3-0)*mean(result) # 4 * average in a quadrant

## [1] 12.5352

# comparing with the true value
pi*2^2

## [1] 12.56637

5. Conclusion.

We have seen how Monte Carlo Integration works. In particular, I have shown how we can use this numerical technique to solve integration problems which revolved around computation of a complex function. We also have seen how this method can be extended into multiple dimension and how we can use any known distribution to sample our observation before calculating its average (refer example a,b,c). Moreover, if our integrals is not involving any probability function, we still can use this technique by manipulating our integrals to contain a probability function (refer example d).

With the improvement on the speed and efficiency of our computer nowadays, this technique has become more practical to be applied by researchers and professionals in solving their daily problem.

😄😄😄

Math, Statistics & Malaysia

Monday, September 18, 2023

Monte Carlo Integration

1. What is Monte Carlo Integration?

2. When to apply Monte Carlo Integration?

3. Why it works?

4. How to perform Monte Carlo Integration?

5. Conclusion.

No comments:

Post a Comment