f(x)={hc−a(x−a)if a≤x≤chc−b(x−b)if c<x≤b0otherwise (1)
Integrating the pdf in equation (1) to solve for h
∫f(x)dx=1
hc−a∫ca(x−a)dx+hc−b∫bc(x−b)dx=h(b−a)2
h=2b−a (2)
Substituting back into equation (1),
f(x)={2(b−a)(c−a)(x−a)if a≤x≤c2(b−a)(c−b)(x−b)if c<x≤b0otherwise (3)
Integrating equation (3) to find E(x),
E(X)=∫xf(x)dx=hc−a∫ca(x2−ax)dx+hc−b∫bc(x2−bx)dx
=a+b+c3 (3)
V(X)=E(X2)−(E(X))2=∫x2f(x)dx−(a+b+c3)2
=hc−a∫cax2(x−a)dx+hc−b∫bcx2(x−b)dx−(a+b+c3)2
=a2+b2+c2−ab−ac−bc18
Define:
al=logϕ(a), bl=logϕ(b), cl=logϕ(c), h=2bl−al, ϕ=log base
f(z)={hcl−al(z−al)if al≤z≤clhcl−bl(z−bl)if cl<z≤bl0otherwise (4)
However,
E(ϕz)≠ϕE(z) (5)
Therefore, transforming…
Y=ϕZ
Z=logϕ(Y)
w(y)=logϕ(y)
w′(y)=dzdy=1ylog(ϕ)
g(y)=f(w(y))w′(y)
g(y)={2(cl−al)(bl−al)log(ϕ)logϕ(y)−alyif 0<a≤y≤c2(cl−bl)(bl−al)log(ϕ)logϕ(y)−blyif c<y≤b0otherwise (5)
Define:
β1=2(cl−al)(bl−al)
β2=2(cl−bl)(bl−al)
Finding the CDF,
G(y)=∫y−∞g(y)dy
for a≤y≤c, G(y)=β1log(ϕ)∫yalog(y)ylog(ϕ)−alydy
=β1[log2ϕ(y)2−allogϕ(y)−a2l2+a2l]
for c<y≤b, G(y)=G(c)+β2log(ϕ)∫yclog(y)ylog(ϕ)−blydy
=G(c)+β2[log2ϕ(y)2−bllogϕ(y)−c2l2+blcl]
Checking that the CDF is 1 at b,
G(b)=c2l−2alcl+a2l(cl−al)(bl−al)+−b2l−c2l+2blcl(cl−bl)(bl−al)
=cl−albl−al+−(cl−bl)bl−al=1
Now calculating E(y),
E(y)=∫y g(y) dy
=β1log(ϕ)∫ca[log(y)log(ϕ)−al]dy+β2log(ϕ)∫bc[log(y)log(ϕ)−bl]dy
=cβ1log2(ϕ)[log(c)−1−log(a)+ac]+cβ2log2(ϕ)[−bc−log(c)+1+log(b)]
E(x)=a+b+c3
c=3E(x)−a−b=3E(x)−min
The procedure for maximum likelihood estimation involves maximizing the likelihood with respect to c for a fixed a and b, followed by minimizing the negative log likelihood with respect to a and b for a fixed c.
This discussion follows the results from Samuel Kotz and Johan Rene van Dorp. Beyond Beta
For the purposes of this section, with a fixed a and b, the sample can be easily rescaled to a=0 and b=1. This section will proceed on [0,1] with the mode at 0 \le c \le 1
w(x) = \left\{ \begin{array}{ll} \frac{2x}{c} & \mbox{if } 0 \le x \lt c \\ \frac{2(1-x)}{1-c} & \mbox{if } c \le x \leq 1 \\ 0 & \mbox{otherwise} \end{array} \right.
L(x|c) = \prod_{i}^{n} w(x|c)
Assume that the sample is ordered into order statistics X_{(1)} \lt \dots \lt X_{(n)}. Also, note that X_{(r)} \le c \lt X_{(r+1)}. In other words, the mode falls between the r^{th} and r+1 order statistics.
L(x|c) = \prod_{i=1}^{r} \frac{2x_{(i)}}{c} \prod_{i=r+1}^{n} \frac{2(1-x_{(i)})}{1-c} = \frac{2^n \prod_{i=1}^{r} x_{(i)} \prod_{i=r+1}^{n} (1-x_{(i)})}{c^r(1-c)^{n-r}}
To maximize the likelihood, we can first maximize with respect to r and then locate c between the r^{th} and r+1 order statistics. For notation purposes, also define X_{(0)} = 0 and X_{(n+1)} = 1.
\large \max_{0 \le c \le 1} L(x|c) = \max_{r \ \epsilon \ (0,\dots,n)} \ \ \max_{x_{(r)} \le c \le x_{(r+1)}} \ \ L(x|c)
Noticing that maximizing the likelihood is equivalent to minimizing the denominator:
\large \max L(x|c) = \max_{r \ \epsilon \ (1,\dots,n-1)} \ \ \min_{x_{(r)} \le c \le x_{(r+1)}} \ \ c^r(1-c)^{n-r}
Since c^r(1-c)^{n-r} is unimodal with respect to c, it should be sufficient to test the end points of an interval to find the minimum on the interval
\large = \max_{r \ \epsilon \ (1,\dots,n-1)} \ \ \min_{c \ \epsilon \ (x_{(r)},\ \ x_{(r+1)})} \ \ c^r(1-c)^{n-r}
Therefore, for this case, it is sufficient to test the likelihood using c at each of the sampled points and find the largest.
\frac{dz}{dc} = rc^{(r-1)}(1-c)^{n-r} + c^r(n-r)(1-c)^{n-r-1}(-1) = c^{(r-1)}(1-c)^{n-r-1}(r - cn)
\frac{dz}{dc} = 0 at c=0,\ 1,\ \frac{r}{n}. At 0 < c < \frac{r}{n}, z is positive, and at \frac{r}{n} < c < 1, z is negative. Therefore, z is unimodal on (0,1).
\large \max L(x|c) = \max_{0 \le c \le x_{(1)}} \prod_{i=1}^{n} \frac{1-x_{(i)}}{1-c} = \prod_{i=1}^{n} \frac{1-x_{(i)}}{1-x_{(1)}}
Choosing the largest endpoint in the interval, creates the smallest denominator, and the largest likelihood.
Therefore, for this case, it is sufficient to test the likelihood using c at the first sampled point.
\large \max L(x|c) = \max_{x_{(n)} \le c \le 1} \prod_{i=1}^{n} \frac{x_{(i)}}{c} = \prod_{i=1}^{n} \frac{x_{(i)}}{x_{(n)}}
Choosing the smallest option in the denominator creates the largest likelihood. Again, it is sufficient to test the likelihood using c at the largest sample point.
For all cases, it is sufficient to compute the sample likelihood using c equal to each of the samples, and choosing the largest likelihood from the n options to find the corresponding c. This calculation is performed with a fixed a and b, so the test must be performed iteratively as a and b are separately optimized.
nLL = -\log(L) = -\log\left(\prod_i^n f(x_i)\right)
= - \sum_i^n \log\left(f(x_i)\right) = - \sum_{i: \ a \le x_i \lt c}^{n_1} \log\left(f(x_i)\right) - \sum_{i: \ c \le x_i \le b}^{n_2} \log\left(f(x_i)\right)
where n = n_1 + n_2
nLL = - \sum_{i}^{n} \log(2) + \log(b-x_i) - \log(b-a) - \log(b-c)
= -n\log(2) + n\log(b-a) + n \log(b-c) - \sum_{i}^{n} \log(b-x_i)
= - \sum_{i}^{n} \log(2) + \log(x_i - a) - \log(b-a) - \log(c-a)
= -n\log(2) + n\log(b-a) + n\log(c-a) - \sum_{i}^{n} \log(x_i - a)
= - \sum_{i: \ a \lt x_i \lt c}^{n_1} \log(2) + \log(x_i - a) - \log(b-a) - \log(c-a) - \sum_{i: \ c \le x_i \lt b}^{n_2} \log(2) + \log(b-x_i) - \log(b-a) - \log(b-c)
= -n\log(2) + n\log(b-a) + n_1\log(c-a) + n_2 \log(b-c) - \sum_{i: \ a \lt x_i \lt c}^{n_1} \log(x_i - a) - \sum_{i: \ c \le x_i \lt b}^{n_2} \log(b-x_i)
The negative log likelihood is not differentiable with respect to c because the limits of the sum (n_1 and n_2) are functions of c. There the gradient and hessian are derived as if c is fixed.
\frac{\partial nLL}{\partial a} = - \frac{n}{b-a}
\frac{\partial nLL}{\partial b} = \frac{n}{b-a} + \frac{n}{b-c} - \sum_i^{n} \frac{1}{b-x_i}
\frac{\partial nLL}{\partial a} = - \frac{n}{b-a} - \frac{n}{c-a} + \sum_i^{n} \frac{1}{x_i - a}
\frac{\partial nLL}{\partial b} = \frac{n}{b-a}
\frac{\partial nLL}{\partial a} = - \frac{n}{b-a} - \frac{n_1}{c-a} + \sum_i^{n_1} \frac{1}{x_i - a}
\frac{\partial nLL}{\partial b} = \frac{n}{b-a} + \frac{n_2}{b-c} - \sum_i^{n_2} \frac{1}{b-x_i}
\frac{\partial^2nLL}{\partial a^2} = - \frac{n}{(b-a)^2}
\frac{\partial^2 nLL}{\partial b^2} = -\frac{n}{(b-a)^2} - \frac{n}{(b-c)^2} + \sum_i^{n} \frac{1}{(b-x_i)^2}
\frac{\partial^2 nLL}{\partial a\partial b} = \frac{\partial^2 nLL}{\partial b\partial a} = - \frac{n}{(b-a)^2}
\frac{\partial^2 nLL}{\partial a^2} = - \frac{n}{(b-a)^2} - \frac{n}{(c-a)^2} + \sum_i^{n} \frac{1}{(x_i - a)^2}
\frac{\partial^2 nLL}{\partial b^2} = - \frac{n}{(b-a)^2}
\frac{\partial^2 nLL}{\partial a\partial b} = \frac{\partial^2 nLL}{\partial b\partial a} = - \frac{n}{(b-a)^2}
\frac{\partial^2 nLL}{\partial a^2} = - \frac{n}{(b-a)^2} - \frac{n_1}{(c-a)^2} + \sum_i^{n_1} \frac{1}{(x_i - a)^2}
\frac{\partial ^2 nLL}{\partial b^2} = -\frac{n}{(b-a)^2} - \frac{n_2}{(b-c)^2} + \sum_i^{n_2} \frac{1}{(b-x_i)^2}
\frac{\partial ^2 nLL}{\partial a\partial b} = \frac{\partial ^2 nLL}{\partial b\partial a} = - \frac{n}{(b-a)^2}
For the optimization of (a,b) given c, we can use the inverse of the hessian of the negative log likelihood for an estimate of the covariance matrix of \hat{a} and \hat{b}. For the variance in \hat{c}, we use the variance of the r^{th} order statistic which corresponds to c. The covariance of (a,b) and c is not computed because the negative log likelihood is not differentiable with respect to c.
Let H denote the Hessian matrix, and let H^{-1}[1,1] be the V(\hat{a}), H^{-1}[2,2] be the V(\hat{b}), and H^{-1}[1,2] = H^{-1}[2,1] be the Cov(\hat{a}, \hat{b}). Then,
V([\hat{a}, \hat{b}, \hat{c}]) = \begin{bmatrix} H^{-1}[1,1] & H^{-1}[1,2] & 0 \\ H^{-1}[2,1] & H^{-1}[2,2] & 0 \\ 0 & 0 & V(\hat{c}) \\ \end{bmatrix}
f(x_{(r)}) = \frac{n!}{(r-1)!(n-r)!} f(x) [F(x)]^{(r-1)}[1-F(x)]^{(n-r)}
A closed form solution to V(X_{(r)}) is not easily obtainable for
the triangle, so numerical integration is used with f(x) as dtriangle
and F(x) as ptriangle
.
V\left(X_{(r)}\right) = \int_a^b x^2 f(x_{(r)}) dx - \left[\int_a^b x f(x_{(r)}) dx \right]^2