Multivariate random variables
Last updated
Last updated
As alluded to on the previous page, econometricians are often interested in the relationship between two or more random variables. Therefore, we need measures that give us insight into the joint probability distribution of these random variables.
An important quantity is the covariance, which is a measure of joint variability between two random variables. The sign of the covariance (i.e. + or -) indicates whether the random variables generally move in the same or opposite direction. The interpretation of a covariance is generally not straightforward, because the values it can take are not bounded. For that reason, the measure correlation which only takes values between -1 and +1 is often preferred.
Whereas the previous measures are useful, they only tell part of the story. More specifically, they only provide information on how two random variables are related linearly. Correlation is however just one way for random variables to be dependent. Note for example that it is possible for two random variables to be uncorrelated, yet dependent.
Example. Suppose we have two random variables and , such that . Assume that can only take values and each with equal probability (alternatively, we say that is uniformly distributed on ). Note that, using elementary results from probability theory:
,
.
Because , and are uncorrelated. We can however show that they are dependent:
,
.
Because (for example) , and are dependent.
So far we have focused on the relation between only two random variables. However, often we deal with (many) more random variables. The multivariate generalizations of the concepts we discussed above are the covariance matrix and the correlation matrix. We characterize one of these in the example below.
Example. Suppose we are given three random variables . Then we can define the covariance matrix of these three random variables as
.
In words: the matrix has the respective variances of the three random variables on the diagonal, while the off-diagonal elements consist of the covariances.
Thus, in the case of multivariate random variables we often use vectors and matrices to "summarize" the relationships between random variables. More generally, we use concepts from linear algebra to investigate the properties of statistical quantities. For example, one could check whether the covariance matrix above has full rank (which is a necessary condition for the matrix to be invertible).
Let us now go back to the concept of joint distribution. Consider the case of two random variables and . The joint distribution, denoted , then determines the probability distribution on all pairs of outcomes. Simply said, it considers all possible combinations of outcomes of and assigns a probability to all of these outcomes. However, the random variables and also still have their own, separate distributions, denoted and which we call marginal distributions.
The joint and marginal distributions are related through conditional distributions, denoted and . The concept of conditioning plays a central role in econometrics and allows us to ask questions such as "What is the probability of given that takes on a certain value ?". That is, knowledge of the outcome of a related variable might influence the likelihood of . It is generally not straightforward to work with conditional distributions, conditional means and conditional variances. But despair not, there are useful theorems out there, such as the law of total expectation and the law of total variance, that aid in simplifying difficult computations.