The process of determining a regression or prediction equation to predict Y from X , with all the method of least squares. In the resulting regression line, the sum of the squared discrepancies between the actual dependent values and the corresponding values predicted by the line are as small as possible, hence the name 'least squares'" (Hassard, 1991). The estimated regression equation is: Y = ß0 + ß1X1 + ß2X2 + ß3D + ê Where the ßs are the OLS estimates of the Bs. OLS minimizes the sum of the squared residuals

## OLS minimizes SUM ê2

The residual, ê, is the difference between the actual Y and the predicted Y and has a zero mean. In other words, OLS calculates the slope coefficients so that the difference between the predicted Y and the actual Y is minimized. The residuals are squared so as to compare negative errors to positive errors more easily. The properties are: 1. The regression line defined by 1 and 2 passes through the means of the observed values 2. The mean of the predicted Y's for the sample will equal the mean of the observed Ys for the sample. 3. The sample mean of the residuals will be 0. 4. The correlation between the residuals and the predicted values of Y will be 0. 5. The correlation between the residuals and the observed values of X will be 0. ## Stationarity

Stationarity can be defined as a time series yt is covariance (or weakly) stationary if, in support of if, its mean and variance are both finite and outside of time, and the auto-covariance doesn't overgrow time, for those t and t-s, 1. Finite mean E (yt) = E (yt-s) = Âµ 2. Finite variance Var (yt) = E [(yt-Âµ) 2] = E [(yt-s - Âµ) 2] = 3. Finite auto-covariance Cov (yt, yt-s) = E [(yt-Âµ) (yt-s - Âµ)] = ÃŽÂ³s ## Non-Stationarity

The variance is time dependent and visits infinity as time strategies to infinity. A time series which is not stationary depending on mean can be done stationary by differencing. Differencing is a popular and effective method of removing a stochastic trend from a series. Nonstationarity in a time series occurs individuals no constant mean, no constant variance or those two properties. It could possibly originate from various sources nevertheless the most crucial one is the unit root. ## Unit root

Any sequence that contains one or more characteristic roots which can be comparable to is known as a unit root process. The most convenient model which will contain a unit root may be the AR (1) model. Look at the autoregressive process of order one, AR (1), below Yt = Ã‰Â¸Yt-1 + ÃŽÂµt Where ÃŽÂµt denotes a serially uncorrected white-noise error term which has a mean of zero and also a constant variance If Ã‰Â¸ = 1, becomes a random walk without drift model, that is certainly, a nonstationary process. 2, we face precisely what is called the unit root problem. This means that we're facing a scenario of nonstationarity in the series. If, however, Ã‰Â¸ < 1, then this series Yt is stationary. The stationarity on the series is essential because correlation could persist in nonstationary time series whether or not the sample is quite large and might end in what is called spurious (or nonsense) regression (Yule, 1989). The unit root problem can be solved, or stationarity can be performed, by differencing the info set (Wei, 2006). ## Testing of Stationarity

If the time series features a unit root, the series is considered to be non-stationary. Tests which may be helpful to confirm the stationarity are: 1. Partial autocorrelation function and Ljung and Box statistics. 2. Unit root tests. To check the stationarity and when there may be presence of unit root inside the series, one of the most famous with the unit root tests are the ones derived by Dickey and Fuller and described in Fuller (1976), also Augmented Dickey-Fuller (ADF) or said-Dickey test has become mostly used. ## Dickey-Fuller (DF) test:

Dickey and Fuller (DF) considered the estimation of the parameter ÃŽÂ± from the models: 1. A simple AR (1) model is: yt â‚¬Â½â‚¬Â ÂÂ¡â‚¬Â yt-1 â‚¬Â«â‚¬Â ÂÂ¥ 2. Yt = Âµ + ÃŽÂ±yt-1 + ÃŽÂµt 3. Yt = Âµ + ÃŽÂ²t + ÃŽÂ±yt-1 + ÃŽÂµt It si assumed that y0 = 0 and ÃŽÂµt ~ independent identically distributed, i.i.d (0, ÃÆ’2) The hypotheses are: H0: ÃŽÂ± = 1 H1: |ÃŽÂ±| < 1 The ADF test may be tested on at the least three possible models: (i) A pure random walk without a drift. This is defined by while using constraint ÃŽÂ±= 0, ÃŽÂ² = 0 and ÃŽÂ³ = 0. This may lead to the equation Ã¢Ë†â€ yt = Ã¢Ë†â€ yt-1 + ÃŽÂµt The Equation above is a nonstationary series because its variance grows with time (Pfaff, 2006). (ii) A random walk with a drift. This is obtained by imposing the constraint ÃŽÂ² = 0 and ÃŽÂ³ = 0 which yields to the equation Ã¢Ë†â€ yt = ÃŽÂ± + Ã¢Ë†â€ yt-1 + ÃŽÂµt (iii) A deterministic trend with a drift. For ÃŽÂ² Ã¢â€°Â 0, becomes the following deterministic trend with a drift model Ã¢Ë†â€ yt = ÃŽÂ± + ÃŽÂ²t + Ã¢Ë†â€ yt-1 + ÃŽÂµt The sign of the drift parameter (ÃŽÂ±) causes the series to wander upward if positive and downward if negative, whereas the length of the value aspects the steepness of the series (Pfaff, 2006). ## Augmented Dickey-Fuller (ADF):

Augmented Dickey-Fuller test can be an augmented version on the Dickey-Fuller test to accommodate some varieties of serial correlation and useful for an increased and much more complicated list of time series models. If you find higher order correlation then ADF test is used but DF is utilized for AR (1) process. The testing strategy of the ADF test matches for that Dickey-Fuller test but we look at the AR (p) equation: ## yt â‚¬Â½â‚¬Â ÂÂ¡â‚¬Â â‚¬Â«â‚¬Â Ââ‚¬Â t â‚¬Â«â‚¬Â ÂÂ¢â‚¬Â y â‚¬Â«â‚¬Â â‚¬Â iyt-1 + ÃŽÂµt

Assume that there is for the most part one unit root, thus the operation is unit root non-stationary. After reparameterize this equation, we get equation for AR (p): ## Ââ€žâ‚¬Â yt â‚¬Â½â‚¬Â ÂÂâ‚¬Â â‚¬Â«â‚¬Â Ââ‚¬Â t â‚¬Â«â‚¬Â ÂÂ¡â‚¬Â yt-1 â‚¬Â«â‚¬Â â‚¬Â iÂâ€žâ‚¬Â yt-i â‚¬Â«â‚¬Â ÂÂ¥t

Each version from the test have their critical value which will depend on how big the sample. In each case, the null hypothesis is we have a unit root, ÃŽÂ³ = 0. Within tests, critical values are calculated by Dickey and Fuller and is also dependent upon whether it has an intercept and, or deterministic trend, be it a DF or ADF test. Test has problems. It's got low statistical power to reject a unit root, and power is reduced by having the lagged differences. The ADF test is also affected by size distortions that occur every time a large first-order moving average component exists inside the time series. Diebold and Rudebusch (1991) show the test has low power against the alternative of fractionally integrated series. Perron (1989, 1993) show that whenever a period of time series is generated by way of a procedure that is stationary in regards to broken trend, standard DF tests of an I(1) null might have very lower power. Alternatively, Leybourne, Mills and Newbold (1998) show that after a moment series is generated by way of a process that is I(1), however intense break, routine putting on the DF test may result in a severe problem of spurious rejection on the null when the break is at the outset of the sample period. ## Granger Causality test

Granger (1980) Granger causality measures whether one thing happens before another thing and helps predict it - and nothing else. Granger's definition1 for probabilistic causality assumes three basic axioms: (1) The cause must precede the effect in time, (2) The cause contains some unique information concerning the effect's future value, (3) While the strength of causal relations may vary over time, their existence and direction are time-invariant (Granger, 1980; 1988a, b). The general definition for probabilistic causality: ## If F (Yt+jÃ¢â€â€šUt) Ã¢â€°Â F (YT+jÃ¢â€â€šUt - Xt),

## Then Xt causes Yt+j;

states that if the j-step-ahead (where j represents the time delay between the perceived cause and effect) conditional probability distribution (P) of random variable Yt+j in period t + j is changed by removal of X from the universal information set (U) existing in period 1, then X, causes U, would contain all possible information in existence up to and including period t. Xt, would contain all past and present values of variable X. The change would be due to some unique information Xt, has concerning Y's future distribution. If X occurs, and X and Y arc causally related, Y's probability of future occurrence changes. Note that Ut, includes Y, so that Xt, contains some information about the value of future Y not found in past or present Y (Granger, 1980; 1988a, b). The general definition implies that if a variable X causes variable Y, then if one is trying to forecast a distribution of future Y, one will frequently he better off using the information contained in past and present values of X (Granger, 1980; 1988a, b). GRANGER (1980), noting the absence of a universally accepted definition for causality, offered a probabilistic definition which he suggested might be useful in econometric research. Granger (1980) proposed two operational definitions which he derived from his general one. The first he referred to as causality-in-mean. The second he referred to as full causality or causality-in-distribution. Full causality is preferred to mean causality when decision-making populations are characterized by non-linear utility functions (Ressler and Kling, 1990). Ashley et al. (1980) proposed and applied a method of testing for a mean causal relationship between two variables. Given a prior belief that X caused Y, mean causality was inferred if the mean squared error of a one-step-ahead point forecast of Y from a bivariate model (an information set of past and present Y and X) was significantly less than that from a univariate model (past and present Y) over the same out-of-sample period. 1 Source "TESTING FOR GRANGER'S FULL CAUSALITY" by Ted Covey and David A.Bessler 2Granger causality tests are mostly used in situations where we are willing to consider 2-dimensional systems. If the data are well described by a 2-dimensional system ("no zt variables") the Granger causality concept is likely to be straightforward to think about and to test, noting that there are special problems with testing for Granger causality in co-integrated relations (see Toda and Phillips (1991). ## Engle and Granger

A non-stationary time series of which exhibit a good-term equilibrium relationship tends to be said to become cointegrated. The potential of non-fixed time series to possibly be cointegrated was considered inwards 1970'S by Engle and also Granger. Many people define cointegrated specifics in their own paper coming from 1987 in the following approach. Consider two non-stationary time series, yt and xt where each of the time series become stationary after differencing once, i.e. they are both are structured associated with, I(1). These non-stationary time series are then said to be cointegrated of order one-one, CI(1,1) if there exists a cointegrating vector ÃŽÂ± that in a linear combination of the two variables yields a stationary term ÃŽÂ¼t ~ I(0), in the regression ÃŽÂ¼t = yt - ÃŽÂ±xt. Cointegration signifies that these kind of nonstationary specifics contribution an extended operate human relationship, and so the brand new time series from pairing the actual connected non-standing time serial is actually fixed, i.e. the this deviations have limited alternative and also a regular necessarily mean. On the whole, two series are cointegrated when they are both integrated of order d, I(d) along with a linear blend of them includes a lower order of integration, (d-b), where b>0. Time series need to be non-stationary to allow them to be able to be cointegrated. Thus, one stationary variable and one non-stationary variable cannot have a long-term co-movement, because the first you've gotten a constant mean and finite variance, whereas your second one does not, hence the gap between your two will not be stationary. But, if there are more than two time series within a system, it is also possible to help them to have different order of integration. Consider three time series, yt ~ I (2), xt ~ I (2), qt ~ I(1). If yt and xt are cointegrated, to ensure that their linear combination brings about a disturbance term ÃŽÂ¼t = yt - ÃŽÂ±xt that is integrated of order 1, I(1), then it is potentially feasible that ut and qt are cointegrated with resulting stationary disturbance term st = qt - ÃŽÂ²ut., where ÃŽÂ±,ÃŽÂ² are cointegrating vectors. Generally, with n integrated variables there can potentially exist nearly to n-1 cointegrating vectors. This does not necessarily mean that each one integrated variables are cointegrated. It is possible to find one example is a couple of 1(d) variables that is not cointegrated. If variables are integrated of different orders, they can be cointegrated. However, you'll be able to have cointegration with variables of various orders. Pagan and Wickens (1989: 1002) illustrate this point clearly that it's possible to uncover cointegration among variables of orders (when there are many than two variables). Enders (2004: 323) agrees with Pagan and Wickens (1989) it is possible to discover cointegration among sets of variables that are integrated of orders. This takes place when there are other than two variables. This is backed up by Harris (1995: 21). ## Vector Auto-regression (VAR)

Vector autoregressions (VARs) were introduced into empirical economics by Sims (1980), who demonstrated that VARs offer a flexible and tractable framework for analyzing economic time series. Vector Auto-regression (VAR) can be an econometric model has been utilized primarily in macroeconomics to capture the connection and independencies between important economic variables. As outlined by Brooks and Tsolacos (2010) one benefit of VAR modeling is the fact that all of the variables are endogenous. Consequently organic meat is capable of capture more features of the results so we are able to use OLS separately on each equation. Brooks and Tsolacos (2010) also talk about Sims (1972) and Mcnees (1986) that VAR models often perform a lot better than traditional structural models. Additionally they indicate some disadvantages, one of these being that VAR models can be a-theoretical by nature. Lag-length determination is a concern critical to finding the most beneficial VAR specification. They cannot rely heavily on economic theory except for selecting variables to be within the VARs. The VAR can be viewed as a method of conducting causality tests, or even more specifically Granger causality tests. VAR can often test the Causality as; Granger-Causality makes it necessary that lagged values of variable 'X' matched to subsequent values in variable 'Y', keeping constant the lagged values of variable 'Y' and some other explanatory variables. In association with Granger causality, VAR model gives a natural framework to try the Granger causality in between each pair of variables. VAR model estimates and describe the relationships and dynamics of a set of endogenous variables. For a set of 'n' time series variables yt = (y1t, y2t..., ymt)', a VAR model of order p (VAR (p)) can be written as: yt â‚¬Â½â‚¬Â A0 â‚¬Â«â‚¬Â A1 yt-1 â‚¬Â«â‚¬Â A2 yt-2 â‚¬Â«â‚¬Â®â‚¬Â®â‚¬Â®â‚¬Â â‚¬Â«â‚¬Â Ap yt-p â‚¬Â«â‚¬Â ÂÂ¥t For just a set of 'n' time series variables yt = (y1t, y2t..., ymt)', a VAR type of order p (VAR (p)) can be written as: yt = A0 + A1 yt-1 + A2 yt-2 +... + Ap yt-p + et Where, p = the quantity of lags to get considered from the system. n = the amount of variables to become considered in the system. yt is definitely an (n.1) vector containing each of the 'n' variables in the VAR. A0 is surely an (n.1) vector of intercept terms. Ai is usually an (n.n) matrix of coefficients. ÃŽÂµt is usually an (n.1) vector of error terms. A critical take into account the specification of VAR models will be the resolution of the lag length of the VAR. Various lag length selection criteria are defined by different authors like, Akaike's (1969) final prediction error (FPE), Akaike Information Criterion (AIC) suggested by Akaike (1974), Schwarz Criterion (SC) (1978) and Hannan-Quinn Information Criterion (HQ) (1979). ## Impulse response functions

An impulse response function (IRF) traces the consequences of the one-time shock one on the innovations on current and future values with the endogenous variables. If your innovations ÃŽÂµt is contemporaneously uncorrelated, the interpretation on the impulse fact is straightforward. The ith innovation ÃŽÂµi, t is only a shock for the ith endogenous variable yi,t. In accordance with Runkle (1987), reporting impulse response functions without standard error bars matches reporting regression coefficients without t-statistics. In numerous empirical studies impulse response functions are already utilized to distinguish temporal from permanent shocks (Bayoumi and Eichengreen, 1994), in your case they'll be helpful to determine the extent to which every endogenous variable reacts for an innovation of each one variable. Traditionally, VAR studies do not report estimated parameters or standard test statistics. Coefficients of estimated VAR systems are thought of little utilization in themselves plus the high (i.e. P Ãƒ- (k Ãƒ- k) autoregressive coefficients) number of them will not invite for individual reporting. Instead, the approach of Sims (1980) is usually employed to summarize the estimated VAR systems by IRF. IRF traces out of the effect of your exogenous shock or an innovation in the endogenous variable on each of the endogenous variables in the system as time passes, to provide an answer towards following question: "Is there a effect of any shock of size ÃŽÂ´ within the system at time t about the state with the system at time t + Ãâ€ž, without other shocks?" Especially, VAR's impulse responses mainly examine the way the dependent variables respond to shocks from each independent variable. The accumulated link between units impulses are measured by appropriate summation with the coefficients of the impulse response functions (Lin 2006). However, Lutkepohl and Reimers (1992) stated the traditional impulse response analysis requires orthogonalization of shocks. And also the results vary with the ordering of the variables inside VAR. The greater correlations between residuals are, a lot more important the variable ordering is. So as to overcome this challenge, Pesaran and Shin (1998) developed the generalized impulse response functions which adjust the influence of any different ordering with the variables on impulse response functions. To spot orthogonalised innovations in each one of the variables as well as the dynamic responses to such innovations, the variance-covariance matrix from the VAR was factorized when using the Cholesky decomposition method suggested by Doan (1992). This process imposes an ordering on the variables within the VAR and attributes every one of the outcomes of any common components towards first variable within the VAR system. The impulse response functions are generated by way of a Vector Moving Average (VMA), a representation of any VAR in standard form with regards to current and past values of the innovations (ÃŽÂµt). We derive the VMA, assuming you can find just one lag without constant term. ## yt = ÃŽÂ 0 + ÃŽÂ 1yt-1 +ÃŽÂµt

ÃŽÂ 1 is really a matrix of coefficients in the reduced form and ÃŽÂ 0 is usually a vector of constants. Lagging this method one period and substituting for yt-1: ## yt = ÃŽÂ 0 + ÃŽÂ 1 (ÃŽÂ 0 + ÃŽÂ 1 yt-2 + ÃŽÂµt-1) + ÃŽÂµt

## = (I + ÃŽÂ 1) ÃŽÂ 0 + t-2 + ÃŽÂ 1ÃŽÂµt-1 + ÃŽÂµt

if we go on substituting n times, eventually we have the following expression: ## yt = (I+ÃŽÂ 1 +Ã¢â‚¬Â¦ +0+t-n+1 + t-i