Title: | Generalized Correlations, Causal Paths and Portfolio Selection |
---|---|
Description: | Function gmcmtx0() computes a more reliable (general) correlation matrix. Since causal paths from data are important for all sciences, the package provides many sophisticated functions. causeSummBlk() and causeSum2Blk() give easy-to-interpret causal paths. Let Z denote control variables and compare two flipped kernel regressions: X=f(Y, Z)+e1 and Y=g(X, Z)+e2. Our criterion Cr1 says that if |e1*Y|>|e2*X| then variation in X is more "exogenous or independent" than in Y, and the causal path is X to Y. Criterion Cr2 requires |e2|<|e1|. These inequalities between many absolute values are quantified by four orders of stochastic dominance. Our third criterion Cr3, for the causal path X to Y, requires new generalized partial correlations to satisfy |r*(x|y,z)|< |r*(y|x,z)|. The function parcorVec() reports generalized partials between the first variable and all others. The package provides several R functions including get0outliers() for outlier detection, bigfp() for numerical integration by the trapezoidal rule, stochdom2() for stochastic dominance, pillar3D() for 3D charts, canonRho() for generalized canonical correlations, depMeas() measures nonlinear dependence, and causeSummary(mtx) reports summary of causal paths among matrix columns. Portfolio selection: decileVote(), momentVote(), dif4mtx(), exactSdMtx() can rank several stocks. Functions whose names begin with 'boot' provide bootstrap statistical inference, including a new bootGcRsq() test for "Granger-causality" allowing nonlinear relations. A new tool for evaluation of out-of-sample portfolio performance is outOFsamp(). Panel data implementation is now included. See eight vignettes of the package for theory, examples, and usage tips. See Vinod (2019) \doi{10.1080/03610918.2015.1122048}. |
Authors: | Prof. H. D. Vinod, Fordham University, NY. |
Maintainer: | H. D. Vinod <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.6 |
Built: | 2025-03-04 04:48:45 UTC |
Source: | https://github.com/cran/generalCorr |
This internal function calls the kern
function to implement kernel regression
with the option residuals=TRUE
and returns absolute residuals.
abs_res(x, y)
abs_res(x, y)
x |
vector of data on the dependent variable |
y |
vector of data on the regressor |
The first argument is assumed to be the dependent variable. If
abs_res(x,y)
is used, you are regressing x on y (not the usual y on
x)
absolute values of kernel regression residuals are returned.
This function is intended for internal use.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_res(x,y) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_res(x,y) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y, with the option ‘gradients = TRUE’ and finally 3) compute the absolute values of gradients
abs_stdapd(x, y)
abs_stdapd(x, y)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
The first argument is assumed to be the dependent variable. If
abs_stdapd(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression gradients are returned after standardizing the data on both sides so that the magnitudes of amorphous partial derivatives (apd's) are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdapd(x,y) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdapd(x,y) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y and a matrix of control variables, with the option ‘gradients = TRUE’ and finally 3) compute the absolute values of gradients
abs_stdapdC(x, y, ctrl)
abs_stdapdC(x, y, ctrl)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
ctrl |
Data matrix on the control variable(s) beyond causal path issues |
The first argument is assumed to be the dependent variable. If
abs_stdapdC(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression gradients are returned after standardizing the data on both sides so that the magnitudes of amorphous partial derivatives (apd's) are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
See abs_stdapd
.
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(20:50) abs_stdapdC(x,y,ctrl=z) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(20:50) abs_stdapdC(x,y,ctrl=z) ## End(Not run)
1) Standardize the data to force mean zero and variance unity, 2) kernel regress x on y, with the option ‘residuals = TRUE’ and finally 3) compute the absolute values of residuals.
abs_stdres(x, y)
abs_stdres(x, y)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
The first argument is assumed to be the dependent variable. If
abs_stdres(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdres(x,y) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdres(x,y) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y and a matrix of control variables, with the option ‘residuals = TRUE’ and finally 3) compute the absolute values of residuals.
abs_stdresC(x, y, ctrl)
abs_stdresC(x, y, ctrl)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
ctrl |
Data matrix on the control variable(s) beyond causal path issues |
The first argument is assumed to be the dependent variable. If
abs_stdres(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with two or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See abs_stdres
.
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) abs_stdresC(x,y,ctrl=z) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) abs_stdresC(x,y,ctrl=z) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y and a matrix of control variables, with the option ‘residuals = TRUE’ and finally 3) compute the absolute values of residuals.
abs_stdrhserC(x, y, ctrl, ycolumn = 1)
abs_stdrhserC(x, y, ctrl, ycolumn = 1)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
ctrl |
Data matrix on the control variable(s) beyond causal path issues |
ycolumn |
if y has more than one column, the column number used when multiplying residuals times this column of y, default=1 or first column of y matrix is used |
The first argument is assumed to be the dependent variable. If
abs_stdrhserC(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See abs_stdres
.
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) abs_stdrhserC(x,y,ctrl=z) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) abs_stdrhserC(x,y,ctrl=z) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y, with the option ‘gradients = TRUE’ and finally 3) compute the absolute values of Hausman-Wu null hypothesis for testing exogeneity, or E(RHS.regressor*error)=0 where error is approximated by kernel regression residuals
abs_stdrhserr(x, y)
abs_stdrhserr(x, y)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
The first argument is assumed to be the dependent variable. If
abs_stdrhserr(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression RHS*residuals are returned after standardizing the data on both sides so that the magnitudes of Hausman-Wu null values are comparable between regression of x on y on the one hand and flipped regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdrhserr(x,y) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdrhserr(x,y) ## End(Not run)
1) Standardize the data to force mean zero and variance unity, 2) kernel regress x on y, with the option ‘residuals = TRUE’ and finally 3) compute the absolute values of residuals.
absBstdres(x, y, blksiz = 10)
absBstdres(x, y, blksiz = 10)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
The first argument is assumed to be the dependent variable. If
abs_stdres(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdres(x,y) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) abs_stdres(x,y) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y and a matrix of control variables, with the option ‘residuals = TRUE’ and finally 3) compute the absolute values of residuals.
absBstdresC(x, y, ctrl, blksiz = 10)
absBstdresC(x, y, ctrl, blksiz = 10)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
ctrl |
Data matrix on the control variable(s) beyond causal path issues |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
The first argument is assumed to be the dependent variable. If
abs_stdres(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with two or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See abs_stdres
.
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) absBstdresC(x,y,ctrl=z) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) absBstdresC(x,y,ctrl=z) ## End(Not run)
1) standardize the data to force mean zero and variance unity, 2) kernel regress x on y and a matrix of control variables, with the option ‘residuals = TRUE’ and finally 3) compute the absolute values of residuals.
absBstdrhserC(x, y, ctrl, ycolumn = 1, blksiz = 10)
absBstdrhserC(x, y, ctrl, ycolumn = 1, blksiz = 10)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
ctrl |
Data matrix on the control variable(s) beyond causal path issues |
ycolumn |
if y has more than one column, the column number used when multiplying residuals times this column of y, default=1 or first column of y matrix is used |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
The first argument is assumed to be the dependent variable. If
absBstdrhserC(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
Absolute values of kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand and regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See abs_stdres
.
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) absBstdrhserC(x,y,ctrl=z) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) z=sample(21:51) absBstdrhserC(x,y,ctrl=z) ## End(Not run)
This studies all possible (perhaps too many) causal directions in a matrix.
It is deprecated because it uses older criterion 1 by caling abs_stdapd
I recommend using causeSummary
or its block version cuseSummBlk
.
This uses abs_stdres
, comp_portfo2
, etc. and returns
a matrix with 7 columns having detailed output. Criterion 1 has been revised
as described in Vinod (2019) and is known to work better.
allPairs(mtx, dig = 6, verbo = FALSE, typ = 1, rnam = FALSE)
allPairs(mtx, dig = 6, verbo = FALSE, typ = 1, rnam = FALSE)
mtx |
Input matrix with variable names |
dig |
Digits of accuracy in reporting (=6, default) |
verbo |
Logical variable, set to 'TRUE' if printing is desired |
typ |
Causal direction criterion number (typ=1 is default) Criterion 1 (Cr1) compares kernel regression absolute values of gradients. Criterion 2 (Cr2) compares kernel regression absolute values of residuals. Criterion 3 (Cr3) compares kernel regression based r*(x|y) with r*(y|x). |
rnam |
Logical variable, default |
A 7-column matrix called 'outcause' with names of variables X and Y in the first two columns and the name of the 'causal' variable in 3rd col. Remaining four columns report numerical computations of SD1 to SD4, r*(x|y), r*(y|x). Pearson r and p-values for its traditional significance testing.
The cause reported in the third column
is identified from the sign of the first SD1 only,
ignoring SD2, SD3 and SD4 under both Cr1 and Cr2. It is
a good idea to loop a call to this function with typ=1:3. One can print
the resulting 'outcause' matrix with the
xtable(outcause)
for the Latex output.
A similar deprecated function included in this package,
called some0Pairs
, incorporates all SD1 to SD4 and all
three criteria Cr1 rto Cr3 to report a ‘sum’ of indexes representing the signed
number whose sign can more comprehensively help determine the causal direction(s).
Since the Cr1 here is revised in later work, this is deprecated.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
See Also somePairs
, some0Pairs
causeSummary
data(mtcars) options(np.messages=FALSE) for(j in 1:3){ a1=allPairs(mtcars[,1:3], typ=j) print(a1)}
data(mtcars) options(np.messages=FALSE) for(j in 1:3){ a1=allPairs(mtcars[,1:3], typ=j) print(a1)}
intended for internal use
data(badCol)
data(badCol)
The format is: int 4
See page 220 of Vinod (2008) “Hands-on Intermediate Econometrics Using R,”
for the trapezoidal integration formula
needed for stochastic dominance. The book explains pre-multiplication by two
large sparse matrices denoted by . Here we accomplish the
same computation without actually creating the large sparse matrices. For example, the
is replaced by
cumsum
in this code (unlike the R code in
my textbook).
bigfp(d, p)
bigfp(d, p)
d |
A vector of consecutive interval lengths, upon combining both data vectors |
p |
Vector of probabilities of the type 1/2T, 2/2T, 3/2T, etc. to 1. |
Returns a result after pre-multiplication by
matrices, without actually creating the large sparse matrices. This is an internal function.
This is an internal function, called by the function stochdom2
, for
comparison of two portfolios in terms of stochastic dominance (SD) of orders
1 to 4.
Typical usage is:
sd1b=bigfp(d=dj, p=rhs)
sd2b=bigfp(d=dj, p=sd1b)
sd3b=bigfp(d=dj, p=sd2b)
sd4b=bigfp(d=dj, p=sd3b)
.
This produces numerical evaluation vectors for the four orders, SD1 to SD4.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.', 'Hands-On Intermediate Econometrics Using R' (2008) World Scientific Publishers: Hackensack, NJ. https://www.worldscientific.com/worldscibooks/10.1142/12831
This calls the meboot package to create J=999 replications of portfolio return matrices and compute 95% confidence intervals on x1, x2 and their difference (x2-x1). If the interval on (x2-x1) conta.ins zero the choice between the two can reverse due to sampling variation
bootDom12(x1, x2, confLevel = 95, reps = 999)
bootDom12(x1, x2, confLevel = 95, reps = 999)
x1 |
a vector of n portfolio returns |
x2 |
a vector of n portfolio returns |
confLevel |
confidene level confLevel=95 is default |
reps |
number of bootstrap resamples, default is reps=999 |
A matrix with six columns. First two Low1 and Upp1 are confidence interval limits for x1. Next two columns have analogous limits for x2. The last but first columns entitled Lowx2mx1 means lower confidence limit for (x2-x1), where m=minus. The last column entitled Uppx2mx1 means upper confidence limit for (x2-x1).
For strong stochastic dominance of x2 over x1 dominance beyond sampling variability, zero should not be inside the confidence interval in the last two columns.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
see exactSdMtx
Maximum entropy bootstrap (meboot) package is used for statistical inference The bootstrap output can be analyzed to estimate an approximate confidence interval on sample-based direction of the causal path. The LC in the function name stands for local constant. Kernel regression np package options regtype="lc" for local constant, and bwmethod="cv.ls" for least squares-based bandwidth selection are fixed.
bootGcLC(x1, x2, px2 = 4, px1 = 4, pwanted = 4, ctrl = 0, n999 = 9)
bootGcLC(x1, x2, px2 = 4, px1 = 4, pwanted = 4, ctrl = 0, n999 = 9)
x1 |
The data vector x1 |
x2 |
The data vector x2 |
px2 |
number of lags of x2 in the data, default px2=4 |
px1 |
number of lags of x1 in the data default px1=4 |
pwanted |
number of lags of both x2 and x1 wanted for Granger causal analysis, default =4 |
ctrl |
data matrix having control variable(s) if any |
n999 |
Number of bootstrap replications (default=9) |
out is n999 X 3 matrix for 3 outputs of GcauseX12 resampled
This computation is computer intensive and generally very slow. It may be better to use this function it at a later stage in the investigation, after a preliminary causal determination is already made. The 3 outputs of GauseX12 are two Rsquares and the difference between after subtracting the second from the first. Col. 1 has (RsqX1onX2) Col.2 has (RsqX2onX1), and Col.3 has dif=(RsqX1onX2 -RsqX2onX1) Note that R-squares are always positive. If dif>0, RsqX1onX2>RsqX2onX1, implying that x2 on RHS performs better that is, x2 –> x1 is the path, or x2 Granger-causes x1. If dif<0, x1 –> x2 holds. If dif is too close to zero, we may have bidirectional causality x1 <–> x2. The proportion of resamples (out of n999) having dif<0 suggests level of confidence in the conclusion x1 –> x2. The proportion of resamples (out of n999) having dif>0 suggests level of confidence in the conclusion x2 –> x1.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also GcRsqX12c
.
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) bootGcLC(y,m,n999=9) ## End(Not run) ## Not run: library(lmtest); data(ChickEgg);attach(data.frame(ChickEgg)) b2=bootGcLC(x1=chicken,x2=egg,pwanted=3,px1=3,px2=3,n999=99) ## End(Not run)
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) bootGcLC(y,m,n999=9) ## End(Not run) ## Not run: library(lmtest); data(ChickEgg);attach(data.frame(ChickEgg)) b2=bootGcLC(x1=chicken,x2=egg,pwanted=3,px1=3,px2=3,n999=99) ## End(Not run)
Maximum entropy bootstrap (meboot) package is used for statistical inference The bootstrap output can be analyzed to estimate an approximate confidence interval on sample-based direction of the causal path. Kernel regression np package options regtype="ll" for local linear, and bwmethod="cv.aic" for AIC-based bandwidth selection are fixed.
bootGcRsq(x1, x2, px2 = 4, px1 = 4, pwanted = 4, ctrl = 0, n999 = 9)
bootGcRsq(x1, x2, px2 = 4, px1 = 4, pwanted = 4, ctrl = 0, n999 = 9)
x1 |
The data vector x1 |
x2 |
The data vector x2 |
px2 |
number of lags of x2 in the data, default px2=4 |
px1 |
number of lags of x1 in the data default px1=4 |
pwanted |
number of lags of both x2 and x1 wanted for Granger causal analysis, default =4 |
ctrl |
data matrix having control variable(s) if any |
n999 |
Number of bootstrap replications (default=9) |
out is n999 X 3 matrix for 3 outputs of GcauseX12 resampled
This computation is computer intensive and generally very slow. It may be better to use this function it at a later stage in the investigation, after a preliminary causal determination is already made. The 3 outputs of GauseX12 are two Rsquares and the difference between them after subtracting the second from the first. Col. 1 has (RsqX1onX2), Col.2 has (RsqX2onX1), and Col.3 has dif=(RsqX1onX2 -RsqX2onX1) Note that R-squares are always positive. If dif>0, RsqX1onX2>RsqX2onX1, implying that x2 on RHS performs better that is, x2 –> x1 is the causal path. If dif<0, x1 –> x2 holds. If dif is too close to zero, we may have bidirectional causality x1 <–> x2. The proportion of resamples (out of n999) having dif<0 suggests level of confidence in the conclusion x1 –> x2. The proportion of resamples (out of n999) having dif>0 suggests level of confidence in the conclusion x2 –> x1.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also GcRsqX12
.
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) bootGcRsq(y,m,n999=9) ## End(Not run) ## Not run: library(lmtest); data(ChickEgg);attach(data.frame(ChickEgg)) options(np.messages=FALSE) b2=bootGcLC(x1=chicken,x2=egg,pwanted=3,px1=3,px2=3,n999=99) Fn=function(x)quantile(x,prob=c(0.025, 0.975))#confInt apply(b1,2,Fn)#reports 95 percent confidence interval ## End(Not run)
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) bootGcRsq(y,m,n999=9) ## End(Not run) ## Not run: library(lmtest); data(ChickEgg);attach(data.frame(ChickEgg)) options(np.messages=FALSE) b2=bootGcLC(x1=chicken,x2=egg,pwanted=3,px1=3,px2=3,n999=99) Fn=function(x)quantile(x,prob=c(0.025, 0.975))#confInt apply(b1,2,Fn)#reports 95 percent confidence interval ## End(Not run)
The ‘2’ in the name of the function suggests a second implementation of ‘bootPair,’ where exact stochastic dominance, decileVote, and momentVote are used. Maximum entropy bootstrap (meboot) package is used for statistical inference using the sum of three signs sg1 to sg3, from the three criteria Cr1 to Cr3, to assess preponderance of evidence in favor of a sign, (+1, 0, -1). The bootstrap output can be analyzed to assess the approximate preponderance of a particular sign which determines the causal direction.
bootPair2(mtx, ctrl = 0, n999 = 9)
bootPair2(mtx, ctrl = 0, n999 = 9)
mtx |
data matrix with two or more columns |
ctrl |
data matrix having control variable(s) if any |
n999 |
Number of bootstrap replications (default=9) |
Function creates a matrix called ‘out’. If
the input to the function called mtx
has p columns, the output out
of bootPair2(mtx)
is a matrix of n999 rows and p-1 columns,
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the function silentPair2(mtx)
applied to each bootstrap sample separately.
This computation is computer-intensive and generally very slow.
It may be better to use
it later in the investigation, after a preliminary
causal determination
is already made.
A positive sign for j-th weighted sum reported in the column ‘sum’ means
that the first variable listed in the argument matrix mtx
is the
‘kernel cause’ of the variable in the (j+1)-th column of mtx
.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Vinod, Hrishikesh D., R Package GeneralCorr Functions for Portfolio Choice (November 11, 2021). Available at SSRN: https://ssrn.com/abstract=3961683
Vinod, Hrishikesh D., Stochastic Dominance Without Tears (January 26, 2021). Available at SSRN: https://ssrn.com/abstract=3773309
See Also silentPair2
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPair2(cbind(x,y),n999=29) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations bb=bootPair2(airquality,n999=999);options(np.messages=FALSE) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bootPair2(cbind(crim,off),n999=29)#First col. crim causes officer deployment, #hence positives signs are most sensible for such call to bootPairs #note that n999=29 is too small for real problems, chosen for quickness here. ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPair2(cbind(x,y),n999=29) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations bb=bootPair2(airquality,n999=999);options(np.messages=FALSE) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bootPair2(cbind(crim,off),n999=29)#First col. crim causes officer deployment, #hence positives signs are most sensible for such call to bootPairs #note that n999=29 is too small for real problems, chosen for quickness here. ## End(Not run)
Maximum entropy bootstrap (meboot) package is used for statistical inference using the sum of three signs sg1 to sg3 from the three criteria Cr1 to Cr3 to assess preponderance of evidence in favor of a sign. (+1, 0, -1). The bootstrap output can be analyzed to assess approximate preponderance of a particular sign which determines the causal direction.
bootPairs(mtx, ctrl = 0, n999 = 9)
bootPairs(mtx, ctrl = 0, n999 = 9)
mtx |
data matrix with two or more columns |
ctrl |
data matrix having control variable(s) if any |
n999 |
Number of bootstrap replications (default=9) |
out When mtx
has p columns, out
of bootPairs(mtx)
is a matrix of n999 rows and p-1 columns
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the function silentPairs(mtx)
applied to each bootstrap sample separately.
This computation is computer intensive and generally very slow.
It may be better to use
it at a later stage in the investigation when a preliminary
causal determination
is already made.
A positive sign for j-th weighted sum reported in the column ‘sum’ means
that the first variable listed in the argument matrix mtx
is the
‘kernel cause’ of the variable in the (j+1)-th column of mtx
.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bootPairs(cbind(crim,off),n999=29)#First col. crim causes officer deployment, #hence positives signs are most sensible for such call to bootPairs #note that n999=29 is too small for real problems, chosen for quickness here. ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bootPairs(cbind(crim,off),n999=29)#First col. crim causes officer deployment, #hence positives signs are most sensible for such call to bootPairs #note that n999=29 is too small for real problems, chosen for quickness here. ## End(Not run)
Maximum entropy bootstrap (meboot) package is used for statistical inference using the sum of three signs sg1 to sg3 from the three criteria Cr1 to Cr3 to assess preponderance of evidence in favor of a sign. (+1, 0, -1). The bootstrap output can be analyzed to assess approximate preponderance of a particular sign which determines the causal direction.
bootPairs0(mtx, ctrl = 0, n999 = 9)
bootPairs0(mtx, ctrl = 0, n999 = 9)
mtx |
data matrix with two or more columns |
ctrl |
data matrix having control variable(s) if any |
n999 |
Number of bootstrap replications (default=9) |
out When mtx
has p columns, out
of bootPairs(mtx)
is a matrix of n999 rows and p-1 columns
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the function silentPairs(mtx)
applied to each bootstrap sample separately.
This computation is computer intensive and generally very slow.
It may be better to use
it at a later stage in the investigation when a preliminary
causal determination
is already made.
A positive sign for j-th weighted sum reported in the column ‘sum’ means
that the first variable listed in the argument matrix mtx
is the
‘kernel cause’ of the variable in the (j+1)-th column of mtx
.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs0
, bootPairs
has the version with later version of Cr1.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs0(cbind(x,y),n999=29) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations bb=bootPairs0(airquality,n999=999);options(np.messages=FALSE) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bootPairs0(cbind(crim,off),n999=29)#First col. crim causes officer deployment, #hence positives signs are most sensible for such call to bootPairs #note that n999=29 is too small for real problems, chosen for quickness here. ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs0(cbind(x,y),n999=29) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations bb=bootPairs0(airquality,n999=999);options(np.messages=FALSE) apply(bb,2,summary) #gives summary stats for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bootPairs0(cbind(crim,off),n999=29)#First col. crim causes officer deployment, #hence positives signs are most sensible for such call to bootPairs #note that n999=29 is too small for real problems, chosen for quickness here. ## End(Not run)
Begin with the output of bootPairs function, a (n999 by p-1) matrix when
there are p columns of data, bootQuantile
produces a (k by p-1) mtx
of quantile(s) of bootstrap ouput assuming that there are k quantiles needed.
bootQuantile(out, probs = c(0.025, 0.975), per100 = TRUE)
bootQuantile(out, probs = c(0.025, 0.975), per100 = TRUE)
out |
output from bootPairs with p-1 columns and n999 rows |
probs |
quantile evaluation probabilities. The default is k=2, probs=c(.025,0.975) for a 95 percent confidence interval. Note that there are k=2 quantiles desired for each column with this specification |
per100 |
logical (default per100=TRUE) to change the range of 'sum' to [-100, 100] values which are easier to interpret |
CI k quantiles evaluated at probs as a matrix with k rows
and quantile of pairwise p-1 indexes representing p-1 column pairs
(fixing the first column in each pair)
This function summarizes the
output of of bootPairs(mtx)
(a n999 by p-1 matrix)
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the
function silentPairs(mtx)
applied to each bootstrap sample separately. #'
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootQuantile(bb) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootQuantile(bb,tau=0.476)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootQuantile(bb)# quantile matrix for n999 bootstrap sum computations ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootQuantile(bb) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootQuantile(bb,tau=0.476)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootQuantile(bb)# quantile matrix for n999 bootstrap sum computations ## End(Not run)
If there are p columns of data, bootSign
produces a p-1 by 1 vector
of probabilities of correct signs assuming that the mean of n999 values
has the correct sign and assuming that m of the 'sum' index values inside the
range [-tau, tau] are neither positive nor negative but
indeterminate or ambiguous (being too close to zero). That is,
the denominator of P(+1) or P(-1) is (n999-m) if m signs are too close to zero.
Thus it measures the bootstrap success rate in identifying the correct sign, when the sign
of the average of n999 bootstraps is assumed to be correct.
bootSign(out, tau = 0.476)
bootSign(out, tau = 0.476)
out |
output from bootPairs with p-1 columns and n999 rows |
tau |
threshold to determine what value is too close to zero, default tau=0.476 is equivalent to 15 percent threshold for the unanimity index ui |
sgn When mtx
has p columns, sgn
reports pairwise p-1 signs representing
(fixing the first column in each pair)
the average sign after averaging the
output of of bootPairs(mtx)
(a n999 by p-1 matrix)
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the
function silentPairs(mtx)
applied to each bootstrap sample separately. #'
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
, bootQuantile
,
bootSignPcent
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootSign(bb,tau=0.476) #gives success rate in n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootSign(bb,tau=0.476)#signs for n999 bootstrap sum computations data('EuroCrime');options(np.messages=FALSE) attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSign(bb,tau=0.476)#gives success rate in n999 bootstrap sum computations ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootSign(bb,tau=0.476) #gives success rate in n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootSign(bb,tau=0.476)#signs for n999 bootstrap sum computations data('EuroCrime');options(np.messages=FALSE) attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSign(bb,tau=0.476)#gives success rate in n999 bootstrap sum computations ## End(Not run)
If there are p columns of data, bootSignPcent
produces a p-1 by 1 vector
of probabilities of correct signs assuming that the mean of n999 values
has the correct sign and assuming that m of the 'ui' index values inside the
range [-tau, tau] are neither positive nor negative but
indeterminate or ambiguous (being too close to zero). That is,
the denominator of P(+1) or P(-1) is (n999-m) if m signs are too close to zero.
Thus it measures the bootstrap success rate in identifying the correct sign, when the sign
of the average of n999 bootstraps is assumed to be correct.
bootSignPcent(out, tau = 5)
bootSignPcent(out, tau = 5)
out |
output from bootPairs with p-1 columns and n999 rows |
tau |
threshold to determine what value is too close to zero, default tau=5 is 5 percent threshold for the unanimity index ui |
sgn When mtx
has p columns, sgn
reports pairwise p-1 signs representing
(fixing the first column in each pair)
the average sign after averaging the
output of of bootPairs(mtx)
(a n999 by p-1 matrix)
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the
function silentPairs(mtx)
applied to each bootstrap sample separately. #'
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
, bootQuantile
,
bootSign
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootSignPcent(bb,tau=5) #gives success rate in n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootSignPcent(bb,tau=5)#success rate for signs from n999 bootstraps data('EuroCrime');options(np.messages=FALSE) attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSignPcent(bb,tau=5)#successful signs from n999 bootstraps ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootSignPcent(bb,tau=5) #gives success rate in n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootSignPcent(bb,tau=5)#success rate for signs from n999 bootstraps data('EuroCrime');options(np.messages=FALSE) attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSignPcent(bb,tau=5)#successful signs from n999 bootstraps ## End(Not run)
Begin with the output of bootPairs function, a (n999 by p-1) matrix when
there are p columns of data, bootSummary
produces a (6 by p-1) mtx
of summary of bootstrap ouput (Min, 1st Qu,Median, Mean, 3rd Qi.,Max)
bootSummary(out, per100 = TRUE)
bootSummary(out, per100 = TRUE)
out |
output from bootPairs with p-1 columns and n999 rows in input here |
per100 |
logical (default per100=TRUE) to change the range of 'sum' to [-100, 100] values which are easier to interpret |
summ summary output from the (n999 by p-1) matrix
output of bootPairs(mtx)
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the
function silentPairs(mtx)
applied to each bootstrap sample separately.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootSummary(bb) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootSummary(bb)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSummary(bb)#signs for n999 bootstrap sum computations ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) bootSummary(bb) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) bootSummary(bb)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSummary(bb)#signs for n999 bootstrap sum computations ## End(Not run)
The ‘2’ in the name of the function suggests a second implementation
where exact stochastic dominance, decileVote and momentVote are used.
Begin with the output of bootPairs function, a (n999 by p-1) matrix when
there are p columns of data, bootSummary
produces a (6 by p-1) mtx
of summary of bootstrap ouput (Min, 1st Qu,Median, Mean, 3rd Qi.,Max)
bootSummary2(out, per100 = TRUE)
bootSummary2(out, per100 = TRUE)
out |
output from bootPair2 with p-1 columns and n999 rows in input here |
per100 |
logical (default per100=TRUE) to change the range of 'sum' to [-100, 100] values which are easier to interpret |
summ a summary matrix (n999 by p-1) having usual parameters
using the output of bootPair2(mtx)
Each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the
function silentPair2(mtx)
applied to each bootstrap sample separately.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPair2(cbind(x,y),n999=29) bootSummary2(bb) #gives summary stats for n999 bootstrap sum computations bb=bootPair2(airquality,n999=999);options(np.messages=FALSE) bootSummary2(bb)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPair2(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSummary2(bb)#signs for n999 bootstrap sum computations ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPair2(cbind(x,y),n999=29) bootSummary2(bb) #gives summary stats for n999 bootstrap sum computations bb=bootPair2(airquality,n999=999);options(np.messages=FALSE) bootSummary2(bb)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPair2(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. bootSummary2(bb)#signs for n999 bootstrap sum computations ## End(Not run)
What exactly is generalized? Canonical correlations start with Rij, a
symmetric matrix of Pearson correlation coefficients based on linear
relations. This function starts with a more general non-symmetric R*ij
produced by gmcmtx0()
as an input. This is a superior measure
of dependence, allowing for nonlinear dependencies. It generalizes Hotelling's
derivation for the nonlinear case.
This function uses data on two sets of column vectors. LHS set [x1, x2 .. xr]
has r=nLHS number of columns
with coefficients alpha, and
the larger RHS set [xr+1, xr+2, .. xp] has nRHS=(p-r) columns and RHS
coefficients beta. Must arrange the sets so that the larger set
in on RHS with coefficients beta estimated first from an eigenvector
of the problem [A* beta = rho^2 beta], where A* is a partitioning of our
generalized matrix of (non-symmetric) correlation coefficients.
canonRho(mtx, nLHS = 2, sgn = 1, verbo = FALSE, ridg = c(0, 0))
canonRho(mtx, nLHS = 2, sgn = 1, verbo = FALSE, ridg = c(0, 0))
mtx |
Input matrix of generalized correlation coefficients R* |
nLHS |
number of columns in the LHS set, default=2 |
sgn |
preferred sign of coefficients default=1 for positive, use sgn= -1 if prior knowledge suggests that negative signs of coefficients are more realistic |
verbo |
logical, verbo=FALSE default means do not print results |
ridg |
two regularization constants added before computing matrix inverses of S11 and S22, respectively, with default=c(0,0). Some suggest ridg=c(0.01,0.01) for stable results |
A |
eigenvalue computing matrix for Generalized canonical correlations |
rho |
Generalized canonical correlation coefficient |
bet |
RHS coefficient vector |
alp |
LHS coefficient vector |
This function calls kern
,
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in 'Handbook of Statistics: Computational Statistics with R', Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'Canonical ridge and econometrics of joint production,' Journal of Econometrics, vol. 4, 147–166.
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
See gmcmtx0
.
## Not run: set.seed(99) mtx2=matrix(sample(1:25),nrow=5) g1=gmcmtx0(mtx2) canonRho(g1,verbo=TRUE) ## End(Not run)#'
## Not run: set.seed(99) mtx2=matrix(sample(1:25),nrow=5) g1=gmcmtx0(mtx2) canonRho(g1,verbo=TRUE) ## End(Not run)#'
Allowing input matrix of control variables, this function produces
a 5 column matrix
summarizing the results where the estimated signs of
stochastic dominance order values, (+1, 0, -1), are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by
a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, -1).
The final range for the unanimity of sign index is [–100, 100].
causeAllPair( mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
causeAllPair( mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns, We consider causal paths among all possible pairs of mtx columns. |
nam |
vector of column names for |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 stochastic dominance orders is simply their
slightly increasing sampling
unreliability due to higher order trapezoidal approximations of
integrals of densities involved in definitions of SD1 to SD4.
The summary results for all
three criteria are reported in one matrix called out
:
If there are p columns in the input matrix, x1, x2, .., xp, say,
there are choose(p,2) or [p*(p-1)/2] possible pairs and as many causal paths.
This function returns
a matrix of p*(p-1)/2 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
has absolute value of summary index in range [0,100]
providing summary of causal results
based on preponderance of evidence from criteria Cr1 to Cr3
from four orders of stochastic dominance, etc.
The fourth column ‘corr.’ reports the Pearson correlation coefficient while
the fifth column has the p-value for testing the null of zero Pearson coeff.
This function merely calls causeSumNoP
repeatedly to include all pairs.
The background function siPairsBlk
allows for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 near unanimously suggest ‘crim’ as the cause of ‘off’,
strength index 100 suggests unanimity.
attach(EuroCrime); causeSummary(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See bootPairs
, causeSummBlk
See someCPairs
## Not run: mtx=data.frame(mtcars[,1:3]) #make sure columns of mtx have names ctrl=data.frame(mtcars[,4:5]) causeAllPair(mtx=mtx,ctrl=ctrl) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeAllPair(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=data.frame(mtcars[,1:3]) #make sure columns of mtx have names ctrl=data.frame(mtcars[,4:5]) causeAllPair(mtx=mtx,ctrl=ctrl) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeAllPair(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
The ‘2’ in the name of the function suggests a second implementation
where exact stochastic dominance, ‘decileVote’ and ‘momentVote’ functions are used,
Block version allows a new bandwidth (chosen by the np package)
while fitting kernel regressions for each block of data. This may
not be appropriate in all situations. Block size is flexible.
The function develops a unanimity index regarding which regression
flip, (y on xi) or (xi on y) is the best. The “cause” is
always on the right-hand side of a regression equation, and
the superior flip gives the correct sign. The summary of all signs determines the
causal direction and unanimity index among three criteria. This is
a block version of causeSummary2()
.
While allowing the researcher to keep some variables as controls,
or outside the scope of causal path determination
(e.g., age or latitude) this function produces detailed causal path information
in a 5 column matrix identifying the names of variables,
causal path directions, path strengths re-scaled to be in the
range [–100, 100], (table reports absolute values of the strength)
plus Pearson correlation and its p-value.
The algorithm determines causal path directions from the sign of the strength index and strength index values by comparing three aspects of flipped kernel regressions: [x1 on (x2, x3, .. xp)] and its flipped version [x2 on (x1, x3, .. xp)] We compare (i) formal exogeneity test criterion, (ii) absolute residuals, and (iii) R-squares of the flipped regressions implying three criteria Cr1, to Cr3. The criteria are quantified by new methods using four orders of stochastic dominance, SD1 to SD4. See Vinod (2021) two SSRN papers.
causeSum2Blk(mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6)
causeSum2Blk(mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6)
mtx |
The data matrix with many columns, y the first column is a fixed target, and then it is paired with all other columns, one by one, and still called x for flipping. |
nam |
vector of column names for |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in the matrix then blksiz=n. That is, no blocking is done |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
The number of digits for reporting (default |
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal-direction-pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which other variable (also by name).
It also prints the strength or signed summary strength index in
the range [-100,100].
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas negative strength index means x(1+j) kernel causes x1. The function
also prints Pearson correlation and its p-value. This function also returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
has an absolute value of the summary index in the range [0,100],
providing a summary of causal results
based on the preponderance of evidence from Cr1 to Cr3 from deciles, moments,
from four orders of stochastic dominance.
The order of input columns in "mtx" matters.
The fourth column, ‘corr.’, reports the Pearson correlation coefficient, while
the fifth column has the p-value for testing the null of zero Pearson coefficient.
This function calls siPairsBlk
, allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
If Cr1 to Cr3 near-unanimously suggest ‘crim’ as the cause of ‘off’,
strength index would be near 100 suggesting unanimity.
attach(EuroCrime); causeSum2Blk(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Vinod, Hrishikesh D., R Package GeneralCorr Functions for Portfolio Choice (November 11, 2021). Available at SSRN: https://ssrn.com/abstract=3961683
Vinod, Hrishikesh D., Stochastic Dominance Without Tears (January 26, 2021). Available at SSRN: https://ssrn.com/abstract=3773309
See bootPairs
, causeSummary
has
an older version of this function.
See someCPairs
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSum2Blk(mtx,ctrl,nam=colnames(mtx)) ## End(Not run)
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSum2Blk(mtx,ctrl,nam=colnames(mtx)) ## End(Not run)
The algorithm of this function uses an internal function fminmax=function(x)min(x)==max(x). The subsets mtx2 of the original data da for a specific time or space can become degenerate if the columns of mtx2 have no variability. The apply function of R is applied to the columns of mtx2 as follows. "ap1=apply(mtx2,2,fminmax)." Now, "sumap1=sum(ap1)" counts how many columns of the data matrix are degenerate. We have a degeneracy problem only if sumap1 is >1 or =1. For example, the panel consists of data on 50 United States and 20 years. Now, consumer price index (cpi) data may be common for all states. That is, the min(cpi) equals max(cpi) for all states. Then, the variance of cpi is zero, and we have degeneracy. When this happens, the regressor cpi should not be involved in determining causal paths. We identify degeneracy using "fminmax=function(x)min(x)==max(x)"
causeSum2Panel( da, fn = causeSummary2NoP, rowfnout, colfnout, fnoutNames, namXs, namXt, namXy, namXc = 0, namXjmtx, chosenTimes = NULL, chosenSpaces = NULL, ylag = 0, verbo = FALSE )
causeSum2Panel( da, fn = causeSummary2NoP, rowfnout, colfnout, fnoutNames, namXs, namXt, namXy, namXc = 0, namXjmtx, chosenTimes = NULL, chosenSpaces = NULL, ylag = 0, verbo = FALSE )
da |
panel dat having a named column for space and time |
fn |
an R function causeSummary2NoP(mtx) |
rowfnout |
the number of rows output by fn |
colfnout |
the number of columns output by fn |
fnoutNames |
the column names of output by fn, for example, fnoutNames=c("cause","effect","strength","r","p-val") |
namXs |
title of the column in da having the space variable |
namXt |
title of the column in da having the time variable |
namXy |
title of the column in da having the dependent y variable |
namXc |
title(s) of the column(s) in da having control variable(s), default=0 means none specified |
namXjmtx |
title(s) of the column(s) in da having regressor(s) |
chosenTimes |
subset of values of time variable chosen for quick results, There are NchosenTimes values chosen in the subset. default=NULL means all time identifiers in the data are included. |
chosenSpaces |
subset of values of space variable chosen for quick results, There are NchosenSpaces values chosen in the subset. default=NULL means all space identifiers are included. The degrees of freedom for Studentized statistic for Granger causality tests are df=(NchosenSpaces -1). |
ylag |
time lag in Granger causality study of time dimension the default ylag=0 is not really zero. It means ylag= min(4, round(NchosenTimes/5,0)), where NchosenTimes is the length of chosenTimes vector |
verbo |
print detail results along the way, default=FALSE |
We assume that panel data have space (space=individual region) and time (e.g., year) dimensions. We use upper case X to denote a common prefix in the panel data. Xs =name of the space variable, e.g., state or individual. The range of values for s is 1 to nspace. Xt =name of the time variable, e.g., year. The range of values for t is 1 to ntime. Xy =the dependent variable(s) value at time t in state s. Since panel data causal analysis can take a long computer time, we allow the user to choose subsets of time and space values called chosenTimes and chosenSpaces, respectively. Various input parameters starting with "nam" specify the names of variables in the panel study.
The algorithm calls some function fn(mtx) where mtx is the data matrix, and fn is causeSummary2NoP(mtx). The causal paths between (y, xj) pairs of variables in mtx are computed following 3 sophisticated criteria involving exact stochastic dominance. Type "?causeSummary2" on the R console to get details (omitted here for brevity). Panel data consist of a time series of cross-sections and are also called longitudinal data. We provide estimates of causal path directions and strengths for both the time-series and cross-sectional views of panel data. Since our regressions are kernel type with no functional forms, fixed effects for time and space are being suppressed when computing the causality.
The causeSum2Panel(.) produces many output matrices and vectors. The first "outt" gives a 3-dimensional array of panel causal path output focused on time series for each space value using fixed space value. It reports causal path directions, and strengths for (y, xj) pairs. The second output array, called "outs", gives similar 3D panel causal path output focused on space cross sections using fixed time value. The third output matrix called "outdif" gives causal paths using Granger causality for each pair (y, xj). They are not causal strengths but differences between Rsquare values of two flipped kernel regressions. The summary of Granger causality answer is an output matrix called grangerAns (first row average of differences in R-squares and second row has its test statistic with degrees of freedom n-1), and grangerStat for related t-statistic for formal inference. based on column means and variances of "outdif". This function also produces a matrix summarizing "outt" and "outs" into two-dimensional matrices reporting averages of signed strengths as "strentime" and "strenspace", Also, "pearsontime" reports the Pearson correlation coefficients for various time values and their average in the last column. It determines the overall direction of the causal relation between y and xj. For example, a negative average correlation means y and xj are negatively correlated (xj goes up, y goes down). Similarly, "pearsonspace" summarizes "outs" correlations.
The function prints to the screen some summaries of the three output matrices. It reports how often a variable is a cause in various pairs as time series or as cross sections. It also reports the average strengths of causal paths for "outt" and "outs" matrices. We compute the difference between two R-square values to find which causal direction is more plausible. This involves kernel regressions of y on its own lags and lags of a regressor. Unlike the usual Granger causality we estimate better-fitting nonlinear kernel regressions. If the averages in "outdif" matrix are negative, the Granger causal paths go from y to xj. This may be unexpected when the model assumes that y depends on x1 to xp, that is, the causal paths go from xj to y. In studying the causal pairs, the function creates mixtures of names y and xj. Character vectors containing the mixed names are are column names or row names depending on the context. For example, the output matrix grangerAns column names help identify the relevant regressor name. The first row of the grangerAns matrix has column averages of outdiff matrix to help get an overall estimate of the Granger-causal paths. The second row of the grangerAns has the Studentized test statistic for formal testing of the significance of Granger causal paths. Collecting the results for the time dimension strengths with suitable sign (negative strength means cause reversal xj->y) is output named strentime. The corresponding Pearson correlations as an output is named pearsontime. Collecting the results for the space dimension strengths with suitable sign (negative strength means cause reversal xj->y) is output named strenspace. The corresponding Pearson correlations are named pearsonspace. A grand summary of average strengths and correlations is output matrix named grandsum. It is intended to provide an overall picture of causal paths in Panel data. These paths should not be confused with Granger causal paths which always involve time lags and causes are presumed to precede effects in time.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Vinod, Hrishikesh D., R Package GeneralCorr Functions for Portfolio Choice (November 11, 2021). Available at SSRN: https://ssrn.com/abstract=3961683
Vinod, Hrishikesh D., Stochastic Dominance Without Tears (January 26, 2021). Available at SSRN: https://ssrn.com/abstract=3773309
See causeSummary2
See causeSummary
is subject to trapezoidal approximation.
## Not run: library(plm);data(Grunfeld) options(np.messages=FALSE) namXs="firm" print("initial values identifying the space variable") head(da[,namXs],3) print(str(da[,namXs])) chosenSpaces=(3:10) if(is.numeric(da[,namXs])){ chosenSpaces=as.numeric(chosenSpaces)} if(!is.numeric(da[,namXs])){ chosenSpaces=as.character(chosenSpaces)} namXt="year" print("initial values identifying the time variable") head(da[,namXt],3) print(str(da[,namXt])) chosenTimes=1940:1949 if(is.numeric(da[,namXt])){ chosenTimes=as.numeric(chosenTimes)} if(!is.numeric(da[,namXt])){ chosenTimes=as.character(chosenTimes)} namXy="inv" namXc=0 namXjmtx=c("value","capital") p=length(namXjmtx) fn=causeSummary2NoP fnout=matrix(NA,nrow=p,ncol=5) fnoutNames=c("cause","effect","strength","r","p-val") causeSum2Panel(da, fn=causeSummary2NoP, rowfnout=p, colfnout=5, fnoutNames=c("cause","effect","strength","r","p-val"), namXs=namXs, namXt=namXt, namXy=namXy, namXc=namXc, namXjmtx=namXjmtx, chosenTimes=chosenTimes, chosenSpaces=chosenSpaces, verbo=FALSE) ## End(Not run)
## Not run: library(plm);data(Grunfeld) options(np.messages=FALSE) namXs="firm" print("initial values identifying the space variable") head(da[,namXs],3) print(str(da[,namXs])) chosenSpaces=(3:10) if(is.numeric(da[,namXs])){ chosenSpaces=as.numeric(chosenSpaces)} if(!is.numeric(da[,namXs])){ chosenSpaces=as.character(chosenSpaces)} namXt="year" print("initial values identifying the time variable") head(da[,namXt],3) print(str(da[,namXt])) chosenTimes=1940:1949 if(is.numeric(da[,namXt])){ chosenTimes=as.numeric(chosenTimes)} if(!is.numeric(da[,namXt])){ chosenTimes=as.character(chosenTimes)} namXy="inv" namXc=0 namXjmtx=c("value","capital") p=length(namXjmtx) fn=causeSummary2NoP fnout=matrix(NA,nrow=p,ncol=5) fnoutNames=c("cause","effect","strength","r","p-val") causeSum2Panel(da, fn=causeSummary2NoP, rowfnout=p, colfnout=5, fnoutNames=c("cause","effect","strength","r","p-val"), namXs=namXs, namXt=namXt, namXy=namXy, namXc=namXc, namXjmtx=namXjmtx, chosenTimes=chosenTimes, chosenSpaces=chosenSpaces, verbo=FALSE) ## End(Not run)
While allowing the researcher to keep some variables as controls, or outside the scope of causal path determination (e.g., age or latitude) this function produces detailed causal path information in a 5 column matrix identifying the names of variables, causal path directions, path strengths re-scaled to be in the range [–100, 100], (table reports absolute values of the strength) plus Pearson correlation and its p-value.
causeSummary( mtx, nam = colnames(mtx), ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
causeSummary( mtx, nam = colnames(mtx), ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns, y the first column is fixed and then paired with all columns, one by one, and still called x for the purpose of flipping. |
nam |
vector of column names for |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The algorithm determines causal path directions from the sign of the strength index and strength index values by comparing three aspects of flipped kernel regressions: [x1 on (x2, x3, .. xp)] and its flipped version [x2 on (x1, x3, .. xp)] We compare (i) formal exogeneity test criterion, (ii) absolute residuals, and (iii) R-squares of the flipped regressions implying three criteria Cr1, to Cr3. The criteria are quantified by sophisticated methods using four orders of stochastic dominance, SD1 to SD4. We assume slightly declining weights on causal path signs because known reliability ranking. SD1 is better than SD2, better than SD3, better than SD4. The user can optionally change our weights.
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal direction pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which another variable (also by name).
It also prints a signed summary strength index in the range [-100,100].
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas negative strength index means x(1+j) kernel causes x1. The function
also prints Pearson correlation and its p-value. In short, function returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
reports the absolute value of summary index, now in the range [0,100]
providing summary of causal results
based on preponderance of evidence from Cr1 to Cr3
from four orders of stochastic dominance, etc. The order of input columns matters.
The fourth column ‘corr.’ reports the Pearson correlation coefficient while
the fifth column has the p-value for testing the null of zero Pearson coeff.
This function calls silentPairs
allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 near unanimously suggest ‘crim’ as the cause of ‘off’,
strength index 100 suggests unanimity. In portfolio
applications of stochastic dominance one wants higher returns. Here we are
comparing two probability distributions of absolute residuals for two
flipped models. We choose that flip which has smaller absolute residuals
or better fit.
attach(EuroCrime); causeSummary(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See bootPairs
, causeSummary0
has
an older version of this function.
See someCPairs
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Allowing input matrix of control variables, this function produces
a 5 column matrix
summarizing the results where the estimated signs of
stochastic dominance order values, (+1, 0, -1), are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by
a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, -1).
The final range for the unanimity of sign index is [–100, 100].
causeSummary0( mtx, nam = colnames(mtx), ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
causeSummary0( mtx, nam = colnames(mtx), ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns, y the first column is fixed and then paired with all columns, one by one, and still called x for the purpose of flipping. |
nam |
vector of column names for |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. The reason for
slightly declining sampling
unreliability of higher moments is simply that SD4 involves fourth power
of the deviations from the mean and SD3 involves 3rd power, etc.
The summary results for all
three criteria are reported in one matrix called out
:
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal direction pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which other variable (also by name).
It also prints strength or signed summary strength index in range [-100,100].
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas negative strength index means x(1+j) kernel causes x1. The function
also prints Pearson correlation and its p-value. This function also returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
has absolute value of summary index in range [0,100]
providing summary of causal results
based on preponderance of evidence from Cr1 to Cr3
from four orders of stochastic dominance, etc. The order of input columns matters.
The fourth column ‘corr.’ reports the Pearson correlation coefficient while
the fifth column has the p-value for testing the null of zero Pearson coeff.
This function calls silentPairs0
(the older version) allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 near unanimously suggest ‘crim’ as the cause of ‘off’,
strength index 100 suggests unanimity.
attach(EuroCrime); causeSummary0(cbind(crim,off))
. Both versions
give identical result for this example. Old version of Cr1 using
gradients was also motivated by the same Hausman-Wu test statistic.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See bootPairs
See someCPairs
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary0(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary0(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary0(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary0(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
The algorithm determines causal path directions from the sign of the strength index and strength index values by comparing three aspects of flipped kernel regressions: [x1 on f(x2, x3, .. xp)] and its flipped version [x2 on f(x1, x3, .. xp)] We compare (i) formal exogeneity test criterion, (ii) absolute residuals, and (iii) R-squares of the flipped regressions implying three criteria Cr1, to Cr3. The criteria are quantified by newer exact methods using four orders of stochastic dominance, SD1 to SD4. See Vinod's (2021) SSRN papers. In portfolio applications of stochastic dominance, one wants higher values. Here, we are comparing two probability distributions of absolute residuals for two flipped models. We choose that flip, which has smaller absolute residuals that will have a better fit.
causeSummary2(mtx, nam = colnames(mtx), ctrl = 0, dig = 6)
causeSummary2(mtx, nam = colnames(mtx), ctrl = 0, dig = 6)
mtx |
The data matrix with many columns, y the first column is fixed and then paired with all columns, one by one, and still called x for the purpose of flipping. |
nam |
vector of column names for |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal direction pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which other variable (also by name).
It also prints a signed summary strength index in the range [-100,100].
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas a negative strength index means x(1+j) kernel causes x1. The function
also prints the Pearson correlation and its p-value. In short, function returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
reports the absolute value of the summary index, in the range [0,100],
providing a summary of causal results
based on the preponderance of evidence from Cr1 to Cr3
from four orders of stochastic dominance, moments, deciles,
etc. The order of input columns in mtx matters.
The fourth column, ‘corr.’ of ‘out’, reports the Pearson correlation coefficient.
The fifth column has the p-value for testing the null of zero Pearson coeff.
This function calls silentPair2
, allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that a
high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 nearly unanimously suggest ‘crim’ as the cause of ‘off’,
strength index 100 suggests unanimity among the criteria.
attach(EuroCrime); causeSummary(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Vinod, Hrishikesh D., R Package GeneralCorr Functions for Portfolio Choice (November 11, 2021). Available at SSRN: https://ssrn.com/abstract=3961683
Vinod, Hrishikesh D., Stochastic Dominance Without Tears (January 26, 2021). Available at SSRN: https://ssrn.com/abstract=3773309
See siPair2Blk
for a block version
See causeSummary
is subject to trapezoidal approximation.
see silentPair2
called by this function.
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary2(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary2(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary2(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary2(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
The function develops a unanimity index for deciding which flip (y on xi) or (xi on y) is best. Relevant signs determine the causal direction and unanimity index among three criteria. While allowing the researcher to keep some variables as controls, or outside the scope of causal path determination (e.g., age or latitude) this function produces detailed causal path information in a 5 column matrix identifying the names of variables, causal path directions, path strengths re-scaled to be in the range [–100, 100], (table reports absolute values of the strength) plus Pearson correlation and its p-value. The ‘2’ in the name of the function suggests a second implementation where exact stochastic dominance, decileVote, and momentVote are used and where we avoid Anderson's trapezoidal approximation.
causeSummary2NoP(mtx, nam = colnames(mtx), ctrl = 0, dig = 6)
causeSummary2NoP(mtx, nam = colnames(mtx), ctrl = 0, dig = 6)
mtx |
The data matrix with many columns, y the first column is fixed and then paired with all columns, one by one, and still called x for the purpose of flipping. |
nam |
vector of column names for |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
The algorithm determines causal path directions from the sign of the strength index and strength index values by comparing three aspects of flipped kernel regressions: [x1 on f(x2, x3, .. xp)] and its flipped version [x2 on f(x1, x3, .. xp)] We compare (i) formal exogeneity test criterion, (ii) absolute residuals, and (iii) R-squares of the flipped regressions implying three criteria Cr1, to Cr3. The criteria are quantified by newer exact methods using four orders of stochastic dominance, SD1 to SD4. See Vinod's (2021) SSRN papers. In portfolio applications of stochastic dominance, one wants higher values. Here, we are comparing two probability distributions of absolute residuals for two flipped models. We choose that flip, which has smaller absolute residuals that will have a better fit.
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal direction pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which other variable (also by name).
It also prints a signed summary strength index in the range [-100,100].
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas a negative strength index means x(1+j) kernel causes x1. The function
also prints the Pearson correlation and its p-value. In short, function returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
reports the absolute value of the summary index, in the range [0,100],
providing a summary of causal results
based on the preponderance of evidence from Cr1 to Cr3
from four orders of stochastic dominance, moments, deciles,
etc. The order of input columns in mtx matters.
The fourth column, ‘corr.’ of ‘out’, reports the Pearson correlation coefficient.
The fifth column has the p-value for testing the null of zero Pearson coeff.
This function calls silentPair2
, allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that a
high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 nearly unanimously suggest ‘crim’ as the cause of ‘off’,
strength index 100 suggests unanimity among the criteria.
attach(EuroCrime); causeSummary(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Vinod, Hrishikesh D., R Package GeneralCorr Functions for Portfolio Choice (November 11, 2021). Available at SSRN: https://ssrn.com/abstract=3961683
Vinod, Hrishikesh D., Stochastic Dominance Without Tears (January 26, 2021). Available at SSRN: https://ssrn.com/abstract=3773309
See siPair2Blk
for a block version
See causeSummary
is subject to trapezoidal approximation.
see silentPair2
called by this function.
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary2(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary2(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummary2(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummary2(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
A block version of causeSummary()
chooses new bandwidth for every
ten (blksiz=10) observations chosen by the ‘np’ package injecting flexibility.
While allowing the researcher to keep some variables as controls,
or outside the scope of causal path determination
(e.g., age or latitude), this function produces detailed causal path information.
The output table is a 5-column matrix identifying the names of variables,
causal path directions, and path strengths re-scaled to be in the
range [–100, 100], (table reports absolute values of the strength)
plus Pearson correlation coefficient and its p-value.
causeSummBlk( mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
causeSummBlk( mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns, y the first column is a fixed target, and then it is paired with all other columns, one by one, and still called x for the purpose of flipping. |
nam |
vector of column names for |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in the matrix then blksiz=n. That is, no blocking is done |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
The number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The algorithm determines causal path directions from the sign of the strength index. The strength index magnitudes are computed by comparing three aspects of flipped kernel regressions: [x1 on (x2, x3, .. xp)] and its flipped version [x2 on (x1, x3, .. xp)]. The cause should be on the right-hand side of the regression equation. The properties of regression fit determine which flip is superior. We compare (Cr1) formal exogeneity test criterion, (residuals times RHS regressor, where smaller in absolute value is better) (Cr2) absolute values of residuals, where smaller in absolute value is better, and (Cr3) R-squares of the flipped regressions implying three criteria Cr1, to Cr3. The criteria are quantified by sophisticated methods using four orders of stochastic dominance, SD1 to SD4. We assume slightly declining weights on the sign observed by Cr1 to Cr3. The user can change default weights.
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal-direction-pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which other variable (also by name).
It also prints a strength, or signed summary strength
index forced to be in the range [-100,100] for easy interpretation.
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas negative strength index means x(1+j) kernel causes x1. The function
also prints Pearson correlation and its p-value. This function also returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
has the absolute value of a summary index in the range [0,100],
providing a summary of causal results
based on the preponderance of evidence from Cr1 to Cr3
from four orders of stochastic dominance, etc. The order of input columns matters.
The fourth column of the output matrix entitled ‘corr.’ reports the Pearson
correlation coefficient, while
the fifth column of the output matrix has the p-value for testing the
null hypothesis of a zero Pearson coefficient.
This function calls siPairsBlk
, allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that
a high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 near-unanimously suggest ‘crim’ as the cause of ‘off’,
a strength index of 100 suggests unanimity.
attach(EuroCrime); causeSummBlk(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See bootPairs
, causeSummary
has
an older version of this function.
See someCPairs
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummBlk(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummBlk(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=as.matrix(mtcars[,1:3]) ctrl=as.matrix(mtcars[,4:5]) causeSummBlk(mtx,ctrl,nam=colnames(mtx)) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSummBlk(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Allowing input matrix of control variables, this function produces
a 5 column matrix
summarizing the results where the estimated signs of
stochastic dominance order values, (+1, 0, -1), are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by
a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, -1).
The final range for the unanimity of sign index is [–100, 100].
causeSumNoP( mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
causeSumNoP( mtx, nam = colnames(mtx), blksiz = 10, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns, y the first column is a fixed target and then it is paired with all other columns, one by one, and still called x for the purpose of flipping. |
nam |
vector of column names for |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the higher order stochastic dominance
numbers are less reliable.
The summary results for all
three criteria are reported in one matrix called out
but not printed:
If there are p columns in the input matrix, x1, x2, .., xp, say,
and if we keep x1 as a common member of all causal-direction-pairs
(x1, x(1+j)) for (j=1, 2, .., p-1) which can be flipped. That is, either x1 is
the cause or x(1+j) is the cause in a chosen pair.
The control
variables are not flipped. The printed output of this function
reports the results for p-1 pairs indicating which variable
(by name) causes which other variable (also by name).
It also prints strength or signed summary strength index in range [-100,100].
A positive sign of the strength index means x1 kernel causes x(1+j),
whereas negative strength index means x(1+j) kernel causes x1. The function
also prints Pearson correlation and its p-value. This function also returns
a matrix of p-1 rows and 5 columns entitled:
“cause", “response", “strength", “corr." and “p-value", respectively
with self-explanatory titles. The first two columns have names of variables
x1 or x(1+j), depending on which is the cause. The ‘strength’ column
has absolute value of summary index in range [0,100]
providing summary of causal results
based on preponderance of evidence from Cr1 to Cr3
from four orders of stochastic dominance, etc. The order of input columns matters.
The fourth column ‘corr.’ reports the Pearson correlation coefficient while
the fifth column has the p-value for testing the null of zero Pearson coeff.
This function calls siPairsBlk
allowing for control variables.
The output of this function can be sent to ‘xtable’ for a nice Latex table.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
Since Cr1 to Cr3 near unanimously suggest ‘crim’ as the cause of ‘off’,
strength index 100 suggests unanimity.
attach(EuroCrime); causeSummary(cbind(crim,off))
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See bootPairs
, causeSummary0
has
an older version of this function.
See causeAllPair
## Not run: mtx=data.frame(mtcars[,1:3]) ctrl=data.frame(mtcars[,4:5]) causeSumNoP(mtx=mtx,ctrl=ctrl) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSumNoP(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: mtx=data.frame(mtcars[,1:3]) ctrl=data.frame(mtcars[,4:5]) causeSumNoP(mtx=mtx,ctrl=ctrl) ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA causeSumNoP(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Compute cofactor of a matrix based on row r and column c.
cofactor(x, r, c)
cofactor(x, r, c)
x |
matrix whose cofactor is desired to be computed |
r |
row number |
c |
column number |
cofactor of x, w.r.t. row r and column c.
needs the function 'minor” in memory. attaches sign (-1)^(r+c) to the minor.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
minor(x,r,c)
## The function is currently defined as function (x, r, c) { out = minor(x, r, c) * ((-1)^(r + c)) return(out) }
## The function is currently defined as function (x, r, c) { out = minor(x, r, c) * ((-1)^(r + c)) return(out) }
Given two vectors of portfolio returns this function calls the internal function wtdpapb to report the simple means of four sophisticated measures of stochastic dominance. as explained in Vinod (2008).
comp_portfo2(xa, xb)
comp_portfo2(xa, xb)
xa |
Data on returns for portfolio A in the form of a T by 1 vector |
xb |
Data on returns for portfolio B in the form of a T by 1 vector |
Returns four numbers which are averages of four sophisticated measures of stochastic dominance measurements called SD1 to SD4.
It is possible to modify this function to report the median or standard
deviation or any other descriptive statistic by changing the line in the
code 'oumean = apply(outb, 2, mean)
' toward the end of this function.
A trimmed mean may be of interest when outliers are suspected.
require(np)
Make sure that functions wtdpapb, bigfp, stochdom2 are in the memory. and options(np.messages=FALSE)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.", "Hands-On Intermediate Econometrics Using R" (2008) World Scientific Publishers: Hackensack, NJ. (Chapter 4) https://www.worldscientific.com/worldscibooks/10.1142/12831
set.seed(30) xa=sample(20:30)#generally lower returns xb=sample(32:40)# higher returns in xb gp = comp_portfo2(xa, xb)#all Av(sdi) positive means xb dominates ##positive SD1 to SD4 means xb dominates xa as it should
set.seed(30) xa=sample(20:30)#generally lower returns xb=sample(32:40)# higher returns in xb gp = comp_portfo2(xa, xb)#all Av(sdi) positive means xb dominates ##positive SD1 to SD4 means xb dominates xa as it should
Given two vectors of portfolio returns this function summarizes their ranks based on moments, deciles and exact measures of stochastic dominance. as explained in Vinod (2021). This algorithm has model selection applications.
compPortfo(xa, xb)
compPortfo(xa, xb)
xa |
Data on returns for portfolio A in the form of a T by 1 vector |
xb |
Data on returns for portfolio B in the form of a T by 1 vector |
Returns three numbers which represent signs based differences in ranks (rank=1 for most desirable) measured by [rank(xa)-rank(xb)] using momentVote, decileVote, and exactSdMtx which are weighted averages of four moments, nine deciles and exact measures of stochastic dominance (from ECDFs of four orders, SD1 to SD4) respectively.
There are model-selection applications where two models A and B are
compared and one wants to choose the model smaller absolute value of
residuals. This function when applied for model-selection will have
he inputs xa and xb as absolute residuals. We can compare the entire
probability distributions of absolute residuals by moments, deciles
or SD1 to SD4. Of course, care must be taken to choose xa or
xb depending on which model has smaller absolute residuals. This choice
is the exact opposite of portfolio choice application where
larger return is more desirable. silentPair2()
and siPair2Blk
call this
function for model selection application.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.", "Hands-On Intermediate Econometrics Using R" (2008) World Scientific Publishers: Hackensack, NJ. (Chapter 4) https://www.worldscientific.com/worldscibooks/10.1142/12831
Vinod, Hrishikesh D., R Package GeneralCorr Functions for Portfolio Choice (November 11, 2021). Available at SSRN: https://ssrn.com/abstract=3961683
set.seed(30) xa=sample(20:30)#generally lower returns xb=sample(32:40)# higher returns in xb gp = compPortfo(xa, xb)#all Av(sdi) positive means xb dominates ##output (1,1,1) means xb dominates xa. xb are larger by consruction
set.seed(30) xa=sample(20:30)#generally lower returns xb=sample(32:40)# higher returns in xb gp = compPortfo(xa, xb)#all Av(sdi) positive means xb dominates ##output (1,1,1) means xb dominates xa. xb are larger by consruction
intended for internal use
data(da2Lag)
data(da2Lag)
The format is: int 4
The first step computes a minimum reference return and nine deciles. The input x must be a matrix having p columns (with a name for each column) and n rows as in the data. If data are missing for some columns, insert NA's. Thus x has p column of the data matrix ready for comparison and ranking. For example, x has a matrix of stock returns. The output matrix produced by this function also has p columns for each column (i.e., for each stock being compared). The output matrix has nineteen rows. The top nine rows have the magnitudes of deciles. Rows 10 to 18 have respective ranks of the decile magnitudes. The next (19-th) row of the output reports a weighted sum of ranks. Ranking always gives the smallest number 1 to the most desirable outcome. We suggest that a higher portfolio weight be given to the column having smallest rank value (along the 19th line). The 20-th row further ranks the weighted sums of ranks in row 19. Investor should choose the stock (column) representing the smallest rank value along the last (20th) row of the ‘out’ matrix.
decileVote(mtx, howManySd = 0.1)
decileVote(mtx, howManySd = 0.1)
mtx |
(n X p) matrix of data. For example, returns on p stocks n months |
howManySd |
used to define ‘fixmin’= imaginary lowest return defined by going howManySd=default=0.1 maximum of standard deviations of all stocks below the minimum return for all stocks in the data |
out is a matrix with p columns (same as in the input matrix) and twenty rows. Top nine rows have 9 deciles, next nine rows have their ranks. The 19-th row of ‘out’ has a weighted sum of 9 ranks. All columns refer to one stock. The weighted sum for each stock is then ranked. A portfolio manager is assumed to prefer higher return represented by high decile values represented by the column with the largest weighted sum. can give largest weight to the column with the smallest bottom line. The bottom line (20-th) labeled “choice" of the ‘out’ matrix is defined so that choice =1 suggests the stock deserving the highest weight in the portfolio. The portfolio manager will generally give the lowest weight (=0?) to the stock representing column having number p as the choice number. The manager may want to sell this stock. Another output of the ‘decileVote’ function is ‘fixmin’ representing the smallest possible return of all the stocks in the input ‘mtx’ of returns. It is useful as a reference stock. We compute stochastic dominance numbers for each stock with this imaginary stock yielding fixmin return for all time periods.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
x1=c(1,4,7,2,6) x2=c(3,4,8,4,7) decileVote(cbind(x1,x2))
x1=c(1,4,7,2,6) x2=c(3,4,8,4,7) decileVote(cbind(x1,x2))
An infant may depend on the mother for survival, but not vice versa. Dependence relations need not be symmetric, yet correlation coefficients are symmetric. One way to measure the extent of dependence is to find the max of the absolute values of the two asymmetric correlations using Vinod's (2015) definition of generalized (asymmetric) correlation coefficients. It requires a kernel regression of x on y obtained by using the ‘np’ package and its flipped version (regress y on x). We use a block version of ‘gmcmtx0’ called 'gmcmtxBlk' to admit several bandwidths for every ten observations if the user sets blksiz=10, a recommended choice here.
depMeas(x, y, blksiz = length(x))
depMeas(x, y, blksiz = length(x))
x |
Vector of data on the first variable |
y |
Vector of data on the second variable |
blksiz |
block size, default blksiz =n, where n=rows in the matrix or no blocking is done |
A measure of dependence having the same sign as Pearson correlation. Its magnitude equals the larger of the two generalized correlation coefficients.
This function needs the gmcmtxBlk function, which in turn needs the np package.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
See Also gmcmtx0
and gmcmtxBlk
library(generalCorr) options(np.messages = FALSE) x=1:20;y=sin(x) depMeas(x,y,blksiz=20)
library(generalCorr) options(np.messages = FALSE) x=1:20;y=sin(x) depMeas(x,y,blksiz=20)
This is for momentum traders who focus on growth, acceleration, its gorwth and further acceleration. The diff function of R seems to do recycling of available numbers, not wanted for our purposes.
dif4(x)
dif4(x)
x |
(n X 1) vector of time series (market returns) with n items each |
ou2 matrix having five columns, first for x, the next four columns have diff(x), diff-squared(x), diff-cubed(x) and diff-fourth(x)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
x=c(2,8,3,5,1,8,19,22,23) dif4(x)
x=c(2,8,3,5,1,8,19,22,23) dif4(x)
This is for momentum traders who focus on growth, acceleration, its growth and further acceleration. The diff function of R seems to do recycling of available numbers, not wanted for our purposes. Hence, this function is needed in portfolio studies based on time series.
dif4mtx(mtx)
dif4mtx(mtx)
mtx |
(n X p) matrix of p time series (market returns) with n items each |
out matrix having 12 rows, (data, D1 to D4 and ranks of D1 to D4 The column names of out are those of input matrix mtx.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
x=c(2,8,3,5,1,8,19,22,23) y=c(3,11,2,6,7,9,20,25,21) dif4mtx(cbind(x,y))
x=c(2,8,3,5,1,8,19,22,23) y=c(3,11,2,6,7,9,20,25,21) dif4mtx(cbind(x,y))
Intended for internal use
data(dig)
data(dig)
The format digs: int 78
This data set refers to crime in European countries during 2008.
The sources are World Bank and Eurostat. The crime statistics refers
to homicides. It avoids possible reporting bias from the presence
of police officers, because homicide reporting in most countries is
standardized. Typical usage is: data(EuroCrime);attach(EuroCrime)
.
The secondary source ‘quandl.com’ was used for collecting these data.
The variables included in the dataset are:
Country
Name of the European country
crim
Per capita crime rate
off
Per capita deployment of police officers
ECDF=empirical cumulative distribution functions. These are sufficient statistics representing probability density functions defined by observable finite data (e.g., stock returns). The exact computation of stochastic dominance orders SD1 to SD4 needs areas between two ECDFs, since such areas represent integrals. Higher-order SDs with continuous variables involve repeated integrals. Our quantification needs areas of ECDFs defined from areas of lower-order ECDFs. We argue that these computations are convenient if there is an ECDF of an imaginary reference minimum (x.ref) return, whose ECDF is a rectangle common for all stock comparisons. A common (x.ref) avoids having to compute all possible pairs of p stocks. Choosing a common reference as SP500 index stock cannot avoid a slower trapezoidal approximation for integrals, since its returns vary over time. We want exact areas of rectangles and fast.
exactSdMtx(mtx, howManySd = 0.1)
exactSdMtx(mtx, howManySd = 0.1)
mtx |
(n X p) matrix of data. For example, returns on p stocks over n months |
howManySd |
used to define (x.ref)= lowest return number. If the grand minimum of all returns in ‘mtx’ is denoted GrMin, then howManySd equals the number of max(sd) (maximum standard deviation for data columns) below the GrMin used to define (x.ref). Thus, (x.ref)=GrMin-howManySd*max(sd). default howManySd=0.1 |
The exactSdMtx
function inputs ‘mtx’ (n X p) matrix data
(e.g., n monthly returns on p stocks).
Its output has four matrices SD1 to SD4, each with dimension (n X p). They measure
exact dominance areas between empirical CDF for each column to the ECDF of
(x.ref), an artificial stock with minimal return in all time periods. A fifth
output matrix called ‘out’ produced by exactSdMtx
has 4 rows and p columns containing column sums of SD1 to SD4.
We intend that this
‘out’ matrix produced by exactSdMtx
is then input to another
function summaryRank()
in the package designed for practitioners.
For example, it indicates the best and the worst columns
representing (the best stock to buy and best stock to sell)
from the input data ‘mtx’ for investment based on a sophisticated computation
of their ranks.
five matrices. SD1 to SD4 contain four orders of stochastic dominance areas using the ECDF pillars and a common (x.ref). The fifth "out" matrix is another output with 4 rows for SD1 to SD4, and p columns (p=No. of columns in data matrix mtx) having a summary of ranks using all four, SD1 to SD4.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
x1=c(2,5,6,9,13,18,21) x2=c(3,6,9,12,14,19,27) st1=exactSdMtx(cbind(x1,x2))
x1=c(2,5,6,9,13,18,21) x2=c(3,6,9,12,14,19,27) st1=exactSdMtx(cbind(x1,x2))
The usual Granger-causality assumes linear regressions. This function allows nonlinear nonparametric kernel regressions using a local linear (ll) option. Granger-causality (Gc) is generalized using nonlinear kernel regressions using local linear (ll) option. This functionn computes two R^2 values. (i) R12 or kernel regression R^2 of x1t on its own lags and x2t and its lags. (ii) R21 or kernel regression R^2 of x2t on its own lags and x1t and its lags. (iii) dif=R12-R21, the difference between the two R^2 values. If dif>0 then x2 Granger-causes x1.
GcRsqX12(x1, x2, px1 = 4, px2 = 4, pwanted = 4, ctrl = 0)
GcRsqX12(x1, x2, px1 = 4, px2 = 4, pwanted = 4, ctrl = 0)
x1 |
The data vector x1 |
x2 |
The data vector x2 |
px1 |
The number of lags of x1 in the data default px1=4 |
px2 |
The number of lags of x2 in the data, default px2=4 |
pwanted |
number of lags of both x2 and x1 wanted for Granger causal analysis, default =4 |
ctrl |
data matrix for designated control variable(s) outside causal paths default=0 means no control variables are present |
Calls GcRsqYX for R-square from kernel regression (local linear version) R^2[x1=f(x1,x2)] choosing GcRsqYX(y=x1, x=x2). It predicts x1 from both x1 and x2 using all information till time (t-1). It also calls GcRsqYX again after flipping x1 and x2. It returns RsqX1onX2, RsqX2onX1 and the difference dif=(RsqX1onX2-RsqX2onX1) If (dif>0) the regression y=f(x1,x2) is better than the flipped version implying that x1 is more predictable or x2 Granger-causes x1, x2 –> x1, rather than vice versa. The kernel regressions use regtype="ll" for local linear, bwmethod="cv.aic" for AIC-based bandwidth selection.
This function returns 3 numbers: RsqX1onX2, RsqX2onX1 and dif
returns a list of 3 numbers. RsqX1onX2=(Rsquare of kernel regression of X1 on lags of X1 and X2 and its lags), RsqX2onX1= (Rsquare of kernel regression of x2 on own lags of X2 and X1), and the difference between the two Rquares (first minus second) called ‘dif.’ If dif>0 then x2 Granger-causes x1
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North-Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Zheng, S., Shi, N.-Z., Zhang, Z., 2012. Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association 107, 1239-1252. -at-note internal routine
bootGcRsq
,
causeSummary
,
GcRsqYX
.
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqX12(y,m) ## End(Not run)
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqX12(y,m) ## End(Not run)
The usual Granger-causality assumes linear regressions. This allows nonlinear nonparametric kernel regressions using a local constat (lc) option. Calls GcRsqYXc for R square from kernel regression. R^2[x1=f(x1,x2)] choosing GcRsqYXc(y=x1, x=x2). The name ‘c’ in the function refers to local constant option of kernel regressions.' It predicts x1 from both x1 and x2 using all information till time (t-1). It also calls GcRsqYXc again after flipping x1 and x2. It returns RsqX1onX2, RsqX2onX1 and the difference dif=(RsqX1onX2-RsqX2onX1) If (dif>0) the regression x1=f(x1,x2) is better than the flipped version implying that x1 is more predictable or x2 Granger-causes x1 x2 –> x1, rather than vice versa. The kernel regressions use regtype="lc" for local constant, bwmethod="cv.ls" for least squares-based bandwidth selection.
GcRsqX12c(x1, x2, px1 = 4, px2 = 4, pwanted = 4, ctrl = 0)
GcRsqX12c(x1, x2, px1 = 4, px2 = 4, pwanted = 4, ctrl = 0)
x1 |
The data vector x1 |
x2 |
The data vector x2 |
px1 |
number of lags of x1 in the data default px1=4 |
px2 |
number of lags of x2 in the data, default px2=4 |
pwanted |
number of lags of both x2 and x1 wanted for Granger causal analysis, default =4 |
ctrl |
data matrix for designated control variable(s) outside causal paths default=0 means no control variables are present |
This function returns 3 numbers: RsqX1onX2, RsqX2onX1 and dif
returns a list of 3 numbers. RsqX1onX2=(Rsquare of kernel regression of X1 on X1 and X2), RsqX2onX1= (Rsquare of kernel regression of x2 on X2 and X1), and the difference between the two Rquares called dif
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Zheng, S., Shi, N.-Z., Zhang, Z., 2012. Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association 107, 1239-1252. -at-note internal routine
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqX12c(y,m) ## End(Not run)
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqX12c(y,m) ## End(Not run)
Function input is y=LHS=First time series and x=RHS=Second time series. Kernel regression np package options regtype="ll" for local linear, and bwmethod="cv.aic" for AIC-based bandwidth selection are fixed. Denote Rsq=Rsquare=R^2 in nonlinear kernel regression. GcRsqYX(.) computes the following two R^2 values. out[1]=Rsqyyx = R^2 when we regress y on own lags of y and x. out[2]=Rsqyy = R^2 when we regress y on lags of y alone.
GcRsqYX(y, x, px = 4, py = 4, pwanted = 4, ctrl = 0)
GcRsqYX(y, x, px = 4, py = 4, pwanted = 4, ctrl = 0)
y |
The data vector y for the Left side or dependent or first variable |
x |
The data vector x for the right side or explanatory or second variable |
px |
number of lags of x in the data |
py |
number of lags of y in the data. px=4 for quarterly data |
pwanted |
number of lags of both x and y wanted for Granger causal analysis |
ctrl |
data matrix for designated control variable(s) outside causal paths default=0 means no control variables are present |
This function returns a set of 2 numbers measuring nonlinear Granger-causality for time series. out[1]=Rsqyyx, out[2]=Rsqyy.
If data are annual or if no quarterly-type structure is present, use this function with pwanted=px=py. For example, the egg or chicken data from lmtest package.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Zheng, S., Shi, N.-Z., Zhang, Z., 2012. Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association 107, 1239-1252.
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqYX(y,m) ## End(Not run)
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqYX(y,m) ## End(Not run)
Function input is y=LHS=First time series and x=RHS=Second time series. Kernel regression np package options regtype="lc" for local constant, and bwmethod="cv.ls" for least squares-based bandwidth selection are fixed. Denote Rsq=Rsquare=R^2 in nonlinear kernel regression. GcRsqYXc(.) computes the following two R^2 values. out[1]=Rsqyyx = R^2 when we regress y on own lags of y and x. out[2]=Rsqyy = R^2 when we regress y on own lags of y alone.
GcRsqYXc(y, x, px = 4, py = 4, pwanted = 4, ctrl = 0)
GcRsqYXc(y, x, px = 4, py = 4, pwanted = 4, ctrl = 0)
y |
The data vector y for the Left side or dependent or first variable |
x |
The data vector x for the right side or explanatory or second variable |
px |
number of lags of x in the data |
py |
number of lags of y in the data. px=4 for quarterly data |
pwanted |
number of lags of both x and y wanted for Granger causal analysis |
ctrl |
data matrix for designated control variable(s) outside causal paths default=0 means no control variables are present |
This function returns a set of 2 numbers measuring nonlinear Granger-causality for time series. out[1]=Rsqyyx, out[2]=Rsqyy.
If data are annual or if no quarterly-type structure is present, use this function with pwanted=px=py. For example, the egg or chicken data from lmtest package, Thurman W.N. and Fisher M.E. (1988)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Zheng, S., Shi, N.-Z., Zhang, Z., 2012. Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association 107, 1239-1252.
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqYXc(y,m) ## End(Not run)
## Not run: library(Ecdat);options(np.messages=FALSE);attach(data.frame(MoneyUS)) GcRsqYXc(y,m) ## End(Not run)
This package provides convenient software tools for causal path determinations
using Vinod (2014, 2015, 2018, 2021) and is explained in many package vignettes.
causeSummary(mtx)
, causeSummary2(mtx)
,causeSum2Blk(mtx)
,
causeSummBlk
are various versions reporting pair-wise causal
path directions and causal strengths. We fit
a kernel regression of X1 on (X2, X3,..Xk) and another flipped regression
of X2 on (X1, x3, ..Xk). We compare the two fits using three sophisticated criteria
called Cr1 to Cr3. We rescale the
weighted sum of the quantified three criteria to the [-100, 100] range.
The sign of the weighted sum gives the direction of the causal path, and
the magnitude of the weighted sum gives the strength of the causal path.
A matrix of non-symmetric generalized correlations r*(x|y) is reported by the
functions rstar()
and gmcmtx0()
.
sudoCoefParcor()
computes pseudo kernel regression coefficients based on
generalized partial correlation coefficients (GPCC)
depMeas()
a measure of nonlinear nonparametric dependence between two vectors.
parcorVec()
has generalized partial correlation coefficients, Vinod (2021)
parcorVecH()
has a hybrid version of the above (using HGPCC).
The usual partial correlations r(x,y|z) for regression of y on (x, z) measure
the effect of y on x after removing the effect of z, where z can have several variables.
Vinod (2021) suggests new generalized partial correlation coefficients (GPCC)
using kernel regressions, r*(x,y|z).
The criterion Cr1 uses observable values of standard exogeneity test criterion,
namely, (kernel regression residual) times (regressor values)
Cr2 computes absolute values kernel regression residuals.
The quantification of Cr1 and Cr2 further uses four orders of stochastic
dominance measures.
Cr3 compares the R-square of the two fits.
The package provides additional tools for matrix algebra, such as
cofactor()
, for outlier detection get0outlier()
,
for numerical integration by the trapezoidal rule, stochastic dominance
stochdom2()
and comp_portfo2()
, etc.
The package has a function pcause()
for bootstrap-based statistical
inference and another one
for a heuristic t-test called heurist()
. Pairwise deletion of missing data
is done in napair()
, while triplet-wise deletion is in naTriplet()
intended for use when control variable(s) are also present. If one has
panel data, functions PanelLag()
and Panel2Lag()
are relevant.
pillar3D
provides 3-dimensional plots of data that look
more like surfaces, than usual plots with vertical pins.
Recent 2020 additions include canonRho()
for generalized canonical
correlations, and many
functions for Granger causality between lagged time series including
GcRsqX12()
, bootGcRsq()
and GcRsqYXc()
.
Recent additions include several functions for portfolio choice.
causeSum2Panel()
for panel data,
sudoCoefParcor()
for pseudo regression coefficients for kernel regressions.
decileVote()
, momentVote()
, exactSdMtx()
for exact
computation of stochastic dominance from ECDF areas. The newer stochastic
dominance tools are used in causeSummary2(mtx)
,causeSum2Blk(mtx)
dif4mtx()
computes growth, change in growth etc. up-to order 4 differencing of time series.
outOFsamp()
and outOFsell()
pandemic-proof
out-of-sample evaluation of portfolio returns using randomization.
causeSum2Panel()
exploits panel data features for causal paths.
Eight vignettes provided with this package at CRAN
describe the theory and usage of the package with examples. Read them using
the command:
vignette("generalCorr-vignette")
to read the first vignette.
vignettes 2 to 6 can be read by including the vignette number. For
example,
vignette("generalCorr-vignette6")
to read the sixth vignette.
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in 'Handbook of Statistics: Computational Statistics with R', Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). 'Generalized measures of correlation for asymmetry, nonlinearity, and beyond,' Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Function to compute outliers and their count using Tukey's method using 1.5 times interquartile range (IQR) to define boundaries.
get0outliers(x, verbo = TRUE, mult = 1.5)
get0outliers(x, verbo = TRUE, mult = 1.5)
x |
vector of data. |
verbo |
set to TRUE(default) assuming printed details are desired. |
mult |
=1.5(default), the number of times IQR is used in defining outlier boundaries. |
below |
which items are lower than the lower limit |
above |
which items are larger than the upper limit |
low.lim |
the lower boundary for outlier detection |
up.lim |
the upper boundary for outlier detection |
nUP |
count of number of data points above upper boundary |
nLO |
count of number of data points below lower boundary |
The function removes the missing data before checking for outliers.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
set.seed(101);x=sample(1:100)[1:15];x[16]=150;x[17]=NA get0outliers(x)#correctly identifies outlier=150
set.seed(101);x=sample(1:100)[1:15];x[16]=150;x[17]=NA get0outliers(x)#correctly identifies outlier=150
This is an auxiliary function for gmcmtxBlk. It gives sequences of starting and ending values
getSeq(n, blksiz)
getSeq(n, blksiz)
n |
length of the range |
blksiz |
blocksize |
two vectors sqLO and sqUP
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
getSeq(n=99, blksiz=10)
getSeq(n=99, blksiz=10)
This function checks for missing data for each pair individually. It then uses the
kern
function to kernel regress x on y, and conversely y on x. It
needs the R package ‘np’, which reports the R-squares of each regression.
gmcmtx0()
function
reports their square roots after assigning them the observed sign of the Pearson
correlation coefficient. Its threefold advantages are: (i)
It is asymmetric, yielding causal direction information
by relaxing the assumption of linearity implicit in usual correlation coefficients.
(ii) The r* correlation coefficients are generally larger upon admitting
arbitrary nonlinearities. (iii) max(|R*ij|, |R*ji|) measures (nonlinear)
dependence.
For example, let x=1:20 and y=sin(x). This y has a perfect (100 percent)
nonlinear dependence on x, and yet Pearson correlation coefficient r(xy)
-0.0948372 is near zero, and the 95% confidence interval (-0.516, 0.363)
includes zero, implying that r(xy) is not significantly different from zero.
This shows a miserable failure of traditional r(x,y) to measure dependence
when nonlinearities are present.
gmcmtx0(cbind(x,y))
will correctly reveal
perfect (nonlinear) dependence with generalized correlation coefficient =-1.
gmcmtx0(mym, nam = colnames(mym))
gmcmtx0(mym, nam = colnames(mym))
mym |
A matrix of data on variables in columns |
nam |
Column names of the variables in the data matrix |
A non-symmetric R* matrix of generalized correlation coefficients
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in 'Handbook of Statistics: Computational Statistics with R', Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). 'Generalized measures of correlation for asymmetry, nonlinearity, and beyond,' Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
See Also as gmcmtxBlk
for a more general version using
blocking allowing several bandwidths.
gmcmtx0(mtcars[,1:3]) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gmcmtx0(x) ## End(Not run)
gmcmtx0(mtcars[,1:3]) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gmcmtx0(x) ## End(Not run)
The algorithm uses
two auxiliary functions, getSeq
and NLhat
. The latter
uses the
kern
function to kernel regress x on y, and conversely y on x. It
needs the package ‘np,’ which reports residuals and allows one to
compute fitted values (xhat, yhat). Unlike gmcmtx0
, this function
considers blocks of blksiz=10 (default) pairs of data points
separately with distinct bandwidths for each block, usually creating superior local fits.
gmcmtxBlk(mym, nam = colnames(mym), blksiz = 10)
gmcmtxBlk(mym, nam = colnames(mym), blksiz = 10)
mym |
A matrix of data on selected variables arranged in columns |
nam |
Column names of the variables in the data matrix |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
This function does pairwise checks of missing data for all pairs. Assume that there are n rows in the input matrix ‘mym’ with some missing rows. If the columns of mym are denoted (X1, X2, ...Xp), we are considering all pairs (Xi, Xj), treated as (x, y), with ‘nv’ number of valid (non-missing) rows Note that each x and y is an (nv by 1) vector. This function further splits these (x, y) vectors into as many subgroups or blocks as are needed for the nv paired valid data points for the chosen block length (blksiz)
Next, the algorithm strings together various blocks of fitted value vectors (xhat, yhat) also of dimension nv by 1. Now for each pair of Xi Xj (column Xj= cause, row Xi=response, treated as x and y), the algorithm computes R*ij the simple Pearson correlation coefficient between (x, xhat) and as R*ji the correlation coeff. between (y, yhat). Next, it assigns |R*ij| and |R*ji| the observed sign of the Pearson correlation coefficient between x and y.
Its advantages discussed in Vinod (2015, 2019) are: (i)
It is asymmetric yielding causal direction information,
by relaxing the assumption of linearity implicit in usual correlation coefficients.
(ii) The R* correlation coefficients are generally larger upon admitting
arbitrary nonlinearities. (iii) max(|R*ij|, |R*ji|) measures (nonlinear) dependence.
For example, let x=1:20 and y=sin(x). This y has a perfect (100 percent)
nonlinear dependence on x and yet Pearson correlation coefficient r(x y)=
-0.0948372 is near zero, and its 95% confidence interval (-0.516, 0.363)
includes zero, implying that the population r(x,y) is not significantly
different from zero. This example highlights a serious
failure of the traditional r(x,y) in measuring dependence between x and y
when nonlinearities are present.
gmcmtx0
without blocking does work if x=1:n, and y=f(x)=sin(x) is used
with n<20. But for larger n, the fixed bandwidth used by the kern
function
becomes a problem. The block version has additional bandwidths for each block, and
hence it correctly quantifies the presence of high dependence even when
x=1:n, and y=f(x) are defined for large n and
complicated nonlinear functional forms for f(x).
A non-symmetric R* matrix of generalized correlation coefficients
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in 'Handbook of Statistics: Computational Statistics with R', Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New exogeneity tests and causal paths,' Chapter 2 in 'Handbook of Statistics: Conceptual Econometrics Using R', Vol.32, co-editors: H. D. Vinod and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2019, pp. 33-64.
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). 'Generalized measures of correlation for asymmetry, nonlinearity, and beyond,' Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
## Not run: x=1:20; y=sin(x) gmcmtxBlk(cbind(x,y),blksiz=10) ## End(Not run)
## Not run: x=1:20; y=sin(x) gmcmtxBlk(cbind(x,y),blksiz=10) ## End(Not run)
This function checks for missing data separately for each pair using
kern
function to kernel regress x on y, and conversely y on x. It
needs the library ‘np’ which reports R-squares of each regression. This function
reports their square roots with the sign of the Pearson correlation coefficients.
Its appeal is that it is asymmetric yielding causal direction information.
It avoids the assumption of linearity implicit in the usual correlation
coefficients.
gmcmtxZ(mym, nam = colnames(mym))
gmcmtxZ(mym, nam = colnames(mym))
mym |
A matrix of data on variables in columns |
nam |
Column names of the variables in the data matrix |
A non-symmetric R* matrix of generalized correlation coefficients
This allows the user to change gmcmtx0
and further experiment with my code.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gmcmtxZ(x) ## End(Not run)
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gmcmtxZ(x) ## End(Not run)
This function uses the ‘np’ package and assumes that there are no missing data.
gmcxy_np(x, y)
gmcxy_np(x, y)
x |
vector of x data |
y |
vector of y data |
corxy |
r*(x|y) from regressing x on y, where y is the kernel cause. |
coryx |
r*(y|x) from regressing y on x, where x is the cause. |
This is provided if the user want to avoid calling kern
.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R,' Chapter 4 in 'Handbook of Statistics: Computational Statistics with R,' Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
## Not run: set.seed(34);x=sample(1:10);y=sample(2:11) gmcxy_np(x,y) ## End(Not run)
## Not run: set.seed(34);x=sample(1:10);y=sample(2:11) gmcxy_np(x,y) ## End(Not run)
Function to run a heuristic t test of the difference between two generalized correlations.
heurist(rxy, ryx, n)
heurist(rxy, ryx, n)
rxy |
generalized correlation r*(x|y) where y is the kernel cause. |
ryx |
generalized correlation r*(y|x) where x is the kernel cause. |
n |
Sample size needed to determine the degrees of freedom for the t test. |
Prints the t statistics and p-values.
This function requires Revele's R package called ‘psych’ in memory. This test is known to be conservative (i.e., often fails to reject the null hypothesis of zero difference between the two generalized correlation coefficients.)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
set.seed(34);x=sample(1:10);y=sample(2:11) g1=gmcxy_np(x,y) n=length(x) h1=heurist(g1$corxy,g1$coryx,n) print(h1) print(h1$t) #t statistic print(h1$p) #p-value
set.seed(34);x=sample(1:10);y=sample(2:11) g1=gmcxy_np(x,y) n=length(x) h1=heurist(g1$corxy,g1$coryx,n) print(h1) print(h1$t) #t statistic print(h1$p) #p-value
Function to run kernel regression with options for residuals and gradients asssuming no missing data.
kern(dep.y, reg.x, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE)
kern(dep.y, reg.x, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE)
dep.y |
Data on the dependent (response) variable |
reg.x |
Data on the regressor (stimulus) variables |
tol |
Tolerance on the position of located minima of the cross-validation function (default =0.1) |
ftol |
Fractional tolerance on the value of cross validation function evaluated at local minima (default =0.1) |
gradients |
Make this TRUE if gradients computations are desired |
residuals |
Make this TRUE if residuals are desired |
Creates a model object ‘mod’ containing the entire kernel regression output.
Type names(mod)
to reveal the variety of outputs produced by ‘npreg’ of the ‘np’ package.
The user can access all of them at will by using the dollar notation of R.
This is a work horse for causal identification.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See kern_ctrl
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=2) require(np); options(np.messages=FALSE) k1=kern(x[,1],x[,2]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=2) require(np); options(np.messages=FALSE) k1=kern(x[,1],x[,2]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
Allowing matrix input of control variables, this function runs kernel regression with options for residuals and gradients.
kern_ctrl( dep.y, reg.x, ctrl, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE )
kern_ctrl( dep.y, reg.x, ctrl, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE )
dep.y |
Data on the dependent (response) variable |
reg.x |
Data on the regressor (stimulus) variable |
ctrl |
Data matrix on the control variable(s) kept outside the causal paths. A constant vector is not allowed as a control variable. |
tol |
Tolerance on the position of located minima of the cross-validation function (default=0.1) |
ftol |
Fractional tolerance on the value of cross validation function evaluated at local minima (default=0.1) |
gradients |
Set to TRUE if gradients computations are desired |
residuals |
Set to TRUE if residuals are desired |
Creates a model object ‘mod’ containing the entire kernel regression output.
If this function is called as mod=kern_ctrl(x,y,ctrl=z)
, the researcher can
simply type names(mod)
to reveal the large variety of outputs produced by ‘npreg’
of the ‘np’ package.
The user can access all of them at will using the dollar notation of R.
This is a work horse for causal identification.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See kern
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=5) require(np) k1=kern_ctrl(x[,1],x[,2],ctrl=x[,4:5]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=5) require(np) k1=kern_ctrl(x[,1],x[,2],ctrl=x[,4:5]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
Kernel regression version 2 with optional residuals and gradients with regtype="ll" for local linear, bwmethod="cv.aic" for AIC-based bandwidth selection.
kern2( dep.y, reg.x, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE )
kern2( dep.y, reg.x, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE )
dep.y |
Data on the dependent (response) variable |
reg.x |
Data on the regressor (stimulus) variables |
tol |
Tolerance on the position of located minima of the cross-validation function (default =0.1) |
ftol |
Fractional tolerance on the value of cross validation function evaluated at local minima (default =0.1) |
gradients |
Make this TRUE if gradients computations are desired |
residuals |
Make this TRUE if residuals are desired |
Creates a model object ‘mod’ containing the entire kernel regression output.
Type names(mod)
to reveal the variety of outputs produced by ‘npreg’ of the ‘np’ package.
The user can access all of them at will by using the dollar notation of R.
This is version 2 ("ll","cv.aic") of a work horse for causal identification.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See kern_ctrl
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=2) require(np); options(np.messages=FALSE) k1=kern(x[,1],x[,2]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=2) require(np); options(np.messages=FALSE) k1=kern(x[,1],x[,2]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
Kernel regression with control variables and optional residuals and gradients. version 2 regtype="ll" for local linear, bwmethod="cv.aic" for AIC-based bandwidth selection. It admits control variables.
kern2ctrl( dep.y, reg.x, ctrl, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE )
kern2ctrl( dep.y, reg.x, ctrl, tol = 0.1, ftol = 0.1, gradients = FALSE, residuals = FALSE )
dep.y |
Data on the dependent (response) variable |
reg.x |
Data on the regressor (stimulus) variable |
ctrl |
Data matrix on the control variable(s) kept outside the causal paths. A constant vector is not allowed as a control variable. |
tol |
Tolerance on the position of located minima of the cross-validation function (default=0.1) |
ftol |
Fractional tolerance on the value of cross validation function evaluated at local minima (default=0.1) |
gradients |
Set to TRUE if gradients computations are desired |
residuals |
Set to TRUE if residuals are desired |
Creates a model object ‘mod’ containing the entire kernel regression output.
If this function is called as mod=kern_ctrl(x,y,ctrl=z)
, the researcher can
simply type names(mod)
to reveal the large variety of outputs produced by ‘npreg’
of the ‘np’ package.
The user can access all of them at will using the dollar notation of R.
This is version 2 ("ll","cv.aic") of a work horse for causal identification.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See kern
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=5) require(np) k1=kern_ctrl(x[,1],x[,2],ctrl=x[,4:5]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
## Not run: set.seed(34);x=matrix(sample(1:600)[1:50],ncol=5) require(np) k1=kern_ctrl(x[,1],x[,2],ctrl=x[,4:5]) print(k1$R2) #prints the R square of the kernel regression ## End(Not run)
Uses Vinod (2015) and runs kernel regression of x on y, and also of y on x by using the ‘np’ package. The function goes on to compute a summary magnitude of the overall approximate partial derivative dx/dy (and dy/dx), after adjusting for units by using an appropriate ratio of standard deviations. Of course, the real partial derivatives of nonlinear functions are generally distinct for each observation.
mag(x, y)
mag(x, y)
x |
Vector of data on the dependent variable |
y |
Vector of data on the regressor |
vector of two magnitudes of kernel regression partials dx/dy and dy/dx.
This function is intended for use only after the direction of causal path
is already determined by various functions in this package (e.g. somePairs
).
For example, if the researcher knows that x causes y, then only
dy/dx denoted by dydx is relevant.
The other output of the function dxdy is to be ignored.
Similarly, only ‘dxdy’ is relevant if y is known to be the cause of x.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
See mag_ctrl
.
set.seed(123);x=sample(1:10);y=1+2*x+rnorm(10) mag(x,y)#dxdy approx=.5 and dydx approx=2 will be nice.
set.seed(123);x=sample(1:10);y=1+2*x+rnorm(10) mag(x,y)#dxdy approx=.5 and dydx approx=2 will be nice.
Uses Vinod (2015) and runs kernel regressions: x~ y + ctrl
and x~ ctrl
to evaluate the ‘incremental change’ in R-squares.
Let (rxy;ctrl) denote the square root of that ‘incremental change’ after its sign is made the
same as that of the Pearson correlation coefficient from
cor(x,y)
). One can interpret (rxy;ctrl) as
a generalized partial correlation coefficient when x is regressed on y after removing
the effect of control variable(s) in ctrl
. It is more general than the usual partial
correlation coefficient, since this one
allows for nonlinear relations among variables.
Next, the function computes ‘dxdy’ obtained by multiplying (rxy;ctrl) by the ratio of
standard deviations, sd(x)/sd(y)
. Now our ‘dxdy’ approximates the magnitude of the
partial derivative (dx/dy) in a causal model where y is the cause and x is the effect.
The function also reports entirely analogous ‘dydx’ obtained by interchanging x and y.
mag_ctrl(x, y, ctrl)
mag_ctrl(x, y, ctrl)
x |
Vector of data on the dependent variable. |
y |
Vector of data on the regressor. |
ctrl |
data matrix for designated control variable(s) outside causal paths. A constant vector is not allowed as a control variable. |
vector of two magnitudes ‘dxdy’ (effect when x is regressed on y) and ‘dydx’ for reverse regression. Both regressions remove the effect of control variable(s).
This function is intended for use only after the causal path direction
is already determined by various functions in this package (e.g. someCPairs
).
That is, after the researcher knows whether x causes y or vice versa.
The output of this function is a vector of two numbers: (dxdy, dydx), in that order,
representing the magnitude of effect of one variable on the other.
We expect the researcher to use only ‘dxdy’ if y is the
known cause, or ‘dydx’ if x is the cause. These approximate overall measures
may not be well-defined in some applications, because
the real partial derivatives of nonlinear functions
are generally distinct for each evaluation point.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C. R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
See mag
set.seed(123);x=sample(1:10); z=runif(10); y=1+2*x+3*z+rnorm(10) options(np.messages=FALSE) mag_ctrl(x,y,z)#dx/dy=0.47 is approximately 0.5, but dy/dx=1.41 is not approx=2,
set.seed(123);x=sample(1:10); z=runif(10); y=1+2*x+3*z+rnorm(10) options(np.messages=FALSE) mag_ctrl(x,y,z)#dx/dy=0.47 is approximately 0.5, but dy/dx=1.41 is not approx=2,
Function to do compute the minor of a matrix defined by row r and column c.
minor(x, r, c)
minor(x, r, c)
x |
The input matrix |
r |
The row number |
c |
The column number |
The appropriate ‘minor’ matrix defined from the input matrix.
This function is needed by the cofactor function.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: x=matrix(1:20,ncol=4) minor(x,1,2) ## End(Not run)
## Not run: x=matrix(1:20,ncol=4) minor(x,1,2) ## End(Not run)
The first step computes mean, std.dev, skewness, kurtosis (kurt),and the Sharpe Ratio (mean/sd) representing risk-adjusted return where sd measures the risk. The input x must be a matrix having p columns (col.names recommended). and n rows as in the data. If data are missing for some columns, insert NA's. Thus x has p column of data matrix ready for comparison and ranking. For example, x has a matrix of stock returns. The output matrix produced by this function has p columns for each data column (i.e. for each stock being compared). The output matrix has twelve rows. Top five rows have the magnitudes of mean, sd, skew, kurt, Sharpe ratios. Output matrix rows 6 to 10 have respective ranks of moment stats. The output 11-th row reports a weighted sum of ranks with following weights mean=1,sd=-1,skew=0.5,kurt=-0.5,Sharpe Ratio=1. User has the option to change the weights. They measure relative importance.
momentVote(mtx, weight = c(1, -1, 0.5, -0.5, 1))
momentVote(mtx, weight = c(1, -1, 0.5, -0.5, 1))
mtx |
n by p matrix of data, For example, n stock returns for p stocks. The mtx columns should have some names (ticker symbols) |
weight |
vector of reliability weights. default: mean=1, sd=-1, skew=0.5,kurt=-0.5,sharpe=1 |
Since skewness and kurtosis are measured relatively less reliably (have greater sampling variation due to higher powers) their weight is 0.5. Our ranking gives the smallest number 1 to the most desirable outcome. The 11-th line of the output matrix has weighted sum of ranks and we suggest higher portfolio weight be given to the column having smallest value (in the bottom line). The 12-th row of output matrix has ‘choice,’ where input weights give the number 1 is for the top choice column of data and all other choice numbers. The (p+1)-th column of the output matrix has the chosen weights. The argument weight to the ‘momentVote’ function allows one to change these weights.
a matrix with same number of columns as in the input matrix x and eleven rows. Top five rows have moment quantities, next five are their ranks the eleventh row has weighted sum of ranks with the input weights (see default) and the 12-th row has choice numbers (choice=1 is best)
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
x1=c(1,4,7,2,6) x2=c(3,4,8,4,7) momentVote(cbind(x1,x2))
x1=c(1,4,7,2,6) x2=c(3,4,8,4,7) momentVote(cbind(x1,x2))
The aim in pair-wise deletions is to retain the largest number of available data pairs with all non-missing data.
napair(x, y)
napair(x, y)
x |
Vector of x data |
y |
Vector of y data |
newx |
A new vector x after removing pairwise missing data |
newy |
A new vector y after removing pairwise missing data |
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: x=sample(1:10);y=sample(1:10);x[2]=NA; y[3]=NA napair(x,y) ## End(Not run)
## Not run: x=sample(1:10);y=sample(1:10);x[2]=NA; y[3]=NA napair(x,y) ## End(Not run)
The aim in three-way deletions is to retain only the largest number of available data triplets with all non-missing data. This works where naTriplet fails (e.g.parcorVecH()). This is called by parcorHijk
naTriple(x, y, z)
naTriple(x, y, z)
x |
Vector of x data |
y |
Vector of y data |
z |
vector or a matrix of additional variable(s) |
newx |
A new vector x after removing triplet-wise missing data |
newy |
A new vector or matrix y after removing triplet-wise missing data |
newz |
A new vector or matrix ctrl after removing triplet-wise missing data |
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: x=sample(1:10);y=sample(1:10);x[2]=NA; y[3]=NA w=sample(2:11) naTriple(x,y,w) ## End(Not run)
## Not run: x=sample(1:10);y=sample(1:10);x[2]=NA; y[3]=NA w=sample(2:11) naTriple(x,y,w) ## End(Not run)
The aim in three-way deletions is to retain only the largest number of available data triplets with all non-missing data.
naTriplet(x, y, ctrl)
naTriplet(x, y, ctrl)
x |
Vector of x data |
y |
Vector of y data |
ctrl |
Data matrix on the control variable(s) kept beyond causal path determinations |
newx |
A new vector x after removing triplet-wise missing data |
newy |
A new vector or matrix y after removing triplet-wise missing data |
newctrl |
A new vector or matrix ctrl after removing triplet-wise missing data |
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
See napair
.
## Not run: x=sample(1:10);y=sample(1:10);x[2]=NA; y[3]=NA w=sample(2:11) naTriplet(x,y,w) ## End(Not run)
## Not run: x=sample(1:10);y=sample(1:10);x[2]=NA; y[3]=NA w=sample(2:11) naTriplet(x,y,w) ## End(Not run)
This is an auxiliary function for ‘gmcmtxBlk.’ It uses
two numerical vectors (x, y) of same length to create two vectors
(xhat, yhat) of fitted values using nonlinear kernel regressions.
It
uses package ‘np’ called by
kern
function to kernel regress x on y, and conversely y on x.
It uses the option ‘residuals=TRUE’ of ‘kern’
NLhat(x, y)
NLhat(x, y)
x |
A column vector of x data |
y |
A column vector of y data |
two vectors named xhat and yhat for fitted values
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
See Also as gmcmtxBlk
.
## Not run: set.seed(34);x=sample(1:15);y=sample(1:15) NLhat(x,y) ## End(Not run)
## Not run: set.seed(34);x=sample(1:15);y=sample(1:15) NLhat(x,y) ## End(Not run)
This function randomly leaves out 5 percent (‘pctOut’=5 by default)
data and finds portfolio choice by seven different
portfolio selection algorithms using the data on the remaining 95 percent (say).
The randomization removes any bias in time series definitions of ‘out-of-sample’ data.
For example, the input to outOFsamp(.)
named ‘mtx’ is a matrix with
p columns for p stocks and n returns. Also, let the maximum number of
stocks admitted to belong in the portfolio be four, or ‘maxChosen=4’.
Now outOFsamp
function computes the returns earned by the
seven portfolio selection algorithms, called
"SD1", "SD2", "SD3", "SD4", "SDAll4", "decile," and "moment," where SDAll4 refers
to a weighted sum of SD1 to SD4 algorithms. Each algorithm provides
a choice ranking of p stocks with choice values 1,2,3,..,p where stock ranked
1 should get the highest portfolio weight.
The outOFsamp
function then calls the
function ‘rank2return,’ which uses these rank choice numbers to the selected
‘maxChosen’ stocks. The allocation is linearly declining. For example, it is
4/10, 3/10, 2/10, and 1/10, with the top choice stock receiving 4/10 of the capital.
Each choice of ‘pctOut’ rows of the ‘mtx’ data yields an outOFsamp return for each
of the seven portfolio selection algorithms. These outOFsamp return
computations are repeated reps
times.
A new random selection of ‘pctOut’ rows (must be 2 or more) of data is made
for each repetition. We set
reps=20 by default. The low default is set
to save processing time in early phases, but we recommend reps=100+.
The final choice of stock-picking algorithm out of seven
is suggested by the one yielding the largest average out-of-sample
return over the ‘reps’ repetitions.'Its standard deviation
measures the variability of performance over the ‘reps’ repetitions.
outOFsamp(mtx, pctOut = 5, reps = 10, seed = 23, maxChosen = 2, verbo = FALSE)
outOFsamp(mtx, pctOut = 5, reps = 10, seed = 23, maxChosen = 2, verbo = FALSE)
mtx |
matrix size n by p of data on n returns from p stocks |
pctOut |
percent of n randomly chosen rows left out as out-of-sample, default=5 percent. One must leave out at least two rows of data |
reps |
number of random repetitions of left-out rows over which we average the out-of-sample performance of a stock-picking algorithm, default reps=20 |
seed |
seed for random number generation, default =23 |
maxChosen |
number of stocks (out of p) with nonzero weights in the portfolio |
verbo |
logical, TRUE means print details, default=FALSE |
a matrix called ‘avgRet’ with seven columns for seven stock-picking algorithms "SD1","SD2","SD3","SD4","SDAll4","decile",and "moment," containing out-of-sample average returns for linearly declining allocation in a portfolio. The user needs to change rank2return() for alternate portfolio allocations.
The traditional time-series out-of-sample leaves out the last few time periods, and estimates the stock-picking model using part of the data time periods. The pandemic of 2019 has revealed that the traditional out-of-sample would have a severe bias in favor of pessimistic stock-picking algorithms. The traditional method is fundamentally flawed since it is sensitive to the trends (ups and downs) in the out-of-sample period. The method proposed here is free from such biases. The stock-picking algorithm recommended by our outOFsamp() is claimed to be robust against such biases.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: x1=c(2,5,6,9,13,18,21,5,11,14,4,7,12,13,6,3,8,1,15,2,10,9) x2=c(3,6,9,12,14,19,27,9,11,2,3,8,1,6,15,10,13,14,5,7,4,12) x3=c(2,6,NA,11,13,25,25,11,9,10,12,6,4,3,2,1,7,8,5,15,14,13) mtx=cbind(x1,x2,x3) mtx=mtx[complete.cases(mtx),] os=outOFsamp(mtx,verbo=FALSE,maxChosen=2, reps=3) apply(os,2,mean) ## End(Not run)
## Not run: x1=c(2,5,6,9,13,18,21,5,11,14,4,7,12,13,6,3,8,1,15,2,10,9) x2=c(3,6,9,12,14,19,27,9,11,2,3,8,1,6,15,10,13,14,5,7,4,12) x3=c(2,6,NA,11,13,25,25,11,9,10,12,6,4,3,2,1,7,8,5,15,14,13) mtx=cbind(x1,x2,x3) mtx=mtx[complete.cases(mtx),] os=outOFsamp(mtx,verbo=FALSE,maxChosen=2, reps=3) apply(os,2,mean) ## End(Not run)
This function randomly leaves out 5 percent (‘pctOut’=5 by default)
data and finds portfolio choice to sell by seven different
portfolio selection algorithms using the data on the remaining 95 percent (say).
The randomization removes any bias in time series definitions of ‘out-of-sample’ data.
For example, the input to outOFsamp(.)
named ‘mtx’ is a matrix with
p columns for p stocks and n returns. Also, let the maximum number of
stocks admitted to belong in the sell portfolio be four, or ‘maxChosen=4’.
Now outOFsamp
function computes the returns earned by the
seven portfolio selection algorithms, called
"SD1", "SD2", "SD3", "SD4", "SDAll4", "decile," and "moment," where SDAll4 refers
to a weighted sum of SD1 to SD4 algorithms. Each algorithm provides
a choice ranking of p stocks with choice values 1,2,3,..,p where stock ranked
p should get the highest portfolio weight. (worst is sold)
The outOFsamp
function then calls the
function ‘rank2sell,’ which uses these rank choice numbers to the selected
‘maxChosen’ stocks. The allocation is linearly declining. For example, it is
1/10, 2/10, 3/10, and 4/10, with the worst return stock
(top choice for selling) receiving highest proportion of the capital
designated for selling.
Each choice of ‘pctOut’ rows of the ‘mtx’ data yields an outOFsamp return for each
of the seven portfolio selection algorithms. These outOFsamp return
computations are repeated reps
times.
A new random selection of ‘pctOut’ rows (must be 2 or more) of data is made
for each repetition. We set
reps=20 by default. The low default is set
to save processing time in early phases, but we recommend reps=100+.
The final choice of stock-selling algorithm out of seven
is suggested by the average out-of-sample return over the ‘reps’ repetitions.
This function is sell version of outOFsamp()
.
outOFsell(mtx, pctOut = 5, reps = 10, seed = 23, maxChosen = 2, verbo = FALSE)
outOFsell(mtx, pctOut = 5, reps = 10, seed = 23, maxChosen = 2, verbo = FALSE)
mtx |
matrix size n by p of data on n returns from p stocks |
pctOut |
percent of n randomly chosen rows left out as out-of-sample, default=5 percent. One must leave out at least two rows of data |
reps |
number of random repetitions of left-out rows over which we average the out-of-sample performance of a stock-picking algorithm, default reps=20 |
seed |
seed for random number generation, default =23 |
maxChosen |
number of stocks (out of p) with nonzero weights in the portfolio |
verbo |
logical, TRUE means print details, default=FALSE |
a matrix called ‘avgRet’ with seven columns for seven stock-picking algorithms "SD1","SD2","SD3","SD4","SDAll4","decile",and "moment," containing out-of-sample average returns for linearly declining allocation in a portfolio. User needs to change rank2sell() for alternate portfolio allocations.
The traditional time-series out-of-sample leaves out the last few time periods, and estimates the stock-picking model using part of the data time periods. The pandemic of 2019 has revealed that the traditional out-of-sample would have a severe bias in favor of pessimistic stock-picking algorithms. The traditional method is fundamentally flawed since it is sensitive to the trends (ups and downs) in the out-of-sample period. The method proposed here is free from such biases. The stock-picking algorithm recommended by our outOFsamp() is claimed to be robust against such biases.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: x1=c(2,5,6,9,13,18,21,5,11,14,4,7,12,13,6,3,8,1,15,2,10,9) x2=c(3,6,9,12,14,19,27,9,11,2,3,8,1,6,15,10,13,14,5,7,4,12) x3=c(2,6,NA,11,13,25,25,11,9,10,12,6,4,3,2,1,7,8,5,15,14,13) mtx=cbind(x1,x2,x3) mtx=mtx[complete.cases(mtx),] os=outOFsell(mtx,verbo=FALSE,maxChosen=2, reps=3) apply(os,2,mean) ## End(Not run)
## Not run: x1=c(2,5,6,9,13,18,21,5,11,14,4,7,12,13,6,3,8,1,15,2,10,9) x2=c(3,6,9,12,14,19,27,9,11,2,3,8,1,6,15,10,13,14,5,7,4,12) x3=c(2,6,NA,11,13,25,25,11,9,10,12,6,4,3,2,1,7,8,5,15,14,13) mtx=cbind(x1,x2,x3) mtx=mtx[complete.cases(mtx),] os=outOFsell(mtx,verbo=FALSE,maxChosen=2, reps=3) apply(os,2,mean) ## End(Not run)
The panel data have a set of time series for each entity (e.g. country) arranged such that all time series data for one entity is together. The data for the second entity should be below the entire data for first entity. When a variable is lagged twice, special care is needed to insert NA's for the first two time points (e.g. weeks) for each entity (country).
Panel2Lag(ID, xj)
Panel2Lag(ID, xj)
ID |
Location of the column having time identities (e.g. the week number) |
xj |
Data on variable to be lagged linked to ID |
Vector containing 2 lagged values of xj.
This function is provided for convenient user modifications.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
A more general function PanelLag
has examples.
Panel data have a set of time series for each entity (e.g. country) arranged such that all time series data for one entity is together, and the data for the second entity should be below the entire data for first entity and so on for entities. In such a data setup, When a variable is lagged once, special care is needed to insert an NA for the first time point in the data (e.g. week) for each entity.
PanelLag(ID, xj, lag = 1)
PanelLag(ID, xj, lag = 1)
ID |
Location of the column having time identities (e.g. week number). |
xj |
Data vector of variable to be lagged and is linked with the ID. |
lag |
Number of lags desired (lag=1 is the default). |
Vector containing one-lagged values of variable xj.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: indiv=gl(6,12,labels=LETTERS[1:6]) #creates A,A,A 12 times B B B also 12 times etc. set.seed(99);cost=sample(30:90, 72, replace=TRUE) revenu=sample(50:110, 72, replace=TRUE); month=rep(1:12,6) df=data.frame(indiv,month,cost,revenu);head(df);tail(df) L2cost=PanelLag(ID=month,xj=df[,'cost'], lag=2) head(L2cost) tail(L2cost) gmcmtx0(cbind(revenu,cost,L2cost)) gmcxy_np(revenu,cost) ## End(Not run)
## Not run: indiv=gl(6,12,labels=LETTERS[1:6]) #creates A,A,A 12 times B B B also 12 times etc. set.seed(99);cost=sample(30:90, 72, replace=TRUE) revenu=sample(50:110, 72, replace=TRUE); month=rep(1:12,6) df=data.frame(indiv,month,cost,revenu);head(df);tail(df) L2cost=PanelLag(ID=month,xj=df[,'cost'], lag=2) head(L2cost) tail(L2cost) gmcmtx0(cbind(revenu,cost,L2cost)) gmcxy_np(revenu,cost) ## End(Not run)
This function uses data on two column vectors, xi, xj and a third xk which can be a vector or a matrix, usually of the remaining variables in the model, including control variables, if any. It first removes missing data from all input variables. Then, it computes residuals of kernel regression (xi on xk) and (xj on xk). The function reports the generalized correlation between two kernel residuals. This version avoids ridge type adjustment present in an older version.
parcor_ijk(xi, xj, xk)
parcor_ijk(xi, xj, xk)
xi |
Input vector of data for variable xi |
xj |
Input vector of data for variable xj |
xk |
Input data for variables in xk, usually control variables |
ouij |
Generalized partial correlation Xi with Xj (=cause) after removing xk |
ouji |
Generalized partial correlation Xj with Xi (=cause) after removing xk |
allowing for control variables.
This function calls kern
,
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
See parcor_linear
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcor_ijk(x[,1], x[,2], x[,3]) ## End(Not run)#'
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcor_ijk(x[,1], x[,2], x[,3]) ## End(Not run)#'
This function uses a generalized correlation matrix R* as input to compute
generalized partial correlations between and
where j can be any one of the remaining
variables. Computation removes the effect of all other variables in the matrix.
The user is encouraged to remove all known irrelevant rows and columns
from the R* matrix before submitting it to this function.
parcor_ijkOLD(x, i, j)
parcor_ijkOLD(x, i, j)
x |
Input a p by p matrix R* of generalized correlation coefficients. |
i |
A column number identifying the first variable. |
j |
A column number identifying the second variable. |
ouij |
Partial correlation Xi with Xj (=cause) after removing all other X's |
ouji |
Partial correlation Xj with Xi (=cause) after removing all other X's |
myk |
A list of column numbers whose effect has been removed |
This function calls minor
, and cofactor
and is called
by parcor_ridge
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gm1=gmcmtx0(x) parcor_ijkOLD(gm1, 2,3) ## End(Not run)#'
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gm1=gmcmtx0(x) parcor_ijkOLD(gm1, 2,3) ## End(Not run)#'
This function uses a symmetric correlation matrix R as input to compute
usual partial correlations between and
where j can be any one of the remaining
variables. Computation removes the effect of all other variables in the matrix.
The user is encouraged to remove all known irrelevant rows and columns
from the R matrix before submitting it to this function.
parcor_linear(x, i, j)
parcor_linear(x, i, j)
x |
Input a p by p matrix R of symmetric correlation coefficients. |
i |
A column number identifying the first variable. |
j |
A column number identifying the second variable. |
ouij |
Partial correlation Xi with Xj after removing all other X's |
ouji |
Partial correlation Xj with Xi after removing all other X's |
myk |
A list of column numbers whose effect has been removed |
This function calls minor
, and cofactor
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
See parcor_ijk
for generalized partial
correlation coefficients useful for causal path determinations.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') c1=cor(x) parcor_linear(c1, 2,3) ## End(Not run)
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') c1=cor(x) parcor_linear(c1, 2,3) ## End(Not run)
This function calls parcor_ijkOLD
function which
uses a generalized correlation matrix R* as input to compute
generalized partial correlations between and
where j can be any one of the remaining
variables. Computation removes the effect of all other variables in the matrix.
It further adjusts the resulting partial correlation coefficients to be in the
appropriate [-1,1] range by using an additive constant in the fashion
of ridge regression.
parcor_ridg(gmc0, dig = 4, idep = 1, verbo = FALSE, incr = 3)
parcor_ridg(gmc0, dig = 4, idep = 1, verbo = FALSE, incr = 3)
gmc0 |
This must be a p by p matrix R* of generalized correlation coefficients. |
dig |
The number of digits for reporting (=4, default) |
idep |
The column number of the first variable (=1, default) |
verbo |
Make this TRUE for detailed printing of computational steps |
incr |
incremental constant for iteratively adjusting ‘ridgek’
where ridgek is the constant times the identity matrix used to
make sure that the gmc0 matrix is positive definite. If not iteratively
increas the |
A five column ‘out’ matrix containing partials. The first column
has the name of the idep
variable. The
second column has the name of the j variable, while the third column has r*(i,j | k).
The 4-th column has r*(j,i | k) (denoted partji), and the 5-th column has rijMrji,
that is the difference in absolute values (abs(partij) - abs(partji)).
The ridgek constant created by the function during the first round
may not be large enough to make sure that
that other pairs of r*(i,j | k) are within the [-1,1] interval. The user may have to choose
a suitably larger input incr
to get all relevant partial
correlation coefficients in the correct [-1,1] interval.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. "A Survey of Ridge Regression and Related Techniques for Improvements over Ordinary Least Squares," Review of Economics and Statistics, Vol. 60, February 1978, pp. 121-131.
See Also parcor_ijkOLD
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) g1=gmcmtx0(mtx) parcor_ijkOLD(g1,1,2) # ouji> ouij implies i=x is the cause of j=y parcor_ridg(g1,idep=1) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gm1=gmcmtx0(x) parcor_ridg(gm1, idep=1) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) g1=gmcmtx0(mtx) parcor_ijkOLD(g1,1,2) # ouji> ouij implies i=x is the cause of j=y parcor_ridg(g1,idep=1) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gm1=gmcmtx0(x) parcor_ridg(gm1, idep=1) ## End(Not run)
This function uses data on two column vectors, xi, xj and a third xk which can be a vector or a matrix, usually of the remaining variables in the model, including control variables, if any. It first removes missing data from all input variables. Then, it computes residuals of kernel regression (xi on xk) and (xj on xk). This is a block version of parcor_ijk.
parcorBijk(xi, xj, xk, blksiz = 10)
parcorBijk(xi, xj, xk, blksiz = 10)
xi |
Input vector of data for variable xi |
xj |
Input vector of data for variable xj |
xk |
Input data for variables in xk, usually control variables |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
ouij |
Generalized partial correlation Xi with Xj (=cause) after removing xk |
ouji |
Generalized partial correlation Xj with Xi (=cause) after removing xk |
allowing for control variables.
This function calls kern
,
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
See parcor_ijk
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcorBijk(x[,1], x[,2], x[,3], blksi=10) ## End(Not run)#'
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcorBijk(x[,1], x[,2], x[,3], blksi=10) ## End(Not run)#'
This function calls a block version parcorBijk
of the function which
uses original data to compute
generalized partial correlations between and
where j can be any one of the remaining
variables in the input matrix
mtx
. Partial correlations remove the effect of
variables other than
and
. Calculation further
allows for the presence of control variable(s) (if any) to remain always outside
the input matrix and whose effect is also removed in computing partial correlations.
parcorBMany(mtx, ctrl = 0, dig = 4, idep = 1, blksiz = 10, verbo = FALSE)
parcorBMany(mtx, ctrl = 0, dig = 4, idep = 1, blksiz = 10, verbo = FALSE)
mtx |
Input data matrix with at least 3 columns. |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0 when control variables are absent |
dig |
The number of digits for reporting (=4, default) |
idep |
The column number of the dependent variable (=1, default) |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
verbo |
Make this TRUE for detailed printing of computational steps |
A five column ‘out’ matrix containing partials. The first column
has the name of the idep
variable. The
second column has the name of the j variable, while the third column
has partial correlation coefficients r*(i,j | k).The last column
reports the absolute difference between two partial correlations.
This function reports all partial correlation coefficients, while avoiding ridge type adjustment.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
See Also parcor_ijk
, parcorMany
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorBMany(mtx, blksiz=10) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') parcorBMany(x, idep=1) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorBMany(mtx, blksiz=10) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') parcorBMany(x, idep=1) ## End(Not run)
This function uses data on two column vectors, xi, xj, and a third set xk, which can be a vector or a matrix. xk usually has the remaining variables in the model, including control variables, if any. This function first removes missing data from all input variables. Then, it computes residuals of OLS (no kernel) regression (xi on xk) and (xj on xk). This hybrid version uses both OLS and then generalized correlation among OLS residuals. This solves the potential problem of having too little information content in kernel regression residuals, since kernel fits are sometimes too close, especially when there are many variables in xk.
parcorHijk(xi, xj, xk)
parcorHijk(xi, xj, xk)
xi |
Input vector of data for variable xi |
xj |
Input vector of data for variable xj |
xk |
Input data for all variables in xk, usually control variables |
ouij |
Generalized partial correlation Xi with Xj (=cause) after removing xk |
ouji |
Generalized partial correlation Xj with Xi (=cause) after removing xk |
allowing for control variables.
This function calls kern
,
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
See parcor_ijk
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcorHijk(x[,1], x[,2], x[,3]) ## End(Not run)#'
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcorHijk(x[,1], x[,2], x[,3]) ## End(Not run)#'
The 2 in the name of the function means second version. The H in the function name means hybrid. This removes the effect of Xk, via OLS regression residuals. This function uses data on two column vectors, xi, xj, and a third set xk, which can be a vector or a matrix, usually of the remaining variables in the model, including control variables, if any. It first removes missing data from all input variables. Then, it computes residuals of OLS regression (xi on xk) and (xj on xk). The function reports the generalized correlation between two OLS residuals. This hybrid version uses both OLS and then generalized correlation among OLS residuals. This second version works when 'parcorVecH' fails. It is called by the function ‘parcorVecH2’.
parcorHijk2(xi, xj, xk)
parcorHijk2(xi, xj, xk)
xi |
Input vector of data for variable xi |
xj |
Input vector of data for variable xj |
xk |
Input data for variables in xk, usually control variables |
ouij |
Generalized partial correlation Xi with Xj (=cause) after removing xk |
ouji |
Generalized partial correlation Xj with Xi (=cause) after removing xk |
allowing for control variables.
This function calls kern
,
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
See parcor_ijk
.
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcorHijk2(x[,1], x[,2], x[,3]) ## End(Not run)#'
## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) options(np.messages=FALSE) parcorHijk2(x[,1], x[,2], x[,3]) ## End(Not run)#'
This function calls parcor_ijk
function which
uses original data to compute
generalized partial correlations between and
where j can be any one of the remaining
variables in the input matrix
mtx
. Partial correlations remove the effect of
variables other than
and
. Calculation further
allows for the presence of control variable(s) (if any) to remain always outside
the input matrix and whose effect is also removed in computing partial correlations.
parcorMany(mtx, ctrl = 0, dig = 4, idep = 1, verbo = FALSE)
parcorMany(mtx, ctrl = 0, dig = 4, idep = 1, verbo = FALSE)
mtx |
Input data matrix with at least 3 columns. |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0 when control variables are absent |
dig |
The number of digits for reporting (=4, default) |
idep |
The column number of the first variable (=1, default) |
verbo |
Make this TRUE for detailed printing of computational steps |
A five column ‘out’ matrix containing partials. The first column
has the name of the idep
variable. The
second column has the name of the j variable, while the third column
has partial correlation coefficients r*(i,j | k). The last column
reports the absolute difference between two partial correlations.
This function reports all partial correlation coefficients, while avoiding ridge type adjustment.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
See Also parcor_ijk
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorMany(mtx) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') parcorMany(x, idep=1) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorMany(mtx) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') parcorMany(x, idep=1) ## End(Not run)
This function calls parcor_ijk
function which
uses original data to compute
generalized partial correlations between and
where j can be any one of the remaining
variables in the input matrix
mtx
. Partial correlations remove the effect of
variables other than
and
. Calculation further
allows for the presence of control variable(s) (if any) to remain always outside
the input matrix and whose effect is also removed in computing partial correlations.
parcorMtx(mtx, ctrl = 0, dig = 4, verbo = FALSE)
parcorMtx(mtx, ctrl = 0, dig = 4, verbo = FALSE)
mtx |
Input data matrix with p columns. p is at least 3 columns. |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0 when control variables are absent |
dig |
The number of digits for reporting (=4, default) |
verbo |
Make this TRUE for detailed printing of computational steps |
A p by p ‘out’ matrix containing partials r*(i,j | k). and r*(j,i | k).
We want to get all partial correlation coefficient pairs removing other column effects. Vinod (2018) shows why one needs more than one criterion to decide the causal paths or exogeneity.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New Exogeneity Tests and Causal Paths,' (June 30, 2018). Available at SSRN: https://www.ssrn.com/abstract=3206096
See Also parcor_ijk
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorMtx(mtx) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') parcorMtx(x) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorMtx(mtx) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') parcorMtx(x) ## End(Not run)
This function calls parcor_ijkOLD
function which
uses a generalized correlation matrix R* as input to compute
generalized partial correlations between and
where j can be any one of the remaining
variables. Computation removes the effect of all other variables in the matrix.
It further adjusts the resulting partial correlation coefficients to be in the
appropriate [-1,1] range by using an additive constant in the fashion of ridge regression.
parcorSilent(gmc0, dig = 4, idep = 1, verbo = FALSE, incr = 3)
parcorSilent(gmc0, dig = 4, idep = 1, verbo = FALSE, incr = 3)
gmc0 |
This must be a p by p matrix R* of generalized correlation coefficients. |
dig |
The number of digits for reporting (=4, default) |
idep |
The column number of the first variable (=1, default) |
verbo |
Make this TRUE for detailed printing of computational steps |
incr |
incremental constant for iteratively adjusting ‘ridgek’
where ridgek is the constant times the identity matrix used to
make sure that the gmc0 matrix is positive definite. If not, this function iteratively
increases the |
A five column ‘out’ matrix containing partials. The first column
has the name of the idep
variable. The
second column has the name of the j variable, while the third column has r*(i,j | k).
The 4-th column has r*(j,i | k) (denoted partji), and the 5-th column has rijMrji,
that is the difference in absolute values (abs(partij) - abs(partji)).
The ridgek constant created by the function during the first round
may not be large enough to make sure that
that other pairs of r*(i,j | k) are within the [-1,1] interval. The user may have to choose
a suitably larger input incr
to get all relevant partial
correlation coefficients in the correct [-1,1] interval.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. "A Survey of Ridge Regression and Related Techniques for Improvements over Ordinary Least Squares," Review of Economics and Statistics, Vol. 60, February 1978, pp. 121-131.
See Also parcor_ijk
for a better version using original data as input.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) g1=gmcmtx0(mtx) parcor_ijkOLD(g1,1,2) # ouji> ouij implies i=x is the cause of j=y parcor_ridg(g1,idep=1) parcorSilent(g1,idep=1) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gm1=gmcmtx0(x) parcorSilent(gm1, idep=1) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) g1=gmcmtx0(mtx) parcor_ijkOLD(g1,1,2) # ouji> ouij implies i=x is the cause of j=y parcor_ridg(g1,idep=1) parcorSilent(g1,idep=1) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3') gm1=gmcmtx0(x) parcorSilent(gm1, idep=1) ## End(Not run)
This function calls parcor_ijk
function which
uses original data to compute
generalized partial correlations between , the dependent variable,
and
which is the current regressor of interest. Note that
j can be any one of the remaining
variables in the input matrix
mtx
. Partial correlations remove the effect of
variables other than
and
.
Calculation merges control variable(s) (if any) into
.
Let the remainder effect
from kernel regressions of
on
equal the
residuals u*(i,k). Analogously define u*(j,k). (asterisk for kernel regressions)
Now partial correlation is generalized correlation
between u*(i,k) and u*(j,k).
Calculation merges control variable(s) (if any) into
.
parcorVec(mtx, ctrl = 0, verbo = FALSE, idep = 1)
parcorVec(mtx, ctrl = 0, verbo = FALSE, idep = 1)
mtx |
Input data matrix with p (> or = 3) columns |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0 when control variables are absent |
verbo |
Make this TRUE for detailed printing of computational steps |
idep |
The column number of the dependent variable (=1, default) |
A p by 1 ‘out’ vector containing partials r*(i,j | k).
Generalized Partial Correlation Coefficients (GPCC) allow comparison of
the relative contribution of each to the explanation of
,
because GPCC are scale-free pure numbers
We want to get all partial correlation coefficient pairs removing other column effects. Vinod (2018) shows why one needs more than one criterion to decide the causal paths or exogeneity.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New Exogeneity Tests and Causal Paths,' (June 30, 2018). Available at SSRN: https://www.ssrn.com/abstract=3206096
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
See Also parcor_ijk
.
See Also a hybrid version parcorVecH
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorVec(mtx) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3')#some names needed parcorVec(x) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorVec(mtx) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3')#some names needed parcorVec(x) ## End(Not run)
This is a hybrid version of parcorVec subtracting only the linear effects (OLS residuals instead of kernel regression residuals), but using the generalized correlation between the OLS residuals for the last stage of the generalized partial correlation.
parcorVecH(mtx, ctrl = 0, dig = 4, verbo = FALSE, idep = 1)
parcorVecH(mtx, ctrl = 0, dig = 4, verbo = FALSE, idep = 1)
mtx |
Input data matrix with p (> or = 3) columns, the first column must have the dependent variable |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0 when control variables are absent |
dig |
The number of digits for reporting (=4, default) |
verbo |
Make this TRUE for detailed printing of computational steps |
idep |
The column number of the dependent variable (=1, default) |
This function calls parcor_ijk
function, which
uses original data to compute
generalized partial correlations between , the dependent variable,
and
, which is the current regressor of interest. Note that
j can be any one of the remaining
variables in the input matrix
mtx
. Partial correlations remove the effect of
variables other than
and
.
Calculation merges control variable(s) (if any) into
.
Let the remainder effect
from OLS regressions of
on
equal the
residuals u(i,k). Analogously define u(j,k). It is a hybrid of OLS and generalized.
Finally, partial correlation is generalized (kernel) correlation
between u(i,k) and u(j,k).
A p by 1 ‘out’ vector containing hybrid partials r*(i,j | k).
Hybrid Generalized Partial Correlation Coefficients
(HGPCC) allow comparison of
the relative contribution of each to the explanation of
,
because HGPCC has scale-free pure numbers.
We want to get all partial correlation coefficient pairs removing other column effects. Vinod (2018) shows why one needs more than one criterion to decide the causal paths or exogeneity.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New Exogeneity Tests and Causal Paths,' (June 30, 2018). Available at SSRN: https://www.ssrn.com/abstract=3206096
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
See Also parcor_ijk
.
See Also parcorVec
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorVecH(mtx) ## Not run: set.seed(34);mtx=matrix(sample(1:600)[1:80],ncol=4) colnames(mtx)=c('V1', 'v2', 'V3', 'V4') parcorVecH(mtx,verbo=TRUE, idep=2) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorVecH(mtx) ## Not run: set.seed(34);mtx=matrix(sample(1:600)[1:80],ncol=4) colnames(mtx)=c('V1', 'v2', 'V3', 'V4') parcorVecH(mtx,verbo=TRUE, idep=2) ## End(Not run)
This is a second version to be used when ‘parcorVecH’ fails. (H=hybrid). This hybrid version of parcorVec subtracting only linear effects but using generlized correlation between OLS residuals
parcorVecH2(mtx, dig = 4, verbo = FALSE, idep = 1)
parcorVecH2(mtx, dig = 4, verbo = FALSE, idep = 1)
mtx |
Input data matrix with p (> or = 3) columns, first column must have the dependent variable |
dig |
The number of digits for reporting (=4, default) |
verbo |
Make this TRUE for detailed printing of computational steps |
idep |
The column number of the dependent variable (=1, default) |
This function calls parcorHijk2
function which
uses original data to compute
generalized partial correlations between , the dependent variable,
and
which is the current regressor of interest. Note that
j can be any one of the remaining
variables in the input matrix
mtx
. Partial correlations remove the effect of
variables other than
and
.
Calculation merges control variable(s) (if any) into
.
Let the remainder effect
from OLS regressions of
on
equal the
residuals u(i,k). Analogously define u(j,k). It is a hybrid of OLS and generalized.
Finally, partial correlation is generalized (kernel) correlation
between u(i,k) and u(j,k).
A p by 1 ‘out’ vector containing hybrid partials r*(i,j | k).
Hybrid Generalized Partial Correlation Coefficients
(HGPCC) allow comparison of
the relative contribution of each to the explanation of
,
because HGPCC are scale-free pure numbers.
We want to get all partial correlation coefficient pairs removing other column effects. Vinod (2018) shows why one needs more than one criterion to decide the causal paths or exogeneity.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New Exogeneity Tests and Causal Paths,' (June 30, 2018). Available at SSRN: https://www.ssrn.com/abstract=3206096
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
See Also parcor_ijk
.
See Also parcorVec
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorVecH2(mtx) ## Not run: set.seed(34);mtx=matrix(sample(1:600)[1:80],ncol=4) colnames(mtx)=c('V1', 'v2', 'V3', 'V4') parcorVecH2(mtx,verbo=TRUE, idep=2) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) parcorVecH2(mtx) ## Not run: set.seed(34);mtx=matrix(sample(1:600)[1:80],ncol=4) colnames(mtx)=c('V1', 'v2', 'V3', 'V4') parcorVecH2(mtx,verbo=TRUE, idep=2) ## End(Not run)
Maximum entropy bootstrap (‘meboot’) package is used for statistical inference
regarding which equals GMC(X|Y)-GMC(Y|X) defined by Zheng et al (2012).
The bootstrap provides an approximation to chances of correct determination of
the causal direction.
pcause(x, y, n999 = 999)
pcause(x, y, n999 = 999)
x |
Vector of x data |
y |
Vector of y data |
n999 |
Number of bootstrap replications (default=999) |
P(cause) the bootstrap proportion of correct causal determinations.
'pcause' is computer intensive and generally slow. It is better to use it at a later stage in the investigation when a preliminary causal determination is already made. Its use may slow the exploratory phase. In my experience, if P(cause) is less than 0.55, there is a cause for concern.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Zheng, S., Shi, N.-Z., and Zhang, Z. (2012). Generalized measures of correlation for asymmetry, nonlinearity, and beyond. Journal of the American Statistical Association, vol. 107, pp. 1239-1252.
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
## Not run: set.seed(34);x=sample(1:10);y=sample(2:11) pcause(x,y,n999=29) data('EuroCrime') attach(EuroCrime) pcause(crim,off,n999=29) ## End(Not run)
## Not run: set.seed(34);x=sample(1:10);y=sample(2:11) pcause(x,y,n999=29) data('EuroCrime') attach(EuroCrime) pcause(crim,off,n999=29) ## End(Not run)
Given data on (x, y, z) coordinate values of a 3D surface, one can directly plot a 3D plot with pins of the height z. By contrast, this function fattens each pin by creating pillars near each z value by adding and subtracting small amounts of dz. By eliminating the pins of the height z, this depicts pillars that better resemble a surface. It uses the wireframe() function of the ‘lattice’ package to do the plotting.
pillar3D( z = c(657, 936, 1111, 1201), x = c(280, 542, 722, 1168), y = c(162, 214, 186, 246), drape = TRUE, xlab = "y", ylab = "x", zlab = "z", mymain = "Pillar Chart" )
pillar3D( z = c(657, 936, 1111, 1201), x = c(280, 542, 722, 1168), y = c(162, 214, 186, 246), drape = TRUE, xlab = "y", ylab = "x", zlab = "z", mymain = "Pillar Chart" )
z |
z-coordinate values |
x |
x-coordinate values |
y |
y-coordinate values |
drape |
logical value, default drape=TRUE to give color to heights |
xlab |
default "x" label on the x axis |
ylab |
default "y" label on the y axis |
zlab |
default "z" label on the z axis |
mymain |
default "Pillar Chart" main label on the plot |
For additional plotting features the user should type ‘pillar3D()’ on the R console to get my code and adjust my wireframe() function defaults.
A 3D plot
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: pillar3D()) ## End(Not run)
## Not run: pillar3D()) ## End(Not run)
Computes cumulative probabilities and difference between consecutive cumulative probabilities described in Vinod (2008) textbook. This is a simpler version of the version in the book without mapping to non-expected utility theory weights as explained in Vinod (2008).
prelec2(n)
prelec2(n)
n |
A (usually small) integer. |
x |
sequence 1:n |
p |
probabilities p= x[i]/n |
pdif |
consecutive differences p[i] - p[i - 1] |
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Hands-On Intermediate Econometrics Using R' (2008) World Scientific Publishers: Hackensack, NJ. https://www.worldscientific.com/worldscibooks/10.1142/12831
## Not run: prelec2(10)
## Not run: prelec2(10)
If there are p columns of data, probSign
produces a p-1 by 1 vector
of probabilities of correct signs assuming that the mean of n999 values
has the correct sign and assuming that m of the 'sum' index values inside the
range [-tau, tau] are neither positive nor negative but
indeterminate or ambiguous (being too close to zero). That is,
the denominator of P(+1) or P(-1) is (n999-m) if m signs are too close to zero.
probSign(out, tau = 0.476)
probSign(out, tau = 0.476)
out |
output from bootPairs with p-1 columns and n999 rows |
tau |
threshold to determine what value is too close to zero, default tau=0.476 is equivalent to 15 percent threshold for the unanimity index ui |
sgn When mtx
has p columns, sgn
reports pairwise p-1 signs representing
(fixing the first column in each pair)
the average sign after averaging the
output of of bootPairs(mtx)
(a n999 by p-1 matrix)
each containing resampled ‘sum’ values summarizing the weighted sums
associated with all three criteria from the
function silentPairs(mtx)
applied to each bootstrap sample separately. #'
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. and Lopez-de-Lacalle, J. (2009). 'Maximum entropy bootstrap for time series: The meboot R package.' Journal of Statistical Software, Vol. 29(5), pp. 1-19.
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See Also silentPairs
.
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) probSign(bb,tau=0.476) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) probSign(bb,tau=0.476)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. probSign(bb,tau=0.476)#signs for n999 bootstrap sum computations ## End(Not run)
## Not run: options(np.messages = FALSE) set.seed(34);x=sample(1:10);y=sample(2:11) bb=bootPairs(cbind(x,y),n999=29) probSign(bb,tau=0.476) #gives summary stats for n999 bootstrap sum computations bb=bootPairs(airquality,n999=999);options(np.messages=FALSE) probSign(bb,tau=0.476)#signs for n999 bootstrap sum computations data('EuroCrime') attach(EuroCrime) bb=bootPairs(cbind(crim,off),n999=29) #col.1= crim causes off #hence positive signs are more intuitively meaningful. #note that n999=29 is too small for real problems, chosen for quickness here. probSign(bb,tau=0.476)#signs for n999 bootstrap sum computations ## End(Not run)
This function computes the return earned knowing the rank of a stock in the input mtx of stock returns. For example, mtx has p=28 Dow Jones stocks over n=169 monthly returns. Portfolio weights are assumed to be linearly declining. If maxChosen=4, the weights are 4/10, 3/10, 2/10 and 1/10, which add up to unity. These portfolio weights are assigned in reverse order in the sense that first chosen stock (choice rank =1) gets portfolio weight=4/10. The function computes return from the stocks using the ‘myrank’ argument.
rank2return(mtx, myrank, maxChosen = 0, pctChoose = 20, verbo = FALSE)
rank2return(mtx, myrank, maxChosen = 0, pctChoose = 20, verbo = FALSE)
mtx |
a matrix with n rows (number of returns) p columns (number of stocks) |
myrank |
vector of p integers listing the rank of each stock, 1=best |
maxChosen |
number of stocks in the portfolio (with nonzero weights) default=0. When maxChosen=0, we let pctChoose determine the maxChosen |
pctChoose |
percent of p stocks chosen inside the portfolio, default=20 |
verbo |
logical if TRUE, print, default=TRUE |
average return from the linearly declining portfolio implied by the myrank vector.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
outOFsell()
. This is a sell version of rank2return()
.Compute the portfolio return knowing the rank of a stock in the input ‘mtx’.
This function computes the return earned knowing the rank of a stock
computed elsewhere and named myrank associate with the data columns in
the input mtx of stock returns. For example, mtx has p=28 Dow Jones stocks
over n=169 monthly returns. Portfolio weights are assumed to be linearly
declining. If maxChosen=4, the weights are 1/10, 2/10, 3/10 and 4/10, which add
up to unity. These portfolio weights are assigned in their order
in the sense that first chosen stock (choice rank =p) gets portfolio weight=4/10.
The function computes return from the stocks using the ‘myrank’ argument.
This helps in assessing out-of-sample performance of (short)
the strategy of selling lowest ranking stocks. It is mostly for internal use
by outOFsell()
. This is a sell version of rank2return()
.
rank2sell(mtx, myrank, maxChosen = 0, pctChoose = 20, verbo = FALSE)
rank2sell(mtx, myrank, maxChosen = 0, pctChoose = 20, verbo = FALSE)
mtx |
a matrix with n rows (number of returns) p columns (number of stocks) |
myrank |
vector of p integers listing the rank of each stock, 1=best |
maxChosen |
number of stocks in the portfolio (with nonzero weights) default=0. When maxChosen=0, we let pctChoose determine the maxChosen |
pctChoose |
percent of p stocks chosen inside the portfolio, default=20 |
verbo |
logical if TRUE, print, default=TRUE |
average return from the linearly declining portfolio implied by the myrank vector.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Uses Vinod (2015) definition of generalized (asymmetric) correlation coefficients. It requires kernel regression of x on y obtained by using the ‘np’ package. It also reports usual Pearson correlation coefficient r and p-value for testing the null hypothesis that (population r)=0.
rstar(x, y)
rstar(x, y)
x |
Vector of data on the dependent variable |
y |
Vector of data on the regressor |
Four objects created by this function are:
corxy |
r*x|y or regressing x on y |
coryx |
r*y|x or regressing y on x |
pearson.r |
Pearson's product moment correlation coefficient |
pv |
The p-value for testing the Pearson r |
This function needs the kern function which in turn needs the np package.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
See Also gmcmtx0
and gmcmtxBlk
.
x=sample(1:30);y=sample(1:30); rstar(x,y)
x=sample(1:30);y=sample(1:30); rstar(x,y)
Allowing input matrix of control variables and missing data, this function produces a
p by p matrix summarizing the results, where the estimated signs of
stochastic dominance order values (+1, 0, –1) are weighted by
wt=c(1.2, 1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, –1).
Final weighted index is always in the range [–3.175, 3.175]. It is converted
to the more intuitive range [–100, 100].
silentMtx(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
silentMtx(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
mtx |
The data matrix with p columns. Denote x1 as the first column which is fixed and then paired with all other columns, say: x2, x3, .., xp, one by one for the purpose of flipping with x1. p must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. Why are higher moment
estimates less reliable? The
higher power of the deviations from the mean needed in their computations
lead to greater sampling variability.
The summary results for all
three criteria are reported in a vector of numbers internally called crall
:
With p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2 and Cr3. Note that sg1 and sg2 themselves are weighted signs using
weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely a negative sign
in the i-th location of the ‘sum’ output of this function means that
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
This function is a summary of someCPairs
allowing for control variables.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying a high unanimity strength.
The index 3.175 is the highest.
The positive sign of the index suggests that ‘crim’
variable in the first column of the matrix input to this function kernel causes
‘off’ in the second column of the matrix argument mtx
to this function.
Interpretation of the output matrix produced by this function is as follows. A negative index means the variable named in the column kernel-causes the variable named in the row. A positive index means the row name variable kernel-causes the column name variable. The abs(index) measures unanimity by three criteria, Cr1 to Cr3 representing the strength of evidence for the identified causal path.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See silentPairs
.
See someCPairs
, some0Pairs
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentMtx(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentMtx(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentMtx(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentMtx(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Allowing input matrix of control variables and missing data, this function produces a
p by p matrix summarizing the results, where the estimated signs of
stochastic dominance order values (+1, 0, –1) are weighted by
wt=c(1.2, 1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, –1).
Final weighted index is always in the range [–3.175, 3.175]. It is converted
to the more intuitive range [–100, 100].
silentMtx0(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
silentMtx0(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
mtx |
The data matrix with p columns. Denote x1 as the first column which is fixed and then paired with all other columns, say: x2, x3, .., xp, one by one for the purpose of flipping with x1. p must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. Why are higher moment
estimates less reliable? The
higher power of the deviations from the mean needed in their computations
lead to greater sampling variability.
The summary results for all
three criteria are reported in a vector of numbers internally called crall
:
With p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2 and Cr3. Note that sg1 and sg2 themselves are weighted signs using
weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely a negative sign
in the i-th location of the ‘sum’ output of this function means that
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
This function
allows for control variables.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying a high unanimity strength.
The index 3.175 is the highest.
The positive sign of the index suggests that ‘crim’
variable in the first column of the matrix input to this function kernel causes
‘off’ in the second column of the matrix argument mtx
to this function.
Interpretation of the output matrix produced by this function is as follows. A negative index means the variable named in the column kernel-causes the variable named in the row. A positive index means the row name variable kernel-causes the column name variable. The abs(index) measures unanimity by three criteria, Cr1 to Cr3 representing the strength of evidence for the identified causal path.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See silentPairs0
using older Cr1 criterion based
on kernel regression local gradients.
See someCPairs
, some0Pairs
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentMtx0(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) ## Not run: options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentMtx0(mtx=cbind(x2,y2), ctrl=cbind(z,w2)) ## End(Not run)
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentMtx0(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) ## Not run: options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentMtx0(mtx=cbind(x2,y2), ctrl=cbind(z,w2)) ## End(Not run)
This function uses flipped kernel regressions to decide causal directions. This version 2 avoids Anderson's trapezoidal approximation used in ‘silenPairs.’ It calls functions: decileVote, momentVote, exactSdMtx, and summaryRank after stochastic dominance is computed. It computes an average of ranks used. The column with the “choice” rank value helps in choosing the flip having the lowest Hausman-Wu (residual times RHS regressor) and secondly the lowest absolute residual. The chosen flipped regression defines the “cause" based on the variable on its right-hand side. In portfolio selection, choice rank 1 has the highest return. Here we want low residuals and low Hausman-Wu value, hence we choose choice=2 as the desirable flip.
The function develops a unanimity index regarding the particular flip (y on xi) or (xi on y) is best. A summary of all relevant signs determines the causal direction and unanimity index among three criteria. The ‘2’ in the name of the function suggests a second implementation where exact stochastic dominance, decileVote, and momentVote algorithms are used.
silentPair2(mtx, ctrl = 0, dig = 6)
silentPair2(mtx, ctrl = 0, dig = 6)
mtx |
The data matrix with p columns. Denote x1 as the first column, which is fixed in all rows of the output and then it is paired with all other columns, say: x2, x3, .., xp, one by one for the purpose of flipping with x1. p must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths, default is ctrl=0, which means that there are no control variables used. |
dig |
Number of digits for reporting (default |
A matrix with p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2, and Cr3. Note that sg1 and sg2 themselves are weighted signs using
a weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely, a negative sign
in the i-th location of the ‘sum’ output of this function means
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
The European Crime data has all three criteria correctly suggesting that a
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying the highest unanimity strength index,
with the positive sign suggesting ‘crim’ in the first column kernel causes
‘off’ in the second column of the argument mtx
to this function.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See summaryRank
, decileVote
See momentVote
, exactSdMtx
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentPair2(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentPair2(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentPair2(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentPair2(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Allowing input matrix of control variables and missing data, this function produces a
3 column matrix summarizing the results where the estimated signs of
stochastic dominance order values (+1, 0, -1) are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, -1),
always in the range [–3.175, 3.175].
silentPairs(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
silentPairs(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
mtx |
The data matrix with p columns. Denote x1 as the first column which is fixed and then paired with all other columns, say: x2, x3, .., xp, one by one for the purpose of flipping with x1. p must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths default ctrl=0 which means that there are no control variables used. |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. The source of slightly declining sampling
unreliability of higher moments is the
higher power of the deviations from the mean needed in their computations.
The summary results for all
three criteria are reported in a vector of numbers internally called crall
:
With p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2 and Cr3. Note that sg1 and sg2 themselves are weighted signs using
weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely a negative sign
in the i-th location of the ‘sum’ output of this function means that
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying the highest unanimity strength index,
with the positive sign suggesting ‘crim’ in the first column kernel causes
‘off’ in the second column of the argument mtx
to this function.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See someCPairs
, some0Pairs
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentPairs(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentPairs(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentPairs(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentPairs(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Allowing input matrix of control variables and missing data, this function produces a
3 column matrix summarizing the results where the estimated signs of
stochastic dominance order values (+1, 0, -1) are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, -1),
always in the range [–3.175, 3.175].
silentPairs0(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
silentPairs0(mtx, ctrl = 0, dig = 6, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4)
mtx |
The data matrix with p columns. Denote x1 as the first column which is fixed and then paired with all other columns, say: x2, x3, .., xp, one by one for the purpose of flipping with x1. p must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths default ctrl=0 which means that there are no control variables used. |
dig |
Number of digits for reporting (default |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
This uses an older version of the first criterion Cr1 based on absolute
values of local gradients of kernel regressions, not absolute
Hausman-Wu statistic (RHS variable times kernel residuals).
It calls abs_stdapd
and abs_stdapdC
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. The source of slightly declining sampling
unreliability of higher moments is the
higher power of the deviations from the mean needed in their computations.
The summary results for all
three criteria are reported in a vector of numbers internally called crall
:
With p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2 and Cr3. Note that sg1 and sg2 themselves are weighted signs using
weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely a negative sign
in the i-th location of the ‘sum’ output of this function means that
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
This function is a summary of someCPairs
allowing for control variables.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying the highest unanimity strength index,
with the positive sign suggesting ‘crim’ in the first column kernel causes
‘off’ in the second column of the argument mtx
to this function.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See someCPairs
, some0Pairs
See silentPairs
for newer version using
more direct Hausman-Wu exogeneity test statistic.
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentPairs0(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentPairs0(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) silentPairs0(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA silentPairs0(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Block version allows a new bandwidth (chosen by the np package)
while fitting kernel regressions for each block of data. This may
not be appropriate in all situations. Block size is flexible.
The function develops a unanimity index regarding the particular
flip (y on xi) or (xi on y) is best. Relevant signs determine the
causal direction and unanimity index among three criteria.
The ‘2’ in the name of the function suggests a second implementation
where exact stochastic dominance, decileVote, and momentVote are used.
It avoids Anderson's trapezoidal approximation.
The summary results for all
three criteria are reported in a vector of numbers
internally called crall
.
siPair2Blk(mtx, ctrl = 0, dig = 6, blksiz = 10)
siPair2Blk(mtx, ctrl = 0, dig = 6, blksiz = 10)
mtx |
The data matrix with p columns. Denote x1 as the first column, which is fixed and then paired with all other columns, say: x2, x3, .., xp, one by one flipping with x1.The number of columns, p, must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths. The default ctrl=0 means that there are no control variables used. |
dig |
Number of digits for reporting (default |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in the matrix, then blksiz=n. That is, no blocking is done |
With p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2 and Cr3. Note that sg1 and sg2 themselves are weighted signs using
the weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely, a negative sign
in the i-th location of the ‘sum’ output of this function means that
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying the highest unanimity strength index,
with the positive sign suggesting ‘crim’ in the first column kernel causes
‘off’ in the second column of the argument mtx
to this function.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See someCPairs
, compPortfo
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) siPair2Blk(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA siPair2Blk(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) siPair2Blk(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA siPair2Blk(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
Allowing input matrix of control variables and missing data, this function produces a
3 column matrix summarizing the results where the estimated signs of
stochastic dominance order values (+1, 0, -1) are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by a weighted sum for
the criteria Cr1 and Cr2 and added to the Cr3 estimate as: (+1, 0, -1),
always in the range [–3.175, 3.175].
siPairsBlk( mtx, ctrl = 0, dig = 6, blksiz = 10, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
siPairsBlk( mtx, ctrl = 0, dig = 6, blksiz = 10, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with p columns. Denote x1 as the first column which is fixed and then paired with all other columns, say: x2, x3, .., xp, one by one for the purpose of flipping with x1. p must be 2 or more |
ctrl |
data matrix for designated control variable(s) outside causal paths default ctrl=0 which means that there are no control variables used. |
dig |
Number of digits for reporting (default |
blksiz |
block size, default=10, if chosen blksiz >n, where n=rows in matrix then blksiz=n. That is, no blocking is done |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. The source of slightly declining sampling
unreliability of higher moments is the
higher power of the deviations from the mean needed in their computations.
The summary results for all
three criteria are reported in a vector of numbers internally called crall
:
With p columns in mtx
argument to this function, x1 can be
paired with a total of p-1 columns (x2, x3, .., xp). Note
we never flip any of the control variables with x1. This function
produces i=1,2,..,p-1 numbers representing the summary sign, or ‘sum’ from
the signs sg1 to sg3 associated with the three criteria:
Cr1, Cr2 and Cr3. Note that sg1 and sg2 themselves are weighted signs using
weighted sum of signs from four orders of stochastic dominance.
In general, a positive sign in the i-th location of the ‘sum’ output of this function
means that x1 is the kernel cause while the variable in (i+1)-th column of mtx
is the
‘effect’ or ‘response’ or ‘endogenous.’ The magnitude represents the strength (unanimity)
of the evidence for a particular sign. Conversely a negative sign
in the i-th location of the ‘sum’ output of this function means that
that the first variable listed as the input to this function is the ‘effect,’
while the variable in (i+1)-th column of mtx
is the exogenous kernel cause.
The European Crime data has all three criteria correctly suggesting that
high crime rate kernel causes the deployment of a large number of police officers.
The command attach(EuroCrime); silentPairs(cbind(crim,off))
returns only one number: 3.175, implying the highest unanimity strength index,
with the positive sign suggesting ‘crim’ in the first column kernel causes
‘off’ in the second column of the argument mtx
to this function.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. Causal Paths and Exogeneity Tests in Generalcorr Package for Air Pollution and Monetary Policy (June 6, 2017). Available at SSRN: https://www.ssrn.com/abstract=2982128
See someCPairs
, some0Pairs
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) siPairsBlk(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA siPairsBlk(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
## Not run: options(np.messages=FALSE) colnames(mtcars[2:ncol(mtcars)]) siPairsBlk(mtcars[,1:3],ctrl=mtcars[,4:5]) # mpg paired with others ## End(Not run) options(np.messages=FALSE) set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA siPairsBlk(mtx=cbind(x2,y2), ctrl=cbind(z,w2))
The seven columns produced by this function summarize the results where the signs of
stochastic dominance order values (+1 or -1) are weighted by wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by a weighted sum for
the criteria Cr1 and Cr2. The weighting is obviously not needed for the third criterion Cr3.
some0Pairs( mtx, dig = 6, verbo = TRUE, rnam = FALSE, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
some0Pairs( mtx, dig = 6, verbo = TRUE, rnam = FALSE, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix in the first column is paired with all others. |
dig |
Number of digits for reporting (default |
verbo |
Make |
rnam |
Make |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. The source of slightly declining sampling
unreliability of higher moments is the
higher power of the deviations from the mean needed in their computations.
The summary results for all
three criteria are reported in one matrix called outVote
:
typ=1 reports ('Y', 'X', 'Cause', 'SD1apd', 'SD2apd', 'SD3apd', 'SD4apd') naming variables identifying 'cause' and measures of stochastic dominance using absolute values of kernel regression gradients (or amorphous partial derivatives, apd-s) being minimized by the kernel regression algorithm while comparing the kernel regression of X on Y with that of Y on X.
typ=2 reports ('Y', 'X', 'Cause', 'SD1res', 'SD2res', 'SD3res', 'SD4res') and measures of stochastic dominance using absolute values of kernel regression residuals comparing regression of X on Y with that of Y on X.
typ=3 reports ('Y', 'X', 'Cause', 'r*x|y', 'r*y|x', 'r', 'p-val') containing generalized correlation coefficients r*, 'r' refers to. Pearson correlation coefficient p-val is the p-value for testing the significance of 'r'
Prints three matrices detailing results for Cr1, Cr2 and Cr3. It also returns a grand summary matrix called ‘outVote’ which summarizes all three criteria. In general, a positive sign for weighted sum reported in the column ‘sum’ means that the first variable listed as the input to this function is the ‘kernel cause.’ For example, crime ‘kernel causes’ police officer deployment (not vice versa) is indicated by the positive sign of ‘sum’ (=3.175) reported for that example included in this package.
The output matrix last column for ‘mtcars’ example has the sum of the scores by the three criteria combined. If ‘sum’ is positive, then variable X (mpg) is more likely to have been engineered to kernel cause the response variable Y, rather than vice versa.
The European Crime data has all three criteria correctly suggesting that high crime rate kernel causes the deployment of a large number of police officers.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See Also somePairs
## Not run: some0Pairs(mtcars) # first variable is mpg and effect on mpg is of interest ## End(Not run) ## Not run: data(EuroCrime) attach(EuroCrime) some0Pairs(cbind(crim,off)) ## End(Not run)
## Not run: some0Pairs(mtcars) # first variable is mpg and effect on mpg is of interest ## End(Not run) ## Not run: data(EuroCrime) attach(EuroCrime) some0Pairs(cbind(crim,off)) ## End(Not run)
This function reports a 7-column matrix (has the older version of criterion Cr1).
It allows an additional input matrix having control variables. It
produces a 7-column matrix
summarizing the results, where the signs of
stochastic dominance order values (+1 or -1) are weighted
by wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance
by a weighted sum for
the criteria Cr1 and Cr2. The weighting is obviously not needed for
the third criterion Cr3 which compares asymmetric correlation coefficients.
someCPairs( mtx, ctrl, dig = 6, verbo = TRUE, rnam = FALSE, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
someCPairs( mtx, ctrl, dig = 6, verbo = TRUE, rnam = FALSE, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns where the first column is fixed and then paired with all other columns, one by one. |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
verbo |
Make |
rnam |
Make |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is somewhat arbitrary.
The summary results for all
three criteria are reported in one matrix called outVote
:
typ=1 reports ('Y', 'X', 'Cause', 'SD1apdC', 'SD2apdC', 'SD3apdC', 'SD4apdC') naming variables identifying 'cause' and measures of stochastic dominance using absolute values of kernel regression gradients (or amorphous partial derivatives, apd-s) being minimized by the kernel regression algorithm while comparing the kernel regression of X on Y with that of Y on X. The letter C in the titles reminds presence of control variable(s).
typ=2 reports ('Y', 'X', 'Cause', 'SD1resC', 'SD2resC', 'SD3resC', 'SD4resC') and measures of stochastic dominance using absolute values of kernel regression residuals comparing regression of X on Y with that of Y on X.
typ=3 reports ('Y', 'X', 'Cause', 'r*x|yC', 'r*y|xC', 'r', 'p-val') containing generalized correlation coefficients r*, 'r' refers to. Pearson correlation coefficient p-val is the p-value for testing the significance of 'r'. The letter C in the titles reminds the presence of control variable(s).
Prints three matrices detailing results for Cr1, Cr2 and Cr3.
It also returns a grand summary matrix called ‘outVote’ which summarizes all three criteria.
In general, a positive sign for weighted sum reported in the column ‘sum’ means
that the first variable listed as the input to this function is the ‘kernel cause.’
This function is an extension of some0Pairs
to allow for control variables.
For example, crime ‘kernel causes’ police officer deployment (not vice versa) is indicated by
the positive sign of ‘sum’ (=3.175) reported for that example included in this package.
The output matrix last column for ‘mtcars’ example has the sum of the scores by the three criteria combined. If ‘sum’ is positive, then variable X (mpg) is more likely to have been engineerd to kernel cause the response variable Y, rather than vice versa.
The European Crime data has all three criteria correctly suggesting that high crime rate kernel causes the deployment of a large number of police officers.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See Also somePairs
, some0Pairs
## Not run: someCPairs(mtcars[,1:3],ctrl=mtcars[4:5]) # first variable is mpg and effect on mpg is of interest ## End(Not run) ## Not run: set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA someCPairs(cbind(x2,y2), cbind(z,w2)) #yields x2 as correct cause ## End(Not run)
## Not run: someCPairs(mtcars[,1:3],ctrl=mtcars[4:5]) # first variable is mpg and effect on mpg is of interest ## End(Not run) ## Not run: set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA someCPairs(cbind(x2,y2), cbind(z,w2)) #yields x2 as correct cause ## End(Not run)
Second version of someCPairs
also allows input matrix of
control variables, produce 7 column matrix
summarizing the results where the signs of
stochastic dominance order values (+1 or -1) are weighted by
wt=c(1.2,1.1, 1.05, 1)
to
compute an overall result for all orders of stochastic dominance by
a weighted sum for the criteria Cr1 and Cr2.
The weighting is obviously not needed for the third criterion Cr3.
someCPairs2( mtx, ctrl, dig = 6, verbo = TRUE, rnam = FALSE, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
someCPairs2( mtx, ctrl, dig = 6, verbo = TRUE, rnam = FALSE, wt = c(1.2, 1.1, 1.05, 1), sumwt = 4 )
mtx |
The data matrix with many columns where the first column is fixed and then paired with all other columns, one by one. |
ctrl |
data matrix for designated control variable(s) outside causal paths |
dig |
Number of digits for reporting (default |
verbo |
Make |
rnam |
Make |
wt |
Allows user to choose a vector of four alternative weights for SD1 to SD4. |
sumwt |
Sum of weights can be changed here =4(default). |
The reason for slightly declining weights on the signs from
SD1 to SD4 is simply that the local mean comparisons
implicit in SD1 are known to be
more reliable than local variance implicit in SD2, local skewness implicit in
SD3 and local kurtosis implicit in SD4. The source of slightly declining sampling
unreliability of higher moments is the
higher power of the deviations from the mean needed in their computations.
The summary results for all
three criteria are reported in one matrix called outVote
:
(typ=1) reports ('Y', 'X', 'Cause', 'SD1.rhserr', 'SD2.rhserr', 'SD3.rhserr', 'SD4.rhserr') naming variables identifying the 'cause' and measures of stochastic dominance using absolute values of kernel regression abs(RHS first regressor*residual) values comparing flipped regressions X on Y versus Y on X. The letter C in the titles reminds presence of control variable(s).
typ=2 reports ('Y', 'X', 'Cause', 'SD1resC', 'SD2resC', 'SD3resC', 'SD4resC') and measures of stochastic dominance using absolute values of kernel regression residuals comparing regression of X on Y with that of Y on X.
typ=3 reports ('Y', 'X', 'Cause', 'r*x|yC', 'r*y|xC', 'r', 'p-val') containing generalized correlation coefficients r*, 'r' refers to. Pearson correlation coefficient p-val is the p-value for testing the significance of 'r'. The letter C in the titles reminds the presence of control variable(s).
Prints three matrices detailing results for Cr1, Cr2 and Cr3.
It also returns a grand summary matrix called ‘outVote’ which summarizes all three criteria.
In general, a positive sign for weighted sum reported in the column ‘sum’ means
that the first variable listed as the input to this function is the ‘kernel cause.’
This function is an extension of some0Pairs
to allow for control variables.
For example, crime ‘kernel causes’ police officer deployment (not vice versa) is indicated by
the positive sign of ‘sum’ (=3.175) reported for that example included in this package.
The output matrix last column for ‘mtcars’ example has the sum of the scores by the three criteria combined. If ‘sum’ is positive, then variable X (mpg) is more likely to have been engineered to kernel cause the response variable Y, rather than vice versa.
The European Crime data has all three criteria correctly suggesting that high crime rate kernel causes the deployment of a large number of police officers.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
See Also somePairs
, some0Pairs
## Not run: someCPairs2(mtcars[,1:3],ctrl=mtcars[4:5]) # first variable is mpg and effect on mpg is of interest ## End(Not run) ## Not run: set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA someCPairs2(cbind(x2,y2), cbind(z,w2)) #yields x2 as correct cause ## End(Not run)
## Not run: someCPairs2(mtcars[,1:3],ctrl=mtcars[4:5]) # first variable is mpg and effect on mpg is of interest ## End(Not run) ## Not run: set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is somewhat indep and affected by z y=1+2*x+3*z+rnorm(10) w=runif(10) x2=x;x2[4]=NA;y2=y;y2[8]=NA;w2=w;w2[4]=NA someCPairs2(cbind(x2,y2), cbind(z,w2)) #yields x2 as correct cause ## End(Not run)
This builds on the function mag_ctrl
, where the input matrix mtx
has p columns. The first column is present in each of the (p-1) pairs. Its
output is a matrix with four columns containing the names of variables
and approximate overall estimates of the magnitudes of
partial derivatives (dy/dx) and (dx/dy) for a distinct (x,y) pair in a row.
The estimated overall derivatives are not always well-defined, because
the real partial derivatives of nonlinear functions
are generally distinct for each observation point.
someMagPairs(mtx, ctrl, dig = 6, verbo = TRUE)
someMagPairs(mtx, ctrl, dig = 6, verbo = TRUE)
mtx |
The data matrix with many columns where the first column is fixed and then paired with all other columns, one by one. |
ctrl |
data matrix for designated control variable(s) outside causal paths. A constant vector is not allowed as a control variable. |
dig |
Number of digits for reporting (default |
verbo |
Make |
The function mag_ctrl
has kernel regressions: x~ y + ctrl
and x~ ctrl
to evaluate the‘incremental change’ in R-squares.
Let (rxy;ctrl) denote the square root of that ‘incremental change’ after its sign is made the
same as that of the Pearson correlation coefficient from
cor(x,y)
). One can interpret (rxy;ctrl) as
a generalized partial correlation coefficient when x is regressed on y after removing
the effect of control variable(s) in ctrl
. It is more general than the usual partial
correlation coefficient, since this one
allows for nonlinear relations among variables.
Next, the function computes ‘dxdy’ obtained by multiplying (rxy;ctrl) by the ratio of
standard deviations, sd(x)/sd(y)
. Now our ‘dxdy’ approximates the magnitude of the
partial derivative (dx/dy) in a causal model where y is the cause and x is the effect.
The function also reports entirely analogous ‘dydx’ obtained by interchanging x and y.
someMegPairs
function runs the function mag_ctrl
on several column
pairs in a matrix input mtx
where the first column is held fixed and all others
are changed one by one, reporting two partial derivatives for each row.
Table containing names of Xi and Xj and two magnitudes: (dXidXj, dXjdXi). dXidXj is the magnitude of the effect on Xi when Xi is regressed on Xj (i.e., when Xj is the cause). The analogous dXjdXi is the magnitude when Xj is regressed on Xi.
This function is intended for use only after the causal path direction
is already determined by various functions in this package (e.g. someCPairs
).
That is, after the researcher knows whether Xi causes Xj or vice versa.
The output of this function is a matrix of 4 columns, where first columns list
the names of Xi and Xj and the next two numbers in each row are
dXidXj, dXjdXi, respectively,
representing the magnitude of effect of one variable on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C. R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
See mag_ctrl
, someCPairs
set.seed(34);x=sample(1:10);y=1+2*x+rnorm(10);z=sample(2:11) w=runif(10) ss=someMagPairs(cbind(y,x,z),ctrl=w)
set.seed(34);x=sample(1:10);y=1+2*x+rnorm(10);z=sample(2:11) w=runif(10) ss=someMagPairs(cbind(y,x,z),ctrl=w)
This function lets the user choose one of three criteria to determine causal direction
by setting typ
as 1, 2 or 3. This function reports results for
only one criterion at a time unlike the function some0Pairs
which
summarizes the resulting causal directions for all criteria with suitable weights.
If some variables are ‘control’ variables, use someCPairs
, C=control.
somePairs(mtx, dig = 6, verbo = FALSE, typ = 1, rnam = FALSE)
somePairs(mtx, dig = 6, verbo = FALSE, typ = 1, rnam = FALSE)
mtx |
The data matrix in the first column is paired with all others. |
dig |
Number of digits for reporting (default |
verbo |
Make |
typ |
Must be 1 (default), 2 or 3 for the three criteria. |
rnam |
Make |
(typ=1) reports ('Y', 'X', 'Cause', 'SD1apd', 'SD2apd', 'SD3apd', 'SD4apd') nameing variables identifying 'cause' and measures of stochastic dominance using absolute values of kernel regression gradients comparing regresson of X on Y with that of Y on X.
(typ=2) reports ('Y', 'X', 'Cause', 'SD1res', 'SD2res', 'SD3res', 'SD4res') and measures of stochastic dominance using absolute values of kernel regression residuals comparing regresson of X on Y with that of Y on X.
(typ=3) reports ('Y', 'X', 'Cause', 'r*X|Y', 'r*Y|X', 'r', 'p-val') containing generalized correlation coefficients r*, 'r' refers to the Pearson correlation coefficient and p-val column has the p-values for testing the significance of Pearson's 'r'.
A matrix containing causal identification results for one criterion.
The first column of the input mtx
having p columns
is paired with (p-1) other columns The output matrix headings are
self-explanatory and distinct for each criterion Cr1 to Cr3.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
The related function some0Pairs
may be more useful, since it
reports on all three criteria (by choosing typ=1,2,3) and
further summarizes their results by weighting to help choose causal paths.
## Not run: data(mtcars) somePairs(mtcars) ## End(Not run)
## Not run: data(mtcars) somePairs(mtcars) ## End(Not run)
This function is an alternative implementation of somePairs
which also lets the user choose one of three criteria to determine causal direction
by setting typ
as 1, 2 or 3. This function reports results for
only one criterion at a time unlike the function some0Pairs
which
summarizes the resulting causal directions for all criteria with suitable weights.
If some variables are ‘control’ variables, use someCPairs
,
where notation C=control.
somePairs2(mtx, dig = 6, verbo = FALSE, typ = 1, rnam = FALSE)
somePairs2(mtx, dig = 6, verbo = FALSE, typ = 1, rnam = FALSE)
mtx |
The data matrix in the first column is paired with all others. |
dig |
Number of digits for reporting (default |
verbo |
Make |
typ |
Must be 1 (default), 2 or 3 for the three criteria. |
rnam |
Make |
(typ=1) reports ('Y', 'X', 'Cause', 'SD1.rhserr', 'SD2.rhserr', 'SD3.rhserr', 'SD4.rhserr') naming variables identifying the 'cause,' using Hausman-Wu criterion. It measures of stochastic dominance using absolute values of kernel regression abs(RHS first regressor*residual), comparing flipped regressions X on Y versus Y on X.
(typ=2) reports ('Y', 'X', 'Cause', 'SD1res', 'SD2res', 'SD3res', 'SD4res') and measures of stochastic dominance using absolute values of kernel regression residuals comparing regression of X on Y with that of Y on X.
(typ=3) reports ('Y', 'X', 'Cause', 'r*X|Y', 'r*Y|X', 'r', 'p-val') containing generalized correlation coefficients r*, 'r' refers to the Pearson correlation coefficient and p-val column has the p-values for testing the significance of Pearson's 'r'.
A matrix containing causal identification results for one criterion.
The first column of the input mtx
having p columns
is paired with (p-1) other columns The output matrix headings are
self-explanatory and distinct for each criterion Cr1 to Cr3.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
H. D. Vinod 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
The related function some0Pairs
may be more useful, since it
reports on all three criteria (by choosing typ=1,2,3) and
further summarizes their results by weighting to help choose causal paths.
Alternative and revised function somePairs2
implements the Cr1 (first criterion) with a direct estimate of
the Hausman-Wu statistic for testing exogeneity.
## Not run: data(mtcars) somePairs2(mtcars) ## End(Not run)
## Not run: data(mtcars) somePairs2(mtcars) ## End(Not run)
This function can use the sort.list function in R. The reason for using it is that one wants the sort to carry along all columns.
sort_matrix(x, j)
sort_matrix(x, j)
x |
An input matrix with several columns |
j |
The column number with reference to which one wants to sort |
A sorted matrix
set.seed(30) x=matrix(sample(1:50),ncol=5) y=sort_matrix(x,3);y
set.seed(30) x=matrix(sample(1:50),ncol=5) y=sort_matrix(x,3);y
1) Standardize the data to force mean zero and variance unity, 2) kernel regress x on y, with the option ‘residuals = TRUE’, and finally 3) compute the residuals. The standardization yields comparable residuals.
stdres(x, y)
stdres(x, y)
x |
vector of data on the dependent variable |
y |
data on the regressors which can be a matrix |
The first argument is assumed to be the dependent variable. If
stdres(x,y)
is used, you are regressing x on y (not the usual y
on x). The regressors can be a matrix with 2 or more columns. The missing values
are suitably ignored by the standardization.
kernel regression residuals are returned after standardizing the data on both sides so that the magnitudes of residuals are comparable between regression of x on y on the one hand, and the flipped regression of y on x on the other.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D. 'Generalized Correlation and Kernel Causality with Applications in Development Economics' in Communications in Statistics -Simulation and Computation, 2015, doi:10.1080/03610918.2015.1122048
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) stdres(x,y) ## End(Not run)
## Not run: set.seed(330) x=sample(20:50) y=sample(20:50) stdres(x,y) ## End(Not run)
Standardize x and y vectors to achieve zero mean and unit variance.
stdz_xy(x, y)
stdz_xy(x, y)
x |
Vector of data which can have NA's |
y |
Vector of data which can have NA's |
stdx |
standardized values of x |
stdy |
standardized values of y |
This works even if there are missing x or y values.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
## Not run: set.seed(30) x=sample(20:30) y=sample(21:31) stdz_xy(x,y) ## End(Not run)
## Not run: set.seed(30) x=sample(20:30) y=sample(21:31) stdz_xy(x,y) ## End(Not run)
Stochastic dominance originated as a sophisticated comparison of two distributions of stock market returns. The dominating distribution is superior in terms of local mean, variance, skewness, and kurtosis, respectively. However, stochastic dominance orders 1 to 4 are really not related to the four moments. Some details are in Vinod (2022, sec. 4.3) and vignettes. Nevertheless, this function uses the output of ‘wtdpapb.’ and Anderson's algorithm. Of course, Anderson's method remains subject to the trapezoidal approximation avoided by exact stochastic dominance methods.
stochdom2(dj, wpa, wpb)
stochdom2(dj, wpa, wpb)
dj |
Vector of (unequal) distances of consecutive intervals defined on common support of two probability distributions being compared |
wpa |
Vector of the first set of (weighted) probabilities |
wpb |
Vector of the second set of (weighted) probabilities |
sd1b |
Vector measuring stochastic dominance of order 1, SD1 |
sd2b |
Vector measuring stochastic dominance of order 2, SD2 |
sd3b |
Vector measuring stochastic dominance of order 3, SD3 |
sd4b |
Vector measuring stochastic dominance of order 4, SD4 |
The input to this function is the output of the function wtdpapb
.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.', 'Hands-On Intermediate Econometrics Using R' (2008) World Scientific Publishers: Hackensack, NJ. https://www.worldscientific.com/worldscibooks/10.1142/12831
Vinod, H. D. 'Ranking Mutual Funds Using Unconventional Utility Theory and Stochastic Dominance,' Journal of Empirical Finance Vol. 11(3) 2004, pp. 353-377.
See Also wtdpapb
## Not run: set.seed(234);x=sample(1:30);y=sample(5:34) w1=wtdpapb(x,y) #y should dominate x with mostly positive SDs stochdom2(w1$dj, w1$wpa, w1$wpb) ## End(Not run)
## Not run: set.seed(234);x=sample(1:30);y=sample(5:34) w1=wtdpapb(x,y) #y should dominate x with mostly positive SDs stochdom2(w1$dj, w1$wpa, w1$wpb) ## End(Not run)
This function gets the GPCCs by calling the parcorVec
function. The
pseudo regression coefficient of a kernel regression is then obtained by
[GPCC*(sd dep.var)/(sd regressor)], that is, by
multiplying the GPCC by
the standard deviation (sd) of the dependent variable, and dividing by the
sd of the regressor.
sudoCoefParcor(mtx, ctrl = 0, verbo = FALSE, idep = 1)
sudoCoefParcor(mtx, ctrl = 0, verbo = FALSE, idep = 1)
mtx |
Input data matrix with p (> or = 3) columns, |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0, when control variables are absent |
verbo |
Make this TRUE for detailed printing of computational steps |
idep |
The column number of the dependent variable (=1, default) |
A p by 1 ‘out’ vector pseudo partial derivatives.
Generalized Partial Correlation Coefficients (GPCC) allow comparison of
the relative contribution of each to the explanation of
,
because GPCC are scale-free. The pseudo regression
coefficient are not scale-free since they equal GPCC*(sd dep.var)/(sd regressor)
We want to get all partial correlation coefficient pairs removing other column effects. Vinod (2018) shows why one needs more than one criterion to decide the causal paths or exogeneity.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New Exogeneity Tests and Causal Paths,' (June 30, 2018). Available at SSRN: https://www.ssrn.com/abstract=3206096
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
See Also parcor_ijk
.
See Also a hybrid version parcorVecH
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) sudoCoefParcor(mtx, idep=2) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3')#some names needed sudoCoefParcor(x) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) sudoCoefParcor(mtx, idep=2) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3')#some names needed sudoCoefParcor(x) ## End(Not run)
This function gets HGPCCs by calling parcorVecH
function.
Pseudo regression coefficient of a kernel regression is obtained by
HGPCC*(sd dep.var)/(sd regressor), that is
multiplying the HGPCC by
the standard deviation (sd) of the dependent variable and dividing by the
sd of the regressor.
sudoCoefParcorH(mtx, ctrl = 0, verbo = FALSE, idep = 1)
sudoCoefParcorH(mtx, ctrl = 0, verbo = FALSE, idep = 1)
mtx |
Input data matrix with p (> or = 3) columns, |
ctrl |
Input vector or matrix of data for control variable(s), default is ctrl=0 when control variables are absent |
verbo |
Make this TRUE for detailed printing of computational steps |
idep |
The column number of the dependent variable (=1, default) |
A p by 1 ‘out’ vector pseudo partial derivatives
Hybrid Generalized Partial Correlation Coefficients (HGPCC) allow comparison of
the relative contribution of each to the explanation of
,
because GPCC are scale-free. Hybrid refers to use of OLS residuals.
Now pseudo hybrid regr coeff are HGPCC*(sd dep.var)/(sd regressor)
We want to get all partial correlation coefficient pairs removing other column effects. Vinod (2018) shows why one needs more than one criterion to decide the causal paths or exogeneity.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
Vinod, H. D. 'Generalized Correlations and Instantaneous Causality for Data Pairs Benchmark,' (March 8, 2015) https://www.ssrn.com/abstract=2574891
Vinod, H. D. 'Matrix Algebra Topics in Statistics and Economics Using R', Chapter 4 in Handbook of Statistics: Computational Statistics with R, Vol.32, co-editors: M. B. Rao and C.R. Rao. New York: North Holland, Elsevier Science Publishers, 2014, pp. 143-176.
Vinod, H. D. 'New Exogeneity Tests and Causal Paths,' (June 30, 2018). Available at SSRN: https://www.ssrn.com/abstract=3206096
Vinod, H. D. (2021) 'Generalized, Partial and Canonical Correlation Coefficients' Computational Economics, 59(1), 1–28.
See Also parcor_ijk
.
See Also a hybrid version parcorVecH
.
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) sudoCoefParcor(mtx, idep=2) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3')#some names needed sudoCoefParcorH(x) ## End(Not run)
set.seed(234) z=runif(10,2,11)# z is independently created x=sample(1:10)+z/10 #x is partly indep and partly affected by z y=1+2*x+3*z+rnorm(10)# y depends on x and z not vice versa mtx=cbind(x,y,z) sudoCoefParcor(mtx, idep=2) ## Not run: set.seed(34);x=matrix(sample(1:600)[1:99],ncol=3) colnames(x)=c('V1', 'v2', 'V3')#some names needed sudoCoefParcorH(x) ## End(Not run)
This function allows getting out the choice (of a column representing a stock) from four rows of numbers quantifying the four orders of exact stochastic dominance comparisons. If the last or 10-th row for “choice" has 1 then the stock representing that column is to be chosen. That is it should get the largest (portfolio) weight. If the original matrix row names are SD1 to SD4, the same names are repeated for the extra rows representing their ranks. The row name for “sum of ranks" is sumRanks. Finally, the ranks associated with sumRanks provide the row named choice along the bottom (10-th) row of the output matrix called "out."
summaryRank(mtx)
summaryRank(mtx)
mtx |
matrix to be ranked by row and summarized |
a matrix called ‘out’ having 10 rows and p columns (p=No.of stocks). Row Numbers 1 to 4 have SD1 to SD4 evaluation of areas over ECDFs. There are 6 more rows. Row No.5= SD1 ranks, Row No.6= SD2 ranks, Row No.7= SD3 ranks, Row No.8= SD4 ranks Row No.9= sum of the ranks in earlier four rows for ranks of SD1 to SD4 Row No.10= choice rank based on all four (SD1 to SD4) added together Thus, the tenth row yields choice priority number for each stock (asset) after combining the all four criteria.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
It is useful in symmetrizing the gmcmtx0 matrix containing a non-symmetric generalized correlation matrix.
symmze(mtx)
symmze(mtx)
mtx |
non-symmetric matrix |
mtx2 |
replace [i,j] and [j,i] by the max of absolute values with common sign |
Prof. H. D. Vinod, Economics Dept., Fordham University, NY.
## Not run: example mtx=matrix(1:16,nrow=4) symmze(mtx) ## End(Not run)#'
## Not run: example mtx=matrix(1:16,nrow=4) symmze(mtx) ## End(Not run)#'
Stochastic dominance is a sophisticated comparison of two distributions of stock market returns. The dominating distribution is superior in terms of mean, variance, skewness and kurtosis respectively, representing dominance orders 1 to 4, without directly computing four moments. Vinod(2008) sec. 4.3 explains the details. The ‘wtdpapb’ function creates the input for stochdom2 which in turn computes the stochastic dominance. See Vinod (2004) for details about quantitative stochastic dominance.
wtdpapb(xa, xb)
wtdpapb(xa, xb)
xa |
Vector of (excess) returns for the first investment option A or values of any random variable being compared to another. |
xb |
Vector of returns for the second option B |
wpa |
Weighted vector of probabilities for option A |
wpb |
Weighted vector of probabilities for option B |
dj |
Vector of interval widths (distances) when both sets of data are forced on a common support |
Function is needed before using stochastic dominance
In Vinod (2008) where the purpose of wtdpapb
is to map from standard
‘expected utility theory’ weights to more sophisticated 'non-expected utility
theory' weights using Prelec's (1998, Econometrica, p. 497) method. These
weights are not needed here. Hence we provide the function prelec2
which does not use Prelec weights at all, thereby simplifying and speeding up
the R code provided in Vinod (2008). This function avoids sophisticated ‘non-expected’
utility theory which incorporates commonly observed human behavior favoring
loss aversion and other anomalies inconsistent with precepts of the
expected utility theory. Such weighting is not needed for our application.
Prof. H. D. Vinod, Economics Dept., Fordham University, NY
Vinod, H. D.', 'Hands-On Intermediate Econometrics Using R' (2008) World Scientific Publishers: Hackensack, NJ. https://www.worldscientific.com/worldscibooks/10.1142/12831
Vinod, H. D. 'Ranking Mutual Funds Using Unconventional Utility Theory and Stochastic Dominance,' Journal of Empirical Finance Vol. 11(3) 2004, pp. 353-377.
See Also stochdom2
## Not run: set.seed(234);x=sample(1:30);y=sample(5:34) wtdpapb(x,y) ## End(Not run)
## Not run: set.seed(234);x=sample(1:30);y=sample(5:34) wtdpapb(x,y) ## End(Not run)