Package gegravity
Documentation
gegravity is a Python package containing tools used to estimate general equilibrium (GE) structural gravity models and simulate counterfactual experiments. The package is based on the well established version of the gravity model described by Yotov, Piermartini, Monteiro, and Larch (2016) An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model. It implements the structural GE gravity model in a general, robust, and easy to use manner in an effort to make GE gravity modeling more accessible for researchers and policy analysts.
The package provides several useful tools for structural gravity modeling.
-
It computes theory consistent estimates of the structural multilateral resistance terms of Anderson and van Wincoop (2003) "Gravity with Gravitas" from standard econometric gravity results.
-
It simulates GE effects from counterfactual experiments such as new trade agreements or changes to other trade costs. The model can be flexibly used to study aggregate or sector level trade as well as many different types of trade costs and policies.
-
It conducts Monte Carlo simulations that provide a means to compute standard errors and other measures of statistical precision for the GE model results.
For more information about the GE gravity model, its implementation, and the various components of the package, see the companion paper "gegravity: General Equilibrium Gravity Modeling in Python".
Citation and license
The package is publicly available and free to use under the MIT license. Users are asked to please cite the following,
Herman, Peter (2021) "gegravity: General Equilibrium Gravity Modeling in Python." USITC Economics Working Paper 2021-04-B.
Versions
- Version 0.3 (Jun. 2024): Package reworked to no longer rely on the GME package for data and parameter inputs (may no longer be compatible with v0.2 models), MonteCarloGE features added to help identify and resolve issues with trials that fail to solve; and minor bug fixes and improvements to reliability.
- Version 0.2 (Dec. 2022): Improved OneSectorGE speed and updated compatibility with dependencies.
- Version 0.1 (Apr. 2021): Initial release.
Installation
The package can be installed via pip with the following command.
>>> pip install gegravity
Example
The following examples demonstrate how to perform a typical GE gravity analysis using the gegravity package. The code files for these examples as well as the sample data set used in them can be found at the project's github page or at the following gist locations.
- Sample data: https://gist.github.com/peter-herman/13b056e52105008c53faa482db67ed4a
- Basic GE analysis script: https://gist.github.com/peter-herman/faeea8ec032c4c2c13bcbc9c400cca9b
- Monte Carlo GE analysis: https://gist.github.com/peter-herman/a2ebf3997bfd6e9cb3268298d49b64b5
OneSectorGE gravity model
Prepare data inputs
Begin by loading some needed packages
>>> import gegravity as ge
>>> import pandas as pd
Next, load some input data
>>> gravity_data_location = "https://gist.githubusercontent.com/peter-herman/13b056e52105008c53faa482db67ed4a/raw/83898713b8c695fc4c293eaa78eaf44f8e880a85/sample_gravity_data.csv"
>>> grav_data = pd.read_csv(gravity_data_location)
>>> print(grav_data.head())
exporter importer year trade Y E pta contiguity common_language lndist international
0 GBR AUS 2006 4310 925638 362227 0 0 1 9.7126 1
1 FIN AUS 2006 514 142759 362227 0 0 0 9.5997 1
2 USA AUS 2006 16619 5019964 362227 1 0 1 9.5963 1
3 IND AUS 2006 763 548517 362227 0 0 1 9.1455 1
4 SGP AUS 2006 8756 329817 362227 1 0 1 8.6732 1
Add a constant to the data
>>> grav_data['constant'] = 1
Create a new BaselineData object to manage the input trade, output, expenditure, and cost variable data for the GE model.
>>> baseline = ge.BaselineData(grav_data,
... imp_var_name='importer', # Columns with importer identifiers
... exp_var_name='exporter', # Column with exporter identifiers
... year_var_name='year', # Column with year (time) identifier
... trade_var_name='trade', # Column with trade values
... expend_var_name='E', # Column with importer total expenditure values
... output_var_name='Y') # Column with exporter total output values
Prepare the trade cost parameters
Define the parameters that will be used to construct trade costs. These values were estimated separately via PPML.
>>> ests = [
... # 'var', 'beta', 'stderr'
... ( 'lndist', -0.3898623, 0.0729915),
... ( 'contiguity', 0.891577, 0.1327354),
... ('common_language', 0.0326249, 0.0840702),
... ( 'pta', 0.4711383, 0.1076578),
... ( 'international', -3.412584, 0.2151235),
... ( 'constant', 16.32434, 0.4844137)]
>>> ests = pd.DataFrame(ests, columns = ['var', 'beta', 'stderr'])
Use the ests dataframe to define a CostCoeff object for the gegravity model, specifying the columns containing the variable identifiers, coefficient estimates, and standard errors.
>>> cost_params = ge.CostCoeffs(estimates = ests, # Dataframe with estimate values
... identifier_col ='var', # Column with variable identifier
... coeff_col ='beta', # Column with estimated coefficient values
... stderr_col ='stderr') # Column with estimated standard errors
Conduct a basic GE analysis
With the data entered into the two input objects (CostCoeff and BaselineData), we can create the GE model. Define the GE model using the OneSectorGE class, which is the package's main model.
>>> ge_model = ge.OneSectorGE(baseline = baseline, # BaselineData input
... cost_coeff_values = cost_params, # CostCoeff input
... cost_variables = ['lndist', 'contiguity', # Variables to use to construct trade costs
... 'common_language', 'pta',
... 'international', 'constant'],
... year = "2006", # Year to use for model
... reference_importer = "DEU", # Reference importer to use (normalizes IMRs to DEU's IMR)
... sigma = 5) # Elasticity of substitution
Before solving the model, it is worth discussing one of the most important factors when it comes to finding a solution to the GE model. This factor is the scaling of the outward multilateral resistance terms, which can potentially cause issues for the solver if they are not scaled in a way that is numerically compatible with the other variables. Rescaling these terms can mitigate this numerical issue. The following method helps identify rescale factors that are likely to result in the model being robustly solveable.
>>> potential_factors = ge_model.check_omr_rescale(omr_rescale_range=4)
>>> print(potential_factors)
omr_rescale omr_rescale (alt format) solved message max_func_value mean_func_value reference_importer_omr
0 0.0001 10^-4 True The solution converged. 1.706264e-08 -2.081753e-10 0.050346
1 0.0010 10^-3 True The solution converged. 9.829577e-10 5.869294e-11 0.050346
2 0.0100 10^-2 True The solution converged. 1.566544e-09 1.595307e-10 0.050346
3 0.1000 10^-1 True The solution converged. 3.543053e-08 -1.622280e-11 0.050346
4 1.0000 10^0 True The solution converged. 1.663875e-09 -1.471645e-11 0.050346
5 10.0000 10^1 True The solution converged. 1.648595e-09 1.652183e-10 0.050346
6 100.0000 10^2 True The solution converged. 1.260880e-12 6.217249e-15 0.050346
7 1000.0000 10^3 False The iteration is not making good progress, as ... 4.666270e-01 -3.827271e-02 0.053669
8 10000.0000 10^4 False The iteration is not making good progress, as ... 4.664453e-01 -3.825787e-02 0.053667
It looks like rescale factors between 0.0001 and 100 all yield a solveable model, produce similar solutions to the baseline model (reference importer OMR = 0.050346), and result in function values that are consistently close to zero, which are all good signs. Going forward, we'll use a factor of 1, which is the default (i.e. OMRs will not be rescaled in this case).
With an appropriate rescale factor in hand, we can build the baseline model. This process solves for baseline multialteral resistances and calibrates other model parameters.
>>> ge_model.build_baseline(omr_rescale = 1)
Solving for baseline MRs...
The solution converged.
We can examine the solutions for the baseline multilateral resistances print(ge_model.baseline_mr.head(12))
Define counterfactual experiment and solve the counterfactual GE model
Next, we can try a counterfactual experiment. For this example, let us consider a hypothetical experiment in which Canada (CAN) and Japan (JPN) sign a preferential trade agreement (pta). Begin by creating a copy of the baseline data. Here, it is important that we create a "deep" copy so as to avoid modifying the baseline data too.
>>> exp_data = ge_model.baseline_data.copy()
Next, we modify the copied data to reflect the hypothetical policy change.
>>> exp_data.loc[(exp_data["importer"] == "CAN") & (exp_data["exporter"] == "JPN"), "pta"] = 1
>>> exp_data.loc[(exp_data["importer"] == "JPN") & (exp_data["exporter"] == "CAN"), "pta"] = 1
Now, define the experiment by supplying the counterfactual data to the GE model.
>>> ge_model.define_experiment(exp_data)
Counterfactual experiment trade costs are constructed when defining the experiment and can be examined. Trade costs in the GE model are represented similarly to trade costs estimated in the econometric model (trade = exp(BX), where BX is the trade cost estimate as captured by the model covariates). As a result, the cost values are generally positive and higher values imply more trade.
>>> print(ge_model.bilateral_costs.head())
We can also check the costs of Canadian exports to Japan, which were subject to the policy change
>>> print(ge_model.bilateral_costs.loc[('CAN','JPN'),:])
baseline trade cost 11305.165421
experiment trade cost 18108.800547
trade cost change (%) 60.181650
Name: (CAN, JPN), dtype: float64
With the experiment defined, the counterfactual model can be simulated. As the model solves, some diagnostic information will print to the console indicating if the first and second stages of the solution were successful.
>>> ge_model.simulate()
Access and Export Results
With the model estimated, we can retrieve many of the different sets of model results that are produced. The following are some of the more prominent collections of results.
Country results: A collection of many of the key country-level results (prices, total imports/exports, GDP, welfare, etc.)
>>> country_results = ge_model.country_results
Print the first few rows of country-level estimated change in factory prices, GDP, and foreign exports
>>> print(country_results.head())
factory gate price change (percent) omr change (percent) imr change (percent) GDP change (percent) welfare statistic terms of trade change (percent) output change (percent) expenditure change (percent) foreign exports change (percent) foreign imports change (percent) intranational trade change (percent)
country
AUS 0.004143 -0.004143 0.011249 -0.007106 1.000071 -0.007106 0.004143 0.004143 -0.087678 -0.033760 0.019940
AUT -0.002774 0.002775 0.002405 -0.005179 1.000052 -0.005179 -0.002774 -0.002774 -0.034691 -0.026447 -0.001602
BEL -0.002052 0.002052 0.000357 -0.002409 1.000024 -0.002409 -0.002052 -0.002052 -0.037698 -0.030564 -0.011239
BRA -0.005588 0.005588 -0.000092 -0.005496 1.000055 -0.005496 -0.005588 -0.005588 -0.080228 -0.066497 -0.005961
CAN -0.135770 0.135955 -0.551724 0.418261 0.995835 0.418261 -0.135770 -0.135770 1.573180 1.504744 -1.939006
Bilateral trade results: Baseline and counterfactual trade between each pair of countries.
>>> bilateral_results = ge_model.bilateral_trade_results
>>> print(bilateral_results.head())
baseline modeled trade experiment trade trade change (percent)
exporter importer
AUS AUS 218129.130005 218172.625325 0.019940
AUT 666.648700 666.499697 -0.022351
BEL 1532.878160 1532.421087 -0.029818
BRA 2747.265374 2746.299784 -0.035147
CAN 2847.023843 2780.118420 -2.350013
Aggregate Trade: Total imports, exports, intranational trade, output, and shipments for each country.
>>> agg_trade = ge_model.aggregate_trade_results
Multilateral Resistances: Country multilateral resistance (MR) terms
>>> mr_terms = ge_model.country_mr_terms
Solver Diagnostics: A dictionary containing many types of solver diagnostic info.
>>> solver_diagnostics = ge_model.solver_diagnostics
It is also possible to export the results to a collection of spreadsheet (.csv) files and add trade values in levels to the outputs.
>>> ge_model.export_results(directory="C://examples//",name="CAN_JPN_PTA_experiment")
Post estimation analysis
There are several tools that allow for post-estimation analysis. These can help to better understand the effects of the analysis or to example particular countries of interest.
For example, we can examine how the counterfactual experiment affected Canadian imports from NAFTA members USA and Mexico.
>>> nafta_share = ge_model.trade_share(importers = ['CAN'],exporters = ['USA','MEX'])
>>> print(nafta_share)
description baseline modeled trade experiment trade change (percentage point) change (%)
0 Percent of CAN imports from USA, MEX 22.021883 21.578895 -0.442988 -2.01158
1 Percent of USA, MEX exports to CAN 2.034768 1.991437 -0.043331 -2.129523
We can calculate counterfactual trade values based on observed levels and the estimated changes in trade.
>>> levels = ge_model.calculate_levels()
>>> print(levels.head())
baseline observed foreign exports experiment observed foreign exports baseline observed foreign imports experiment observed foreign imports baseline observed intranational trade experiment observed intranational trade
exporter
AUS 42485 42447.749958 98938 98904.598242 261365 261417.116627
AUT 87153 87122.765692 96165 96139.566802 73142 73140.828196
BEL 258238 258140.649294 262743 262662.696177 486707 486652.299609
BRA 61501 61451.658710 56294 56256.566037 465995 465967.223949
CAN 256829 260869.383269 266512 270522.322380 223583 219247.712294
Finally, we can calculate trade weighted shocks to see which countries or country-pairs were most/least affected by the counterfactual experiment. Start with bilateral level shocks
>>> bilat_cost_shock = ge_model.trade_weighted_shock(how = 'bilateral')
>>> print(bilat_cost_shock.head())
exporter importer baseline modeled trade trade cost change (%) weighted_cost_change
0 AUS AUS 218129.130005 0.0 0.0
1 AUS AUT 666.648700 0.0 0.0
2 AUS BEL 1532.878160 0.0 0.0
3 AUS BRA 2747.265374 0.0 0.0
4 AUS CAN 2847.023843 0.0 0.0
In this experiment, only Canada–Japan trade should be affected so all other shocks are zero.
Alternatively, we can also do this at country-level, which summarizes the bilateral shocks.
>>> country_cost_shock = ge_model.trade_weighted_shock(how='country', aggregations = ['mean', 'sum', 'max'])
>>> print(country_cost_shock.head())
weighted_cost_change
mean sum max mean sum max
exporter exporter exporter importer importer importer
AUS 0.000000 0.000000 0.000000 0.000000 0.0 0.0
AUT 0.000000 0.000000 0.000000 0.000000 0.0 0.0
BEL 0.000000 0.000000 0.000000 0.000000 0.0 0.0
BRA 0.000000 0.000000 0.000000 0.000000 0.0 0.0
CAN 0.009956 0.298669 0.298669 0.033333 1.0 1.0
The maximum shock possible is 1 so we can see Canada's imports from Japan were the largest affected trade flow and likely the most influential factor underlying the counterfactual results.
Monte Carlo analysis
This example demonstrates a basic MonteCarloGE analysis. The Monte Carlo version of the model randomly draws a collection of cost parameter estimates a a simulates a collection of OneSectorGE models using thos parameters. The randomly drawn parameters are based on the varriance/covariances of each of the parameters as estimated by an econometric gravity model.
Create the baseline data inputs
>>> import gegravity as ge
>>> import pandas as pd
Begin by loading some baseline data.
>>> raw_data = pd.read_csv("https://gist.githubusercontent.com/peter-herman/13b056e52105008c53faa482db67ed4a/raw/83898713b8c695fc4c293eaa78eaf44f8e880a85/sample_gravity_data.csv")
>>> raw_data['constant'] = 1
Define baseline data structure
>>> baseline = ge.BaselineData(raw_data, trade_var_name='trade', imp_var_name='importer', exp_var_name='exporter',
... year_var_name='year', output_var_name='Y', expend_var_name='E')
Setup the cost parameters
Create DataFrame of coefficient estimates and standard errors. These values were estimated separately using the same data and PPML
>>> ests = [
... # 'var', 'beta', 'stderr'
... ( 'lndist', -0.3898623, 0.0729915),
... ( 'contiguity', 0.891577, 0.1327354),
... ('common_language', 0.0326249, 0.0840702),
... ( 'pta', 0.4711383, 0.1076578),
... ( 'international', -3.412584, 0.2151235),
... ( 'constant', 16.32434, 0.4844137)]
>>> ests = pd.DataFrame(ests, columns = ['var', 'beta', 'stderr'])
Create dataframe of Variance/Covariance matrix and set row variable labels as DataFrame index
>>> var_covar = [
... # 'var', 'lndist', 'contiguity', 'common_language', 'pta', 'international', 'constant'
... ( 'lndist', .00532776, .00504029, .00093778, .00473823, -.01440259, -.03476407),
... ( 'contiguity', .00504029, .01761868, -.00175404, -.00030881, -.01492965, -.03036031),
... ('common_language', .00093778, -.00175404, .00706779, .00249937, .00136678, -.01312498),
... ( 'pta', .00473823, -.00030881, .00249937, .01159021, -.01538926, -.03255428),
... ( 'international', -.01440259, -.01492965, .00136678, -.01538926, .04627812, .08911527),
... ( 'constant', -.03476407, -.03036031, -.01312498, -.03255428, .08911527, .23465662)]
>>> var_covar = pd.DataFrame(var_covar, columns = ['var', 'lndist', 'contiguity', 'common_language', 'pta',
... 'international', 'constant'])
>>> var_covar.set_index('var', inplace = True)
Define gegravity CostValues object to organize this info as inputs for the GE model
>>> cost_params = ge.CostCoeffs(estimates = ests, identifier_col='var', coeff_col='beta', stderr_col='stderr',
... covar_matrix=var_covar)
Note: These estimates were produced using Stata and the following code
>>> import delimited "C:\gegravity\examples\sample_data_set.dlm"
>>> encode importer, gen(imp_fe)
>>> encode exporter, gen(exp_fe)
>>> ppmlhdfe trade lndist contiguity common_language pta international, absorb(i.exp_fe i.imp_fe)
>>> matrix list e(V)
Defining and simulating the MonteCarloGE model
With the econometric model estimated, we can define the MonteCarloGE model. In this example, we'll perform a small Monte Carlo experiment using 10 trials. The model will randomly draw 10 sets of cost coefficients from their joint normal distribution and simulate 10 OneSectorGE models corresponding to each draw. Most of the other MonteCarloGE arguments follow those from the OneSectorGE model. The two main exceptions are the "trials" and "seed" arguements. "trials" sets the number of Monte Carlo simulations to perform and "seed" sets the seed for the random draw. Setting a seed allows for repeatable/reproducible simulations.
>>> mc_model = ge.MonteCarloGE(baseline,
... year = '2006',
... trials = 10,
... reference_importer='DEU',
... sigma=7,
... cost_variables=['lndist', 'contiguity', 'common_language', 'pta', 'international', 'constant'],
... cost_coeff_values=cost_params,
... allow_singular_covar=True,
... seed = 1)
When the model is defined, it creates the 10 draws of cost coefficients. We can examine those random draws in the following way.
>>> print(mc_model.coeff_sample)
var 0 1 2 3 4 5 6 7 8 9
0 lndist -0.520534 -0.521725 -0.371895 -0.390785 -0.455785 -0.343360 -0.304334 -0.343764 -0.402571 -0.368141
1 contiguity 0.900885 0.830745 0.922490 0.853686 0.934542 1.043870 0.828643 0.797255 0.824927 0.771972
2 common_language 0.029730 -0.022368 0.011377 0.086987 -0.001718 0.057161 -0.009925 0.179760 0.043748 0.023636
3 pta 0.336452 0.331530 0.599859 0.356130 0.422941 0.510970 0.630710 0.602066 0.533864 0.512185
4 international -3.118628 -3.174684 -3.573262 -3.310206 -3.267814 -3.510545 -3.723681 -3.457308 -3.348660 -3.479166
5 constant 17.128512 17.209114 16.222708 16.288851 16.786156 15.989807 15.813891 15.920395 16.368214 16.219798
Similarly, we can examine summary information about the random draws. The first two columns report the point estimates from the econometric model (est_model) while the remaining report summary information about the Monte Carlo sample.
>>> print(mc_model.sample_stats)
beta_estimate stderr_estimate sample_count sample_mean sample_std sample_min sample_25% sample_50% sample_75% sample_max
var
lndist -0.389862 0.072992 10.0 -0.402289 0.074404 -0.521725 -0.442482 -0.381340 -0.349858 -0.304334
contiguity 0.891577 0.132735 10.0 0.870902 0.080656 0.771972 0.825856 0.842216 0.917089 1.043870
common_language 0.032625 0.084070 10.0 0.039839 0.059033 -0.022368 0.001555 0.026683 0.053808 0.179760
pta 0.471138 0.107658 10.0 0.483671 0.114499 0.331530 0.372833 0.511578 0.583360 0.630710
international -3.412584 0.215123 10.0 -3.396395 0.186946 -3.723681 -3.502700 -3.402984 -3.278412 -3.118628
constant 16.324340 0.484414 10.0 16.394745 0.489138 15.813891 16.047305 16.255780 16.681671 17.209114
Next, we can prepare the counterfactual experiment. As before, we'll consider a hypothetical Canada–Japan trade agreement.
>>> exp_data = mc_model.baseline_data.copy()
>>> exp_data.loc[(exp_data["importer"] == "CAN") & (exp_data["exporter"] == "JPN"), "pta"] = 1
>>> exp_data.loc[(exp_data["importer"] == "JPN") & (exp_data["exporter"] == "CAN"), "pta"] = 1
We can conduct the Monte Carlo analysis and simulate the trials by supplying the counterfactual experiment and using the following method. As before, most inputs follow from the OneSectroGE class. The two most notable exceptions are "results_stats" and "all_results", which determine what types of information are returned after running the model. "result_stats" determines the types of summary results that are produced across the different trials. For example, the list supplied below will produce the mean, standard deviation, standard error, and median values for each type of result, respectively. "all_results" determines if the model results for each individual trieal are retained and returned after the summary statistics are computed. If True, all that information is retained. Setting it to False disposes of these results after summary stats are computed and frees up any memory needed to store them, which can be sizable for models with a large number of trials and/or a large number of countries.
>>> mc_model.run_trials(experiment_data=exp_data,
... omr_rescale=1,
... result_stats = ['mean', 'std', 'sem', 'median'],
... all_results = True)
Examining the Monte Carlo results
After the model finishes solving all ten trials, we can examine the MonteCarloGE results. The MonteCarloGE model populates with the same set of results attributes as the OneSectorGE model. For example, we can retrieve to main country level results.
>>> mc_country_results = mc_model.country_results
>>> print(mc_country_results.head(8))
statistic factory gate price change (percent) omr change (percent) imr change (percent) GDP change (percent) welfare statistic terms of trade change (percent) output change (percent) expenditure change (percent) foreign exports change (percent) foreign imports change (percent) intranational trade change (percent)
country
AUS mean 0.002680 -0.002680 0.007648 -0.004968 1.000050 -0.004968 0.002680 0.002680 -0.088476 -0.039323 0.023454
AUS std 0.001212 0.001212 0.001252 0.001369 0.000014 0.001369 0.001212 0.001212 0.025571 0.012610 0.004520
AUS sem 0.000383 0.000383 0.000396 0.000433 0.000004 0.000433 0.000383 0.000383 0.008086 0.003988 0.001429
AUS median 0.002719 -0.002719 0.007692 -0.005198 1.000052 -0.005198 0.002719 0.002719 -0.090336 -0.041185 0.023691
AUT mean -0.001898 0.001898 0.001662 -0.003560 1.000036 -0.003560 -0.001898 -0.001898 -0.028666 -0.022628 0.005849
AUT std 0.000577 0.000577 0.000468 0.001041 0.000010 0.001041 0.000577 0.000577 0.007434 0.005717 0.002428
AUT sem 0.000182 0.000182 0.000148 0.000329 0.000003 0.000329 0.000182 0.000182 0.002351 0.001808 0.000768
AUT median -0.002071 0.002071 0.001740 -0.003838 1.000038 -0.003838 -0.002071 -0.002071 -0.029426 -0.023267 0.006439
If all_results = True, the model retains information about all of the trial runs. These can be accessed in a variety of ways. One option is to view all country-level results for each trial. Columns reflect trial numer and result type. (E.g. (0, 'welfare statistic') for the value of the Trial 0 welfare statistic).
>>> all_mc_country_results = mc_model.all_country_results
Alternatively, MonteCarloGE is based on running a series of OneSectorGE models using different cost parameters for each trial. These individual OneSectorGE models are stored in a list that can be accessed via the attribute trial_models. For example, we can examine the results from trial 2:
>>> trial_2_model = mc_model.trial_models[2]
>>> print(trial_2_model.country_results.head())
factory gate price change (percent) omr change (percent) imr change (percent) GDP change (percent) welfare statistic terms of trade change (percent) output change (percent) expenditure change (percent) foreign exports change (percent) foreign imports change (percent) intranational trade change (percent)
country
AUS 0.003306 -0.003306 0.009805 -0.006498 1.000065 -0.006498 0.003306 0.003306 -0.126374 -0.054366 0.030599
AUT -0.002510 0.002510 0.002141 -0.004651 1.000047 -0.004651 -0.002510 -0.002510 -0.037593 -0.029239 0.007880
BEL -0.001742 0.001742 0.000400 -0.002142 1.000021 -0.002142 -0.001742 -0.001742 -0.041347 -0.034214 -0.005636
BRA -0.004893 0.004893 -0.000103 -0.004789 1.000048 -0.004789 -0.004893 -0.004893 -0.100750 -0.089339 0.003946
CAN -0.147532 0.147750 -0.502212 0.356469 0.996448 0.356469 -0.147532 -0.147532 2.148502 2.047754 -2.415646
Finally we can export the summary results to a series of .csv files.
>>> mc_model.export_results(directory="examples//", name = 'monte')
Dealing with Failed Trials
Some GE gravity models are more difficult to solve than others. The repeated sampling and resolving of the model can result in cases where the model solves for some trials but not others. There are a few tools available to identify and mitigate these issues. For simplicity, the methods are demonstrated using the above example model, which does not exhibit any failed trials within the 10 specified trials, but can still illustrate the methods.
We can check for failed trials, which are listed by their trial number (0 to N). In this case, there are none.
>>> print(mc_model.failed_trials)
[]
One likely source of solver issues is the OMR rescale factor. There is a version of the check_omr_rescale method for the MonteCarloGE model that will check some or all of the trials for feasible OMR scaling factors. Here, we will redefine the model from scratch and check the values for trials 1, 4, and 5. Omitting that argument will check all trials.
>>> mc_model_2 = ge.MonteCarloGE(baseline,
... year = '2006',
... trials = 10,
... reference_importer='DEU',
... sigma=7,
... cost_variables=['lndist', 'contiguity', 'common_language', 'pta', 'international', 'constant'],
... cost_coeff_values=cost_params,
... allow_singular_covar=True,
... seed = 1)
>>> mc_omr_checks = mc_model_2.check_omr_rescale(omr_rescale_range=5, trials = [1, 4, 5])
One potential option for dealing with a failed trial is to specify a different omr_rescale_factor to use for that specific trial. For example, we could specify that trials 4 and 5 use a rescale factor of 100 instead of 1, which is used for all other trials.
>>> mc_model_2.run_trials(experiment_data=exp_data,
... omr_rescale=1,
... trial_omr_rescale={4:100, 5:100})
If the source of the failure cannot be overcome by adjusting the rescale factor, another option is to draw additional trials to replace those that failed. This has the advantage of insuring that a desired number of trials is run in total. However, it should also be noted that if certain parameter draws are failing for systemitic reasons, then redrawing failed trials could introduce an unwanted bias in the results. Therefore, this option should be used with some caution.
>>> mc_model_2.run_trials(experiment_data=exp_data,
... omr_rescale=1,
... redraw_failed_trials=True)
Expand source code
"""
# Documentation
--------------------
gegravity is a Python package containing tools used to estimate general equilibrium (GE) structural gravity models and simulate counterfactual experiments. The package is based on the well established version of the gravity model described by Yotov, Piermartini, Monteiro, and Larch (2016) *An Advanced Guide to Trade Policy Analysis: The Structural Gravity Model*. It implements the structural GE gravity model in a general, robust, and easy to use manner in an effort to make GE gravity modeling more accessible for researchers and policy analysts.
The package provides several useful tools for structural gravity modeling.
1. It computes theory consistent estimates of the structural multilateral resistance terms of Anderson and van Wincoop (2003) "Gravity with Gravitas" from standard econometric gravity results.
2. It simulates GE effects from counterfactual experiments such as new trade agreements or changes to other trade costs. The model can be flexibly used to study aggregate or sector level trade as well as many different types of trade costs and policies.
3. It conducts Monte Carlo simulations that provide a means to compute standard errors and other measures of statistical precision for the GE model results.
For more information about the GE gravity model, its implementation, and the various components of the package, see the companion paper ["gegravity: General Equilibrium Gravity Modeling in Python"](https://usitc.gov/sites/default/files/publications/332/working_papers/herman_2021_gegravity_modeling_in_python.pdf).
## Citation and license
The package is publicly available and free to use under the MIT license. Users are asked to please cite the following,
Herman, Peter (2021) "gegravity: General Equilibrium Gravity Modeling in Python." USITC Economics Working Paper 2021-04-B.
## Versions
* **Version 0.3** (Jun. 2024): Package reworked to no longer rely on the GME package for data and parameter inputs (may no longer be compatible with v0.2 models), MonteCarloGE features added to help identify and resolve issues with trials that fail to solve; and minor bug fixes and improvements to reliability.
* **Version 0.2** (Dec. 2022): Improved OneSectorGE speed and updated compatibility with dependencies.
* **Version 0.1** (Apr. 2021): Initial release.
## Installation
The package can be installed via pip with the following command.
>>> pip install gegravity
## Example
The following examples demonstrate how to perform a typical GE gravity analysis using the gegravity package. The code files for these examples as well as the sample data set used in them can be found at the project's github page or at the following gist locations.
* **Sample data:** https://gist.github.com/peter-herman/13b056e52105008c53faa482db67ed4a
* **Basic GE analysis script:** https://gist.github.com/peter-herman/faeea8ec032c4c2c13bcbc9c400cca9b
* **Monte Carlo GE analysis:** https://gist.github.com/peter-herman/a2ebf3997bfd6e9cb3268298d49b64b5
### OneSectorGE gravity model
#### Prepare data inputs
Begin by loading some needed packages
>>> import gegravity as ge
>>> import pandas as pd
Next, load some input data
>>> gravity_data_location = "https://gist.githubusercontent.com/peter-herman/13b056e52105008c53faa482db67ed4a/raw/83898713b8c695fc4c293eaa78eaf44f8e880a85/sample_gravity_data.csv"
>>> grav_data = pd.read_csv(gravity_data_location)
>>> print(grav_data.head())
exporter importer year trade Y E pta contiguity common_language lndist international
0 GBR AUS 2006 4310 925638 362227 0 0 1 9.7126 1
1 FIN AUS 2006 514 142759 362227 0 0 0 9.5997 1
2 USA AUS 2006 16619 5019964 362227 1 0 1 9.5963 1
3 IND AUS 2006 763 548517 362227 0 0 1 9.1455 1
4 SGP AUS 2006 8756 329817 362227 1 0 1 8.6732 1
Add a constant to the data
>>> grav_data['constant'] = 1
Create a new BaselineData object to manage the input trade, output, expenditure, and cost variable data for the GE model.
>>> baseline = ge.BaselineData(grav_data,
... imp_var_name='importer', # Columns with importer identifiers
... exp_var_name='exporter', # Column with exporter identifiers
... year_var_name='year', # Column with year (time) identifier
... trade_var_name='trade', # Column with trade values
... expend_var_name='E', # Column with importer total expenditure values
... output_var_name='Y') # Column with exporter total output values
#### Prepare the trade cost parameters
Define the parameters that will be used to construct trade costs. These values were estimated separately via PPML.
>>> ests = [
... # 'var', 'beta', 'stderr'
... ( 'lndist', -0.3898623, 0.0729915),
... ( 'contiguity', 0.891577, 0.1327354),
... ('common_language', 0.0326249, 0.0840702),
... ( 'pta', 0.4711383, 0.1076578),
... ( 'international', -3.412584, 0.2151235),
... ( 'constant', 16.32434, 0.4844137)]
>>> ests = pd.DataFrame(ests, columns = ['var', 'beta', 'stderr'])
Use the ests dataframe to define a CostCoeff object for the gegravity model, specifying the columns containing the variable identifiers, coefficient estimates, and standard errors.
>>> cost_params = ge.CostCoeffs(estimates = ests, # Dataframe with estimate values
... identifier_col ='var', # Column with variable identifier
... coeff_col ='beta', # Column with estimated coefficient values
... stderr_col ='stderr') # Column with estimated standard errors
#### Conduct a basic GE analysis
With the data entered into the two input objects (CostCoeff and BaselineData), we can create the GE model. Define the GE model using the OneSectorGE class, which is the package's main model.
>>> ge_model = ge.OneSectorGE(baseline = baseline, # BaselineData input
... cost_coeff_values = cost_params, # CostCoeff input
... cost_variables = ['lndist', 'contiguity', # Variables to use to construct trade costs
... 'common_language', 'pta',
... 'international', 'constant'],
... year = "2006", # Year to use for model
... reference_importer = "DEU", # Reference importer to use (normalizes IMRs to DEU's IMR)
... sigma = 5) # Elasticity of substitution
Before solving the model, it is worth discussing one of the most important factors when it comes to finding a solution to the GE model. This factor is the scaling of the outward multilateral resistance terms, which can potentially cause issues for the solver if they are not scaled in a way that is numerically compatible with the other variables. Rescaling these terms can mitigate this numerical issue. The following method helps identify rescale factors that are likely to result in the model being robustly solveable.
>>> potential_factors = ge_model.check_omr_rescale(omr_rescale_range=4)
>>> print(potential_factors)
omr_rescale omr_rescale (alt format) solved message max_func_value mean_func_value reference_importer_omr
0 0.0001 10^-4 True The solution converged. 1.706264e-08 -2.081753e-10 0.050346
1 0.0010 10^-3 True The solution converged. 9.829577e-10 5.869294e-11 0.050346
2 0.0100 10^-2 True The solution converged. 1.566544e-09 1.595307e-10 0.050346
3 0.1000 10^-1 True The solution converged. 3.543053e-08 -1.622280e-11 0.050346
4 1.0000 10^0 True The solution converged. 1.663875e-09 -1.471645e-11 0.050346
5 10.0000 10^1 True The solution converged. 1.648595e-09 1.652183e-10 0.050346
6 100.0000 10^2 True The solution converged. 1.260880e-12 6.217249e-15 0.050346
7 1000.0000 10^3 False The iteration is not making good progress, as ... 4.666270e-01 -3.827271e-02 0.053669
8 10000.0000 10^4 False The iteration is not making good progress, as ... 4.664453e-01 -3.825787e-02 0.053667
It looks like rescale factors between 0.0001 and 100 all yield a solveable model, produce similar solutions to the baseline model (reference importer OMR = 0.050346), and result in function values that are consistently close to zero, which are all good signs. Going forward, we'll use a factor of 1, which is the default (i.e. OMRs will not be rescaled in this case).
With an appropriate rescale factor in hand, we can build the baseline model. This process solves for baseline multialteral resistances and calibrates other model parameters.
>>> ge_model.build_baseline(omr_rescale = 1)
Solving for baseline MRs...
The solution converged.
We can examine the solutions for the baseline multilateral resistances
print(ge_model.baseline_mr.head(12))
#### Define counterfactual experiment and solve the counterfactual GE model
Next, we can try a counterfactual experiment. For this example, let us consider a hypothetical experiment in which Canada (CAN) and Japan (JPN) sign a preferential trade agreement (pta). Begin by creating a copy of the baseline data. Here, it is important that we create a "deep" copy so as to avoid modifying the baseline data too.
>>> exp_data = ge_model.baseline_data.copy()
Next, we modify the copied data to reflect the hypothetical policy change.
>>> exp_data.loc[(exp_data["importer"] == "CAN") & (exp_data["exporter"] == "JPN"), "pta"] = 1
>>> exp_data.loc[(exp_data["importer"] == "JPN") & (exp_data["exporter"] == "CAN"), "pta"] = 1
Now, define the experiment by supplying the counterfactual data to the GE model.
>>> ge_model.define_experiment(exp_data)
Counterfactual experiment trade costs are constructed when defining the experiment and can be examined. Trade costs in the GE model are represented similarly to trade costs estimated in the econometric model (trade = exp(BX), where BX is the trade cost estimate as captured by the model covariates). As a result, the cost values are generally positive and higher values imply more trade.
>>> print(ge_model.bilateral_costs.head())
We can also check the costs of Canadian exports to Japan, which were subject to the policy change
>>> print(ge_model.bilateral_costs.loc[('CAN','JPN'),:])
baseline trade cost 11305.165421
experiment trade cost 18108.800547
trade cost change (%) 60.181650
Name: (CAN, JPN), dtype: float64
With the experiment defined, the counterfactual model can be simulated. As the model solves, some diagnostic information will print to the console indicating if the first and second stages of the solution were successful.
>>> ge_model.simulate()
#### Access and Export Results
With the model estimated, we can retrieve many of the different sets of model results that are produced. The following are some of the more prominent collections of results.
**Country results:** A collection of many of the key country-level results (prices, total imports/exports, GDP, welfare, etc.)
>>> country_results = ge_model.country_results
Print the first few rows of country-level estimated change in factory prices, GDP, and foreign exports
>>> print(country_results.head())
factory gate price change (percent) omr change (percent) imr change (percent) GDP change (percent) welfare statistic terms of trade change (percent) output change (percent) expenditure change (percent) foreign exports change (percent) foreign imports change (percent) intranational trade change (percent)
country
AUS 0.004143 -0.004143 0.011249 -0.007106 1.000071 -0.007106 0.004143 0.004143 -0.087678 -0.033760 0.019940
AUT -0.002774 0.002775 0.002405 -0.005179 1.000052 -0.005179 -0.002774 -0.002774 -0.034691 -0.026447 -0.001602
BEL -0.002052 0.002052 0.000357 -0.002409 1.000024 -0.002409 -0.002052 -0.002052 -0.037698 -0.030564 -0.011239
BRA -0.005588 0.005588 -0.000092 -0.005496 1.000055 -0.005496 -0.005588 -0.005588 -0.080228 -0.066497 -0.005961
CAN -0.135770 0.135955 -0.551724 0.418261 0.995835 0.418261 -0.135770 -0.135770 1.573180 1.504744 -1.939006
**Bilateral trade results:** Baseline and counterfactual trade between each pair of countries.
>>> bilateral_results = ge_model.bilateral_trade_results
>>> print(bilateral_results.head())
baseline modeled trade experiment trade trade change (percent)
exporter importer
AUS AUS 218129.130005 218172.625325 0.019940
AUT 666.648700 666.499697 -0.022351
BEL 1532.878160 1532.421087 -0.029818
BRA 2747.265374 2746.299784 -0.035147
CAN 2847.023843 2780.118420 -2.350013
**Aggregate Trade:** Total imports, exports, intranational trade, output, and shipments for each country.
>>> agg_trade = ge_model.aggregate_trade_results
**Multilateral Resistances:** Country multilateral resistance (MR) terms
>>> mr_terms = ge_model.country_mr_terms
**Solver Diagnostics:** A dictionary containing many types of solver diagnostic info.
>>> solver_diagnostics = ge_model.solver_diagnostics
It is also possible to export the results to a collection of spreadsheet (.csv) files and add trade values in levels to the outputs.
>>> ge_model.export_results(directory="C://examples//",name="CAN_JPN_PTA_experiment")
#### Post estimation analysis
There are several tools that allow for post-estimation analysis. These can help to better understand the effects of the analysis or to example particular countries of interest.
For example, we can examine how the counterfactual experiment affected Canadian imports from NAFTA members USA and Mexico.
>>> nafta_share = ge_model.trade_share(importers = ['CAN'],exporters = ['USA','MEX'])
>>> print(nafta_share)
description baseline modeled trade experiment trade change (percentage point) change (%)
0 Percent of CAN imports from USA, MEX 22.021883 21.578895 -0.442988 -2.01158
1 Percent of USA, MEX exports to CAN 2.034768 1.991437 -0.043331 -2.129523
We can calculate counterfactual trade values based on observed levels and the estimated changes in trade.
>>> levels = ge_model.calculate_levels()
>>> print(levels.head())
baseline observed foreign exports experiment observed foreign exports baseline observed foreign imports experiment observed foreign imports baseline observed intranational trade experiment observed intranational trade
exporter
AUS 42485 42447.749958 98938 98904.598242 261365 261417.116627
AUT 87153 87122.765692 96165 96139.566802 73142 73140.828196
BEL 258238 258140.649294 262743 262662.696177 486707 486652.299609
BRA 61501 61451.658710 56294 56256.566037 465995 465967.223949
CAN 256829 260869.383269 266512 270522.322380 223583 219247.712294
Finally, we can calculate trade weighted shocks to see which countries or country-pairs were most/least affected by the counterfactual experiment. Start with bilateral level shocks
>>> bilat_cost_shock = ge_model.trade_weighted_shock(how = 'bilateral')
>>> print(bilat_cost_shock.head())
exporter importer baseline modeled trade trade cost change (%) weighted_cost_change
0 AUS AUS 218129.130005 0.0 0.0
1 AUS AUT 666.648700 0.0 0.0
2 AUS BEL 1532.878160 0.0 0.0
3 AUS BRA 2747.265374 0.0 0.0
4 AUS CAN 2847.023843 0.0 0.0
In this experiment, only Canada--Japan trade should be affected so all other shocks are zero.
Alternatively, we can also do this at country-level, which summarizes the bilateral shocks.
>>> country_cost_shock = ge_model.trade_weighted_shock(how='country', aggregations = ['mean', 'sum', 'max'])
>>> print(country_cost_shock.head())
weighted_cost_change
mean sum max mean sum max
exporter exporter exporter importer importer importer
AUS 0.000000 0.000000 0.000000 0.000000 0.0 0.0
AUT 0.000000 0.000000 0.000000 0.000000 0.0 0.0
BEL 0.000000 0.000000 0.000000 0.000000 0.0 0.0
BRA 0.000000 0.000000 0.000000 0.000000 0.0 0.0
CAN 0.009956 0.298669 0.298669 0.033333 1.0 1.0
The maximum shock possible is 1 so we can see Canada's imports from Japan were the largest affected trade flow and likely the most influential factor underlying the counterfactual results.
### Monte Carlo analysis
This example demonstrates a basic MonteCarloGE analysis. The Monte Carlo version of the model randomly draws a collection of cost parameter estimates a a simulates a collection of OneSectorGE models using thos parameters. The randomly drawn parameters are based on the varriance/covariances of each of the parameters as estimated by an econometric gravity model.
#### Create the baseline data inputs
>>> import gegravity as ge
>>> import pandas as pd
Begin by loading some baseline data.
>>> raw_data = pd.read_csv("https://gist.githubusercontent.com/peter-herman/13b056e52105008c53faa482db67ed4a/raw/83898713b8c695fc4c293eaa78eaf44f8e880a85/sample_gravity_data.csv")
>>> raw_data['constant'] = 1
Define baseline data structure
>>> baseline = ge.BaselineData(raw_data, trade_var_name='trade', imp_var_name='importer', exp_var_name='exporter',
... year_var_name='year', output_var_name='Y', expend_var_name='E')
#### Setup the cost parameters
Create DataFrame of coefficient estimates and standard errors. These values were estimated separately using the same data and PPML
>>> ests = [
... # 'var', 'beta', 'stderr'
... ( 'lndist', -0.3898623, 0.0729915),
... ( 'contiguity', 0.891577, 0.1327354),
... ('common_language', 0.0326249, 0.0840702),
... ( 'pta', 0.4711383, 0.1076578),
... ( 'international', -3.412584, 0.2151235),
... ( 'constant', 16.32434, 0.4844137)]
>>> ests = pd.DataFrame(ests, columns = ['var', 'beta', 'stderr'])
Create dataframe of Variance/Covariance matrix and set row variable labels as DataFrame index
>>> var_covar = [
... # 'var', 'lndist', 'contiguity', 'common_language', 'pta', 'international', 'constant'
... ( 'lndist', .00532776, .00504029, .00093778, .00473823, -.01440259, -.03476407),
... ( 'contiguity', .00504029, .01761868, -.00175404, -.00030881, -.01492965, -.03036031),
... ('common_language', .00093778, -.00175404, .00706779, .00249937, .00136678, -.01312498),
... ( 'pta', .00473823, -.00030881, .00249937, .01159021, -.01538926, -.03255428),
... ( 'international', -.01440259, -.01492965, .00136678, -.01538926, .04627812, .08911527),
... ( 'constant', -.03476407, -.03036031, -.01312498, -.03255428, .08911527, .23465662)]
>>> var_covar = pd.DataFrame(var_covar, columns = ['var', 'lndist', 'contiguity', 'common_language', 'pta',
... 'international', 'constant'])
>>> var_covar.set_index('var', inplace = True)
Define gegravity CostValues object to organize this info as inputs for the GE model
>>> cost_params = ge.CostCoeffs(estimates = ests, identifier_col='var', coeff_col='beta', stderr_col='stderr',
... covar_matrix=var_covar)
Note: These estimates were produced using Stata and the following code
>>> import delimited "C:\gegravity\examples\sample_data_set.dlm"
>>> encode importer, gen(imp_fe)
>>> encode exporter, gen(exp_fe)
>>> ppmlhdfe trade lndist contiguity common_language pta international, absorb(i.exp_fe i.imp_fe)
>>> matrix list e(V)
#### Defining and simulating the MonteCarloGE model
With the econometric model estimated, we can define the MonteCarloGE model. In this example, we'll perform a small Monte Carlo experiment using 10 trials. The model will randomly draw 10 sets of cost coefficients from their joint normal distribution and simulate 10 OneSectorGE models corresponding to each draw. Most of the other MonteCarloGE arguments follow those from the OneSectorGE model. The two main exceptions are the "trials" and "seed" arguements. "trials" sets the number of Monte Carlo simulations to perform and "seed" sets the seed for the random draw. Setting a seed allows for repeatable/reproducible simulations.
>>> mc_model = ge.MonteCarloGE(baseline,
... year = '2006',
... trials = 10,
... reference_importer='DEU',
... sigma=7,
... cost_variables=['lndist', 'contiguity', 'common_language', 'pta', 'international', 'constant'],
... cost_coeff_values=cost_params,
... allow_singular_covar=True,
... seed = 1)
When the model is defined, it creates the 10 draws of cost coefficients. We can examine those random draws in the following way.
>>> print(mc_model.coeff_sample)
var 0 1 2 3 4 5 6 7 8 9
0 lndist -0.520534 -0.521725 -0.371895 -0.390785 -0.455785 -0.343360 -0.304334 -0.343764 -0.402571 -0.368141
1 contiguity 0.900885 0.830745 0.922490 0.853686 0.934542 1.043870 0.828643 0.797255 0.824927 0.771972
2 common_language 0.029730 -0.022368 0.011377 0.086987 -0.001718 0.057161 -0.009925 0.179760 0.043748 0.023636
3 pta 0.336452 0.331530 0.599859 0.356130 0.422941 0.510970 0.630710 0.602066 0.533864 0.512185
4 international -3.118628 -3.174684 -3.573262 -3.310206 -3.267814 -3.510545 -3.723681 -3.457308 -3.348660 -3.479166
5 constant 17.128512 17.209114 16.222708 16.288851 16.786156 15.989807 15.813891 15.920395 16.368214 16.219798
Similarly, we can examine summary information about the random draws. The first two columns report the point estimates from the econometric model (est_model) while the remaining report summary information about the Monte Carlo sample.
>>> print(mc_model.sample_stats)
beta_estimate stderr_estimate sample_count sample_mean sample_std sample_min sample_25% sample_50% sample_75% sample_max
var
lndist -0.389862 0.072992 10.0 -0.402289 0.074404 -0.521725 -0.442482 -0.381340 -0.349858 -0.304334
contiguity 0.891577 0.132735 10.0 0.870902 0.080656 0.771972 0.825856 0.842216 0.917089 1.043870
common_language 0.032625 0.084070 10.0 0.039839 0.059033 -0.022368 0.001555 0.026683 0.053808 0.179760
pta 0.471138 0.107658 10.0 0.483671 0.114499 0.331530 0.372833 0.511578 0.583360 0.630710
international -3.412584 0.215123 10.0 -3.396395 0.186946 -3.723681 -3.502700 -3.402984 -3.278412 -3.118628
constant 16.324340 0.484414 10.0 16.394745 0.489138 15.813891 16.047305 16.255780 16.681671 17.209114
Next, we can prepare the counterfactual experiment. As before, we'll consider a hypothetical Canada--Japan trade agreement.
>>> exp_data = mc_model.baseline_data.copy()
>>> exp_data.loc[(exp_data["importer"] == "CAN") & (exp_data["exporter"] == "JPN"), "pta"] = 1
>>> exp_data.loc[(exp_data["importer"] == "JPN") & (exp_data["exporter"] == "CAN"), "pta"] = 1
We can conduct the Monte Carlo analysis and simulate the trials by supplying the counterfactual experiment and using the following method. As before, most inputs follow from the OneSectroGE class. The two most notable exceptions are "results_stats" and "all_results", which determine what types of information are returned after running the model. "result_stats" determines the types of summary results that are produced across the different trials. For example, the list supplied below will produce the mean, standard deviation, standard error, and median values for each type of result, respectively. "all_results" determines if the model results for each individual trieal are retained and returned after the summary statistics are computed. If True, all that information is retained. Setting it to False disposes of these results after summary stats are computed and frees up any memory needed to store them, which can be sizable for models with a large number of trials and/or a large number of countries.
>>> mc_model.run_trials(experiment_data=exp_data,
... omr_rescale=1,
... result_stats = ['mean', 'std', 'sem', 'median'],
... all_results = True)
#### Examining the Monte Carlo results
After the model finishes solving all ten trials, we can examine the MonteCarloGE results. The MonteCarloGE model populates with the same set of results attributes as the OneSectorGE model. For example, we can retrieve to main country level results.
>>> mc_country_results = mc_model.country_results
>>> print(mc_country_results.head(8))
statistic factory gate price change (percent) omr change (percent) imr change (percent) GDP change (percent) welfare statistic terms of trade change (percent) output change (percent) expenditure change (percent) foreign exports change (percent) foreign imports change (percent) intranational trade change (percent)
country
AUS mean 0.002680 -0.002680 0.007648 -0.004968 1.000050 -0.004968 0.002680 0.002680 -0.088476 -0.039323 0.023454
AUS std 0.001212 0.001212 0.001252 0.001369 0.000014 0.001369 0.001212 0.001212 0.025571 0.012610 0.004520
AUS sem 0.000383 0.000383 0.000396 0.000433 0.000004 0.000433 0.000383 0.000383 0.008086 0.003988 0.001429
AUS median 0.002719 -0.002719 0.007692 -0.005198 1.000052 -0.005198 0.002719 0.002719 -0.090336 -0.041185 0.023691
AUT mean -0.001898 0.001898 0.001662 -0.003560 1.000036 -0.003560 -0.001898 -0.001898 -0.028666 -0.022628 0.005849
AUT std 0.000577 0.000577 0.000468 0.001041 0.000010 0.001041 0.000577 0.000577 0.007434 0.005717 0.002428
AUT sem 0.000182 0.000182 0.000148 0.000329 0.000003 0.000329 0.000182 0.000182 0.002351 0.001808 0.000768
AUT median -0.002071 0.002071 0.001740 -0.003838 1.000038 -0.003838 -0.002071 -0.002071 -0.029426 -0.023267 0.006439
If all_results = True, the model retains information about all of the trial runs. These can be accessed in a variety of ways. One option is to view all country-level results for each trial. Columns reflect trial numer and result type. (E.g. (0, 'welfare statistic') for the value of the Trial 0 welfare statistic).
>>> all_mc_country_results = mc_model.all_country_results
Alternatively, MonteCarloGE is based on running a series of OneSectorGE models using different cost parameters for each trial. These individual OneSectorGE models are stored in a list that can be accessed via the attribute trial_models. For example, we can examine the results from trial 2:
>>> trial_2_model = mc_model.trial_models[2]
>>> print(trial_2_model.country_results.head())
factory gate price change (percent) omr change (percent) imr change (percent) GDP change (percent) welfare statistic terms of trade change (percent) output change (percent) expenditure change (percent) foreign exports change (percent) foreign imports change (percent) intranational trade change (percent)
country
AUS 0.003306 -0.003306 0.009805 -0.006498 1.000065 -0.006498 0.003306 0.003306 -0.126374 -0.054366 0.030599
AUT -0.002510 0.002510 0.002141 -0.004651 1.000047 -0.004651 -0.002510 -0.002510 -0.037593 -0.029239 0.007880
BEL -0.001742 0.001742 0.000400 -0.002142 1.000021 -0.002142 -0.001742 -0.001742 -0.041347 -0.034214 -0.005636
BRA -0.004893 0.004893 -0.000103 -0.004789 1.000048 -0.004789 -0.004893 -0.004893 -0.100750 -0.089339 0.003946
CAN -0.147532 0.147750 -0.502212 0.356469 0.996448 0.356469 -0.147532 -0.147532 2.148502 2.047754 -2.415646
Finally we can export the summary results to a series of .csv files.
>>> mc_model.export_results(directory="examples//", name = 'monte')
#### Dealing with Failed Trials
Some GE gravity models are more difficult to solve than others. The repeated sampling and resolving of the model can result in cases where the model solves for some trials but not others. There are a few tools available to identify and mitigate these issues. For simplicity, the methods are demonstrated using the above example model, which does not exhibit any failed trials within the 10 specified trials, but can still illustrate the methods.
We can check for failed trials, which are listed by their trial number (0 to N). In this case, there are none.
>>> print(mc_model.failed_trials)
[]
One likely source of solver issues is the OMR rescale factor. There is a version of the check_omr_rescale method for the MonteCarloGE model that will check some or all of the trials for feasible OMR scaling factors. Here, we will redefine the model from scratch and check the values for trials 1, 4, and 5. Omitting that argument will check all trials.
>>> mc_model_2 = ge.MonteCarloGE(baseline,
... year = '2006',
... trials = 10,
... reference_importer='DEU',
... sigma=7,
... cost_variables=['lndist', 'contiguity', 'common_language', 'pta', 'international', 'constant'],
... cost_coeff_values=cost_params,
... allow_singular_covar=True,
... seed = 1)
>>> mc_omr_checks = mc_model_2.check_omr_rescale(omr_rescale_range=5, trials = [1, 4, 5])
One potential option for dealing with a failed trial is to specify a different omr_rescale_factor to use for that specific trial. For example, we could specify that trials 4 and 5 use a rescale factor of 100 instead of 1, which is used for all other trials.
>>> mc_model_2.run_trials(experiment_data=exp_data,
... omr_rescale=1,
... trial_omr_rescale={4:100, 5:100})
If the source of the failure cannot be overcome by adjusting the rescale factor, another option is to draw additional trials to replace those that failed. This has the advantage of insuring that a desired number of trials is run in total. However, it should also be noted that if certain parameter draws are failing for systemitic reasons, then redrawing failed trials could introduce an unwanted bias in the results. Therefore, this option should be used with some caution.
>>> mc_model_2.run_trials(experiment_data=exp_data,
... omr_rescale=1,
... redraw_failed_trials=True)
"""
from .OneSectorGE import *
from .BaselineData import *
from .MonteCarloGE import * # move these to after comments to produce documentation. Not sure if that affects packaging.
Sub-modules
gegravity.BaselineData
gegravity.MonteCarloGE
gegravity.OneSectorGE