Step-by-Step: How the Coincident Investment Index Was Made
A coincident index provides a gauge for the movement of a key economic variable through the use of related data series. In this case, the coincident investment index mirrors one the two largest components of GDP, Private Fixed Investment. The index is divided in sections, which are weighed in accordance with their relevance to the variable at hand, and estimated using the latest available data for each. You can see read more about the index (methodology, usefulness, interpretation) here.
Component | Weight | Indicators |
---|---|---|
Construction | 47.2% | Construction Activity, Construction Employment |
Machinery and Equipment | 41.4% | Machinery Production, Capital Goods Imports |
Transportation Equipment | 11.3% | Utility Vehicles Sales, Transportation Imports |
Rather than focusing on what the index looks like and what to use it for, this post will focus on how the Alphacast pipelines engine can be used to make it in a completely automated manner. This means that, as soon as the data is uploaded in the Alphacast site, the Index will be updated to reflect it. The first step, then, is identifying the data. As said above, the index is divided into three components, reflected their importance to private fixed investment: Construction, Machinery and Equipment, and Transport Equipment. Each indicator has a number of data sources attached (in principle two, but not necessarily).
The first step to creating any pipeline is clicking on the "create new" button on the site and selecting "pipeline". Then, name slect the repository in which the pipeline will be "housed" and give it a name. For the name, I'd recommend using the same as the final dataset, to avoid confusion (and because the pipeline name is publicly displayed). For the repository, it depends on who you want to have access to it - in a public repository, everyone would be able to modify it, while a private one would allow for more control over modifications.
The next step is fetching a dataset. Since the index utilizes many different ones, it's not particularly relevant which ones to begin with - just search for it by name. For example, using the Construya Index as a proxy for construction activity, you can just look up "Construya Index" on the search bar and look for the desired dataset. It's inadvisable to choose the "raw data" version of this dataset, because the data is, as the name would sugges, raw - for instance, if you wanted a seasonally adjusted variable, you would have to do it manually. Don't forget to click "save" after selecting the dataset!
The next step is merging the dataset with all the other relevant ones. Click "add step below" and look for "merge" in the list. Selecting the dataset to merge with works exactly the same as fetching it: look up its name, type it, select it.
The remaining step is "matching entities". All datasets have at most two kinds of entities, at least one. The one they all have is "date" - the date for each data point. So when matching entities, choose "date" and "date" (they might have different names, like date and year, but the principle is the same). The remaining entity type is "countries" - what the data refers to. These are often countries, but sometimes other kinds of data too. Either way, if they are available, match them; if not, it's not a problem. The dataset for construction employment doesn't have an entity - and the pipeline works just fine.
You can merge every dataset consecutively (in this case, all six of them) or you can do it one at a time. But either way, the truth is that they don't all contain useful information - some variables are not relevant and clutter the process (especially in later steps). Regardless of the choice, selecting columns is a must. Once all the useful datasets are uploaded, whether partially or fully (for large datasets like the Industrial Production Index I'd recommend filtering variables after every merge), select the useful variables from the list and click "save". Be mindful of the difference between seasonally adjusted and regular variables, and between variables and MoM/YoY changes!
Indicator | Weight | Alphacast Source Source |
---|---|---|
Construction Activity | 33.2% | Construya Index |
Construction Employment | 14.2% | EIL - Labor Indicators by Sector |
National Machinery & Equipment | 15.5% | Industrial Production Index |
Imported Machinery & Equipment | 25.9% | Monthly Trade Statistics |
National Transportation Equipment | 7.4% | ADEFA Car Statistics |
Imported Transportation Equipment | 3.9% | Monthly Trade Statistics |
The full list of variables is there, and it's fairly simple to figure out which is which within each dataset.
After reiterating this process for all the relevant datasets (see chart above for the outline of which variables and from where), the next step is mostly for convenience: renaming the variables. This is convenient because, in some cases, variable names might not be comfortable to use or easy to remember (for larger lists of variables, this is really important). This process is simple: rename the variables you want, and keep the rest the same. It's really important to not have any variables have the name of a final output: for instance, a variable called "Investment Index" wouldn't make sense to keep, so you'd have to rename it.
The final step is calculating the variables. This was done in two steps. The first is converting all of the variables into a base=100 index, simply to prevent their different scales from muddying up the numbers - for instance, a variable that is measured in millions of pesos would have a thousand times less weight than one measured in thousands. A simple way of doing this is taking the variable for the base date to use (which will be the soonest date at which all variables are present - for instance, in the case of the investment index, it's January 2016), dividing each value by that figure, and multiplying by a hundred. For example, the Construya Index has a base 2016 number of 327.5 and a weight within the construction subindex of 70%, while the employment indicator has a base number of 87.3 and a weight of 30%.
The last step is calculating the Investment Index itself, which is done using all the subindeces and multiplying them by their weight. Since they all use base 100 variables by construction, their base values don't need to be adjusted. After this is done, the remainder is simple: filter out the input variables (Construction Activity & Employment, Car Sales, etc) to have a cleaner presentation of the final dataset.
You can also apply transformations, so users can have access to a more "complete" dataset: seasonal adjustment, YoY change, % of GDP, constant prices, whatever. Select the variables within the list (this is important if there are multiple transformations) and click save.
The final step is simple: publish to dataset. Pick a repository for the dataset to be listed, and pick a name for the dataset (it's best if it keeps with the general Alphacast formula of Topic - Country - Dataset Name - Frequency). Click "save", run the pipeline, and enjoy the results! If there are any errors, check for unsaved changes, bad formulas, or even typos. Voila!