Step-by-Step: How the Coincident Consumption Index Was Made
A coincident index provides a gauge for the movement of a key economic variable through the use of related data series. In this case, the coincident investment index mirrors one the largest components of GDP, Private Consumption. The index is divided in sections, which are weighed in accordance with their relevance to the variable at hand, and estimated using the latest available data for each. You can see read more about the index (methodology, usefulness, interpretation) [here](LINK AL NUEVO INSIGHT).
Component | Weight | Indicators |
---|---|---|
Food and Beverages | 22.7% | Baked Goods, Meats, Fruits and Vegetables, Dairy, Beverages, Sweets and Candy |
Housing and Utilities | 14.5% | Real Estate and Housing Services, Electricity, Heating, Running Water |
Transportation | 14.5% | Vehicle purchases, Fuels, Transportation Services, Airfare |
Recreation and Culture | 8.6% | Related goods |
Clothing and Footwear | 6.8% | Clothing and Footwear production |
Restaurants and Hotels | 6.6% | Restaurants and Hotels activity |
Healthcare | 6.4% | Healthcare activity |
Home Equipment | 5.4% | Appliances, Furniture and Mattresses |
Communications | 5.3% | Activity in Communications |
Other Services | 4.3% | Financial Services, Personal and Professional Services |
Education | 3.2% | Education activity |
Alcohol and Tobacco | 1.9% | Wine, Cigarettes |
Rather than focusing on what the index looks like and what to use it for, this post will focus on how the Alphacast pipelines engine can be used to make it in a completely automated manner. This means that, as soon as the data is uploaded in the Alphacast site, the Index will be updated to reflect it. The first step, then, is identifying the data. As said above, the index is divided into three components, reflected their importance to private fixed investment: Construction, Machinery and Equipment, and Transport Equipment. Each indicator has a number of data sources attached (in principle two, but not necessarily).
The first step to creating any pipeline is clicking on the "create new" button on the site and selecting "pipeline". Then, name slect the repository in which the pipeline will be "housed" and give it a name. For the name, I'd recommend using the same as the final dataset, to avoid confusion (and because the pipeline name is publicly displayed). For the repository, it depends on who you want to have access to it - in a public repository, everyone would be able to modify it, while a private one would allow for more control over modifications.
The next step is fetching a dataset. Since the index utilizes many different ones, it's not particularly relevant which ones to begin with - just search for it by name. For example, using the Construya Index as a proxy for construction activity, you can just look up "Construya Index" on the search bar and look for the desired dataset. It's inadvisable to choose the "raw data" version of this dataset, because the data is, as the name would sugges, raw - for instance, if you wanted a seasonally adjusted variable, you would have to do it manually. Don't forget to click "save" after selecting the dataset!
The next step is merging the dataset with all the other relevant ones. Click "add step below" and look for "merge" in the list. Selecting the dataset to merge with works exactly the same as fetching it: look up its name, type it, select it.
The remaining step is "matching entities". All datasets have at most two kinds of entities, at least one. The one they all have is "date" - the date for each data point. So when matching entities, choose "date" and "date" (they might have different names, like date and year, but the principle is the same). The remaining entity type is "countries" - what the data refers to. These are often countries, but sometimes other kinds of data too. Either way, if they are available, match them; if not, it's not a problem. The dataset for construction employment doesn't have an entity - and the pipeline works just fine.
You can merge every dataset consecutively (in this case, all six of them) or you can do it one at a time. But either way, the truth is that they don't all contain useful information - some variables are not relevant and clutter the process (especially in later steps). Regardless of the choice, selecting columns is a must. You can do this while selecting a dataset, or after merging through the "select variables" option. Either alternative works the same, by dropping the variables that won't be needed from the dataset. Be mindful of the difference between seasonally adjusted and regular variables, and between variables and MoM/YoY changes!
The full list of variables is as follows, and it's fairly simple to figure out which is which within each dataset.
Component | Weight | Indicators |
---|---|---|
Food and Beverages | 22.7% | Food production, Egg production |
Housing and Utilities | 14.5% | Real Estate and Housing Services, and Public Services activity |
Transportation | 14.5% | New car purchases, used car purchases, motorcycle purchases, Fuel Production, Air Traffic |
Recreation and Culture | 8.6% | Related goods production |
Clothing and Footwear | 6.8% | Clothing and Footwear production |
Restaurants and Hotels | 6.6% | Restaurants and Hotels activity |
Healthcare | 6.4% | Healthcare activity |
Home Equipment | 5.4% | Home Appliances production, Electronics production, and Furniture and Mattresses production |
Communications | 5.3% | Communications activity |
Other Services | 4.3% | Financial Services and Personal and Professional Services activity |
Education | 3.2% | Education activity |
Alcohol and Tobacco | 1.9% | Wine and Cigarettes production |
The final step is calculating the variables. This was done in two steps. The first is converting all of the variables into a base=100 index, simply to prevent their different scales from muddying up the numbers - for instance, a variable that is measured in millions of pesos would have a thousand times less weight than one measured in thousands. A simple way of doing this is taking the variable for the base date to use (which will be the soonest date at which all variables are present - for instance, in the case of the investment index, it's January 2016), dividing each value by that figure, and multiplying by a hundred. Then, each component is calculated as a weighted sum of the subcomponent indeces.
The last step is calculating the Consumption Index itself, which is done using all the subindeces and multiplying them by their weight. Since they all use base 100 variables by construction, their base values don't need to be adjusted. After this is done, the remainder is simple: filter out the input variables (Wine, Meats, Car Sales, etc) to have a cleaner presentation of the final dataset.
You can also apply transformations, so users can have access to a more "complete" dataset: seasonal adjustment, YoY change, % of GDP, constant prices, whatever. Select the variables within the list (this is important if there are multiple transformations) and click save.
The final step is simple: publish to dataset. Pick a repository for the dataset to be listed, and pick a name for the dataset (it's best if it keeps with the general Alphacast formula of Topic - Country - Dataset Name - Frequency). Click "save", run the pipeline, and enjoy the results! If there are any errors, check for unsaved changes, bad formulas, or even typos. Voila!