End-to-end analytics and custom attribution model for SMBs without coding | VPS and VPN
HostArmada - Affordable Cloud SSD Web Hosting

End-to-end analytics and custom attribution model for SMBs without coding


Despite the large number of end-to-end analytics systems, all of them have disadvantages: price, integration process, customization. Even in large industries such as real estate, not every client is willing to commit additional resources to this.

Therefore, for one of the large developers in St. Petersburg, we decided to independently build a system of end-to-end analytics and a custom attribution model that can be adjusted depending on the specifics of the business and the tasks set.

How it works

Calls are most often used to analyze the contribution of advertising channels to sales. The result is a rough picture of the user’s path to the deal. Why approximate? Because if the type of the promoted product or service implies several touches with advertising campaigns, then there may be one call, and more visits to the site. As a result, analyzing only calls, we lose some of the advertising channels involved in the sale.

Our analytics are based on ClientID, an anonymous identifier assigned by Yandex.Metrica to each unique visitor. It stores all information about the user’s actions on the site: about visits to pages, interaction with advertising campaigns, conversions.

Thanks to ClientID, we see a more complete picture and can more accurately assign value to channels and campaigns, since much more information is recorded than just a call.

Another benefit is data privacy. Not every customer is ready to hand over their clients’ phone numbers to the agency because of the fear that they will end up with competitors. ClientID solves this problem.

The only drawback or limitation is that no more than 100 identifiers can be specified at a time, that is, you can analyze the behavior of a maximum of 100 users at a time. This is due to the limitation on the number of parameters when using the operator (=.), For more details see the Help. Therefore, this method is suitable for a business with a small number of transactions and not instant demand (in case of instant demand, there is no need to apply a custom attribution model, since ~ 99% of orders occur immediately after the first visit to the site).

With a larger number of ClientIDs, you can upload data in stages, by separate requests, or instead of segments, use the ClientID as a request parameter (dimensions).

The result of building the analytics system and attribution model will be a table in which the value of each source and campaign on the way to sale will be visible. With its help, it will be possible to optimize marketing costs by rejecting ineffective placements.

The custom attribution model assumes that the specialist can independently choose what value to assign to the channel or campaigns of the first and last visit to the site. This allows you to adjust to specific tasks: increasing the top level of the funnel, focusing on the most conversion channels.

Before the beginning

To prepare for further work, you need a list of ClientIDs. If sales analysis is planned, then they are downloaded from CRM along with the date of the transaction (why the date is needed will become clear when creating the attribution model). Since the data transfer in each CRM may differ, check with the CRM support managers for details.

All channels for which analysis is planned must be tagged with UTM tags. Required parameters: utm_source, utm_medium, utm_campaign. The rest of the parameters are optional. For example, if you plan to analyze ad groups, keywords, then the markup should contain the corresponding values.

Creating a data query

Next, you need to compose a request via the Yandex.Metrica API, since the web interface of the analytics system allows you to filter data by only one identifier. When the ClientID is more than ten, the process becomes tedious.

The request is made using groupings and metrics. More information on the description of groupings and metrics can be found in the help.

Grouping (dimensions) is a feature by which you can group data, for example, browser, campaign, device type. In the context of our task, we need to use grouping by ClientID and traffic source parameters.

Metrics is a numerical value calculated based on the attribute of a hit or visit, for example: the number of visits, average browsing depth, conversion rate.

When creating a query, consider the compatibility of groupings and metrics. To set the query correctly, at the beginning of the name of groupings and metrics, two prefixes are used:

  • hit – ym: pv:

  • visit – ym: s:

An example of a ready-made link that we will use to send a request:

https://api-metrika.yandex.net/stat/v1/data.csv?id=11111111&limit=1000&dimensions=ym:s:clientID,ym:s:date,ym:s:lastsignTrafficSource, ym:s:UTMSource,ym:s:UTMMedium,ym:s:UTMCampaign,ym:s:UTMTerm&metrics=ym:s:visits, ym:s:users&date1=2020-01-09&date2=today&filters=ym:s:clientID=.('1579067368','1584950989','1590578020','1590944830','1591214250')

Instead of units in the id parameter, you need to specify the counter number, and your login, which will send the request, must have access to edit the counter.

Parameters used in the link:

  • limit – the limit of the lines to be uploaded, the number is written manually, depending on the amount of data;
  • dimensions – enumeration of data groupings;
  • ym: s: clientID – data grouping parameter, anonymous user ID;
  • ym: s: date – data grouping parameter, visit date;
  • ym: s: lastsignTrafficSource – data grouping parameter, last significant source;
  • ym: s: UTMSource – data grouping parameter, advertising system;
  • ym: s: UTMMedium – data grouping parameter, traffic type;
  • ym: s: UTMCampaign – data grouping parameter, name of the advertising campaign;
  • ym: s: UTMTerm – data grouping parameter, keyword;
  • metrics – enumeration of metrics;
  • visits – metric, number of visits;
  • users – metric, number of users;
  • date1 / date2 – the period for which we upload data in the YYYY-MM-DD format (including a number of reserved words, for example, today);
  • filters – filter, in this case we filter by ClientID;
  • ym: s: clientID following filters is not a link parameter, but a parameter of the above-described variable – filters – describes the segment by which the data is exported. Literally it can be translated as “unload data on all visits of only these customer identifiers”, then – enumeration of identifiers (up to 100).

Then you need to get a token to indicate from which login the request is sent to the API. To do this, you need to register the application on Yandex.OAuth and send an application for access to the API. All information on obtaining a token can be found in the API course.

After generating the link and receiving the token, you can send an API request to Metrica to obtain data by ClientID.

Uploading and formatting data by ClientID

To send a request, we use the ReqBin website. With its help, we can send API requests to Metrica without resorting to programming in JavaScript, Python or R.

Insert the generated link with the ClientID list, insert the token, click on the Send button. Done. You are beautiful (actually, not quite yet).

The received data must be inserted into Excel and divided into columns using a comma. This is how the data looks right after copying it to Excel:

Select the column with data, on the toolbar, select “Text by Columns”, in the “Delimited” window that opens.

Select “Comma” and wait for “Next” – the text is split into columns.

We sort the data by the ClientID column and by Date of visit (from old to new) to see the user’s path to the deal in chronological order.

Now let’s take a look at the magic of creating an attribution model using formulas.

Creating an attribution model using formulas

First, you need to number the visits for each ClientID. To do this, use the “IF” formula: if the ClientID matches the identifier in the previous line, then the visit number is increased by 1. If it does not match, then this is the first visit in the chain.

Formula: = IF (A3 = A2; J2 + 1; 1), where column A is the ClientID list, J is the previous visit number.

Then we calculate the number of the last visit using the same formula. Formula: = IF (A2 = A3; K3; J2).

The next three columns are weight distribution between visits. We gave 40% to the first and the last, 20% to the rest. The distribution can be changed at your discretion.

The weight of the first source is set by the same “IF” formula. If the visit number is 1, then 40% is given to it: = IF (J2 = 1; 40%; 0). It is the same with the last source – if the visit number is equal to the last visit number, then we give 40%: = IF (K2 = J2; 40%; 0).

And it remains to assign value to intermediate channels, dividing 20% ​​by the number of visits between the first and the last. If there are only two sources, then 20% is divided between the first and second in half. If one, then all the weight goes to him.

Formula: = IF (AND (J2> 1; J2

Summing up all three columns – now you are definitely beautiful.

Filtering post-sale visits

After the transaction is completed, users make visits to the site. If such visits fall into unloading, then they are also assigned a value, which is not always correct (it depends on how often repeat purchases are made). For example, in real estate, repeat purchases are very rare, so visits to the site after the transaction are made mainly to clarify information (phone number, construction report, key issuance). However, if one user can make several purchases with a high probability, then each visit to the site matters, as it can lead to a repeat transaction.

In our case, there is no value in post-transaction visits. Therefore, we manually add a column with the date of sale (Q) to Excel. Next, we compare the date of the sale and the visit, and if the visit was later, then it does not participate in the calculation of the value and is filtered using the formula = IF (B2

The resulting data can be adjusted to spend by channel and campaign, visualized using Google Data Studio or Power BI. Below is an example of a Power BI visualization with a random dataset that has nothing to do with actual advertiser data.

A few words in conclusion

Best of all, this method of analyzing sales and traffic sources is suitable for industries with a long transaction cycle and a small number of customers, for example, for real estate, auto, repair work, small online stores. In this case, the chain of interactions can be more than 1–2 visits and it is possible to limit yourself to 100 identifiers. If necessary, you can make several unloads, each with 100 ClientIDs. Difficulties can arise when the number of transactions per month exceeds 1000 and it becomes inconvenient to download ClientIDs in hundreds.

This analysis is not suitable for short deals – there are much fewer touches with advertising campaigns, the number of deals is higher, except for the analysis of repeat sales.

Leave a Reply

Your email address will not be published. Required fields are marked *