To a large extent boot-strapping your marketing activities with data, resolves around the collection of data of two or three specific domains depending on the scope of your business.
There are quite a few use cases around marketing analytics, but we can easily show how these data sources of Campaign Activity, Clickstreams and Ecommerce sales can drive some of the biggest marketing analytics use cases such as media-mix modelling, marketing attribution and churn prevention.
Media-mix modeling (MMM) allows to get an understanding of how to shift your budget mix among different advertising channels to optimize your outcome. It relies on statistics techniques to understand where 1 unit of marginal spend would be best placed.
MMM relies on two specific source of information 1. Spend data (Campaign) 2. an Outcome data (Eg: Ecommerce Sales), that we want to optimize for. In Order to be effective it is important for MMM to have a view of all the different channel spend contributing to the outcome.
Marketing attribution role is to assign credit to specific marketing campaign, to get a better sense of their contribution to a specific objective. Attribution methods such as “last click”, provide full credit to a specific campaign when an objective is reached, in this case it provides the full credit of a campaign to the last touchpoint having contributed to the said objective, while other techniques can partial credit. Explanation of different attribution techniques are provided in the following medium posts: here and here.
Marketing attribution relies on Campaign Data (Spend), Clickstream information and Sales data in order to properly attribute Campaign activities. Along with setting up the different systems to collect the information, proper url tagging (UTM) and potentially setting up the right Cookies on the websites are necessary first step to enable this use case.
Churn identification and prevention is one of the most traditional CRM use cases. It leverages upon Sales and Campaign data (CRM) to get a better understanding as to what customers are likely to churn and to what offer they would tend to respond to in order to stick with the service offering.
In general one of the hardest part of empowering Marketing Analytics is the ability to source the data. It requires information to be pulled from a variety of sources. Depending on the specific of the Business trying to kick-start this use-case there can be a variety of applicable ways to integrate each of the data sources required.
There exists quite a few way to integrate and collect campaign data, from using specific data integration solutions, to leveraging singers taps, using the different tools built in data export capabilities or through building specific API pipelines.
Different tools exists to simplify the collection of data from marketing campaigns, Talend, Adverity, Fivetran and Alooma (recently acquired by Google) provide a series of connectors that make data integration from these different ad-sources fairly easy.
Singer taps provides open-source pre-built connectors to a series of advertising sources such Facebook, Google, Outbrain, Salesforce, Marketo, Selligent,.. The data from these sources can be then be easily fetched by only modifying some configuration settings and executing a command line call, for instance
tap-adwords -c config.json -p properties.json -s state.json
Certain advertising tools allows for data exports to Big Query, a different data-warehouse tool or as file exports, for instance:
Development of pipelines to pull directly the data from Facebook, Google through API calls. This requires a data engineer or developer to setup the different data-flows. Most advertising sources provide SDK for easy integrations with their platforms.
Different alternatives exist in order to collect clickstream data, from relying on a premium analytics tool such as google analytics 360 or Adobe, leveraging a Customer Data Platform, a Clickstream collector or through setting up some custom development.
The simplest way, if you can afford it, to collect raw clickstream data is through google/adobe Analytics. Google offers the possibility to export raw clickstream data to Big Query as part of their Google 360 offering. One of the major draw-back of going for that route is the $150k annual cost of Google Analytics 360.
Most customer data platform offer the possibility to export ingested events to a data-warehouse. They ingest data from multiple sources, including website activity and are able to stream it back for processing or analysis. Depending on the size of your business, a customer data platform might prove more expensive than purchasing a Google 360 license, but offers additional benefits. Certain CDP such as segment, offer a free version up to a certain amount of events or active users.
Different open source clickstream collectors exist, the most known one are Snowplow and divolte. They offer a way to ingest clickstream data, without the need to fully develop it. The draw-back of using these, is that you need to be managing the infrastructure.
Another solution to collect clickstream data is through custom development. Logic App, Function App, Lambda function and an EventHub/Kinesis/PubSub setup would allow for a scalable ingestion of data, but at the cost of managing code and infrastructure .
There are different ways to source information related to online sales, all of which have their own set of pros and cons:
It is possible to capture online sales data through an analytics tag such as Google’s. Google provides a structured way to pass the information to analytics through its enhanced ecommerce plug-in. This allows to provide a good first pass at capturing e-commerce data, there is however quite a few drawbacks from that approach:
The main advantage from using this approach is the universality of it and the speed at which this can be deployed, usually only necessitating some tag integration for most web-shops, and for certain platform such as shopify, only some simple configuration.
Another of the advantage of setting up enhanced e-commerce tracking, is the ability to tie purchase to specific sessions and therefore be able to rely on google’s last click attribution. With it it is possible to attribute specific orders to specific campaign and sales-channel based on a last click attribution, this can be beneficial.
Some E-commerce platforms, allow those operating the platform to setup their own database , this is the case of Magento, EpiServer, or SiteCore for instance. In these cases, it is possible to setup a Master-Slave database replication or database mirroring, so that the data can be used for reporting purpose without affecting the production environment.
These can be setup without custom development, and can allow for a quick turnaround for providing data for reporting purposes.
As for Campaign data, data integration tools exists that provide turnkey integration for e-commerce data. Each of the mentioned vendors provide connectors to certain e-commerce platforms:
The use of these data integration solution are an alternative when there is lack of technical capabilities within the team/department.
Currently WooCommerce is the only web-shop, having a singer-tap connector, making it’s use quite restricted. It is however possible to develop custom singer taps for specific use. This can be a good move, when already operating online campaign data collection through SingerTaps.
Most web-shops these days allows for pulling order information directly as API calls, this is the case of Pure SaaS platforms such as Shopify, Lightspeed, Commercecloud or Commercetools, but also of the likes of Magento.
Some of which are supported by python SDK:
One of the major drawback of this approach is that it requires custom data engineer or software engineer work, it requires polling and to a certain degree is not “real-time”. Certain platform furthermore have rate limitations that might make it impractical to work with for pulling large amounts of orders.
One of the advantage of it however, is the ability to pull updated information about specific orders to date-ranges.
The segment store allows us to retain the history of the customers belonging to the different audiences. Segment stores allow for better manage the load on the downstream export flow. Either by providing a means to check the delta of audience memberships/segment attribution or through branching out the evaluation logic for customers belonging to the audience.
Webhooks are essentially a way to send a notification over HTTP when some type of event happen, for our purpose, they provides a way to create a real-time ingestion of data from an e-commerce platform. Web-hooks can also provide a way to go around some of the rate-limitation if there isn’t any need to make callbacks.
They do require some sort of web-hook listener API and ingestion layer in order to capture the data. These can be built with the same type of technologies used for capturing clickstream data, for example a Logic App / EventHub combination.
Most e-commerce platform supports web-hooks, Shopfiy, Lightspeed and WooCommerce, Shopware, and Big commerce support it natively, while Magento supports it through 3rd party plugins, and platforms, such as sitecore or episerver need custom development.
Some e-commerce platforms are able to publish events directly onto a message queue (eg: Google Pub/Sub, Azure Service Bus, AWS SQS). This is the case of Commercetools, that prefered this approach to the standard HTTP webhooks. This allows for instance to natively “duplicate” the relevant data for both processing (eg order fulfillment), and for long term storage in a datawarehouses and let the different consumer of the data “subscribe” from that single source of information.
Beside Google Pub/Sub that has a turnkey export to a Datawarehouse (BigQuery), the other technology choices will still require development work in order to ingest he data.