Databricks is a fantastic data lake/warehouse option for storing and centralizing your business data. It's extremely easy to feed data into Databricks as they have a wide range of existing connectors and notebooks to help you manage large datasets.
Once you have your data in Databricks, you can easily process and transform the data to your requirements.
Ease of use:
Databricks has made it really easy to load your data into the platform. There are many prebuilt connectors to help ingest data from all the major players. This means you can get up and running quickly, often with just a few clicks and some minimal config.
For anything that doesn't exist, creating your own pipelines to ingest data is quite straightforward as well using Python/SQL notebooks.
Documentation:
There is great documentation and examples on the Databricks side which means you can get up to speed quickly and onboard new staff easily.
Transforming data:
Quite often, we need to reformat/restructure data before we can sync it over to HubSpot. Databricks makes transforming data a breeze with real time and batch transformations.
Hosting options:
"Databricks workspaces can be hosted on Amazon AWS, Microsoft Azure, and Google Cloud Platform. You can use Databricks on any of these hosting platforms to access data wherever you keep it, regardless of cloud."
This can really help with compliance and IT security requirements.
Sets businesses up for the future!
Most businesses have complex tech stacks with data spread out all over the place, but once you consolidate all your data into one place, life becomes a lot easier.
Need to integrate with another platform in the future? Easy, you already have access to your data!
A platform you're using goes away suddenly? Again, no worries, you already have a copy of your data.
We've helped clients stand up Databricks from scratch. The first step is determining what data sources they want to load into Databricks and what they are planning to do with the data.
Generally we are more focussed on the HubSpot side, but clients will often use Databricks for many other purposes.
Our high level HubSpot and Databricks approach:
If you want to get data from HubSpot into Databricks, there are 2 options:
Once we have all the data in Databricks, we need to get it back out and into HubSpot.
Luckily, there are many reverse ETL tools out there already that can handle this.
The main downside of these 3rd party tools is they generally have limited functionality (ie. can only sync certain objects) and can have slow processing (often do not use batch calls, etc.).
Costs can also add up. Generally, they either charge by volume or by object that is sync'd over. Over the long run, these costs can really start to add up.
For simple syncs, we recommend Census. It also allows you to create 2 object syncs to HubSpot for free.
For anything more complex, we recommend getting us to script the Reverse ETL process for you.
We have developed a custom processing engine which is optimized to the HubSpot batch API calls.
The main benefits to this approach:
Easily trigger bulk resync of smaller or the entire data set
Map and transform new properties
Last mile transformations:
Easily convert values during the sync
Our processing engine can also be used with other data warehouse or data source platforms like Snowflake, MySQL, RDS, Google Datastore, and more.