Approaching its fifth birthday, Databricks’ founders launched the business after the success of Apache Spark – the world’s first unified data engine, created in response to a competition set up by Netflix to deliver an algorithm to predict what people would watch next on its streaming service.
Its creator Matei Zaharia developed the software at UC Berkely in 2009 aiming to overcome the challenge between data and compute. On inception, Databricks handed over the software to the Apache Spark open source foundation (becoming the largest open source community in big data, with over 1,000 contributors from 250-plus organisations) and tasked itself with accelerating the adoption of Spark around the world across a range of industries.
So how does it work and provide fuel for data hungry dashboards? We spoke with Databricks VP EMEA Nick Peart to find out more. “It’s able to bring together big data and break down the silos between lines of business (allowing them to glean actionable insights from data for better decisions) and the data engineers who collect the data, store it, cleanse it and put it in a shape that allows the data scientists to create models and analyse data sets and data points to measure their impact on each other,” says Peart. “Databricks has created a cloud-based and serverless unified analytics platform that enables an overview for data driven decisions based on these three areas working together.”
Peart notes the buzz around big data has been building steadily over the past few years so what trends is he seeing in the industry around dashboards and how is Databricks reacting to them? “By utilising Databricks and other tools you are able to get your insights to bubble to the surface and make the decisions you can reach based on data easy to consume by a broad audience,” he explains. “That’s where the real power of the dashboard comes in – it can provide a connectivity layer between the deep learning insight that goes in to producing your models and analysing your data, while making it easy to utilise for a broad spectrum of users across your business. By channeling everyone’s needs and enabling data to be ingested faster and in real time, dashboards have moved on, from a few years ago, telling you what happened yesterday to telling you what’s happening right now and what events are on the horizon.”
Apache Spark is at the heart of Databricks, which is available as APIs to enable you to plug into multiple different layers. It’s a cloud-based platform currently operating on AWS and Azure, where Databricks has first party service with Microsoft. “We also have partnerships with GSIs and RSIs such as CapGemini and Cognizant, adds Peart. “We’re working at that level from an integrated perspective. Allied to this we have open APIs and plug-ins with the likes of Tableau and Looker. Because our foundation is in open source we have an extendable platform enabling you to plug in a multitude of data sources, visualisation and VI platforms, so you can go from data ingestion through to aggregation and sharing of your results with almost any vendor you want to work with.”
Databricks has been able to overcome the challenge of delivering results with a focus on real time analysis by utilising the cloud to take in streaming live data. “Spark’s ability to deal with real time streaming data goes beyond zeroes and ones,” maintains Peart. “We’re able to do meet the challenge of real time analysis of video as well. We’re working with new customers on a variety of applications. For example, on a building site in a hard hat area where protective glasses must also be worn, we’ve got customers looking at their CCTV feeds from site entrances and using real time analysis to ensure people walking through the door are dressed appropriately with a hi-vis vest, the right type of boots, hard hat and eye protection. They’re working on learning models to be able to sound an alarm to alert the gate if someone walks though without meeting the full requisite of safety standards. This near real time alert model could be applied to multiple situations where you’ve got eyes and ears monitoring an environment.”
Peart is keen to emphasise the practical benefits of this real time capability and highlights his personal experience at a company event in San Francisco. “One of our key, and first, US customers is Capital One – they run their fraud protection analytics using a Databricks model,” he explains. “That night I went into a bar and paid for dinner and drinks with my Capital One card and just as the card had been swiped I received a text message warning me of an unusual transaction. I replied that yes, this was me and the payment was authorised. That whole transaction took just four seconds from start to finish.”
Databricks also works with multinationals such as Shell, which has implemented its solution for their supply chain analysis across partnerships with mining, drilling, petro chemical and processing plants. “These are areas with high levels of maintenance parts requirements,” notes Peart. “They used to have big storage facilities strategically placed around the world with multiple versions of every part in stock. Now they’ve been able to rationalise that by looking at IoT sensors recording the likes of machine vibration and exhaust emissions to predict where the next part will fail, helping them streamline their process. It’s helped them save a significant amount of cost avoiding unnecessary down time.”
It's this ability to manage predictive analysis and maintenance which attracted HP as a customer. Peart recalls how the printer giant was keen to meet the challenge of running its ink division: “HP users can sign up for a service that monitors how much ink they use on an average day so when you’re low it orders one just in time so you don’t need multiple spares as back up.”
Peart believes successes like these that Databricks is having with a broad spectrum of customers are answering a key question: how can you derive more value from your data? This can allow for transformative business decisions based on the data a company already holds. “Quite frankly, you can have the biggest of big data but unless you can bring it to the surface and make use of it, what is it’s worth?” he asks. “Google noted back in 2015 that everybody was focusing on machine learning, AI… but actually the algorithms in that space, and available within Databricks, have been around for the past 20 years and haven’t really changed. What’s making them work is the ability to pump a lot of data through the algorithms to get meaningful results. We’re at the forefront of that because Databricks has made it so easy to get as much data as you need into your model, get it in the right shape, manage it and access it via data lakes or warehouses.”
Peart cites Spark’s flexibility as one of the main reasons for its success with customers across industries keen to glean insights from the ingestion of historical data blended with real time threat analysis and threat detection. At last year’s Spark summit in San Francisco, Databricks CEO Ali Ghodsi highlighted the capability for Spark to provide a user interface dashboard in healthcare to analyse x-ray scans. Peart explains: “You train the model to understand the difference between tissue and bone while detecting a break or fracture. There are terabytes of data that go into training the model and then you make that visually appealing with an interface that’s easy for a medical professional to use and highlight anomalies.”
Peart maintains the philosophy driving the company is its aim to create a better world order through data present in any vertical that needs to be aggregated, modelled and more broadly understood to lead to a better consumer experience. “Machine learning and AI could help feed people by optimising crop production, make people healthier by improving care and treatment through algorithms. All of these are about improving our environment and the way we live.”
Peart predicts in the next two to five years people will start to realise the potential of their data by leveraging the power of simple things like churn prevention analysis on to self-driving cars and cures for diseases. “We hope that the use of our platform can provide major leaps forward to help people understand the value of AI and machine learning and prompt them to get their infrastructure ready to maximise data. I’ve been in the industry 20 years and remember the debates around cloud versus on premise. People are waking up to the fact that the cloud is equally secure while offering the benefits of productivity, speed and total cost of ownership allied with the ability to totally transform their business and make better decisions.”
Looking to the future, Databricks is committed to the Apache Spark foundation and enhancing its roll out and adoption on a global scale. “There’s an open source version and pretty much all of our competitors have some sort of deployment,” says Peart. “We want to continue making Databricks the best place to run Spark (already deployed on a massive scale by internet powerhouses such as Yahoo, and eBay). The cool thing is that if you work directly with us you get a 10-times faster version of Spark than most of our competitors can offer and a five-times speed bump in our collaborative environment. Our unified analytics platform with Spark’s unified engine offers the best environment for data analysis, and in a world where people fear vendor lock in, the underlying engine of Spark gives you that flexibility to know that you’re not tied in to Databricks in the future, so if you want to make a change and plug in other Spark-based applications, you can.”
Peart concludes with a message of empowerment: “We want to democratise the ability for every business of any size to be able to utilise machine learning and AI to make those transformative business decisions which should be available to the many not just the few.”