We take a closer look into Databricks who was featured in our list of the top ten technology startups in this month's magazine.
With origins in both academia and the open-source community, Databricks has always been devoted to simplifying data, sharing knowledge and pursuing truths. Founded in 2013 by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks brings together data engineering, science and analytics on an open, unified platform so data teams can collaborate and innovate faster.
More than five thousand organizations worldwide —including Shell, Conde Nast and Regeneron — rely on Databricks as a unified platform for massive-scale data engineering, collaborative data science, full-lifecycle machine learning and business analytics. See real-world examples.
Venture-backed and headquartered in San Francisco (with offices on four continents) and hundreds of global partners, including Microsoft, Amazon, Tableau, Informatica, Capgemini and Booz Allen Hamilton, Databricks is on a mission to help data teams solve the world’s toughest problems.
Databricks offers its services in the following industries:
Advertising and marketing - The growth of digital advertising and marketing data has created a plethora of opportunities to optimize campaign performance and advertising spend across direct advertising and auctions. Databricks helps you manage the traffic jam of data caused by multiple sources of data such as ad inventory, web traffic, click logs, CRM, and behavioural data to uncover insights that improve audience targeting, pricing strategies, and conversion rates — increasing campaign ROI and creating new revenue opportunities.
Energy and utilities - From highly-instrumented wells to the proliferation of smart grid technologies, data is becoming a critical element in the discovery, extraction, and delivery of energy — whether it is oil, natural gas, or even wind and solar. Databricks provides a virtual analytics platform that enables real-time analysis of operational and customer data at scale, making modern innovations like predicting weather patterns and optimizing the energy grid a reality.
Enterprise technology and software - You build software to support business processes, and your software captures billions of data points daily. To create a positive impact on your customers’ businesses, you need to quickly take the pulse of any business through data and prescribe actions to generate value.
Financial services - Enable financial services companies to leverage data analytics, machine learning and AI to make smarter, faster decisions that reduce risk and protect against fraud — powered by the Databricks Unified Data Analytics Platform.
Gaming - Analyze player, purchase and behavioural data to create targeted offers that convert, prevent gameplay issues such as slow load times and abusive behaviour with real-time insights and AI and predict the content players want by modelling player, session and social data at scale.
Healthcare - Enable healthcare organizations to drive innovations in patient care while reducing costs with big data analytics, machine and AI-powered by the Databricks Unified Data Analytics Platform. Build anomaly detection models on top of claims, billing, and behavioural data to identify and prevent payment of fraudulent claims, unnecessary treatments or potential identity theft. Predictively determine best treatments and optimize the quality of care through the aggregation and analysis of relevant patient history and care data across all healthcare channels. Leverage machine learning to drive operational efficiencies across a variety of areas such as preventing patient readmission, predicting bed utilization, transcribing doctors notes and more.
Media and entertainment - Enable media and entertainment companies to leverage big data analytics, machine learning and AI to gain rich insights into customer preferences to increase audience engagement — powered by the Databricks Unified Data Analytics Platform. Analyze viewer, purchase and behavioural data to provide content recommendations that drive viewership, serve viewers the right ad, on the right device and at the right time with real-time insights and AI and prevent churn by identifying the factors that influence loyalty such as stream quality, content preferences and pricing.
Federal government - Enable government agencies to leverage data analytics, machine learning and AI to make smarter, faster decisions — powered by the Databricks Unified Data Analytics Platform. Detect and prevent criminal activities, cyberattacks and national threats with real-time analytics and machine learning, predict the needs of citizens, fight fraud and reduce waste by analyzing demographics, health stats, claims and other public data sets at scale and analyze large volumes of streaming IoT and satellite data to improve disaster recovery, city planning, transportation and more.
Retail and consumer goods - Enable retail and CPG companies to leverage big data analytics, machine learning and AI to create more engaging customer experiences that convert — powered by the Databricks Unified Data Analytics Platform. Use machine learning to mine clickstream, purchase and customer data to provide targeted recommendations that drive lifetime value, improve promo conversion by using big data to serve the right ad, at the right time, to the right person, on the right device and prevent credit card fraud, cyberattacks and other illicit activities by detecting anomalous behaviour with real-time analytics and AI.
Life sciences - Enable life sciences companies to leverage data analytics, machine learning and AI to improve therapeutics while reducing costs — powered by the Databricks Unified Data Analytics Platform. Accelerate drug discovery and improve retargeting efforts by processing and analyzing large cohorts of DNA sequence data along with other biomedical and imaging datasets, build machine learning models on top of diverse sets of real-world data to improve trial design, disease identification, medication adherence and many other use cases and increase marketing and sales effectiveness with the highly targeted prescriber and patient programs using machine learning and predictive analytics.
Databricks and Comcast case study:
As a global technology and media company connecting millions of customers to personalized experiences, Comcast struggled with massive data, fragile data pipelines, and poor data science collaboration. With Databricks including Delta Lake and MLflow, they can build performant data pipelines for petabytes of data and easily manage the lifecycle of 100s of models to create a highly innovative, unique and award-winning viewer experience using voice recognition and machine learning.
Infrastructure unable to support data and ML needs
Instantly answering a customer’s voice request for a particular program while turning billions of individual interactions into actionable insights, strained Comcast’s IT infrastructure and data analytics and data science teams. To make matters more complicated, Comcast needed to deploy models to a disjointed and disparate range of environments: cloud, on-prem, and even directly to devices in some instances.
Massive data: billions of events generated by our entertainment system and 20+ million voice remotes resulting in petabytes of data that need to be sessionized for analysis.
Fragile pipelines: complicated data pipelines that frequently failed and were hard to recover. Small files were difficult to manage, slowing data ingestion for downstream machine learning.
Poor collaboration: globally dispersed data scientists working in different scripting languages struggled to share and reuse code.
Manage management of ML models: Developing, training, and deploying 100s of models was highly manual, slow, and hard to replicate, making it difficult to scale.
Friction between dev and deployment: dev teams wanted to use the latest tools and models while ops wanted to deploy on proven infrastructure.
Automated infrastructure, faster data pipelines with Delta Lake
Comcast realized they needed to modernize their entire approach to analytics from data ingest to the deployment of machine learning models that deliver new features that delight their customers. Today, the Databricks Unified Data Analytics Platform enables Comcast to build rich data sets and optimize machine learning at scale, streamline workflows across teams, foster collaboration, reduce infrastructure complexity, and deliver superior customer experiences.
Simplified infrastructure management: reduced operational costs through automated cluster management and cost management features such as autoscaling and spot instances.
Performant data pipelines with Delta Lake: Delta Lake is used for the ingest, data enrichment, and initial processing of the raw telemetry from video and voice applications and devices.
Reliably manage small files: Delta Lake enabled them to optimize files for rapid and reliable ingestion at scale.
Collaborative workspaces: interactive notebooks improve cross-team collaboration and data science creativity, allowing Comcast to greatly accelerate model prototyping for faster iteration.
Simplified ML lifecycle: managed MLflow simplifies the machine learning lifecycle and model serving via the Kubeflow environment, allowing them to track and manage 100s of models with ease.
Reliable ETL at scale: Delta Lake provides efficient analytics pipelines at a scale that can reliably join historic and streaming data for richer insights.
Delivering personalized experiences with ML
In the intensely competitive entertainment industry, there is no time to press the pause button. Armed with a unified approach to analytics, Comcast can now fast forward into the future of AI-powered entertainment – keeping viewers engaged and delighted with competition-beating customer experiences.
Emmy winning viewer experience: Databricks helps enable Comcast to create a highly innovative and award-winning viewer experience with intelligent voice commands that boosts engagement
Reduced compute costs by 10X: Delta Lake has enabled Comcast to optimize data ingestion, replacing 640 machines with 64 while improving performance. Teams can spend more time on analytics and less time on infrastructure management.
Less DevOps: Reduced number of DevOps full-time employees required for onboarding 200 users from 5 to 0.5.
Higher data science productivity: Fostered collaboration between global data scientists by enabling different programming languages through a single interactive workspace. Also, Delta Lake has enabled the data team to use data at any point within the data pipeline, allowing them to act more quickly in building and training new models.
Faster model deployment: reduced deployment times from weeks to minutes as operations teams deployed models on disparate platforms
Find out more about the innovative startup, here.