Migrate And Modify Your Data Platform Confidently With Compilerworks
Data Engineering Podcast - A podcast by Tobias Macey - Duminică
Categories:
Summary A major concern that comes up when selecting a vendor or technology for storing and managing your data is vendor lock-in. What happens if the vendor fails? What if the technology can’t do what I need it to? Compilerworks set out to reduce the pain and complexity of migrating between platforms, and in the process added an advanced lineage tracking capability. In this episode Shevek, CTO of Compilerworks, takes us on an interesting journey through the many technical and social complexities that are involved in evolving your data platform and the system that they have built to make it a manageable task. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Schema changes, missing data, and volume anomalies caused by your data sources can happen without any advanced notice if you lack visibility into your data-in-motion. That leaves DataOps reactive to data quality issues and can make your consumers lose confidence in your data. By connecting to your pipeline orchestrator like Apache Airflow and centralizing your end-to-end metadata, Databand.ai lets you identify data quality issues and their root causes from a single dashboard. With Databand.ai, you’ll know whether the data moving from your sources to your warehouse will be available, accurate, and usable when it arrives. Go to dataengineeringpodcast.com/databand to sign up for a free 30-day trial of Databand.ai and take control of your data quality today. We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to dataengineeringpodcast.com/census today to get a free 14-day trial. Your host is Tobias Macey and today I’m interviewing Shevek about Compilerworks and his work on writing compilers to automate data lineage tracking from your SQL code Interview Introduction How did you get involved in the area of data management? Can you describe what Compilerworks is and the story behind it? What is a compiler? How are you applying compilers to the challenges of data processing systems? What are some use cases that Compilerworks is uniquely well suited to? There are a number of other methods and systems available for tracking and/or computing data lineage. What are the benefits of the approach that you are taking with Compilerworks? Can you describe the design and implementation of the Compilerworks platform? How has the system changed or evolved since you first began working on it? What programming languages and SQL dialects do you currently support? Which have been the most challenging to work with? How do you handle verification/validation of the algebraic representation of SQL code given the variability of implementations and the flexibility of the specification? Can you talk through the process of getting Compilerworks integrated into a customer’s infrastructure? What is a typical workflow for someone using Compilerworks to manage their data lineage? How does Compilerworks simplify the process of migrating between data warehouses/processing platforms? What are the most interesting, innovative, or unexpected ways that you have seen Compilerworks used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Compilerworks? When is Compilerworks the wrong choice? What do you have planned for the future of Compilerworks? Contact Info @shevek on GitHub Webiste Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Links Compilerworks Compiler ANSI SQL Spark SQL Google Flume Paper SAS Informatica Trie Data Structure Satisfiability Solver Lisp Scheme Snooker Qemu Java API The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast