Run Scenarios to Automate Your Work on Data
Dataiku Launch Program - Step 4
Automation Overview
Watch Your Work Get Itself Done
Dataiku is built as a platform to run data projects in production. That means running them when new data comes in, connecting them to your existing infrastructure to bring data in or out, sending notifications etc.
To encourage collaboration, many of these capabilities are code-free in visual interfaces.
Scenarios
Schedule Your Project to Run Daily
Create a scenario on the project flow you want to schedule to run. If you don’t have a project yet, start here.
A scenario has:
- Steps that are actions you configure to run.
- Triggers that define when to execute a scenario.
- Reporters that send information or alerts about a scenario via a variety of channels.
Scenario Steps
Decide What the Scenario Will Do
Add steps to your scenario. For example, you can:
- Build or clear a dataset
- Train a model
- Verify data quality rules or run checks
- Send messages
- Refresh the cache of charts and dashboards
Scenario steps run sequentially. However, you can control whether a step runs based on the outcome of a check.
Scenario steps documentationTriggers
How Do You Want to Trigger Your Automation?
This will launch the scenario at regular intervals. Example: Repeat every 30 minutes.
Triggers documentationThis starts a scenario whenever a change is detected in a dataset.
Triggers documentationThis runs a query at a specified interval and starts the scenario when the output of the query changes. Python triggers will execute a custom Python script that activates a trigger.
Triggers documentationReporters
Report On Your Scenario Run
Send an email, a Slack message, or more based on your scenario to close the loop and stay on top of potential issues with your scenario runs.
Reporters documentationAutomation Challenges
Be Smart With Automation
Your project doesn’t stop with your flow or hitting play on your scenario; you need to think about the after. As you automate workflows, you’re exposed to risks: ingesting poor-quality data, dataset schemas changing with extra columns added, or models becoming obsolete and drifting.
Automated runs can also be costly for your organization. This is why you may not have access rights to set up scenarios and will have to work with an admin, data engineer, or data ops person at your organization.
Metrics and Checks
Keep Everything Under Control
Metrics are metadata to measure datasets or models, for example. Use these to monitor how an object evolves, for example:
- The number of missing values of a column,
- The size of a folder,
- The accuracy of a model.
You can then set up checks based on these metrics, like whether there is missing data, whether the size of a model stays reasonable, or whether model accuracy doesn’t fall under a specific number.
Learn More About Metrics & ChecksFlow Views
Check Your Flow for Scenarios
Use Flow Views to filter and keep an eye on all the steps in your project included in a scenario.
Read more on all the different Flow ViewsDashboards
Build Visualizations to Share Insights
From your navigation, access your dashboards to share metrics, charts, datasets, team discussions, or even interactive web applications. With dashboard refresh, anybody you share your dashboard with can access the freshest data.
Learn all about building DashboardsWe’re happy you’re here
Join a Community of Dataiku Users
Have a question? Want to explore Dataiku tips? Share an idea you have to improve Dataiku? Explore great stories from our users?
Join the Dataiku CommunityKeep Exploring Dataiku
© 2013 – 2024 Dataiku. All rights reserved Privacy Policy