Information
Apache Hamilton (incubating) is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is **portable**; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is **expressive**; Apache Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution). To create a DAG, write regular Python functions that specify their dependencies with their parameters. As shown below, it results in readable code that can always be visualized. Apache Hamilton loads that definition and automatically builds the DAG for you!
 
   Functions 
B() and C() refer to function A via their parameters
Apache Hamilton brings modularity and structure to any Python application moving data: ETL pipelines, ML workflows, LLM applications, RAG systems, BI dashboards, and the [Apache Hamilton UI](https://hamilton.apache.org/concepts/ui) allows you to automatically visualize, catalog, and monitor execution. > Apache Hamilton is great for DAGs, but if you need loops or conditional logic to create an LLM agent or a simulation, take a look at our sister library [Burr](https://github.com/apache/burr) . # Installation Apache Hamilton supports Python 3.8+. We include the optional \`visualization\` dependency to display our Apache Hamilton DAG. For visualizations, [Graphviz](https://graphviz.org/download/) needs to be installed on your system separately. \`\`\`bash pip install "sf-hamilton[visualization]" \`\`\` To use the Apache Hamilton UI, install the \`ui\` and \`sdk\` dependencies. \`\`\`bash pip install "sf-hamilton[ui,sdk]" \`\`\` To try Apache Hamilton in the browser, visit [www.tryhamilton.dev](https://www.tryhamilton.dev/?utm_source=README) # Why use Apache Hamilton? Data teams write code to deliver business value, but few have the resources to standardize practices and provide quality assurance. Moving from proof-of-concept to production and cross-function collaboration (e.g., data science, engineering, ops) remain challenging for teams, big or small. Apache Hamilton is designed to help throughout a project's lifecycle: - **Separation of concerns**. Apache Hamilton separates the DAG "definition" and "execution" which lets data scientists focus on solving problems and engineers manage production pipelines. - **Effective collaboration**. The [Apache Hamilton UI provides a shared interface](https://hamilton.apache.org/hamilton-ui/ui/) for teams to inspect results and debug failures throughout the development cycle. - **Low-friction dev to prod**. Use \`@config.when()\` to modify your DAG between execution environments instead of error-prone \`if/else\` feature flags. The notebook extension prevents the pain of migrating code from a notebook to a Python module. - **Portable transformations**. Your DAG is [independent of infrastructure or orchestration](https://blog.dagworks.io/publish/posts/detail/145543927?referrer=%2Fpublish%2Fposts), meaning you can develop and debug locally and reuse code across contexts (local, Airflow, FastAPI, etc.). - **Maintainable DAG definition**. Apache Hamilton [automatically builds the DAG from a single line of code whether it has 10 or 1000 nodes](https://hamilton.apache.org/concepts/driver/). It can also assemble multiple Python modules into a pipeline, encouraging modularity. - **Expressive DAGs**. [Function modifiers](https://hamilton.apache.org/concepts/function-modifiers/) are a unique feature to keep your code [DRY](https://en.wikipedia.org/wiki/Don't_repeat_yourself) and reduce the complexity of maintaining large DAGs. Other frameworks inevitably lead to code redundancy or bloated functions. - **Built-in coding style**. The Apache Hamilton DAG is [defined using Python functions](https://hamilton.apache.org/concepts/node/), encouraging modular, easy-to-read, self-documenting, and unit testable code. - **Data and schema validation**. Decorate functions with \`@check_output\` to validate output properties, and raise warnings or exceptions. Add the \`SchemaValidator()\` adapter to automatically inspect dataframe-like objects (pandas, polars, Ibis, etc.) to track and validate their schema. - **Built for plugins**. Apache Hamilton is designed to play nice with all tools and provides the right abstractions to create custom integrations with your stack. Our lively community will help you build what you need! # Apache Hamilton UI You can track the execution of your Apache Hamilton DAG in the [Apache Hamilton UI](https://hamilton.apache.org/hamilton-ui/ui/). It automatically populates a data catalog with lineage / tracing and provides execution observability to inspect results and debug errors. You can run it as a [local server](https://hamilton.apache.org/hamilton-ui/ui/#local-mode) or a [self-hosted application using Docker](https://hamilton.apache.org/hamilton-ui/ui/#docker-deployed-mode).
   
   
   
DAG catalog, automatic dataset profiling, and execution tracking
## Get started with the Apache Hamilton UI 1. To use the Apache Hamilton UI, install the dependencies (see \`Installation\` section) and start the server with \`\`\`bash hamilton ui \`\`\` 2. On the first connection, create a \`username\` and a new project (the \`project_id\` should be \`1\`). 
3. Track your Apache Hamilton DAG by creating a \`HamiltonTracker\` object with your \`username\` and \`project_id\` and adding it to your \`Builder\`. Now, your DAG will appear in the UI's catalog and all executions will be tracked! \`\`\`python from hamilton import driver from hamilton_sdk.adapters import HamiltonTracker import my_dag # use your \`username\` and \`project_id\` tracker = HamiltonTracker( username="my_username", project_id=1, dag_name="hello_world", ) # adding the tracker to the \`Builder\` will add the DAG to the catalog dr = ( driver.Builder() .with_modules(my_dag) .with_adapters(tracker) # add your tracker here .build() ) # executing the \`Driver\` will track results dr.execute(["C"]) \`\`\` # Documentation & learning resources * See the [official documentation](https://hamilton.apache.org/) to learn about the core concepts of Apache Hamilton. *  Consult the [examples on GitHub](https://github.com/apache/hamilton/tree/main/examples) to learn about specific features or integrations with other frameworks. * The [DAGWorks blog](https://blog.dagworks.io/) includes guides about how to build a data platform and narrative tutorials. * Find video tutorials on the [DAGWorks YouTube channel](https://www.youtube.com/@DAGWorks-Inc) * Reach out via the [Apache Hamilton Slack community](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g) for help and troubleshooting # How does Apache Hamilton compare to X? Apache Hamilton is not an orchestrator ([you might not need one](https://blog.dagworks.io/p/lean-data-automation-a-principal)), nor a feature store ([but you can use it to build one!](https://blog.dagworks.io/p/featurization-integrating-hamilton)). Its purpose is to help you structure and manage data transformations. If you know dbt, Apache Hamilton does for Python what dbt does for SQL. Another way to frame it is to think about the different layers of a data stack. Apache Hamilton is at the **asset layer**. It helps you organize data transformations code (the **expression layer**), manage changes, and validate & test data.
| Layer | Purpose | Example Tools | 
|---|---|---|
| Orchestration | Operational system for the creation of assets | Airflow, Metaflow, Prefect, Dagster | 
| Asset | Organize expressions into meaningful units (e.g., dataset, ML model, table) | Apache Hamilton, dbt, dlt, SQLMesh, Burr | 
| Expression | Language to write data transformations | pandas, SQL, polars, Ibis, LangChain | 
| Execution | Perform data transformations | Spark, Snowflake, DuckDB, RAPIDS | 
| Data | Physical representation of data, inputs and outputs | S3, Postgres, file system, Snowflake | 
 
                                             
                                                        
                                                        
                                                    
                                                    
                                                 
                                                        
                                                        
                                                    
                                                    
                                                 
                                                        
                                                        
                                                    
                                                    
                                                 
                                                        
                                                        
                                                    
                                                    
                                                