Snowflake Data for Breakfast Boston 2026: Notes and Takeaways
Notes from Snowflake Data for Breakfast Boston 2026. A three-pillar AI framework, why the semantic layer matters, and a real-world unstructured data pipeline built on Cortex Search.
My conference attendance splits between two worlds: nonprofit performing arts events where we’re often one of the larger orgs in the room, and general tech events like this one where we are frequently dwarfed. The entirety of our data estate might be what some of these companies generate in minutes. But the calculation is the same regardless of scale: ROI versus total cost of ownership, build versus buy. Snowflake was the starting point when I built the data platform at the Boston Symphony Orchestra. It let a small org move fast. How we use it keeps evolving. That’s why I’m here.
Keynote: the three-pillar framework
Nick Pereira opened with a quote that stuck: “Our little sliver of existence aligns with one of the most disruptive technological advances the world may ever see.” His claim was that the drop-off from falling behind on AI will be permanent, unlike previous technology waves. I don’t know if my organization fully shares that urgency yet, but I do. The risk of inaction feels more significant to me than the risk of action, and I’m pushing for us to move accordingly.
The core of the keynote was a three-pillar framework for making AI work in the enterprise:
- Unified data foundation. Structured, semi-structured, and unstructured data in one governed environment. With Iceberg and Delta interoperability, that one environment doesn’t need to be entirely Snowflake native.
- Business logic and context. A semantic layer of rules, KPIs, and business definitions — the “brain” that turns generic AI into AI that actually understands the business. The keynote cited MIT’s “GenAI Divide” report, claiming 95% of AI pilots fail without it.
- AI embedded in workflows. Agents deployed where people already work — Slack, Teams, Streamlit, CLI. ROI comes from embedding, not from standalone tools.
The demo sequence was ambitious: connect to an AWS Glue catalog, create external Iceberg tables, build a dynamic pipeline, generate a semantic view, attach it to an agent, serve it through Snowflake Intelligence. Claimed about four to five minutes end-to-end. Two things stuck with me:
- Iceberg performance parity. The previous 4x performance penalty for external tables has been eliminated. Governance — masking policies, tagging — works identically on Iceberg tables. Dynamic tables on Iceberg can refresh as frequently as every 60 seconds.
- Semantic layer accuracy. They showed a bar chart comparing AI query accuracy with and without a semantic layer. The difference was clear enough to make the point, and it’s the clearest evidence I’ve seen for why the semantic layer is where the real work is.
Pillar 3 will get the most folks excited: the agents, the demos, the workflows. And the audience gathered at a Data for Breakfast event are probably already convinced Pillar 1 is important. But the message underneath was clear: Pillar 2 is where most organizations are weakest, mine included. We have a solid start on the data foundation and we’re beginning to explore more integration use cases, but the semantic layer is new territory for us. Without it, the agents are guessing at business logic. They’re good at it, but the time spent figuring things out is time humans are waiting, and it’s more compute.
Customer story: Agelon Health
One customer session on the agenda, and it hit the altitude I look for at these events: clear business case, technical strategy to follow the thread, and enough technical breadcrumbs to give me something to follow up on.
Agelon works with primary care groups taking financial risk for Medicare patients. Revenue is risk-adjusted by CMS based on disease profile, so undocumented conditions mean the provider isn’t compensated for the actual cost of care, and the patient may not be receiving the right therapy. Alex, their data and engineering lead, walked through an AI system built entirely in Snowflake to find those undocumented diagnoses at scale: claims, labs, electronic health records, and tens of millions of clinical notes feeding a multi-step prompt chain.
“Just SQL all the way down,” Alex said. No vector database. No external infrastructure. “Our DevOps guy wants to get involved in this project, and I just keep telling him it’s not as sexy as you think.” They’d started in the AWS world with RAG servers and vector databases, but there were too many moving pieces. The data was already in Snowflake. They just needed to write prompts.
The numbers were striking. They piloted with 17,000 patients focused on metastatic cancer and recovered roughly $1 million in revenue. They’re now processing 120,000 patients per week, scaling toward all 500,000-600,000 members. Cost per patient chart dropped from ~$10 to under $1, largely from filtering irrelevant documents before processing and matching the right model to each step.
On model selection, Alex was blunt: “The models are moving too fast for fine-tuning.” They abandoned it after initial experiments because frontier models kept outpacing their tuned versions. Claude for coherence and physician-facing output quality. Sonnet 4 for some steps. Llama 4 Maverick for the largest summarization step at one-tenth the cost. Three different models in one pipeline, chosen by fit rather than loyalty. I expect that kind of pragmatism to become more common this year, and the fine-tuning observation stuck with me.
Breakout highlights
Two breakout sessions covered a lot of ground. What resonated:
Semantic view autopilot auto-generates semantic views from existing table metadata and can import from Tableau, Looker, and Power BI. It analyzes incoming queries and suggests improvements. Average setup time drops from roughly six days to two hours. For us, anything that lowers the barrier to building a semantic layer matters — that’s the gap the keynote identified, and this is tooling aimed directly at closing it.
Iceberg and the Horizon Catalog reinforce the open-format direction. Full interoperability with Iceberg and Delta tables, auto-discovery of data assets. This aligns with what we’ve been building on our end.
Governance flowing through to agents was the thread I kept coming back to. In the enterprise agents session, Al Uber demoed Snowflake Intelligence — each agent scoped to specific data and roles, RBAC and masking policies carrying through to responses. The hallucination prevention demo was telling: the agent queried for a product that didn’t exist, got no results, and said so instead of fabricating. The governance story is what makes this enterprise-grade, not the natural language interface.
Cortex Code for data governance was the most practical demo of the day: point it at a table, it identifies PII columns, tags them, and generates masking policies. Changes immediately reflect in Intelligence agents. For a small team without dedicated security engineering, that kind of tooling closes a real gap. Side note: Cortex Code is available as a CLI, not just in the Snowflake UI — I need to try pairing it with Claude Code to see how they complement each other on data work.
What I’m taking back
I’m writing this up almost two weeks after the event. Some of these takeaways are already in motion.
Open formats, confirmed. I’d already moved our pipeline to Apache Iceberg before this event. Seeing Iceberg interoperability as a leading narrative at a Snowflake event, and hearing that the performance penalty for external tables is gone, reinforces the direction. One less thing to second-guess.
Cortex as the path to conversational BI. Snowflake was leaning hard into Cortex Analyst, Cortex Search, and Cortex Agent throughout the day. For us, these tools could be what connects the data platform to the people who need to ask questions of it — fundraising, marketing, education and community programs. Every minute someone spends hunting for a report or waiting for a dashboard to get built is a minute better spent increasing the impact of our work. If I can offer them a way to just ask the question and get the answer, that time goes back to mission. (I’ve since built a light proof of concept connecting our Snowflake data to Claude, which is what we’re using for our AI pilot, to test the pattern before asking the team to invest time in it.)
The semantic layer must come next. This was the through-line of the entire morning. The accuracy demo in the keynote, the autopilot tooling in the breakout, the Agelon architecture where the semantic structure of the prompt chain is what makes the output trustworthy. Better semantic modeling improves accuracy. I think it also reduces compute costs by giving the model less guesswork, though how much depends on scale. Our semantic layer work is early. This event made it clear why I should prioritize it next.
What follows
We can build proofs of concept. We can test these tools. The harder question is what scales, has meaningful impact, and doesn’t open up governance problems before the work can prove itself.
There’s a tension between being entrepreneurial with AI and having all the policies in place before people start using it. We need enough governance to feel like the risk is managed, but we don’t need it all figured out before we begin. The line we’re walking is giving people enough space to explore and learn, while keeping things controlled enough that we’re not creating problems we can’t walk back. I don’t think there’s a clean answer to that balance. It’s something we’re navigating week by week.
Agelon’s model selection discipline resonated. Their 10x cost reduction came from matching the right model to each step, not from optimizing a single one. “The models are moving too fast for fine-tuning” is a sentence I expect to hear more often this year. We’ve already started applying that thinking in our own work, on a much smaller scale, matching models to steps and running bake-offs rather than over-optimizing any single choice.