Sonal’s Newsletter

Thoughts on all things data

Top posts of the year

And their main takeaways

Building Identity Resolution on Snowflake Using Snowpark

58 implied HN points • 19 Jun 23

🕹 Technology Data processing Machine Learning Software Development Open Source Cloud Computing

Building ML pipelines in Snowpark requires using third-party libraries like scikit-learn for machine learning.
Integrating specialized functionalities like graph processing in Snowpark may require additional support or custom solutions.
Adapting a codebase from Apache Spark to Snowpark requires careful consideration and potential restructuring to maintain efficiency and avoid technical debt.

Performance Tuning Snowpark For Identity Resolution On Snowflake

19 implied HN points • 29 Jul 23

🕹 Technology Data processing Database Management

Performance tuning Snowpark on Snowflake can significantly reduce processing time, from half a day to half an hour.
Utilizing the query profiler by Snowflake and making targeted optimizations can have a high impact on performance.
Optimizations like converting UDTFs to UDFs, caching Dataframes, and using batch size annotations can further optimize Snowpark workflows.