Pig and Hive Scripts in Azure Data Factory

I work a lot with HDInsight activities in Azure Data Factory and this involves calling out to Pig and Hive Scripts passing in parameters to be used. It works very well or at least it did until I started to use Visual Studio for my ADF publishing.

I built my ADF solution using the Visual Studio templates. I then tried to publish my project to Azure. This is where it went wrong. Below is the error

publish_Failure

As you can see from my code I am passing in the location of the .hql file inside my activity and if I was publishing this through PowerShell or the Portal itself then this would be enough but Visual Studio is different. Your ADF project wants a reference to your .hql or .pig files. At publish time Visual Studio will publish the files in your project to the locations specified in your Activity. I like it this way as it means I can maintain everything all in one place and not have to alter the scripts in some other tool.

There’s actually two ways to reference these files

  • At the project level. Add | Existing item
  • In your project add a reference to a Pig AND/OR Hive project
  • hivepigprojects

    This caught me out at first but the more I think about it the more I like it. Hopefully this will save somebody else 20 minutes of head scratching.