Using Visual Studio to Work With Hive Scripts

I was building some HDInsight Azure Data Factory pipelines the other day and the Hive scripts were not doing what I wanted to them to do (although they were doing what was being asked).I needed to isolate the Hive scripts to see what they were doing at each stage of the script. I didn’t want to have to invoke the whole ADF just to look at the Hive section. I was in Visual Studio already so troubleshooting was easy from here. Here’s how I solved it.

  • Open up Server Explorer
  • Connect to your Azure Subscription
  • Navigate to the HDInsight node
  • Expand your clusters
  • Right click on the cluster of choice and select “Write a Hive Query”
  • Paste/Write your query

  • writehivequery

    Another quirk of my Hive Script was that I was passing in values from the activity to the Hive Script itself. I knew what the values were that I was using but didn’t want to change the script to use hard coded values instead of the reference to the config. Here’s what I mean

    hivewithparam

    I wanted to pass in my values as parameters to the script in Visual Studio. No problem. Once you have finished writing your query

    Hit Submit | Advanced

    submitadvanced

    Now you can enter your configuration key value pairs

    addparams

    This for me was a really quick and useful way of troubleshooting my Hive scripts. Hope you find it useful.

    Config Files in Azure Data Factory

    As you know I have started to use Visual Studio to publish my ADF solutions to the Azure cloud. I like it for ease of use and integration with TFS. I came across a piece of functionality that I had not seen before but currently is really only half baked (it works fine but the implementation needs work).
    The feature is configs. On your project right click and choose Add | New Item.

    config

    configstruct

    The properties are easy to specify and I would imagine the most often specified property will be the connectionString for a linked service. Like this

    configwithprops

    OK so that part is easy and makes sense. The next part is where the dots are not joined up yet. As you can see from my project below I have two config files. One for Dev and one for UAT

    configsinproject

    When I build the project in the Release/Debug folder it creates sub folders for each of the configurations. The configuration parameters have been applied to my JSON objects. I can then take these files from the relevant folder and deploy them to the correct ADF using Powershell or the Portal

    config folders

    This seems very long winded and not really a good experience. Another way to do it is:

  • Right Click on your project | Unload Project
  • Right Click Project | Edit xxxx.dfproj
  • Find the section ADFConfigFileToPublish
  • Insert your config name (I.e.DevConfig.json)
  • Save | reload the project

  • In the future I fully expect a better “deployment with config” experience.

    Pig and Hive Scripts in Azure Data Factory

    I work a lot with HDInsight activities in Azure Data Factory and this involves calling out to Pig and Hive Scripts passing in parameters to be used. It works very well or at least it did until I started to use Visual Studio for my ADF publishing.

    I built my ADF solution using the Visual Studio templates. I then tried to publish my project to Azure. This is where it went wrong. Below is the error

    publish_Failure

    As you can see from my code I am passing in the location of the .hql file inside my activity and if I was publishing this through PowerShell or the Portal itself then this would be enough but Visual Studio is different. Your ADF project wants a reference to your .hql or .pig files. At publish time Visual Studio will publish the files in your project to the locations specified in your Activity. I like it this way as it means I can maintain everything all in one place and not have to alter the scripts in some other tool.

    There’s actually two ways to reference these files

  • At the project level. Add | Existing item
  • In your project add a reference to a Pig AND/OR Hive project
  • hivepigprojects

    This caught me out at first but the more I think about it the more I like it. Hopefully this will save somebody else 20 minutes of head scratching.