top of page

Big Data Part 1:

Let's Get Started...

 

In this presentation I cover the procedures to...

    ...set up an HDInsight environment

    ...deploy and configure related tools and utilities

    ...import data

    ...process data using hive, pig, and map-reduce

    ...connect via Excel

    ...execute all of the above using Azure PowerShell 

 

That is a huge amount of material and, as such, I was not able to go into much detail so I am posting a series of short videos that show each of the topics and tasks in far greater detail. These videos are intended to be used as learning and reference materials, each covering a single topic, named and ordered to make the end to end process as intuitive and simple as possible.

 

The PowerShell files can be edited in the utility of your choice. It is obviously better if the environment supports Azure commandlets so I chose to download and install the Microsoft Azure PowerShell ise. I just checked and it appears that the standard Microsoft PowerShell ise has support for the Azure commandlets but I cannot confirm or deny that the installation of the Azure version also affected the standard version. In any case, I chose to use the Azure PowerShell ise and Visual Studio, the prior for executing single line queries, running configuration scripts, and the latter for full function batch files with the added benefit of intellisense and integrated source control.

 

The IHBD_HDInsight.zip file contains the full Visual Studio solution with all of the .PS1 files discussed in the presentation. These are intended to be used as-is or as templates for further development. My intent is to reduce the brain damage and get everyone off to a solid start without having to know much about PowerShell or, for that matter, Azure and HDInsight.

 

So...let's get started!

 

As demonstrated in the slide deck, I have created a Visual Studio project that contains all of the PowerShell scripts. I have zipped the VS solution folder along with two dependent folders, one for configuration files and one for data files. After downloading, unzip the file and place all three folders on the C: drive of your computer. If you choose to put them at a different location, you will have to edit the scripts and possibly the Visual Studio .sln file to match the new locations.

 

NOTE: If you are using Visual Studio 2012, you will need to prepare your development enviroment before you will be able to open the VS solution without errors. This includes configuring Visual Studio for PowerShell support, installing Azure PowerShell sdi, installing 1 or 2 Azure storage explorer utilities (Microsoft provides none). Please see the slide deck for instructions. If you choose not to use Visual Studio you can grab the .ps1 files from the HDInsight project folder and edit them in the utility of your choice.

 

If you are using a prior version of Visual Studio or choose not to use VS at all, there is a separate zip file with just the PowerShell scripts which you can add into a project or the PS editor of your choice.

 

The zip files can be downloaded from this link:

 

If you have any issues with any of the folders, files, etc., please shoot me an email at admin@iheartbigdata.com. I have not tested everything in all environments so I would appreciate the input in debugging deployment and execution issues. I hope to make the entire process very painless and more-or-less bomb proof. If this is not your experience, again, let me know!

What is Big Data, where do I start, and what's in it for me?

 

 

How to Prepare Your Local Environment!

 

 

PowerShell Script Library

 

 

bottom of page