Big Data: How to ...and Now What??
In this session we will discuss what big data is, the options for where to deploy a Hadoop cluster, how to get and manipulate data, and basic analytics in the Hadoop environment, along with moving data in and out of a SQL Server environment.
In this session I will show you how to set up a Big Data environment, how to load and transform data, and how to access the data using Hadoop and Microsoft utilities. I will leave you with all of the pieces, parts, instructions, and additional experiments to set up your own environment and get some actual stick time. This is part one in a three part series which will take you from setting up your environment to basic analytics to applying analytics to business decision making.
Part 1 - Let's Get Started
In this session I will build on the exercises in Part 1 of this series. I will cover more advanced data acquisition and transformations, building 'tidy' data sets, plotting and visualizations, and basic data mining using a variety of tools and utilities. This is part two in a three part series which takes you from setting up your environment to basic analytics to applying analytics to business decision making
Part 2 - Beyond the Basics
In this session I will discuss scenarios for applying analytics to business decision making with a focus on determining the questions to ask of your data, the problems it can solve, and the value it can bring to your enterprise. This is part three in a three part series which has taken you from setting up your environment to basic analytics to applying analytics to business decision making.
Part 3 - Applying Big Data Analytics to Business Decisions
All of my past presentations are listed below. I am reworking these pages to reformat my content as short videos that I believe are easier to use and easier to search.
"Go to" links are provided for the topics that are done or in-progress.
From Query to Disk:
The 'out of the box' SQL Server installation and configuration is generally useless in an enterprise of any size and will lead to unbelievably poor query performance. In this hour we will discuss the basics of SQL Server deployment and configuration, the tools at your disposal to optimize query execution, and some tricks to make your queries perform at their best...all from the 'magic wand' approach where we start with 'most optimal' then adjust as necessary to suit the circumstances. Expect to learn a little bit of everything...from SQL Server deployment and configuration to indexing and statistics to stored procedures and functions to query design to database architecture.
Intro to Big Data
This is a primer for anyone not clear on what Big Data is and the basics of implementing a Big Data solution, focusing on the Microsoft HDInsight offering.
Big Data Series: Azure
Managing your queries, data distribution, and environment for best performance
Data Mining with Excel
Learn how to use the data mining features available in Excel with particular emphasis on the "data science workflow" and how the Excel components fit into it. This is for anyone wondering how or where to start
Azure Machine Learning
Get a grip on the very cool Azure Machine Learning tools and a primer on the data science workflow.
In this slide deck I offer my musings on the course and importance of the Big Data movement based on interviews with many industry experts.