Azure Data Catalog

Data Catalog provides a cloud-based service into which a data source can be registered. The data remains in its existing location, but a copy of its metadata is added to Data Catalog, along with a reference to the data-source location. The metadata is also indexed to make each data source easily discoverable via search, and understandable to the users who discover it.

After a data source has been registered, its metadata can then be enriched, either by the user who registered it or by other users in the enterprise. Any user can annotate a data source by providing descriptions, tags, or other metadata, such as documentation and processes for requesting data source access. This descriptive metadata supplements the structural metadata (such as column names and data types) that's registered from the data source.

Registering data sources makes it easier to discover and understand the sources and how they are used. Enterprise users might need data for business intelligence, application development, data science, or other tasks. They can use the Data Catalog to quickly find data that matches their needs, evaluate its fitness for the purpose, and consume the data by opening the data source in their tool of choice.

At the same time, users can contribute to the catalog by tagging, documenting, and annotating data sources that have already been registered. They can also register new data sources, which can then be discovered, understood, and consumed by the community of catalog users.

Azure Data Catalog
Register data sources

Registration is the process of extracting metadata from the data source and copying it to the Data Catalog service. The data remains where it currently resides, under the control of the administrators and policies of the current system.

To register a data source, do the following:

  1. In the Azure Data Catalog portal, start the Data Catalog data source registration tool
  2. Sign in with your Azure Active Directory credentials
  3. Select the data source you want to register

For more information, see the Get Started with Azure Data Catalog tutorial.

Azure Event Grids

Azure Event Grid allows you to easily build applications with event-based architectures. You select the Azure resource you would like to subscribe to, and give the event handler or WebHook endpoint to send the event to. Event Grid has built-in support for events coming from Azure services, like storage blobs and resource groups. Event Grid also has custom support for application and third-party events, using custom topics and custom webhooks.

You can use filters to route specific events to different endpoints, multicast to multiple endpoints, and make sure your events are reliably delivered.

Azure Event Grid

Azure Event Hubs

The common role that Event Hubs plays in solution architectures is the "front door" for an event pipeline, often called an event ingestor. An event ingestor is a component or service that sits between event publishers and event consumers to decouple the production of an event stream from the consumption of those events.

Azure Event Hub

Event Hubs are used for hyper-scale ingestion of event data from a variety of connected devices and services. Data is sent to an Event Hub using a virtual endpoint address defined by the publisher, and can be partitioned and read by multiple consumers to increase scaling. Each of the 8-32 partitions operate independently and may have different growth rates and retention policies. Valid tokens are required with every request, which consist of a Shared Access Signature (SAS) and the publisher name (normally the device’s unique identifier).

When an Event Hub is created, a consumer group (up to 20) is automatically created. When a consumer reads from a consumer group, it creates an offset marker in the stream. Each consumer groups manages its own offset and reads all the partitions at its own pace. You can replay messages in a stream by resetting the offset. Message data persists for up to a week, and messages are not deleted from the stream when read by a consumer. There is no dead letter-type processing for poison messages – this needs to be done by the application.

When consuming messages from an Event Hub using code, optimal performance is achieved if the number of partitions is a multiple of the number of event processors; ie if there are 10 partitions, there should be 1, 2, 5, or 10 event processors.

When consuming messages using a platform services such as Azure Stream Analytics, all the complex processing is handled by the service. In this case, the Event Hub stream is assigned as an input to the Azure Stream Analytics job.

You can connect to the Event Hub using either:

Azure Functions

Azure Functions is a serverless computing environment for scaling and pay-as-you-go service, deployed as Function Apps in Azure. The full App Service settings include continuous deployment, CORS, Authentication/Authorization, API definition, etc. It is used to intercept incoming data, and can be triggered by the arrival of data on several different Azure services.

Function names must be unique within a Function app, as they will each create a separate folder. Code can be shared between functions by placing it in a shared folder off the root of the Function App. To reference it in the function, you must include a #load “filename” such as #load “..\shared\common.csx”.

The functions can be invoked with timer triggers based on Cron expressions. The function runtime is single-threaded, but multiple instances can be invoked in parallel.

Functions have several common external .NET assemblies automatically added by the Azure Function runtime. User created assemblies can also be added by uploading them to a bin directory relative to the root folder of the function. The assembly can then be referenced by including the #r “assemblyname” directive at the top of the function.

Generic Webhook Functions can be called directly from Azure Logic Apps and provide a mechanism for introducing functionality not readily available in a Logic App. It is also possible to reference NuGet packages by added a dependency to the function’s project.json file.

The easy way to consume JSON payload on a generic webhook is to use JSON2CSHARP to create a C# class from JSO payload and load into the shared folder. Read the request payload and use JsonConvert.DeserializeObject.

Function Apps provide a Monitor tab which allows you to see invocation logs and drill into message content. You can also view live streaming data on function performance and usage.

Azure HTTP Connector

You can use the HTTP Connector to connect to any custom API that is not already wrapped by a managed SaaS connector. You will need to specify:

Azure Internet of Things (IoT) Hub

Internet of Things

Security is a key issue for IoT devices and Hubs. Azure IoT Hub provides a secure bidirectional communications mechanism to and from devices at scale, using a range of open source protocols. Each Azure subscription can have up to 10 IoT Hubs. Azure IoT Hubs do not support auto-scaling, so monitoring and alerts are required to understand performance/throughput characteristics.

Azure IoT Hub implements a device registry to provide per device authentication, access control and life-cycle management.Azure IoT Hub's combination of device initiated access control, per-device security and trusted peer-to-peer communication minimizes the attack vector. If the device will support full cryptographic certificate exchange, then security can be enhanced using X.509 certificates.

New IoT Devices need to be registered with the solution through an administrative user experience that can associate the device with an appropriate instance record. This creates message stores for both messages to (C2D) and from (D2C) the device:

IoT gateways can be deployed when a device can’t communicate using one of the IoT Hub supported protocols.

Azure IoT Hub supports shared access policies that can be used to create SAS tokens with a specified TTL. Policy keys can be regenerated to revoke access to devices using an associated SAS token.

Multiple SDKs built over cross-platform C code are available for IoT devices, as are other programs. These can be developed using any language, deployed on any platform that provides telemetry to the enterprise, and may accept commands from the enterprise. Azure IoT Hubs also come with a REST API.

Azure IoT Hub supports IP filtering to allow the creation of white/black lists using CIDR format IP Address Ranges that can either be allowed or rejected.

Azure IoT Hub supports the concept of a Device Twin, a JSON document stored in Azure DocumentDB that contains additional state information and other metadata about a device. These provide a mechanism to maintain the configuration of a device:

Azure Stream Analytics

Azure Stream Analytics

Azure Stream Analytics is an event-processing engine that allows you to examine high volumes of data streaming. Incoming data can be from devices, sensors, websites, social media feeds, applications, and more. It also supports extracting information from data streams, identifying patterns, and relationships. You can then use these patterns to trigger other actions downstream, like alerts, feeding information to a reporting tool, or storing it for later use.

Azure Stream Analytics can be used in:

Your Stream Analytics job can use all or a selected set of inputs and outputs.

Azure Notification Hub

Azure Notification Hub

Azure Notification Hubs provide secure push notifications for mobile devices. Consumer usage is normally for engagement and marketing events, while the back-end uses it to notify users that they need to participate in an active process.

You can use back-end Installation process instead of having the devices register with the notification hub directly. This lets the back-end to maintain the tags (User/Groups) associated with each installation. Push notifications are then sent to the appropriate tag - User GUID for a notification to an individual or Group GUID to push to multiple users.

You can use templates to avoid creating platform specific payload for push notifications. The payload through the Notifications table will be in XML format, the data elements will be converted into fields as the notification is pushed.


This exposes directories that should be used to transfer data files; SFTP is preferred. The exposed directories should be encrypted, and all data files should be virus checked, compressed, and encrypted before being transferred. Ideally, this can be replaced with Message Queues or secure Azure Storage.

On-Premises Data Gateway

On-Premises Data Gateway acts as a bridge for quick and secure data transfer between on-premises data and Azure cloud services such as Logic Apps, Microsoft Flow, PowerApps and PowerBI. After installing the on-premises side of the gateway, you need to create the Azure matching resource.