Text Analytics of Movie Reviews using Azure Data Lake, Cognitive Services and Power BI (part 2 of 2)

Applicable Business Scenario

Marketing or data analysts who need to review sentiments and key phrases of a very large data set of consumer-based movie reviews.

Applied Technologies

  1. Azure Data Lake Store
  2. Azure Data Lake Analytics
    1. Cognitive Services
  3. Visual Studio with Azure Data Lake Analytics Tools
  4. Power BI Desktop & Power BI Service
  5. SharePoint Online site and preview of Power BI Web Part

Power BI Desktop

Use Power BI Desktop as the report authoring tool.

Data Source

Get Data from Azure Data Lake Store. Retrieve the output of the U-SQL script executed in Azure Data Lake Analytics in part 1 of this blog series.

TextAnalytics-6.png

Data Source to Azure Data Lake Store

textanalytics-7Point to the folder containing the .tsv (tab delimited) files which was the output of the U-SQL script execution.

Provide credentials to an account that has permissions to the Azure Data Lake Store. In this case, it was an Azure AD account.

Queries

Create a query for each .TSV file

textanalytics-8

textanalytics-9

Relationships

Define 1 to many relationship based on the ID of each movie review.

textanalytics-10

Reports/Visualization

Sentiment confidence value for each of the 2000 movie reviews

textanalytics-11

 

textanalytics-12

Publish

Click on ‘Publish’ to upload your report to Power BI Service in the MS cloud. You can view in http://app.powerbi.com with your Office 365 or Microsoft account.

textanalytics-13

SharePoint Online

If you want to publish and share this report to a wide audience via a SharePoint online site, you can leverage the new Power BI Web Part (currently preview as of Feb 2017). I have displayed this report in the latest SPO modern page experience as a publishing page. For each user that views the report must have a Power BI Pro license which is not free.

textanalytics-14

To configure, you need to create a modern publishing page displaying the power BI report via Power BI Web Part (preview).

textanalytics-15

Web Part in Edit Modetextanalytics-16

Enter the report link which you get from Power BI Service at http://app.powerbi.com

textanalytics-17

Options to further extend this solution

  • For the movie reviews .csv file, one can add date/time, movie being reviewed, genre, location and any other descriptive metadata. Thus, supporting more reporting and insights.
  • Overlay this data set against other data sent for correlation such as related news events, weather, popular movies trending, other movie reviews sources, etc. This is to find any cause and effect relationships for diagnostic insights – “Why is this happening?”.
  • To get data from internal network to Azure Data Lake or any Azure storage account, an option is to use the Data Management Gateway. This is installed within the internal network to allow to transfer files and data from other internal data sources with no to little corporate firewall changes.Move data between on-premises sources and the cloud with Data Management Gateway

Closing Remarks

Azure Cognitive Services built into Azure Data Lake Analytics is a suitable option for very high volume, unstructured and complex processing of data. This is such that the scalable computing power is needed. In addition, this priced in a pay-per use model in making it cost-effective in many scenarios. The agility of Azure services allows to experiment, iterate quickly and fail-fast in finding the right technical solution and applying the right techniques and approach. This article highlights how data can be ingested, analyzed/processed, modeled, visualized and then published to a business audience.

Text Analytics of Movie Reviews using Azure Data Lake, Cognitive Services and Power BI (part 1 of 2)

Applicable Business Scenario

Marketing or data analysts who need to review sentiments and key phrases of a very large data set of consumer-based movie reviews.

Applied Technologies

  1. Azure Data Lake Store
  2. Azure Data Lake Analytics
    1. Cognitive Services
  3. Visual Studio with Azure Data Lake Analytics Tools
  4. Power BI Desktop & Power BI Service
  5. SharePoint Online site and preview of Power BI Web Part

Azure Data Lake Store

Upload .csv file of 2000 movie reviews to a folder in Azure Data Lake Store

textanalytics-1

Azure Data Lake Analytics

Execute the following U-SQL script in either the Azure Portal > Azure Data Lake Analytics > Jobs > New Jobs or Visual Studio with Azure Data Lake Analytics Tools.

This script makes reference to the Cognitive Services assemblies. They come out of the box in the Azure Data Lake master database.

TextAnalytics-2.png

U-SQL Script

 The following script reads the moviereviews.csv file in Azure Data Lake Store and then analyzes for sentiment and key phrase extraction. Two .tsv files are produced, one with the sentiment and key phrases for each movie review and another for a list of each individual key phrase with a foreign key ID to the parent movie review.

 REFERENCE ASSEMBLY [TextCommon];
 REFERENCE ASSEMBLY [TextSentiment];
 REFERENCE ASSEMBLY [TextKeyPhrase];

@comments =
 EXTRACT
 Text string
 FROM @"/TextAnalysis/moviereviews.csv"
 USING Extractors.Csv();

@sentiment =
 PROCESS @comments
 PRODUCE
 Text,
 Sentiment string,
 Conf double
 READONLY
 Text
 USING new Cognition.Text.SentimentAnalyzer(true);

@keyPhrases =
 PROCESS @sentiment
 PRODUCE
 Text,
 Sentiment,
 Conf,
 KeyPhrase string
 READONLY
 Text,
 Sentiment,
 Conf
 USING new Cognition.Text.KeyPhraseExtractor();

@keyPhrases = SELECT *, ROW_NUMBER() OVER () AS RowNumber
 FROM @keyPhrases;
 OUTPUT @keyPhrases
 TO "/TextAnalysis/out/MovieReviews-keyPhrases.tsv"
 USING Outputters.Tsv();

// Split the key phrases.
 @kpsplits =
 SELECT RowNumber,
 Sentiment,
 Conf,
 T.KeyPhrase
 FROM @keyPhrases
 CROSS APPLY
 new Cognition.Text.Splitter("KeyPhrase") AS T(KeyPhrase);

OUTPUT @kpsplits
 TO "/TextAnalysis/out/MovieReviews-kpsplits.tsv"
 USING Outputters.Tsv();

Azure Portal > Azure Data Lake Analytics  U-SQL execution

Create a new job to execute a U-SQL script.
TextAnalytics-3.png

TextAnalytics-4.png

Visual Studio Option

You need the Azure Data Lake Tools for Visual Studio. Create a U-SQL project and paste the script. Submit the U-SQL script to the Azure Data Lake Analytics for execution. The following shows the successful job summary after the U-SQL script has been submitted.

TextAnalytics-5.png

Click here to Part 2 of 2 of this blog series

Azure Search: Pushing Content to an Index with the .NET SDK.

Blog Series

  1. Azure Search Overview
  2. Pushing Content To An Index with the .NET SDK

I hold the opinion that for a robust indexing strategy, you would likely end up writing a custom batch application between your desired data sources and your defined Azure Search index. The pull method currently only supports data sources that reside in specific Azure data stores (as of Feb 2017):

  • Azure SQL Database
  • SQL Server relational data on an Azure VM
  • Azure DocumentDB
  • Azure Blob storage, Table storage

I would assume many at this time would have desired content in websites databases and LOB applications outside of these Azure data stores.

Azure Search .NET SDK

This article Upload data to Azure Search using the .NET SDK gives great guidance and is what I used, but here’s my specific implementation approach.

To get started, first create a .NET project.Azure Search Pushing Content to an Index with the .NET SDK-1

Install from NuGet
Azure Search Pushing Content to an Index with the .NET SDK-2

My project with the Microsoft.Azure.Search library
Azure Search Pushing Content to an Index with the .NET SDK-3

 

To start coding, define your search index by creating a model class. I created a generic index schema. I will use this to define and create a new search index in the Azure Search Service. And to hold a list of records of movies as my searchable content.

[SerializePropertyNamesAsCamelCase ]
    public partial class IndexModel
    {
        [Key]
        [IsRetrievable(true)]
        public string Id { get; set; }

        [IsRetrievable(true), IsSearchable, IsFilterable, IsSortable]
        public string Title { get; set; }

        [IsRetrievable(true), IsSearchable]
        [Analyzer(AnalyzerName.AsString.EnLucene)]
        public string Content { get; set; }

        [IsFilterable, IsFacetable, IsSortable]
        public string ContentType { get; set; }

        [IsRetrievable(true)IsFilterable, IsSortable, IsSearchable]
        public string Url { get; set; }

        [IsRetrievable(true)IsFilterable, IsSortable]
        public DateTimeOffset? LastModifiedDate { get; set; }

        [IsRetrievable(true)IsFilterable, IsSortable]
        public string Author { get; set; }

    }

Next, I do 3 major steps in the Main method of the console app

  1. Create mock data, as if this data was retrieved from a data source.
  2. Create and index, if one not already exists, based on the index model class
  3. Update the index with new or updated content.
public static void Main(string[] args)
        {

            // Mock Data
            List<IndexModel> movies = new List<IndexModel>
            {
                new IndexModel()
                {
                    Id = "1000",
                    Title = "Star Wars",
                    Content = "Star Wars is an American epic space opera franchise, centered on a film series created by George Lucas. It depicts the adventures of various characters a long time ago in a galaxy far, far away",
                    LastModifiedDate = new DateTimeOffset(new DateTime(1977, 01, 01)),
                    Url = @"http://starwars.com"
                },
                new IndexModel()
                {
                    Id = "1001",
                    Title = "Indiana Jones",
                    Content = @"The Indiana Jones franchise is an American media franchise based on the adventures of Dr. Henry 'Indiana' Jones, a fictional archaeologist. It began in 1981 with the film Raiders of the Lost Ark",
                    LastModifiedDate = new DateTimeOffset(new DateTime(1981, 01, 01)),
                    Url = @"http://indianajones.com"
                },
                new IndexModel()
                {
                    Id = "1002",
                    Title = "Rocky",
                    Content = "Rocky Balboa (Sylvester Stallone), a small-time boxer from working-class Philadelphia, is arbitrarily chosen to take on the reigning world heavyweight champion, Apollo Creed (Carl Weathers), when the undefeated fighter's scheduled opponent is injured.",
                    LastModifiedDate = new DateTimeOffset(new DateTime(1976, 01, 01)),
                    Url = @"http://rocky.com"
                }
            };

            AzureSearch.CreateIndexIfNotExists<IndexModel>("movies");

            AzureSearch.UpdateIndex("movies", movies);

            Console.WriteLine("Enter any key to exist");
            Console.ReadKey();
        }

In the Azure Portal, you will see the outcomes

  • The ‘movies’ index has been created along with 3 documents as expected.
    Azure Search Pushing Content to an Index with the .NET SDK-4
  • I find that the document count value takes several minutes’ or more to be updated, but the indexing is immediate.
  • The fields has been defined along with its type and attributes based on the index model class
    Azure Search Pushing Content to an Index with the .NET SDK-5
  • To test the index, use the Search Explorer
    Azure Search Pushing Content to an Index with the .NET SDK-6

For further code snippet details of the following method calls. I made this method dynamic such that you pass in the Type of the index model as T. Then the  FieldBuilder.BuildForType() will build out the index schema.

AzureSearch.CreateIndexIfNotExists<IndexModel>("movies");

public static Boolean CreateIndexIfNotExists<T>(string indexName)
        {
            bool isIndexCreated = false;

                List<string> suggesterFieldnames = new List<string>() { "title" };

                var definition = new Index()
                {
                    Name = indexName,
                    Fields = FieldBuilder.BuildForType<T>(),
                    Suggesters = new List<Suggester>() {
                        new Suggester() {
                            Name = "Suggester",
                            SearchMode = SuggesterSearchMode.AnalyzingInfixMatching,
                            SourceFields = suggesterFieldnames
                        }
                   }
                };

                SearchServiceClient serviceClient = CreateSearchServiceClient();

                if (!serviceClient.Indexes.Exists(indexName))
                {
                    serviceClient.Indexes.Create(definition);
                    isIndexCreated = true;
                }
                else
                    isIndexCreated = false;
}

AzureSearch.UpdateIndex("movies", movies);
inner method call:
private static void UploadDocuments(ISearchIndexClient indexClient, List<IndexModel> contentItems)
        {
                var batch = IndexBatch.MergeOrUpload(contentItems);
                indexClient.Documents.Index(batch);
}

 

In conclusion, I generally recommend the push approach using the Azure Search .NET SDK as there are more control and flexibility. As I created a CreateIndex method, you should create a delete index method. This helps during development process as you iterate upon defining your index schema. Even in production scenarios, it can be appropriate to delete your index, re-create index with an updated schema and then re-index your content.


Azure Search Overview

Blog Series

  1. Azure Search Overview
  2. Pushing Content To An Index with the .NET SDK

Azure Search is a platform-as-a-service offering. This requires code and configuration to set up and use.

Applicable corporate scenarios

  • Enterprise search on many repositories of data or files that are intended to be available for a wide audience. A lightweight one-stop shop for finding any information chosen to be indexed
  • Add search functionality to an existing application that does not have its own search functionality. Such as public internet company website.
  • A custom search against multiple sources of log files.

Search Service

In the Azure Portal, create a new Azure Search service. The free tier should be sufficient for proof of concept purposes.

AzureSearchOverview-1

The Index

The index is the heart of any search engine where content is stored in a way that is searchable for fast and accurate retrieval. The index needs to be configured with a schema that defines custom fields by its name, type, and attribute.

Overview blade:
AzureSearchOverview-2

Index Fields:
azuresearchoverview-3

Field Types

Type & Description

Edm.String – Text that can optionally be tokenized for full-text search (word breaking, stemming, etc).

Collection(Edm.String) – A list of strings that can optionally be tokenized for full-text search. There is no theoretical upper limit on the number of items in a collection, but the 16 MB upper limit on payload size applies to collections.

Edm.Boolean – Contains true/false values.

Edm.Int32 – 32-bit integer values.

Edm.Int64– 64-bit integer values.

Edm.Double – Double-precision numeric data.

Edm.DateTimeOffset – Date time values represented in the OData V4 format (e.g. yyyy-MM-ddTHH:mm:ss.fffZ or yyyy-MM-ddTHH:mm:ss.fff[+/-]HH:mm).

Edm.GeographyPoint -A point representing a geographic location on the globe.


Field attributes

Attribute & Description

Key – A string that provides the unique ID of each document, used for document look up. Every index must have one key. Only one field can be the key, and its type must be set to Edm.String.

Retrievable – Specifies whether a field can be returned in a search result.

Filterable – Allows the field to be used in filter queries.

Sortable – Allows a query to sort search results using this field.

Facetable – Allows a field to be used in a faceted navigation structure for user self-directed filtering. Typically fields containing repetitive values that you can use to group multiple documents together (for example, multiple documents that fall under a single brand or service category) work best as facets.

Searchable – Marks the field as full-text searchable.

You can’t change an existing field. You can only add to the schema. If you have change existing fields, you have to delete the index, re-create with the new specifications and re-index your content. I suggest you automate this process by creating your own management app with the.NET SDK or REST API.

Refer to further guidance and sample code at Create an Azure Search index using the .NET SDK


Indexing

To populate the index from data sources, you need indexers. There are two approaches of indexing:

  1. Pushing to the index programmatically using the REST API or.NET SDK
  2. Pulling into the index with the Search Services’ indexer. No need for custom code.
Indexing Approach Pros Cons
Push to index More flexible
Support any kind of data source
You manage change tracking.
Need to write custom code using REST API or.NET SDK.
Pull into index Change tracking is mostly handled for you.
No need to write custom code.
Limited number of data sources that reside in Azure

To setup a pull into the index, you can configure through the Azure Portal through the Import Data. The alternative and more comprehensive way to configure is through the REST API or.NET SDK.

Import Data

azuresearchoverview-4

azuresearchoverview-5

Must have already uploaded supported file types into your blob containers such as PDF and MS Office Documents.

A custom index schema will be generated for you based on the supported metadata of the specific data source. In this case, it is blob storage.azuresearchoverview-6

Indexer Configurationazuresearchoverview-7

You provide the name of the indexer and the schedule of how often the indexer runs. There is automatic change tracking for new and updated documents, but deletion is handled differently. The indexer will not remove from the index unless you define a metadata field in your blob that can be marked with any value identifying it as ‘soft deleted’. In this example, we take the metadata_title field and when the document has this value as ‘softdelete’, the indexer will delete from the index. After this is known, then you can delete the document in the blob store. In my opinion, I find this process a bit complex to handle. This may need some custom application to scan through blob store and the index to see if there is any difference and delete in the blob store.

Search Explorer

Test the index with the Search Explorer by inputting your query string of search keys, filters, and other operatorsazuresearchoverview-8

Since I had uploaded some PDF and Word documents on Azure, I’ll search on “Azure”azuresearchoverview-9

Also, scrolling further down, you can see the results of the metadata fieldsazuresearchoverview-10

To setup a push approach to the index, you have to either use the REST API or .NET SDK.

You build a custom application that interfaces with your data source and use the above options to push the content to the index.

To see further details, click to my blog post – Azure Search: Pushing to an Index with the .NET SDK


Building My SharePoint 2016 Disaster Recovery Farm Lab on Azure

I have set out to build a SharePoint 2016 disaster recovery farm extending my home-based on-premises SharePoint 2016 farm.

My objectives

  1. Continue to build my networking, windows server and other infrastructure related skills. I come from an application development background.
  2. Build my hands on skills and knowledge with Azure IaaS;
    • Azure Virtual networking, Site-to-site VPN
    • Azure virtual machine management
  3. Gain in depth architecture and system administration knowledge of all the pieces that make up a disaster recovery farm using SQL AlwaysOn (async commit) approach.
    • Understand performance/latency based on asynchronous commit to secondary database replica.

I used the following article as my primary source:
Plan for SQL Server AlwaysOn and Microsoft Azure for SharePoint Server 2013 Disaster Recovery

I tried my best to follow all the steps, but I approached them in a different order per my own DR design.

As a result, the following link are my raw notes and screen shots of some of my detailed steps in building the disaster recovery farm.
https://onedrive.live.com/redir?resid=D50B33B813A3693B!13901&authkey=!ANaDqU9cBkj36s0&ithint=file%2cdocx

My naming conventions are not perfectly consistent since I was building on the go. With these notes, it is my hope you can come away with some steps to a working solution.

The following is a summary of key steps in building my disaster recovery lab in Azure.

On-premises Home Network and Azure Network

Azire-SPDR-1

My personal home network consists of a set of Hyper-V virtual machines with the physical host as a Windows 10 desktop PC. The specifications are Intel Core i5 4 processors, 16 GB RAM, Intel solid-state drive for the virtual machine disks, and D-Link DIR-826L router.

My on-premises environment:

  • homedc virtual machine
    domain controller and DNS
    domain: rkhome.com
    Decided to serve as a general file server. I don’t have enough RAM and CPU for a dedicated file and backup server. This is not the ideal server topology.
  • homesp virtual machine
    SharePoint 2016 single server farm and SQL 2014SP1 database. SP is installed.
    Single server farm instead of a desired 2-server topology because I don’t have enough CPU and RAM.
  • homerras virtual machine
    Routing and Remote Access Server (RRAS)
    Used to establish site-to-site VPN connectivity with an Azure virtual network. There are other options such as using a hardware VPN router. This server is not domain joined.
  • D-Link router
    Port forwarding feature is leveraged to support site-to-site VPN connectivity.

Azure Disaster Recovery Site

The Microsoft cloud-based disaster recovery site.

  • Virtual Network
    Configured two subnets. One for the SharePoint farm and the other for the Gateway subnet for the site-to-site VPN.
  • rkdc virtual machine
    domain controller and DNS (no domain controller promotion just yet)

Note: At least set this server as a static IP rather than dynamic IP in the Azure portal.

  • rksp virtual machine
    SharePoint 2016 single server (not installed yet)
  • rksql virtual machine
    SQL 2014SP1 database server

Site-to-site VPN and DC Replica

Azire-SPDR-2.png

Enable cross network connectivity between the on-premises home network and the Azure virtual network. The other option is using ExpressRoute, which is more suited for production scenarios for its private connection, higher bandwidth, better performance and reliability.

Port forwarding configured in the D-link home router to allow internet connectivity to the homerras server for a VPN connection.

Virtual Network Gateway

Serves as the cross-premises gateway connecting your workloads in the Azure Virtual Network to on-premises sites. This gateway has a public IP address accessible from the internet.

Local Network Gateway

Enables interaction with on-premises VPN devices represented in the Gateway Manager. Therefore, needs to be configured with the home router’s public WAN IP address. The port forwarding setup always communicates to the RRAS server as the VPN device.

Connection

Represents a connection between two gateways – the virtual network gateway and the local network gateway.

homerras RRAS Server

Configuration of an interface named as “Remote Router” to have the public IP address 40.114.x.x for the virtual network gateway.

Domain Controller replica on the Azure virtual network

Prerequisite: site-to-site VPN connection needs to be active.

Install a replica Active Directory domain controller (i.e. rkhome.com) in the Azure virtual network

Domain join rksp, rksql servers to rkhome.com

Any added DNS records and AD accounts will be synchronized between the two domain controllers.

In testing the VPN connection, any machine connected to the on-premises network was able to ping or RDP, with a domain account, into any other server in the Azure virtual network and vice versa.

SharePoint 2016, WSFC, and SQL Server AlwaysOn

Azire-SPDR-3

SharePoint 2016

Installed on Azure rksp virtual machine as a single-server farm with mysites host and portal site collection. SharePoint 2016 is already installed on the on-premises farm before the start of this lab.

Windows Server Failover Cluster

Installed Windows Server Failover Cluster feature on homesp and rksql as they are database server roles.

Name: SPSQLCluster
IP Address: 192.168.0.102

File share cluster quorom is hosted on homedc. This quorom should be on a dedicated file server, but do not have enough memory resources for another VM.

Set Node weight = 1 on primary homesp node

SQL Server AlwaysOn

Enabled SQL AlwaysOn and asynchronous commit configuration. This is recommended  for higher network latency due to the VPN connection and geographic distance between the two sites. Synchronous commit is recommended for network latency of <1ms for SharePoint. When I ping servers across the two environments (Toronto and North Central US), I get an average of about 75ms ranging from 30ms to 110ms.

The supported databases for asynchronous commit in the article Supported high availability and disaster recovery options for SharePoint databases (SharePoint 2013)

https://technet.microsoft.com/en-us/library/jj841106.aspx

The below databases below were deleted in rksql secondary before replication from homesp primary database instance.

Availability groups

  • AG_SPContent
    • MySites
    • PortalContent
  • AG_SPServicesAppsDB
    • App Management
    • Managed Metadata
    • Subscription Settings
    • User Profile
    • User Social
    • Secure Store

Configuration databases are farm specific. Search databases can be updated with a full crawl upon failover.

Availability Listener configuration for each availability group

  • agl_spcontent1 for AG_SPContent
    0.0.8 (on-premises)
    192.168.0.103 (azure DR)
  • agl_spservice for AG_SPServicesAppsDB
    0.0.9 (on-premises)
    192.168.0.107 (azure DR)

 

Evaluating AlwaysOn Availability Group in Asynchronous Commit Mode

 

Failover Test

Azire-SPDR-4.png

  1. Manual shut down IIS Web
    sites of SharePoint
    Simulate a failure event such as a IIS shut down
  2. For each Availability Group, failover to secondary replica
    Resume database movement
  3. Adjust WSFC node voting rights
  4. Update DNS records of SharePoint sites to DR
    Start IIS on original primary on-premises site

 

This can be repeated to failover once again to the on-premises site making it the primary once again.

Comments on Azure costs

Virtual Machines

  • Domain controller and DNS – Basic A1 1 cpu 1.75GB RAM
    • Left running
  • SQL Server database server – Basic A1 2cpu 3.5GB RAM
    • Left running
  • SharePoint 2016 single-server – Basic A4 4CPU 7GB RAM
    • Turned off in cold standby
  • VPN Gateway
    • ~$31CAD/month
    • Pricing is based on time; however, I didn’t find a way to stop or pause usage to save on costs.

I approximate the cost of running the above resources to be $130CAD/month, if the SP VMs are stopped per cold standby methods.

Final Remarks

This has been a great learning experience as I understand how all the little pieces work together. Out in the enterprise world, disaster recovery tends to be lower in priority in a project roadmap or not at all. However, as the business criticality of a technology solution increases, so is the need for a DR solution. Hosting in Azure is a cost effective option since you are actually paying for what you use, especially in cold standby scenarios. Leveraging Azure regions in geographically remote areas are appropriate for mitigating widespread disaster situations such as hurricanes, mass power outages, earthquakes, floods or even outbreaks that can affect a data centre’s operability.

In technology, something’s you do not really know until you build it with your own hands – learning is by doing.

Windows Server 2012 R2 Web Application Proxy and ADFS 3.0 Azure Lab

The following diagrams are based on a lab I built on Microsoft Azure IaaS leveraging Web Application Proxy and ADFS 3.0. to demonstrate single sign-on with claims based applications.

As I come from an application development and architecture background, I learned a great deal with Azure IaaS and system administration with respect to Azure Virtual Networks, Virtual Machines, IP addressing, Azure PowerShell and the Azure management portal, domain controllers, DNS, subnets, certificates and other relevant Windows Server Roles and Features. At the present time of May 2016, I thought I share my notes to help others who may find this helpful in the manner that it is built. Note that I have built this lab in March of 2015 given the Azure’s feature and capabilities at that time.

Lab Architectural Overview

Hosting Infrastructure

  • Microsoft Azure Infrastructure-as-a-Service

Virtual Network

  • One Virtual Network with three subnets
  • Subnet-DC for the domain controller and ADFS server
  • Subnet-Web for web applications and other applications such as SharePoint Server.
  • Subnet-DMZ for the Web Application Proxy

Network Security Groups

  • I didn’t implement any NSG yet, but for proper network security you would have NSG around each subnet to allow/deny traffic based on a set of Access Control List rules.

Windows Domain

  • All servers except for the DMZ are on the same rk.com domain, except for the Web Application Proxy server. For trivial reasons of it being in the DMZ and as a proxy server to the internet.

Public domain name

  • I purchased rowo.ca domain name to be used as part of public urls to internal applications.

Certificates

  • There was a great deal of certificate dependencies between WAP and ADFS and Relying Party (web apps) and token signing. This was a challenging learning point for me and to set things up appropriately and troubleshooting. The detailed topics involved public/private key, export/import certificates, authority chain, thumbprint, certificate subject name, SSL, server authentication, expiry, revocation, browser certificate errors, etc.

screenshot1464024458932

Azure Virtual Network configuration involving address spaces and subnets

screenshot1464024922211.png

I setup ADFS and added my simple .NET claims aware web application as a relying party trust.

screenshot1464025034973.png

I conducted the following test:

Logging into the rkweb1 web server (i.e. internal to the network), I opened the browser
1.Enter the url: https://rkweb1.rk.com/ClaimApp
2.Redirected to ADFS and then authenticated
3.Redirect back to the ClaimApp with access.

screenshot1464025058988.png

Testing withing internal network:

screenshot1464025175345.png

I configured the Web Application Proxy to publish the following applications to the internet.

Internet-facing External URLs are start with https://rowo.ca/ and are mapped to backend URLs starting with https://rkweb1.rk.com for the following applications.

ClaimApp

  • .NET claims based application using Windows Identity Foundation.
  • WAP Pre-authentication is ADFS

HTMLApp

  • HTML web application with no authentication.
  • WAP Pre-authentication is Pass-through. No authentication.

TodoListService

  • REST API with windows authentication
  • WAP Pre-authentication is ADFS

Capture.JPG

Accessing ClaimApp from the internet:

screenshot1464025578290.png

Accessing a REST API via a .NET WPF desktop application from the internet. User will be prompted for credentials in a separate dialog per OAuth.

screenshot1464025704524.png

Accessing ClaimApp through iOS Sarafi browser with device registration. In AD there is a dev

screenshot1464025974358.png

In Active Directory, my iPhone mobile device has been registered for added authentication and conditional access rules to applications.

screenshot1464030919794 (1).png

In conclusion, I loved the fact that Azure has become my IT sandbox to learn and build solutions such as this remote access solution. Also, the Web Application Proxy is one of many other options in the market to publish out internal on-premises applications using ADFS to support single sign-on.

Online References that helped me build this lab

Operational