Beiträge

Building bridges is hard

In a world where everybody builds distributed system (in a small scale: for microservice-based systems; in a larger scale for a global market) and time to market is a key success factor, we face the challenges of parallel development and synchronization across teams and products. While agile frameworks like SCRUM address some of the arising topics (e.g. having a unifying definition of done for multiple teams) there are other means to tackle the complexity.
But before we jump to a solution, let’s take one step back and review one of the typical issues:

  • One team or company builds an awesome API and another team integrates it into their product.
  • The second team must wait for team one to have a working API before they can start their integration work. However, even after they waited for team one to finish, they face mayor problems during their development.

These could occur while working with the API directly but might as well be totally unrelated (e.g. when trying to automate an unclear business process). The reasons don’t matter that much when the second team’s product arrives the market too late and lost its worth.
We (i.e. software engineers) solved these kinds of problems on other layers already. When building a database-heavy application we start with the data model (+ database) and then build the user-facing application and the data-oriented pipelines, optimizations etc. in parallel – instead of sequential development. We call this database first.

Series:

So, let’s talk about API first…

The idea is not complicated: two (or more) teams design an interface collaboratively. Usually, this is achieved by a proposal from one team and following discussions. The result should be an interface description that all parties agree on. In our case we are going to use the OpenAPI specification (or swagger if you don’t look too closely). We can create an example for our case here: here or here.
Disclaimer: There a lot of pay-versions out there, but these two are free and without registration. So, they are either nice or steal your data.

For the example in this post we want to create an API used for a room-booking service (think about it like the outlook room booking feature for meetings). We’ll start with two endpoints

  • GET /rooms gives a list of all available rooms and
  • GET /roomdetails provides more specific information for a given room

For good measure we throw in a technical service endpoint with

  • GET /health that provides a simple ‘ok’ in case our service is up and running.

Using one of the API designers and only minimal features of OpenAPI we end up with our swagger.yml:

swagger: "2.0"
info:
  description: 'a simple room booking service'
  version: 1.0.0
  title: room booking service

basePath: /v1

schemes:
  - http
consumes:
  - application/json
produces:
  - application/json

paths:
  /health:
    get:
      operationId: ''
      responses:
        '200':
          description: 'get service health'
  /rooms:
    get:
      operationId: ''
      responses:
        '200':
          description: 'get all rooms + meta data'
  /roomdetails:
    get:
      operationId: ''
      responses:
        '200':
          description: 'get details for a room'

Before we implement our (mock) API, we create our webservice with these six lines of code (app.py):

import connexion

app = connexion.App(__name__,
                    options={"swagger_ui": True})
app.add_api('swagger.yaml')
application = app.app

if __name__ == '__main__':
    app.run(port=3000, server='gevent')

Before we explore what we see here, we’ll install our dependencies first. There are multiple ways to do this in python, but for the easy way we use pip and our requirements.txt. For people unfamiliar with python, this would be like npm & package.json for Nodejs or maven & the pom.xml in Java.

connexion[swagger-ui]
gevent

Connexton is an API-first framework build in python by Zalando (Open Source). It leverages popular technology components like Flask and Oauth2 and integrates various industry standards to fasten up (microservice) development.

Gevent is a python networking library that is used as a WSGI server in this case. For our purpose we do not need more information on this, but if you want to dig deeper, check out this excellent blogpost.

Now, let’s come back to our code! As you can see, we create a new connextion.App(...) with enabled swagger ui and add an API (reference) the app exposes (our previously created swagger.yaml). We are pretty close to running our API-first application.

Connecting the dots

For the last steps, we need a mock implementation of your endpoints. In a real-world example, you should have defined responses in you swagger specification already. A variety of tools can generate server mocks from this. However, as we have a slim example, we create this without any tooling help.
Considering our current project structure, we’ll just add two files for our API-controllers. We create an api/ folder with two subfolders for our "business" logic and our technical health-endpoint (rooms.py and health.py):

app.py
requirements.txt
swagger.yml
api
|-- business_controller
|   `--rooms.py
|-- tech_controller
    `-- health.py

Simple mock-returns for rooms.py and health.py:

def get_room():
    return ["Kallista", "Io"]

def get_room_details():
    return [{"name": "Kallista", "space": 3},{"name": "Io", "space": 4}]
def get():
    return 'ok'

Ok, let’s review what we got so far…

  • We have an API specification (at this point another team could start their work).
  • We created a lean python webserver that exposes our API from our specification.
  • Lastly, to actually run this, we coded a trivial stub implementation of our API.

The last thing we need to do is to connect our API routes to our controller functions. This is easily done by enhancing our swagger.yaml slightly (adding the path to our controllers as operationId):

swagger: "2.0"

info:
  description: 'a simple room booking service'
  version: 1.0.0
  title: room ms

basePath: /v1

schemes:
  - http
consumes:
  - application/json
produces:
  - application/json

paths:
  /health:
    get:
      operationId: api.tech_controller.health.get
      responses:
        '200':
          description: 'get service health'
  /rooms:
    get:
      operationId: api.business_controller.rooms.get_room
      responses:
        '200':
          description: 'get all rooms + meta data'
  /roomdetails:
    get:
      operationId: api.business_controller.rooms.get_room_details
      responses:
        '200':
          description: 'get detail for a room'

Explore and verify our API first mock service

Now let’s enjoy our service:

~/DEV/wellroom-room$ pip3 install -r requirements.txt
~/DEV/wellroom-room$ python3 app.py

We could call our API now at http://localhost:3000/v1/health or http://localhost:3000/v1/rooms with a tool like Postman or Insomnia, but we can also serve to http://localhost:3000/v1/ui/ and explore the API via the build-in swagger-ui function in a browser.

Wrap up and what we didn’t cover

We learned about the benefits of API first and saw one way to approach the topic. We got a short glimpse at what python with connexion can do for us. From tooling standpoint, we saw tools to create OpenAPI specifications.
The two main things we are missing at this point is the security component and quality assurance (e.g. testing + linting). The first part is a topic on its own, but one of the great benefits of using connexion is the native integration of standard mechanisms like Oauth2 and X-API-KEYs. For testing and linting there are great blogs, books and videos out there….😉

Follow up

This blogpost is part of a series.

  • The next post is going to focus on how to integrate such a microservice in an enterprise ecosystem:

    • Dockerize the application,
    • use Azure Container Registry to store our application,
    • create and access an Azure Kubernetes Service and finally
    • deploy our service in this Kubernetes cluster.
  • The third post shows how to enforce improvement with continuous delivery and/or continuous deployment for our room service:

    • leverage AzureDevOp’s Build and Release Pipelines for our service and
    • integrate our Kubernetes cluster as environments in AzureDevOps.

R is a powerfull tool for data scientists but a huge pain for software developers and system administrators.
It’s hard to automate and integrate into other systems. That’s because it was developed mostly in academic environments for adhoc analytics instead of IT departments for business processes.
As every meal gets better if you gratinate it with cheese, every command line tool gets better if you wrap it in a PowerShell Commandlet.

PSRTools

You can do a lot with the R command line tools Rscript.exe, but the developer of R seem to have a strange understanding of concepts like standard streams and return codes. So i created the PowerShell module PSRTools, that provides Commandlets for basic tasks needed for CI/CD pipelines.

If you have build dependecies, you have to install a R package from a repository. You can specify a snapshot to get reproducable builds. An example is:

Install-RPackage -Name 'devtools' -Repository 'https://mran.microsoft.com/' -Snapshot '2019-02-01'

In case you have the package on the disk you can install it with the Path parameter.
In both cases you can specify the library path, to install it on a specific location.

Install-RPackage -Path '.\devtools.tar.gz' -Library 'C:\temp\build\library'

If you have inline documentation in your R functions, you have to generate Rd files for the package. There is the Commandlet New-RDocumentation for that.
The build can be executed using New-RPackage. Both Commandlets have the parameters Path for the project location and Library for required R packages.

Build Script

The build script depends a little on your solution. It could be something like:


param (
    $Project,
    $StagingDirectory
)

$Repository = 'https://mran.microsoft.com/'
$Snapshot = '2019-02-01'
$Library = New-Item `
    -ItemType Directory `
    -Path ( Join-Path (
        [System.IO.Path]::GetTempPath()
    ) (
        [System.Guid]::NewGuid().ToString()
    ))

Import-Module PSRTools -Scope CurrentUser
Set-RScriptPath 'C:\Program Files\Microsoft\R Open\R-3.5.2\bin\x64\Rscript.exe'

Install-RPackage -Name 'devtools' -Repository $Repository -Snapshot $Snapshot -Library $Library
Install-RPackage -Name 'roxygen2' -Repository $Repository -Snapshot $Snapshot -Library $Library

New-RDocumentation -Path $Project -Library $Library
$Package = New-RPackage -Path $Project -Library $Library
Copy-Item -Path $Package -Destination $StagingDirectory

For build scripts i recommend InvokeBuild, but wanted a simple example.

Continuous Integration

To achive automated builds, you have to integrate it in a build script.
I tested it successfully in Appveyor and Azure Pipelines. Both work pretty much the same.

If you don’t have your own build agent, that can be prepared manually, you need to install R and RTools in your build script.
That can be done using Chocolatey.

In your appveyor.yml it is like:

install:
  - choco install microsoft-r-open
  - choco install rtools
build_script:
  - ps: Build.ps1 -Project $env:APPVEYOR_BUILD_FOLDER -StagingDirectory "$env:APPVEYOR_BUILD_FOLDER\bin"

In you azure-pipelines.yml it is like:

steps:
- script: |
    choco install microsoft-r-open
    choco install rtools
  displayName: Install build dependencies
- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      Build.ps1 -Project '$(Build.SourcesDirectory)' -StagingDirectory '$(Build.ArtifactStagingDirectory)'

Not that terrible anymore, right?

Wie funktioniert Automatisches Deployment von Datawarehouse-Systemen? Im Rahmen von Projekten mit DevOps-Ansatz spielen Continuous Integration und Continuous Deployment eine wichtige Rolle für ein effizientes Arbeiten. Der erste Schritt zu Continuous Deplyoment ist die Deployment-Automatisierung. Das ist bei Datawarehouse-Systemen nicht anders. Eine Anforderung dazu ist die Versionsverwaltung der Quellen. Jede Version kann automatisch in ein Release übersetzt werden. Das umfasst das Erstellen einer Installationsdatei aus den Quellen. Das Deployment ist dann das Aktualisieren einer Systemumgebung (Produktionssystem oder Testsystem). Das vereinfacht insbesondere die Qualitätssicherung, was in Folge auch die Änderbarkeit des Systems verbessert und letztendlich die Qualität an sich erhöht.

CI and CD

Systemdefinition und Versionierung

Bei SQL Server-basierten Datawarehouse-Systemen besteht die Systemdefinition aus Visual Studio Projekten. Für SQL-Datenbanken, Integration Services, Reporting Services und Analysis Services gibt es jeweils Visual Studio Projekttypen. Das ist aber noch nicht ausreichend für das automatische Deployment von ganzen Lösungen. Es fehlen Projekte für SQL Server Agent Jobs und Linked Server. Integration Services Pakete verwenden häufig PowerShell-Skripte für File-Management und Dummy-Dateien für die Validierung der Pakete. Dafür gibt es ebenfalls keine Standardlösung. Reporting Services Projekte unterstützen keine Report Abonnements mit Data-Driven-Subscriptions, die häufig für automatische Exporte von XLSX- oder PDF-Dateien verwendet werden. Ein weiters Problem dieser Projekttypen ist, dass es keine geeignete Möglichkeit gibt um zu definieren, wie diese Projekte zusammenhängen, sodass eine vollständige Lösung deployt werden können und dabei die zielumgebungsspezifischen Reports mit Datenbanken, ETL-Prozessen verknüpft werden können.
Diese Mängel müssen mit zusätzlichen Skripten und Konfigurationsdateien kompensiert werden. Ein Build-Skript erzeugt ein Release-Archiv mit allen Dateien, die für das Deployment nötig sind. Ein Deployment-Skript verwendet das Release-Archiv und den Namen der Zielumgebung als Parameter. Eine Deployment-Konfiguration enthält die Informationen zur Verknüpfung der Komponenten. Eine Umgebungs-Konfiguration enthält die umgebungsspezifischen Parameter wie Hostnamen und Benutzernamen.

DWH Release

Implementierung

Es gibt diverse Möglichkeiten diesen Ansatz zu implementieren. Bewährt haben sich XML-Dateien für hierarchische Konfigurationen, wegen guter Lesbarkeit und Unterstützung von internen Variablen, die z.B. JSON nicht unterstützt. Tabellarische Konfigurationen wie z.B. umgebungsspezifische Parameter können gut in CSV-Dateien abgelegt werden, da diese einfach verarbeitet werden können und leicht in Excel gepflegt werden können.
Die Build- und Deployment-Schritte können effizient in PowerShell implementiert werden. Grafische Werkzeuge wie Microsoft Azure Pipelines, CA Application Release Automation sind für kleinteilige Deployment-Schritte sehr umständlich und eignen sich maximal zur Ausführung von Deployment-Skripten.

DevOps findet immer größere Verbreitung bei IT-Projekten. Ob im StartUp, beim Mittelständler oder im Konzern-Umfeld, kann der DevOps-Modus für Projekte handfeste Vorteile bringen, die sich von Fall zu Fall leicht unterscheiden können. Sei es Strukturierung der Prozesse für kleine Projekte, wo Trennung von Entwicklung, Betrieb und Qualitätssicherung nicht wirtschaftlich waren. Oder sei es für große Projekte, in denen Planung, Kommunikation und Abnahmen zu größeren Zeitspannen zwischen Anforderung und GoLive führen und damit zum Projektrisiko wurden. Allgemein sollen sich Qualität, Geschwindigkeit und Soziales Klima verbessern. BI-Projekte nehmen hier häufig eine Sonderrolle ein. Das ist aber nicht notwendig.

Steffen Kampmann von der Tekaris GmbH hat dazu einen Vortrag bei der PASS Bayern gehalten. Der Vortag erklärt die zentralen Konzepte und Methoden von DevOps, die Besonderheiten in BI-Projekten, die technischen Lösungsansätze von Continuous Deployment in Microsoft SQL Server mit ihren Alltagsproblemen und einen Erfahrungsbericht.

Dazu sind auch Folien Github Repository der PASS Bayern veröffentlicht worden.