Beiträge

R is a powerfull tool for data scientists but a huge pain for software developers and system administrators.
It’s hard to automate and integrate into other systems. That’s because it was developed mostly in academic environments for adhoc analytics instead of IT departments for business processes.
As every meal gets better if you gratinate it with cheese, every command line tool gets better if you wrap it in a PowerShell Commandlet.

PSRTools

You can do a lot with the R command line tools Rscript.exe, but the developer of R seem to have a strange understanding of concepts like standard streams and return codes. So i created the PowerShell module PSRTools, that provides Commandlets for basic tasks needed for CI/CD pipelines.

If you have build dependecies, you have to install a R package from a repository. You can specify a snapshot to get reproducable builds. An example is:

Install-RPackage -Name 'devtools' -Repository 'https://mran.microsoft.com/' -Snapshot '2019-02-01'

In case you have the package on the disk you can install it with the Path parameter.
In both cases you can specify the library path, to install it on a specific location.

Install-RPackage -Path '.\devtools.tar.gz' -Library 'C:\temp\build\library'

If you have inline documentation in your R functions, you have to generate Rd files for the package. There is the Commandlet New-RDocumentation for that.
The build can be executed using New-RPackage. Both Commandlets have the parameters Path for the project location and Library for required R packages.

Build Script

The build script depends a little on your solution. It could be something like:


param (
    $Project,
    $StagingDirectory
)

$Repository = 'https://mran.microsoft.com/'
$Snapshot = '2019-02-01'
$Library = New-Item `
    -ItemType Directory `
    -Path ( Join-Path (
        [System.IO.Path]::GetTempPath()
    ) (
        [System.Guid]::NewGuid().ToString()
    ))

Import-Module PSRTools -Scope CurrentUser
Set-RScriptPath 'C:\Program Files\Microsoft\R Open\R-3.5.2\bin\x64\Rscript.exe'

Install-RPackage -Name 'devtools' -Repository $Repository -Snapshot $Snapshot -Library $Library
Install-RPackage -Name 'roxygen2' -Repository $Repository -Snapshot $Snapshot -Library $Library

New-RDocumentation -Path $Project -Library $Library
$Package = New-RPackage -Path $Project -Library $Library
Copy-Item -Path $Package -Destination $StagingDirectory

For build scripts i recommend InvokeBuild, but wanted a simple example.

Continuous Integration

To achive automated builds, you have to integrate it in a build script.
I tested it successfully in Appveyor and Azure Pipelines. Both work pretty much the same.

If you don’t have your own build agent, that can be prepared manually, you need to install R and RTools in your build script.
That can be done using Chocolatey.

In your appveyor.yml it is like:

install:
  - choco install microsoft-r-open
  - choco install rtools
build_script:
  - ps: Build.ps1 -Project $env:APPVEYOR_BUILD_FOLDER -StagingDirectory "$env:APPVEYOR_BUILD_FOLDER\bin"

In you azure-pipelines.yml it is like:

steps:
- script: |
    choco install microsoft-r-open
    choco install rtools
  displayName: Install build dependencies
- task: PowerShell@2
  inputs:
    targetType: 'inline'
    script: |
      Build.ps1 -Project '$(Build.SourcesDirectory)' -StagingDirectory '$(Build.ArtifactStagingDirectory)'

Not that terrible anymore, right?

I recently had the task to automate a program, with a COM interface, and integrate it in a database application.
I already used PowerShell to automate Docker, SqlPackage and others.
So my first thought was to use PowerShell in this case too, but due to the complexity of the task i decided against it.
I ended up with a C# solution with round about 300 classes and i’m happy with it, but it brought me to the question what good criteria are to choose between PowerShell and something else.

Basically PowerShell is a nice hammer, but not every problem is a nail.
And since almost every programming language is turing complete, you can every problem in every language, but they have their pros and cons.
This is especially true for .NET-based languages like C#, F# and PowerShell, since they share the same libraries.
So you can develop a nice graphical user interface using Windows Presentation Foundation in PowerShell, even though is was originally designed for C# applications.

There are many problems out there, that you can solve in PowerShell with less code than in other languages, which makes it faster to write and easier to maintain.
But now i will show you some cases when PowerShell is a little painful and other languages are a better choice.

Inhertiance and Polymorphism

PowerShell is object-oriented and where object-orientation is, there is polymorphism not far. So you define interfaces and maybe several implementations for it.
But since PowerShell is a interpreted language there is no type or interface checking before runtime.
In default, everything is of type Object and you see if a method is available, when you execute the code.
You can assert types, but not have to.

There are different ways to get new objects in PowerShell.
Often they are created in Commandlets that are written in C#, like Get-Process or New-Item.
Another common option is to create a custom object using New-Object -Type PsCustomObject -Property @{ 'Foo' = 'Bar' }.
That creates a generic object that can be extended by any property or method in runtime.
Another option is to create the object in PowerShell but write the class definition in C#.
You can do that from existing .NET libraries or even in runtime.
Store the C# code in a variable and add the classes with Add-Type.
That were the options for PowerShell version 4, but in version 5 classes were introduced.

All that methods have their reason.
Let me explain that using some questions:

  • Why would you want to create a PowerShell class, if you can use a PsCustomObject?
  • Why would you want to create a PsCustomObject, if you can use a Hashmap?

Commandlets are the default if you want to use existing PowerShell modules.
Hashmaps are the default if you need custom attributes in an object.
But if you want to pass data to existing Commandlets, for example to write it CSV files, then the easiest way is to use a PsCustomObject.
If you write your own functions that require parameters with specific properties and methods, then its better to define a class, that can be easily validated.
The next level of complexity is, if you write a function that has parameters that may be of the one or of the other type with the same interface.
Then you start thinking about abstract methods, reuse of code between these classes.
Here C# supports more expressions to simplify the code and compilation time validation improves the quality.

So if you start to create inherited classes in PowerShell you probably gone to far.
Maybe it’s better to create a PowerShell Module in C# or a .DLL in C# and include it in your Powershell code.

Concurrency and Parallel Computing

Since PowerShell supports using the System.Threading library of .NET, you can do multicore computation in PowerShell.
In some cases this is not even a bad idea.
A common case where PowerShell is used is automation and integration of other tools.
For example run a compiler, call a web service and so on. These tools may produce output that you might want to process while the tool is still working.
Sometimes you have to do that since otherwise you would get a overflow of the output buffer and you don’t get the entire output.
In that case you can define PowerShell block as variable and register it as a event handler.
But there are other cases where you have more parallel processes that need to be synchronized somehow and that may communicate in between. Then C# or F# has better expressions to manage asychronous calls.

Die Auswertung vom Windows Event Log auf Fehler und Muster darin, ist mit dem Event Viewer nur begrenzt möglich und sehr mühsam. Eine einfache Alternative ist der Export der Events mit der Powershell zu CSV und der Import mit Analyse in Power BI. Weiterlesen

Powershell Skripte können je nach Anwendungsfall komplex werden. Daher sind auch für Robustheit und Qualität automatisierte Tests wichtig. Das Powershell Module Pester hat sich hierfür bewährt und lässt wenig Wünsche offen.

Powershell kommt aber auch häufig zum Einsatz, wenn es darum geht verschiedene Systeme oder Dienste zu integrieren, also für Dateiaustausch oder Deployment. Das Mocken von Funktionen kommt dabei schnell an Grenzen. Tests gegen existirerende Systeme sind nicht immer möglich oder schwer zu isolieren.

Eine mögliche Lösung ist hier Docker Container zu verwenden. Im Test-Setup können die benötigten Dienste und Server als Container instanziiert werden und im Tear-Down einfach wieder gelöscht werden. Zur besseren Integration von Powershell und Docker habe ich dazu PSDocker entwickelt. Weiterlesen

Since Powershell Core supports Linux, you might want to test your Powershell Module if it works on Linux as it works on Windows. You could setup a virtual machine, which is a litte time consuming and cumbersome. You could use a continous integration service like AppVeyor which does not allow to debug if the test fails.
Another option would be Docker. But anyway i tested the Windows Linux Subsystem (WLS) and it’s pretty good! Weiterlesen