0 comments

Configure Windows Logon With An Electronic Identity Card (EID)

Published on Wednesday, October 22, 2014 in , , ,

Here in Belgium people have been receiving an Electronic Identity Card (EID) for years now. Every once in a while I have a customer who asks me whether this card can be used to logon to workstations. That would mean a form of strong authentication is applied. The post below will describe the necessary steps in order to make this possible. It has been written using a Belgian EID and the Windows Technical Preview (Threshold) for both client and server.

In my lab I kept the infrastructure to a bare minimum.

  • WS10-DC: domain controller for threshold.local
  • WS10-CA2: certificate authority (enterprise CA)
  • W10-Client: client

The Domain Controller(s) Configuration

Domain Controller Certificate:

You might wonder why I included a certificate authority in this demo. Users will logon using their EID and those cards come with certificates installed that have nothing to do with your internal PKI. However, in order for domain controllers to be able to authenticate users with a smart card, they should have a valid certificate as well. If you fail to complete this requirement, your users will receive an error:

image

In words: Signing in with a smart card isn’t supported for your account. For more info, contact your administrator.

And your domain controllers will log these errors:

ErrorClientClue1

In words: The Key Distribution Center (KDC) cannot find a suitable certificate to use for smart card logons, or the KDC certificate could not be verified. Smart card logon may not function correctly if this problem is not resolved. To correct this problem, either verify the existing KDC certificate using certutil.exe or enroll for a new KDC certificate.

And

ErrorClientClue2

In words: This event indicates an attempt was made to use smartcard logon, but the KDC is unable to use the PKINIT protocol because it is missing a suitable certificate.

In order to give the domain controller a certificate, that can be used to authenticate users using a smart card, we will leverage the Active Directory Certificate Services (AD CS) role on the WS10-CA2 server. This server is installed as an enterprise CA using more or less default values. Once the ADCS role is installed, your domain controller should automatically request a certificate based upon the “Domain Controller” certificate. This is a V1 template. A domain controller is more or less hardcoded to automatically request a certificate based upon this template.

image 

In my lab this certificate was good enough to let my users authenticate using his EID. After restarting the KDC service and performing the first authentication, the following event was logged though:

image

In words: The Key Distribution Center (KDC) uses a certificate without KDC Extended Key Usage (EKU) which can result in authentication failures for device certificate logon and smart card logon from non-domain-joined devices. Enrollment of a KDC certificate with KDC EKU (Kerberos Authentication template) is required to remove this warning.

Besides the Domain Controller template there’s also the more recent Domain Controller Authentication and Kerberos Authentication templates which depend on auto-enrollment to be configured.

Computer Configuration > Policies > Windows Settings > Security Settings > Public Key Policies

image

After waiting a bit, gpupdate and/or certutil –pulse might speed things up a bit, we got our new certificates:

image

You can see that the original domain controller certificate is gone and replaced by its more recent counterparts. After testing we can confirm that the warning is no longer logged in the event log. We have now covered the certificate the domain controller requires, we’ll need to add a few more settings on the domain controllers for EID logons to work.

Domain Controller Settings

Below HKLM\SYSTEM\CurrentControlSet\Services\Kdc we’ll create two registry keys:

  • DWORD SCLogonEKUNotRequired 1
  • DWORD UseCachedCRLOnlyAndIgnoreRevocationUnknownErrors 1

Strictly spoken, the last one shouldn’t be necessary if your domain controller can reach the internet, or at least the URL where the CRL’s used in the EIDs, are hosted. If you use this registry key, make sure to remove a name mapping (more on that later) or disable the user when the EID is stolen or lost. An easy way to push these registry key is using group policy preferences.

Domain Controller Trusted Certificate Authorities

In order for the domain controller to accept the EID of the user, the domain controller has to trust the full path in the issued certificate. Here’s my EID as an example:

image

We’ll add the Belgium Root CA2 certificate to the Trusted Root Certificate Authorities on the domain controller:

Computer Configuration > Policies > Windows Settings > Security Settings > Public Key Policies > Trusted Root Certification Authorities

image

And the Citizen CA to the Trusted Intermediate Certificate Authorities on the domain controller:

Computer Configuration > Policies > Windows Settings > Security Settings > Public Key Policies > Intermediate Certification Authorities

image

Now this is where the first drawback from using EIDs as smartcards comes: there are many Citizen CA’s to add and trust… Each month, sometimes more, sometimes less, a new Citizen CA is issued and used to sign new EID certificates. You can find them all here: http://certs.eid.belgium.be/ So instead of using a GPO to distribute them, scripting a regular download and adding them to the local certificate stores might be a better approach.

The Client Configuration

Settings

For starters we’ll configure the following registry keys:

Below HKLM\SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters we’ll create two registry keys:

  • DWORD CRLTimeoutPeriod 1
  • DWORD UseCachedCRLOnlyAndIgnoreRevocationUnknownErrors 1

Again, if your client is capable of reaching the internet you should not need these. I have to admit that I’m not entirely sure how the client will react when a forward proxy is in use. After all, the SYSTEM doesn’t always know what proxy to use and it might be requiring to authenticate.

Besides the registry keys, there’s also some regular group policy settings to configure. In some articles you’ll probably see these settings also being pushed out as registry keys, but I prefer to use the “proper” settings as they are available anyhow.

Computer Settings > Policies > Administrative Templates > Windows Components > Smart Cards

  • Allow certificates with no extended key usage certificate attribute: Enabled
    • This policy setting lets you allow certificates without an Extended Key Usage (EKU) set to be used for logon.
  • Allow signature keys valid for Logon: Enabled
    • This policy setting lets you allow signature key-based certificates to be enumerated and available for logon.

These two are required so that the EID certificate can be used. As you can see it has a usage attribute of Digital Signature

UsageAttr

In some other guides you might also find these Smart Card settings enabled:

  • Force the reading of all certificates on the smart card
  • Turn on certificate propagation from smart card
  • Turn on root certificate propagation from smart card

But my tests worked fine without these.

Drivers

Out of the box Windows will not be able to use your EID. If you don’t install the required drivers you’ll get an error like this:

SmartCardErrorNoDrivers

You can download the drivers from here: eid.belgium.be On the Windows 10 preview I got an error during the installation. But that probably had to do with the EID viewer software. The drivers seem to function just fine.

EidDriverError

Active DIrectory User Configuration

As these certificates are issued by the government, they don’t contain any specific information that allows Active Directory to find out to which user should be authenticated. In order to resolve that we can add a name mapping to a user. And this is the second drawback. If you want to put EID authentication in place you’ll have to have some sort of process or tool that allows users to link their EID to their Active Directory User Account. The helpdesk could do this for them or you could write a custom tool that allows users to do it themselves.

In order to do it manually:

First we need the certificate from the EID. You can use Internet Explorer > Internet Options > Content > Certificates

EID

You should see two certificates. The one you want is the one with Authentication in the Issued To. Use the Export… button to save it to a file.

Open Active Directory Users and Computers > View > Advanced Features

NameMapping1

Locate the user the EID belongs too > Right-Click > Name Mappings…

NameMapping2

Add an X.509 Certificate

NameMapping3

Browse to a copy of the Authentication smart card which can be found on the EID

NameMapping35

Click OK

NameMapping4

Testing the authentication

You should now be able to logon to a workstation with the given EID. Either by clicking other user and clicking the smart card icon

EIDLogon

Or if the client has remembered you from earlier logons you can choose smart card below that entry.

EIDLogon2

An easy way to see if a user logged on using smart card or username/password is the query for the user his group memberships on the client. When users  log on with a smart card they get the This organization certificate group SID added to their logon token. This is a well-known group (S-1-5-65-1) that was introduced with Windows 7/ Windows 2008 R2.

ThisOrgCertWhoami

Forcing smart card authentication

Now all of the above allows a user to authenticate using smart cards, but it doesn’t forces the user to do it. Username password will still be accepted by the workstations. If you want to force smart card logon there are two possibilities. Each with their own drawbacks.

1. On the user level:

There’s a property Smart card is required for interactive logon that you can check on the user object in Active Directory. Once this is checked, the users will only be able to logon using a smart card. There’s one major drawback though. Once you click apply, at the same time this will set the password of that user to a random value and password policies will no longer apply for that user. That means that if you got some applications that are integrated with Active Directory, but do so by asking credentials in a username/password form, your user will not be able to logon as they don’t know the password… If you configure this setting on the user you have to make sure all applications are available through Kerberos/NTLM SSO. If you were to use Exchange Active Sync, you would have to change the authentication scheme from username/password to certificate based for instance. So I’m not really sure enforcing this at the user level is a real option. This option seems more feasible for protection high privilege accounts.

RequireSm

2. On the workstation level:

There’s a group policy setting that can be configured on the computer level that enforces all interactive logons to require a smart card. It can be found under computer settings > Policies > Windows Settings > Security Settings > Local Policies > Security Options > Interactive logon: Require smart card

 InterActL

While you’re there, also look at Interactive logon: Smart card removal behavior. It allows you to configure a workstation to lock when a smart card is removed. If you configure this one, make sure to also configure the Smart Card Removal Policy service to be started on your clients. This service is stopped and set to manual by default.

Now the bad news. Just like with the first one, there’s also a drawback. This one could be less critical for some organisations, might it might require people to operate in a slightly different way. Once this setting is enabled, all interactive logons require a smart card:

  • Ctrl-alt-del logon like a regular user
  • Remote Desktop to this client
  • Right-click run as administrator (in case the user is not an administrator himself) / run as different user

For instance right-click notepad and choosing run as different user will result in the following error if you try to provide a username/password

AccountRest 

In words: Account restriction are preventing this user from signing in. For example: blank passwords aren’t allowed, sign-in times are limited, or a policy restriction has been enforced. For us this is an issue as our helpdesk often uses remote assistance (built-in the OS) to help users. From time to time they have to provide their administrative account in order to perform certain actions. As the user his smart card is inserted, the helpdesk admin cannot insert his own EID. That would require an additional smart card reader. And besides that: a lot of the helpdesk tasks are done remotely and that means the EID is in the wrong client… There seem to be third party solutions that tackle this particular issue: redirecting smart cards to the pc you’re offering remote assistance.

Now there’s a possible workaround for this. The policy we configure in fact sets the following registry value to 1:

MACHINE\Software\Microsoft\Windows\CurrentVersion\Policies\System\ScForceOption

Using remote registry you could change it to 0 and then perform your run as different again. Changing it to 0 immediately sets the Interactive logon: Require smart card to disabled. Effective immediately. Obviously this isn’t quite elegant, but you could create a small script/utility for it…

Administrative Accounts (or how to link a smart card to two users)

If you would use the same certificate (EID) in the name mapping of two users in Active Directory, your user will fail to login:

DoubleMapping

In words: Your credentials could not be verified. The reason is quite simple. Your workstation is presenting a certificate to Active Directory, but Active Directory has two principals (users) that map to that certificate. Now which user does the workstation want?

Name hints to the rescue! Let’s add the following GPO setting to our clients:

Computer Settings > Policies > Administrative Templates > Windows Components > Smart Cards

  • Allow user name hint: enabled

After enabling this setting there’s an optional field called Username hint below the prompt for the PIN.

Hint

In this username hint field the person trying to logon using a smart card can specify which AccountName to be used. In the following example I’ll be logging on with my thomas_admin account:

HintAdmin

NTauth Certificate Store

Whenever you read into the smart card logon subject you’ll see the NTauth certificate store being mentioned from time to time. It seems to be involved in some way, but it’s still not clear to me. All I can say is that in my setup, using an AD integrated CA for the Domain Controller certificates, I did not had to configure/add any certificates to the NTauth store. Not the Belgian Root CA, Not the Citizen CA. My internal CA was in it of course.

I did some tests, and to my experience, the CA that issued your domain controllers certificate has to be in the NTAuth store on both clients and domain controllers. If you would remove that certificate you’ll be greeted with an error like this:

NtAuth

In words: Signin in with a smart card isn’t supported for your account. For more info, contact your administrator. And on the domain controller the same errors are logged like the ones from the beginning of this article.

Some useful commands to manipulate the NTauth store locally on a client/server:

  • Add a certificate manually: certutil -enterprise -addstore ntAuth .\ThresholdCA.cer
  • View the store: certutil -enterprise -viewstore ntAuth
  • Delete a certificate: certutil -enterprise -viewdelstore ntAuth

Keep in mind that the NTauth store exists both locally on the client/servers and in Active Directory. An easy way to view/manipulate the NTauth store in Active Directory is the pkview.msc management console which you typically find on a CA. Right-click the root and choose manage AD containers to view the store.

A second important fact regarding the NTauth store. Whilst you might see the require CA certificate in the store in AD, your clients and servers will only download the content of the AD NTauth store IF they have auto-enrollment configured!

Summary:

There are definitely some drawbacks to using EID in a corporate environment:

  • No management software to link the certificates to the AD users. Yes there’s active directory users and computers, but you’ll have to ask the users to either come visit your helpdesk or email their certificate. Depending on the number of users in your organisation this might be a hell of a task. A custom tool might be a way to solve this.
  • Regular maintenance: as described, quite regular a new Citizen CA (Subordinate Certificate Authority) is issued. You need to ensure your domain controllers have this CA in their trusted intermediate authorities store. This can be done through GPO, but this particular setting seems hard to automated. You might be better off with a script that performs this task directly on your domain controllers.
  • Helpdesk users will have to face the complexity if the require a smart card setting is enabled.
  • If an EID is stolen/lost you might have to temporary allow normal logons for that user. An alternative is to have a batch of smart cards that you can issue yourself. An example vendor for such smart cards is Gemalto.
  • An other point that I didn’t had to chance to test though. What about the password of the users. If they can’t use it to logon, but the regular password policies still apply, how will they be notified of the expiration? Or even better, how will they change it? Some applications might depend on the username/password to logon.

As always, feedback is welcome!

0 comments

Error Loading Direct Access Configuration

Published on Wednesday, October 1, 2014 in ,

This morning I wanted to have a quick look at our Direct Access infrastructure and when opening the console I got greeted with various errors all explaining that there was a configuration load error:

Capture2

In words: ICMP settings for entry point cannot be determined. Or:

Capture

In words: Settings for entry point Load Balanced Cluster cannot be retrieved. The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.

Because initially I only stumbled upon the ICMP settings … error I had to dig a bit deeper. I searched online on how to enable additional tracing capabilities, but I couldn’t find anything. From reverse engineering the RaMmgmtUI.exe I could see that more than enough tracing was available. Here’s how to enable it:

EnableTrace

Create a REG_DWORD called DebugFlag below HKLM\SYSTEM\CurrentControlSet\Services\RaMgmtSvc\Parameters. For our purpose we’ll give it a value of 8. For an overview of the possible values:

TraceLevels

I’m not sure if you can combine those in some way. After finding this registry key, I was able to find the official article on how do this: TechNet: Troubleshooting DirectAccess. I Should have looked a bit better for that information perhaps… After closing the Remote Access Management Console and opening it again, the log file was being filled up:

Tracing

You can find the trace file in c:\Windows\Tracing and it’s called RaMgmtUIMon.txt After opening the file I stumbled across the following error:

2112, 1: 2014-09-30 11:51:43.116 Instrumentation: [RaGlobalConfiguration.AsyncRefresh()] Exit
2112, 12: 2014-09-30 11:51:43.241 ERROR: The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.
2112, 9: 2014-09-30 11:51:43.242 Failed to run Get-CimInstance

I then used PowerShell to try to do the same: connect to the other DA node using WinRM:

WinRMerr

The command: winrm get winrm/config –r:HOSTNAME The error:

WSManFault
    Message = The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.

Error number:  -2144108297 0x803380F7
The WinRM client cannot process the request. It cannot determine the content type of the HTTP response from the destination computer. The content type is absent or invalid.

Googling on the error number 2144108297 quickly got me to the following articles:

Basically I was running into this issue because my AD user account was member of a large amount of groups. The MaxTokenSize has been raised in Windows 2012 (R2) so that’s already covered, but winhttp.sys, which WinRM depends on, hasn’t. When running into Kerberos token bloat issues on web applications, typically the MaxRequestBytes and MaxFieldLength values have to be tweaked a bit.

RegHTTP

There are various ways to configure these. Using GPO or a manual .reg file that you can just double click:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\HTTP\Parameters]

"MaxRequestBytes"=dword:0000a000

"MaxFieldLength"=dword:0000ff37

In my environment I’ve set MaxRequestBytes to 40960 and MaxFieldLength to 65335. But I am by no means saying those are the advised values. It’s advised to start with a lower value and slightly increase until you’re good to go.

Conclusion: if you run into any of the above errors when using the Direct Access management console, make sure to check whether WinRM is happy. In my case WinRM was in trouble due to the size of my token.

0 comments

S.DS.AM GetAuthorizationGroups() Fails on Windows 2008 R2/WIN7

Published on Thursday, September 25, 2014 in ,

Today I got a call from a colleague asking me to assist with an issue. His customer had a Windows 2008 R2 server with a custom .NET application on it. The application seemed to stop working from time to time. After a reboot the application seemed to work for a while.

The logging showed a stack trace that started at UserPrincipals.GetAuthorizationGroups and gave the message: An error (1301) occurred while enumerating the groups. The group's SID could not be resolved.

Exception information:
Exception type: PrincipalOperationException
Exception message: An error (1301) occurred while enumerating the groups. 
The group's SID could not be resolved.

at System.DirectoryServices.AccountManagement.SidList.TranslateSids(String target, IntPtr[] pSids)
at System.DirectoryServices.AccountManagement.SidList..ctor(SID_AND_ATTR[] sidAndAttr)
at System.DirectoryServices.AccountManagement.AuthZSet..ctor(Byte[] userSid, NetCred credentials, ContextOptions contextOptions, String flatUserAuthority, StoreCtx userStoreCtx, Object userCtxBase)
at System.DirectoryServices.AccountManagement.ADStoreCtx.GetGroupsMemberOfAZ(Principal p)
at System.DirectoryServices.AccountManagement.UserPrincipal.GetAuthorizationGroups()

The first thing that came to mind was that they deleted some groups and that the application wasn’t properly handling that. But they assured me that was not the case. The only thing they had changed that came to mind was adding a Windows 2012 domain controller.

I could easily reproduce the issue using PowerShell:

Function Get-UserPrincipal($cName, $cContainer, $userName){
    $dsam = "System.DirectoryServices.AccountManagement"
    $rtn = [reflection.assembly]::LoadWithPartialName($dsam)
    $cType = "domain" #context type
    $iType = "SamAccountName"
    $dsamUserPrincipal = "$dsam.userPrincipal" -as [type]
    $principalContext =
        new-object "$dsam.PrincipalContext"($cType,$cName,$cContainer)
    $dsamUserPrincipal::FindByIdentity($principalContext,$iType,$userName)
}

[string]$userName = "thomas"
[string]$cName = "contoso"
[string]$cContainer = "dc=contoso,dc=local"

$userPrincipal = Get-UserPrincipal -userName $userName '
    -cName $cName -cContainer $cContainer

$userPrincipal.getGroups()

Source: Hey Scripting Guy

Some googling led me to:

In short, it seems that when a 2012 domain controller was involved, the GetAuthorizationGroups() function would choke on two new groups (SIDs) that are added to a user by default. Patching the server running the application was enough in order to fix this.

The issue wasn’t really hard to solve as the solution was easy to find online, but I think it’s a great example of the type of application/code to give a special look when you’re testing your AD upgrade.

3 comments

Active Directory: Lsass.exe High CPU Usage

Published on Thursday, September 18, 2014 in , , ,

Recently I had an intriguing issue at one of my customers I frequently visit. They saw their domain controllers having a CPU usage way above what we normally saw. Typically their domain controllers have between 0 – 15% CPU usage whereas now we saw all of their domain controllers running at 80 – 90% during business hours. A quick look showed us that the process which required this much CPU power was lsass.exe. Lsass.exe is responsible for handling all kind of requests towards Active Directory. If you want you can skip to the end to find the cause, but I’ll write this rather lengthy post nevertheless so that others can learn from the steps I took before finding the answer.

Some background: the environment consists of approximately 20.000 users. Technologies in the mix: Windows 7, Windows 2008 R2, SCCM, Exchange, … Our Domain Controllers are virtual, run 2008 R2 x64 SP1, have 4 vCPU’s and 16 GB RAM. RAM is a bit oversized but CPU is really the issue here. There are 8 Domain Controllers.

Here you can see a screenshot of a more or less busy domain controller. It’s not 80% on this screenshot but it’s definitely much more than we are used to see. Task Manager shows us without doubt that lsass.exe is mainly responsible.

 _1 _2

Whenever a domain controller (or server) is busy and you are trying to find the cause, it’s a good idea to start with the built-in tool Perfmon. In the Windows 2003 timeframe articles you’ll see that they mention SPA (Server Performance Advisor) a lot, but for Windows 2008 R2 and up you don’t need this separate download anymore. Its functionality is included in Perfmon. Just go to Start > Run > Perfmon

_3

Open up Data Collector Sets > System and right-click Active Directory Diagnostics > Start. By default it will collect data for 5’ and then it will compile a nice HTML report for you. And that’s where things went wrong for me. The compiling seemed to take a lot of time (20-30 minutes) and after that I ended up with no performance data and no report. I guess the amount of data to be processed was just to much. I found a workaround to get the report though:

While it is compiling: copy the folder mentioned in “Output” to a temporary location. In my example C:\PerfLogs\ADDS\20140909-0001

If the report would fail to compile you will see that the folder will be emptied. However we can try to compile to report for the data we copied by executing the following command:

  • tracerpt *.blg *.etl -df RPT3870.tmp-report report.html -f html

The .tmp file seems to be crucial for the command and it should be present in the folder you copied. Normally you’ll see again that lsass.exe is using a lot of CPU:

1.cpu

The Active Directory section is pretty cool and has a lot of information. For several categories you can see the top x heavy CPU queries/processes.

_4

A sample for LDAP queries, I had to erase quite some information as I try to avoid sharing customer specific details.

2. LDAPCPU

However in our case, for all of the possible tasks/categories, nothing stood out. So the search went on. We took a network trace on a domain controller using the built-in tracing capabilities (Setspn: Network Tracing Awesomeness)

  • Netsh trace start capture=yes
  • Netsh trace stop

In our environment taking a trace of 20 minutes or 1 minute resulted in the same. Due to the large amount of traffic passing by, only 1 minute of tracing data was available in the file. The file was approximately 130 MB.

Using Microsoft Network Monitor (Microsoft.com: Network Monitor 3.4 ) the etl file from the trace can be opened and analyzed. However: 1 minute of tracing contained about 328041 frames (in my case). In order to find a needle in this haystack I used the Top Users expert (Network Monitor plugin): Codeplex: NMTopUsers This plugin will give you an overview of all IP’s that communicated and how many data/packets they exchanged. It’s an easy way to see which IP is communicating more than average. If that’s a server like Exchange that might be legit, but if it’s a client something might be off.

As a starting point I took an IP with many packets:

3. TOPusers

I filtered the trace to only show traffic involving traffic with that IP

  • IPv4.Address == x.y.z.a

4. Trace

Going through the data of that specific client seemed to reveal a lot of TCP 445 (SMB) traffic. However in Network Monitor this was displayed as all kinds of protocols:

  • SMB2
  • LSAD
  • LSAT
  • MSRPC
  • SAMR

As far as I can tell it seems that TCP 445 (SMB) is being used as a way for certain protocols to talk to Active Directory (lsass.exe on the domain controller). When a client logs on, you could explain TCP 445 as group policy objects being downloaded from the SYSVOL share. However in our case that definitely didn’t seem to be case. Browsing a bit through the traffic, it seemed that the LSAT and SAMR messages were more than interesting. So I changed my display filter:

  • IPv4.Address == x.y.z.a AND (ProtocolName == “LSAT” OR ProtocolName == “SAMR”)

image

This resulted in traffic being displayed which seemed pretty easy to read and understand:

image

It seemed that the client was doing a massive amount of LsarLookupNames3 request. A typical request looked like this:

5.TraceDet

Moreover by opening multiple requests I could see that each request was holding a different username to be looked up. Now why would a client be performing queries to AD that seemingly involved all (or a large subset) of our AD user accounts. In my case, in the 60’ seconds trace I had my top client was doing 1015 lookups in 25 seconds. Now that has to count for some load if potentially hundreds of clients are doing this.

As the traffic was identified as SMB, I figured the open files (Computer Management) on the domain controller might show something:

6.Open

Seems like a lot client have a connection to \lsarpc

7.open

And \samr is also popular. Now I got to be honest, both lsarpc and samr were completely unknown to me. I always thought whatever kind of lookups have to be performed against AD are over some kind of LDAP.

After using my favourite search engine I came up with a bit more background information:

It seems LsarOpenPolicy2, LsarLookupNames3, LsarClose are all operations that are performed against a pipe that is available over TCP 445. The LsarLookupNames3 operation seems to resolve usernames to SIDs.

Using the open files tool I tried identifying clients that were currently causing traffic. Keep in mind that there’s also traffic to those pipes that are absolutely valid. In my case I took some random clients and used “netstat –ano | findstr 445” to check if the SMB session remained open for more than a few seconds. Typically this would mean the client was actually hammering our domain controllers.

image

8.Trace

From this trace we can see that the process ID of the process performing the traffic is 33016. Using Task Manager or Process Explorer (SysInternals) you can easily identifiy the actual process behind this ID.

9.PE

We could see that the process was WmiPrvSe.exe and that it’s parent process was Svchost.exe (C:\WINDOWS\system32\svchost.exe -k DcomLaunch). If you see multiple WmiPrvSe.exe processes, don’t be alarmed. Each namespace (e.g. root\cimv2) has it’s own process. Those processes are only running when actual queries are being handled.

10.peHover

In the above screenshot you can see, by hovering over the process, that this instance is responsible for the root\CIMV2 namespace. The screenshot was taken from another problem instance so the PID does not reflect the one I mentioned before.

The properties of the process / The threads tab:

image image

You can clearly see that there’s a thread at the top which draws attention. If we open the stack:

image

Use the copy all button to copy paste in a text file:

RPCRT4.dll!I_RpcTransGetThreadEventThreadOptional+0x2f8

RPCRT4.dll!I_RpcSendReceive+0x28

RPCRT4.dll!NdrSendReceive+0x2b

RPCRT4.dll!NDRCContextBinding+0xec

ADVAPI32.dll!LsaICLookupNames+0x1fc

ADVAPI32.dll!LsaICLookupNames+0xba

ADVAPI32.dll!LsaLookupNames2+0x39

ADVAPI32.dll!SaferCloseLevel+0x2e26

ADVAPI32.dll!SaferCloseLevel+0x2d64

cimwin32.dll!DllGetClassObject+0x5214

framedynos.dll!Provider::ExecuteQuery+0x56

framedynos.dll!CWbemProviderGlue::ExecQueryAsync+0x20b

wmiprvse.exe+0x4ea6

wmiprvse.exe+0x4ceb

..

We can clearly see that this process is the one performing the lookups. Now the problem remains… Who ordered this (WMI) query with the WMI provider? WmiPrvSe is only the executor, not the one who ordered it…

In order to get to the bottom of this I needed to find out who performed the WMI query. So enabling WMI tracing was the way to go. In order to correlate the events, it might be convenient to know when the WmiPrvSe.exe procss was created so that I know that I’m looking at the correct events. I don’t want to be looking at one of the the many SCCM initiated WMI queries! In order to know what time we should be looking for events, we’ll check the Security event log.

Using calc.exe we can easily convert the PID: 33016 to HEX: 80F8. Just set it to programmer mode, enter it having Dec selected and then select Hex

image image

In our environment the security event log has entries for each process that is created. If that’s not the case for you, you can play around with auditpol.exe or update your GPO’s. We can use the find option and enter the HEX code to find the appropriate entry in the security event log. You might find some non related events, but in my case using the HEX code worked out pretty well. The screenshot is from another problem instance, but you get the point.

11.ProcWmiCr

So now all we need is WMI tracing. Luckily with Windows 7 a lot of this stuff is available from with the event viewer: Applications and Services Log > View > Show Analytic And Debug Logs

image

Microsoft > Windows > WMI-Actvitiy > Trace

image

One of the problems with this log is that it fills up quite fast. Especially in an environment where SCCM is active as SCCM relies on WMI extensively. This log will show each query that is handled by WMI. The good part is that it will include the PID (process identifier) of the process requesting the query. The challenge here was to have tracing enabled, make sure this specific log wasn’t full as then it would just drop new events, and have the issue, which we couldn't reproduce, occur… With some hardcore patience and a bit of luck I found an instance pretty fast.

So I correlated the events with the process creation time of the WmiPvrSe.exe process:

First event:

12 Tr1

Followed by:

13 Tv2

Basically it’s a connect to the namespace, execute a query, get some more information and repeat process. Some of the queries:

  • SELECT AddressWidth FROM Win32_Processor
  • Select * from __ClassProviderRegistration
  • select __RELPATH, AddressWidth from Win32_Processor
  • select * from msft_providers where HostProcessIdentifier = 38668
  • SELECT Description FROM Win32_TimeZone

And the last ones:

14TR

15 Tr

GroupOperationId = 260426; OperationId = 260427; Operation = Start IWbemServices::ExecQuery - SELECT Name FROM Win32_UserAccount; ClientMachine = Computer01; User = NT AUTHORITY\SYSTEM; ClientProcessId = 21760; NamespaceName = \\.\root\CIMV2

16 Tr

Now the Select name FROM Win32_UserAccount seems to be the winner here. That one definitely seems to be relevant to our issue. If we open up the MSDN page for WIN32_UserAccount: link there’s actually a warning: Note  Because both the Name and Domain are key properties, enumerating Win32_UserAccount on a large network can negatively affect performance. Calling GetObject or querying for a specific instance has less impact.

Now the good part PID 21760 actually leads to something:

17 PE

The process we found seems to be a service from our antivirus solution:

test

The service we found to be the culprit is the McAfee Product Improvement Program service (TelemetryServer, mctelsvc.exe). Some background information from McAfee: McAfee.com: Product Improvement Program

In theory this is the info they are gathering:

  • Data collected from client system
  • BIOS properties
  • Operating System properties
  • Computer model, manufacturer and total physical memory
  • Computer type (laptop, desktop, or tablet)
  • Processor name and architecture, Operating System architecture (32-bit or 64-bit), number of processor cores, and number of logical processors
  • Disk drive properties such as name, type, size,and description
  • List of all third-party applications (including name, version, and installed date)
  • AV DAT date and version and Engine version
  • McAfee product feature usage data
  • Application errors generated by McAfee Processes
  • Detections made by On-Access Scanner from VirusScan Enterprise logs
  • Error details and status information from McAfee Product Logs

I don’t see any reason why they would need to execute that query… But the event log doesn’t lie. In order to be absolutely sure that this is the query that resulted in such a massive amount of traffic we’ll try to execute the query we suspect using wbemtest.

Start > Run > WbemTest

image image

image image

image

If you have Network Monitor running alongside this test you’ll see a lot of SAMR traffic passing by. So it seemed to be conclusive. The McAfee component had to go. After removing the Product Improvement Program component of each pc we can clearly see the load dropping:

image

To conclude: I know this is a rather lengthy post and I could also just have said “hey, if you have a lsass.exe CPU issues, just check if you got the McAfee Product Improvement Program component running” but with this kind of blog entries I want to share my methodology with others. For me troubleshooting is not only fixing the issue at hand. It’s also about getting an answer (what? why? how?) and it’s always nice if you learn a thing or two on your quest. In my case I learned about WMI tracing and lsarpc/samr. As always feedback is appreciated!