Maintain SharePoint Workflow History Lists

By James|03/01/2018|,

Overview

A daily Workflow Auto Cleanup job runs to permanently delete workflow instances and related task entries that still exist 60 days after a workflow is completed or canceled.  Workflow history items themselves are not deleted, but the entry point to view them on the status page for a particular instance of a workflow will no longer be available.  As the workflow history and task lists grow in size, site performance may be compromised.

Personal Experience

While perusing storage metrics I noticed a workflow history list that was taking up 4 GB of space.  Upon further investigation, I found the workflow history list contained more than 8.2 MILLION items.  This prompted me to develop PowerShell scripts to 1) locate and provide information regarding our workflow history lists and 2) remove old workflow history items older than 4 months.  During discovery I found several other workflow history lists with millions of items.

Locating Workflow History Lists

The following PowerShell script will locate and display all workflow history lists.  The script provides insight regarding site title, URL, list name and item count for each discovered workflow history list.

# Title:   Locate-WorkflowHistory.ps1
# Version: 1.0, 28FEB18
# Author:  James Sanders
# Purpose: Find all workflow history lists

Add-PSSnapin Microsoft.SharePoint.PowerShell -EA SilentlyContinue

$WebApp = Get-SPWebApplication

ForEach ($SPSite in $WebApp.Sites) {
  ForEach ($Web in $SPSite.AllWebs) {
    For ($i = 0;$i -ne $Web.lists.count;$i++) {
      $List = $Web.Lists[$i]
      If ($List.BaseTemplate -eq "WorkflowHistory") {
        $O = New-Object PSObject
        $O | Add-Member NoteProperty Site $($List.ParentWeb.Title)
        $O | Add-Member NoteProperty URL $($List.RootFolder.ServerRelativeUrl)
        $O | Add-Member NoteProperty Title $($List.Title)
        $O | Add-Member NoteProperty Count $($List.ItemCount)
        $O | FL
      }
    }
    $Web.Dispose()
  }
}

Purge Workflow History List

The first thing to determine is how much workflow history you want to keep.  The next step is actually purging the workflow history list.  Information Management Retention Policies can be configured on the affected list(s), but they don't work very well when there are loads of items in the list(s).  I recommend purging the list(s) with PowerShell and THEN creating the policies.

The following script will purge a workflow history list.  The list is purged in batches of 100 items at a time to minimize performance impact.

# Title:   Cleanup-WorkflowHistory.ps1
# Version: 1.0, 27FEB18
# Author:  James Sanders
# Purpose: Clean up large workflow history lists

# Configure environment
# The $webURL variable should point to the site with the workflow history list that you want to purge
# The $cutoffDate determines which items to keep.  As configured, the cutoff date will be
# current date - 120 days.  Any items older than 120 days will be removed
$webURL   = "https://portal/site/subsite"
$cutoffDate = (Get-Date).AddDays(-120)

# Add the PowerShell Snap-In
Add-PSSnapin Microsoft.SharePoint.PowerShell -EA SilentlyContinue

# Open the web site
Write-Host "`n- Opening web site $webURL"
$web = Get-SPWeb $webURL -ErrorAction SilentlyContinue
if (!($web)) {
  Write-Host -ForegroundColor Red "- Unable to open web site $webURL"
  Exit
}

# Open Source List
Write-Host "- Opening workflow history list"
$list = $web.lists["Workflow History"]
if (!($list)) {
  Write-Host -ForegroundColor Red "- Unable to open Workflow History list"
  $web.Dispose()
  Exit
}

DO {
  # Execute query
  # Using an SPQuery so that results can be limited ($rowlimit variable).
  $cutoffDateF = "{0:MM/dd/yyyy}" -f $cutoffDate
  Write-Host "- Looking for items where 'Date Occurred' <= $cutoffDateF"
  Write-Host
  $rowlimit = 100
  $caml="<OrderBy><FieldRef Name='ID' Ascending='TRUE' /></OrderBy><OrderBy><FieldRef Name='ID' /></OrderBy>"
  $camlQuery = New-Object Microsoft.SharePoint.SPQuery
  $camlQuery.RowLimit = $rowLimit
  $camlQuery.Query = $caml
  $items = $list.GetItems($camlQuery)

  If (!($items)) {
    Write-Host -ForegroundColor Red "- No items to process"
    Break
  }

  # Delete workflow history items that are before the cutoff date
  # As we are directly deleting and NOT recycling, we process the items in reverse so as to not throw a "Collection was modified; enumeration operation may not execute..." error!
  # If we were recycling, we could use a ForEach loop.
  $itemTotal = $items.Count
  $itemCurrent = 1
  $itemsDeleted = 0
  Write-Host "- Items to process: $($itemTotal)"
  $timeStarted = Get-Date
  For ($i=$items.count-1; $i -ge 0; $i--) {
    $item = $items[$i]
    $itemCurrentF = $itemCurrent.ToString().PadLeft($itemTotal.ToString().Length,"0")
    $dateOccurredF = "{0:MM/dd/yyyy}" -f $item["Date Occurred"]
    Write-Host "- Processing item $itemCurrentF of $($itemTotal) ... " -NoNewline
    If ($item['Date Occurred'] -lt $cutoffDate) {
      $item.Delete()
      Write-Host "Deleted - $dateOccurredF <= $cutoffDateF"
      $itemsDeleted++
    }
    Else {
      Write-Host "Ignored - $dateOccurredF => $cutoffDateF"
    }
    $itemCurrent++
  }

  # Statistics
  $timeFinished = Get-Date
  $timeDuration = ("{0:hh\:mm\:ss}" -f ($timeFinished - $timeStarted))
  If ($timeDuration.Length -gt 8) { $timeDuration = $timeDuration.Substring(0,12) }
  Else { $timeDuration += ".000" }
  $timeTotalSeconds = [Math]::Round(($timeFinished - $timeStarted).TotalSeconds)
  $deletedPerSecond = [Math]::Round($itemsDeleted / $timeTotalSeconds)
  Write-Host
  Write-Host "Batch Started:  $timeStarted"
  Write-Host "Batch Finished: $timeFinished"
  Write-Host "Batch Duration: $timeDuration"
  Write-Host
  Write-Host "Total Seconds:  $timeTotalSeconds"
  Write-Host "Items Deleted:  $itemsDeleted"
  Write-Host "Deleted/Second: $deletedPerSecond"
  Write-Host
} UNTIL ($itemsDeleted -le 0)

# Clean up
$web.dispose()

Information Management Retention Policies

After the workflow lists have been purged, create Information Management Retention Policies to keep them cleaned up.

  1. Open the Workflow history list (site URL/Lists/Workflow History
  2. Open the list settings page
  3. Select “Information management policy settings” in the Permissions and Management section. If you don’t see this option, you may need to enable the site collection feature "Library and Folder Based Retention".
  4. Under Content Type Policies, select the Workflow History content type
  5. Check/Enable the “Enable Retention” check box
  6. Click “Add a retention stage”
  7. From the Stage Properties dialog under Event > Time Period change the settings to Date Occurred + 120 days   (to purge any items over 4 months old)
  8. Under Action select Permanently Delete
  9. Click OK to close the Stage Properties dialog
  10. Click OK to save the list policy.  The expired items will be deleted the next time the timer jobs run.
  11. Open Central Administration -> Monitoring -> Review Job Definitions
  12. In the list of timer jobs, select Information management policy and schedule the job.  Then, select Run Now.
  13. In the list of timer jobs, select Expiration policy and schedule the job.  Ensure previous job finishes running, then select Run Now.
Copyright 2011 - 2024 The Lazy IT Admin | All Rights Reserved
menu-circlecross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram