Musings of a Data professional

Stuart Moore

Finding the bottom folders of a file tree using PowerShell

There’s a simple way of getting Powershell to tell you the last folder in a known path:

$path = \\server\share\folder1\folder2\folder3
$folder = Split-Path $path -Leaf

and that’s all you need.

Now, how do you go about doing that if you don’t know the path ahead of time? That is, you have a large collection of directories, and all you want are those folders than make up the bottom levels of the branches. In this graphic it would be all the folders underlined.

Example of bottom folders

As anΒ example, current $EMPLOYER holds backups from all SQL databases on a share, with each database’s backups being in a folder. Due to various DBAs over the years the depth of the tree to the actual backups isn’t fixed. So I need to find the “bottom” folder so I can process the backup files in there.

Well, that’s a little trickier. So lets have a look at a couple of ways of doing that:

First of all let’s get ourselves a nice object containing all the potential directories:

$dirs = Get-ChildItem "\\server1\backups$" -Recurse -Directory

On my test subject, this contains 2233 directory objects of varying depths. I’ve pulled this out here so that I can compare the speed of the 3 approaches without this operation clouding anything.

The first option is based on looping through out $dirs object, calling Get-Childitem -Directory on each folder, and by definition a folder at the bottom of the tree will have no child directories so we check for that condition. Each time we find a folder with no child folders we add the path to a $children array:

$children1 = New-Object System.Collections.ArrayList
$time1start = get-date
foreach ($d in $dirs){
    if ((Get-ChildItem $d.fullname -Directory).count -eq 0){
        $children1.add($d.FullName)
    }
}
$time1finish = get-date
$time1taken = $t1f - $t2f
$count1 =  $children1.count

I’m wrapping each sample in a little bit of timing code so we can compare performance. In this case this routine takes 45.93 seconds against my test data, returning 1750 leaf folders.

Now, can we improve on that a little? Yes, we can. Version 1 requires an extra Get-ChildItem call for each folder, so in this case that’s an extra 2233 calls. Luckily for us, each DirectoryInfo object has a EnumerateDirectories method we can use, which is implemented at a lower level and saves calling another cmdlet:

$children2 = New-Object System.Collections.ArrayList

$time2start = get-date
foreach ($d in $dirs){
    if (($d.EnumerateDirectories()).count -eq $NULL){
        $children2.add($d.fullname)
    }
}
$time2finish = get-date
$time2taken = $time2finish-$time2start
$count2 = $children2.count

Since there are no child directories for a leaf folder EnumerateDirectories returns NULL rather than 0, which is why we have to check for that.

This approach returns the same number of leaf folders as the first, 1750. But this time it only took 26.48 seconds. Not too shabby an improvement I think you’d agree.

Being an inveterate tinkerer, I wondered if there was an even quicker way, one that doesn’t need to refer to the filesystem once we’ve populated $dir. And there is:

$children3 = New-Object System.Collections.ArrayList
$time3start = get-date
foreach ($d in $dirs){
    $t = split-path $d.FullName -Parent
    if ($children3 -notcontains $t){
        $children3.add($d.fullname)
    }else{
        $children3.remove($t)
        $children3.add($d.fullname)
    }
}
$time3finish = get-date
time3taken = $time3finish - $time3start
$count3 = $children3.count

We loop through each item in $dir as before, we use Split-Path to get the parent folder of the folder we’re currently looking at, then we take 1 of 2 steps:

  • If the parent folder doesn’t already exist in $children, then we add the full path to $children
  • If the parent folder does exist in $children, then we know it must have child folders, so we remove it and replace it with the full path

This method returns the same list of bottom folders. But, this time it took 1.9 seconds, which is a huge improvement over our first model.

As if often shown whenever you’re programming or writing scripts:

  • Avoid calling any extra commands, as there will be a price to pay
  • If you must call an extra method, use the most efficient one you can find
  • Working with the data you’ve already got is usually going to be the fastest way to do something
  • Extra calls to the filesystem are always slow

And if anyone has any strong opinions over what you’d call the folder at the bottom of a filesystem tree (bottom folder?, leaf folder?, last folder?, anti-root folder?) then please let me know!

Previous

Bulk adding Active Directory service accounts with PowerShell

Next

SQL Relay bringing free SQL Server Training to Nottingham

18 Comments

  1. Robert Leftwich

    Hi Stuart, thanks for your brilliant article and examples. I’ve beem mulling over a way to remove a large number of redundant files in various sub directory branches from user’s profiles. The files reside in various child folders at the bottom of each directory branch. Directories like box sync, one drive, docs open cache et cetera.
    These directories need to be purged. I tried dirstats function from scripting gallery to identify them but it doesn’t seem to handle wild cards in the search path of the form d:/profiles/%username%/appdir/blah1/blah2/yadayada/%username%/box sync/
    Be that as it may I will use your algorithms to help me identify the bottom directory strings of interest then determine how many files each contains.

    Having identified and quantified the directories i will then work out a way to cycle through them and delete the files in the bottom directories of interest.

    Will try it out tomorrow.

    • Avatar photo

      Hi Robert,
      Glad you’ve found it useful. Hope it works out for you, and if you want some advice feel free to ping me a message. Love seeing how people take these things and build on top of them.

  2. Jeff

    Hello Stuart,
    Thanks for the post and the comparisons between the different approaches. I have run this on a server and LT and found that option 2 Enumerate Directories is consistently faster. Using Powerhsell v5 in both instances.

    Don’t know why that would be different to your own results.

    • Avatar photo

      I expect there’s been some underlying improvements in PowerShell since I wrote the article. The PS team keep evaluating better ways of doing things, and there’s also the chance that an underlying .NET assembly has improved things as well.

      I’d always recommend testing code you find on the InterTubes for performance, as things move on quickly, and some things may depend on your setup. I know some of my old demos are no longer useful now I’ve a laptop with a flash drive, just too fast to make SQL Server sweat.

  3. Mauro

    Hi Stuart!
    Excelent post.
    do you know how to obtain a containing folder of a file?
    lets suppose the file is

    folder1\folder2\…\folderN\file.extension

    I want to obtain “folderN”

    • Avatar photo

      Hi,
      Assuming you’ve got file path as a FileInfo object with something like these:
      $file = Get-Item C:\dbatools\RestoreTimeStripe\Stripe1\restoretime_21.trn
      or
      $file = (Get-ChildItem C:\dbatools\RestoreTimeStripe\Stripe1\)[0]

      Then you can use the Directory property, which then has a name property:
      $file.Directory.Name
      which in both the cases above would return Stripe1

      And if all you have is a fragment like in your example you can use the System.Io.FileInfo class like this:
      $path = [System.Io.FileInfo]”folder1\folder2\…\folderN\file.extension”
      $path.Directory.Name

      which would return folderN

      Hope that’s helpful. Let me know if anything’s not clear, or not quite what you’re after

      • Mauricio Fuentes

        THANKS A LOT!!!!!
        It is what i’m looking for. But now I don’t know how to use it LOL, I’m trying to rename the files on a tree next way

        \day1\case1\{rename all files as (file1_A , file2_B, … so on)}
        \day1\case2\{same as case1}
        \day2\{same logic}

        I’ve tried using something like
        (Get-ChildItem)[0]| Rename-Item -NewName {$._Name replace (Get-ChildItem)[0].name , (Get-ChildItem)[0].directory.name}

        but returns error on $._Name is no cmdlet.
        You have any idea how to fix it?
        Thanks a lot for youre time.

        • Avatar photo

          Hi,
          The `$_` variable is automatically created inside a scriptblock, it’s part of the pipeline. Something similar to what you’re trying (I think!) would be this:
          “`
          if (Test-Path c:\temp\blog){
          Remove-Item c:\temp\blog -Recurse -force
          }
          New-Item -Path c:\temp\blog -ItemType Directory
          Set-Location C:\temp\blog

          New-Item -Path .\1\2 -ItemType Directory
          New-Item -Path .\1\3 -ItemType Directory
          New-Item -Path .\1\4 -ItemType Directory

          New-Item -Path .\1\2\File.txt -ItemType File
          New-Item -Path .\1\3\File.txt -ItemType File
          New-Item -Path .\1\4\File.txt -ItemType File

          Get-ChildItem .\ -Recurse -File | ForEach {ReName-Item -Path $_.FullName -NewName ($_.FullName -replace $_.BaseName, $_.Directory.Name) }
          “`

          Which renames all the Text files based on the folder they’re in. Because I’m piping the output of Get-ChildItem in the ForEach, $_ is a FileInfo object so I can use it’s properties in the scriptblock.

          The brackets around the replace section are to make that whole part resolve before it’s passed into NewName, or you’d get some funny syntax errors there.

  4. Arun Negi

    Hi Stuart,
    Thanks for the article!
    I had a query.
    Is there a way i can get only the top directory while using the below cmdlet?
    Get-ChildItem -Path $WorkingDirectory -Recurse

    Given below directory hierarchy structure
    Scripts
    -Script1(folder)
    -data(folder)
    files
    -Script2(folder)
    -data(folder)
    files
    .
    .
    .
    -ScriptN(folder)
    -data(folder)
    files

    I’m trying to replace a URL/hostname in a certain file present in each Script(n) folder and then zip the Script(n) folder.
    while trying to use the below code snippet even the data folder present under each Script(n) folder is getting zipped

    $WorkingDirectory = (Get-Location).path // here i am currently in Scripts folder
    $hostname=’www.anysite.com’
    Get-ChildItem -Path $WorkingDirectory -Recurse | Where-Object {$_.PSIsContainer} | ForEach-Object {
    $directoryFullName = $_.FullName
    $directoryName = $_.Name
    $directoryFullName1 = echo $directoryFullName”\*”
    $zipdrive = echo $directoryName”.zip”
    $zippedpath = echo $WorkingDirectory”\”$zipdrive
    (gc $directoryFullName\$file) | %{ $_ -replace ‘^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$|^([a-z0-9]*\.[a-zA-Z0-9]*\.com)$’, $hostname } | Set-Content $directoryFullName\$file
    Compress-Archive -Path $directoryFullName1 -DestinationPath $zippedpath

    }

    Is there any way we can restrict the forEach-Object loop to only act on the root folder and not the sub folders ( data)?
    Note: I’m not an expert in Powershell. learning it for the above specific task so the above code may not be the most optimised and yes the regex is not complete yet for the non-IP url bit πŸ™‚

    Regards,
    Arun

  5. arun negi

    Hi Stuart,

    Kindly ignore my earlier post.
    All i had to do was remove the Recurse flag πŸ™‚
    Regards,
    Arun

    • Avatar photo

      It’s always something simple πŸ˜‰

      Glad you’ve got it sorted

      • arun negi

        Thanks Stuart πŸ™‚

        I’m however facing a strange issue with an issue unrelated to this thread wrt Invoke-Webrequest cmdlet for invoking a REST API call to a cloud based service.
        With CURL i’m able to get the expected response, however with Invoke-Webrequest i have no idea what i’m missing. I would like to believe i’m posting the same payload but then i have no way of validating the same( what actually goes) as its a https REST API and not able to decrypt a network trace via Wireshark.

        lol if you are willing you hear i could post more details πŸ™‚

        Regards,
        Arun

        • Avatar photo

          Hi,
          Yes, if you want to post up the CURL and iwr versions I’ll have a look.

          • arun negi

            Thanks Stuart!

            Below is the curl command with deliberate bad password

            curl -s -X POST “https://stormrunner-load.saas.hpe.com/v1/auth?TENANTID=137729615” –header ‘Content-Type: application/json’ –header ‘Accept: application/json’ -d ‘{“user”:”‘arunnegi82@gmail.com'”,”password”:”‘wrongpassword'”}’

            Response ( gave bad password so expected response)
            {“error”:”failed fetching token. credentials failed (http 403)”}

            Below is what i have tried on powershell
            $postparams=@{User=”arunnegi82@gmail.com”;password=”wrongpassword”} | ConvertTo-Json
            $headers = @{ “ContentType” = “application/json”; }
            echo $postparams
            echo $headers

            Invoke-WebRequest -Uri “https://stormrunner-load.saas.hpe.com/v1/auth?TENANTID=137729615” -Method Post -Body $postparams -Headers $headers

            I get below response
            Invoke-WebRequest : The remote server returned an error: (400) Bad Request.

          • Avatar photo

            Hi,
            Had a play around, and afraid I can’t spot the problem πŸ™

            I’ve run the curl and iwr versions against httpbin.org to check everything, and it appear they’re sending pretty much the same data up. I’ve tried reading around the hpe API, but there doesn’t appear to be anything publicaly available.

          • arun negi

            Hi Stuart,
            i tried a slightly different variant and it miraculously worked. not sure what was the difference between this and the earlier one. I just converted to json later at the time of invoking the call and explicitly passed the Content type header.

            function gettoken{
            $postparams=@{user=”arunnegi82@gmail.com”;password=”password”}

            try { $response=Invoke-WebRequest “https://stormrunner-load.saas.hpe.com/v1/auth?TENANTID=137729615” -Body (ConvertTo-Json $postparams) -ContentType “application/json” -Method Post }
            catch {
            $streamReader = [System.IO.StreamReader]::new($_.Exception.Response.GetResponseStream())
            $ErrResp = $streamReader.ReadToEnd() | ConvertTo-Json
            $streamReader.Close()
            $ErrResp
            }

            There is one last API call left which involved upload of file. will try that and already dreading the issue that may pop up πŸ˜›

            Cheers,
            Arun

  6. Iain

    Thanks so much for this script. It really helped me out!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén