Musings of a Data professional

Stuart Moore

Chunking files into sets of a certain size or number of files with PowerShell

Using PowerShell to scratch another itch. This time though it wasn’t a work or SQL Server related itch for once. I’m a keen cyclist and like to keep track of all my rides, so I’ve been experimenting with various online ride trackers. And I’ve settled on Strava as I can have cool little online badges like this:

And race myself on my own segments.

The only problem was transferring a couple of years worth of data across from other sites or off of my hard drive. Strava has a handy multiple file uploader, but it only allows you to upload up to 25MB or 25 files at a time. To streamline this I wanted to run through all my old files and put them into folders so that each folder contained less than 25MB of data or less than 25 files. Enter PowerShell:

#folder containing files to be "chunked"
$file_to_process = Get-ChildItem F:\strava-tmp

#Hold the current total size of files
$current_total_size = 0

#Set the maximum total size of files for one folder (in this case 24MB)
$max_size = 24*1024*1024

#Set maximum number of files in a folder
$max_count = 25

#Folder numbering count
$current_folder_index = 1

#Count of files in folder
$current_file_count = 0

#Base folder for output folders
$output_base = "f:\strava2"

new-item "$output_base\folder$current_folder_index" -ItemType directory
$fullout = "$output_base\folder$current_folder_index"
foreach ($file in $file_to_process){
    $tmp_size = $file.Length + $current_total_size
    if (($tmp_size -lt $max_size) -and ($current_file_count -lt $max_count)){
        copy-item $file.FullName -destination $fullout
        $current_total_size = $current_total_size + $file.Length
        $current_file_count++
    }else{
        $current_folder_index++
        new-item "$output_base\folder$current_folder_index" -ItemType directory
        $fullout = "$output_base\folder$current_folder_index"
        $current_total_size= $file.length
        copy-item $file.FullName -destination $fullout
        $current_file_count = 1
    }
}

I’d have then liked to have used <code>Invoke-WebRequest</code> to do the actual uploading for me. But it appears that Strava’s v3 API is invite only, which isn’t that useful really.

This script also comes in handy for splitting files up for emailing if you have a maximum attachment limit on your account.

Previous

Migrating Databases From SQL Server 2000 to SQL Server 2012 using PowerShell

Next

SQL Saturday Cambridge only 4 weeks away!

3 Comments

  1. klynton

    Great script, working very good, but I have one question – what for are these parameters? How they works?
    #Count of files in folder
    $current_file_count = 0
    #Hold the current total size of files
    $current_total_size = 0

    • Avatar photo

      Hi, took me a while to remember as well.

      They’re just used to keep track of the number of files in the current folder, and the total size of the files in the current folder.

      With this example I didn’t want more than 25 files

        or

      more than 24MB of files in each folder. Each time through the loop I incremeent the $current_file_count by 1, and add the size of the added file to $current_total_size.

      Next time through is they evaluate to 25 or 24MB then a new folder is created, and the values are set to 1 and the size of the file respectively.

      Hopefully that makes sense?

      • klynton

        Now I get it 🙂 Thanks for explanation. I will use your script with pleasure.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén