Archive more than one million files in one folder
Archive more than one million files located in one folder

Archive more than one million files located in one folder

Windows operating systems are not good candidates to host and manage million files located in only folder (no subfolders). If you try to list the content of the folder (using the GUI Windows File Explorer or the Get-ChildItem Powershell cmdlet), it will take several minutes to get the result.

The mission was to reduce the amount of files in this directory. I have created a script to manage that :

  • list the folder with the dir DOS command and limit to the first 5000 results
  • use 7zip to put them into a zip file named using the “date modified” information of the file : if the file has been modified on the 27.03.2015, it will be archived the zip file called _archive_201503.zip
  • if the zip operation is successful by checking the 7zip result code, the file is deleted
  • only files older than 3 months are archived

All changes are logged into a log file called _archive_yyyy-MM-dd.log

The performance is not impressive but efficient : this script can archive 5000 files in 10 minutes

$src_folder = "D:\folder_that_contains_your_files" 

set-alias sz "$env:ProgramFiles\7-Zip\7z.exe"
[Reflection.Assembly]::LoadWithPartialName('System.IO.Compression.FileSystem')
$olddateinfo = (Get-Date).AddMonths(-3)
$filepool_size=5000
$skipfirstline_dircmd=7
$skip_again = 0
$skipline_num = 0
$try_again=0

$OutputFileLocation = $src_folder + "\_archive_" + (Get-Date).tostring("yyyy-MM-dd") + ".log"

$ErrorActionPreference="SilentlyContinue"
Stop-Transcript | out-null
$ErrorActionPreference = "Continue"
Start-Transcript -path $OutputFileLocation -append
(Get-Date).tostring() + ": Archive script begin" | out-host

$shiftlistnum = 0
 
for ($i=0; $i -le 100000 ;$i++) {
    $file_arr = @()
    $file_list = @()

    if ($try_again -eq 1) {
        $i=$i-1
    }

    $skipline_num = ($i*$filepool_size)+$skipfirstline_dircmd+$shiftlistnum
    $file_list = cmd /c dir "$src_folder" | select -first $filepool_size -skip $skipline_num 
    
    if ($file_list.length -gt 1) {
        $file_list| % { 
            if ($_ -notmatch "zip|lst|err") { 
                $file_name = $_.substring(36)
                $file_date = (get-date $_.split(" ")[0]).tostring("yyyyMMdd")

                if ((get-date $_.split(" ")[0]) -gt $olddateinfo) {
                    $shiftlistnum += 1
                }
                else {
   		            $Properties = @{date=$file_date;filename=$file_name}
		            $Newobject = New-Object PSObject -Property $Properties
		            $file_arr +=$Newobject
                }
            }
        }

        if ($file_arr.length -gt 1) {
            $file_arr_date = $file_arr.date | sort -Unique

            $file_arr_date | % {
                $datecomp = $_
                $file_arr_filtr = $file_arr | ? { $_.date -eq $datecomp }

	            $xml_archive_file = "_archive_" + $_ + ".zip"
	            $xml_archive_file_w_path = $src_folder + "\" + $xml_archive_file

	            $xml_archive_lst = "_archive_" + $_ + "_" + $i + ".lst"
	            $xml_archive_lst_w_path = $src_folder + "\" + $xml_archive_lst
	            $xml_archive_lst_w_path7z = "@"+$xml_archive_lst_w_path

                (Get-Date).tostring() + ": Begin create file list for $_ ($xml_archive_lst_w_path)" | out-host
                $file_arr_filtr | % {
                    $src_folder + "\" + $_.filename | Out-File $xml_archive_lst_w_path -NoClobber -Force -Append -Encoding ASCII
                }
                (Get-Date).tostring() + ": End create file list for $_" | out-host

                (Get-Date).tostring() + ": Begin create the archive file" | out-host
                $7zoutput = sz a $xml_archive_file_w_path $xml_archive_lst_w_path7z 2>&1
                $7zoutput  | out-host
                if ($7zoutput -match "Everything is Ok") { 
                    (Get-Date).tostring() + ": No problem found. All files from $_ are properly archived. Begin removal" | out-host
        	    set-location $src_folder
        	    gc $xml_archive_lst_w_path | % { remove-item $_ -Force }
                    remove-item -force "$xml_archive_lst_w_path"
                }
                (Get-Date).tostring() + ": End create the archive file" | out-host
                $try_again=1
            }
        }
        else {
            (Get-Date).tostring() + ": no files to archive in the pool number $i" | out-host 
            $shiftlistnum = 0
            $try_again=0
        }
    }
    else {
        write-host "BREAK FILELIST EMPTY" 
        exit
    }
}

(Get-Date).tostring() + ": Archive script end" | out-host
Stop-Transcript

<>

My Powershell script categories

Archive more than one million files located in one folder

Leave a Reply

Your email address will not be published.