
Archive more than one million files located in one folder
Windows operating systems are not good candidates to host and manage million files located in only folder (no subfolders). If you try to list the content of the folder (using the GUI Windows File Explorer or the Get-ChildItem Powershell cmdlet), it will take several minutes to get the result.
The mission was to reduce the amount of files in this directory. I have created a script to manage that :
- list the folder with the dir DOS command and limit to the first 5000 results
- use 7zip to put them into a zip file named using the “date modified” information of the file : if the file has been modified on the 27.03.2015, it will be archived the zip file called _archive_201503.zip
- if the zip operation is successful by checking the 7zip result code, the file is deleted
- only files older than 3 months are archived
All changes are logged into a log file called _archive_yyyy-MM-dd.log
The performance is not impressive but efficient : this script can archive 5000 files in 10 minutes
$src_folder = "D:\folder_that_contains_your_files" set-alias sz "$env:ProgramFiles\7-Zip\7z.exe" [Reflection.Assembly]::LoadWithPartialName('System.IO.Compression.FileSystem') $olddateinfo = (Get-Date).AddMonths(-3) $filepool_size=5000 $skipfirstline_dircmd=7 $skip_again = 0 $skipline_num = 0 $try_again=0 $OutputFileLocation = $src_folder + "\_archive_" + (Get-Date).tostring("yyyy-MM-dd") + ".log" $ErrorActionPreference="SilentlyContinue" Stop-Transcript | out-null $ErrorActionPreference = "Continue" Start-Transcript -path $OutputFileLocation -append (Get-Date).tostring() + ": Archive script begin" | out-host $shiftlistnum = 0 for ($i=0; $i -le 100000 ;$i++) { $file_arr = @() $file_list = @() if ($try_again -eq 1) { $i=$i-1 } $skipline_num = ($i*$filepool_size)+$skipfirstline_dircmd+$shiftlistnum $file_list = cmd /c dir "$src_folder" | select -first $filepool_size -skip $skipline_num if ($file_list.length -gt 1) { $file_list| % { if ($_ -notmatch "zip|lst|err") { $file_name = $_.substring(36) $file_date = (get-date $_.split(" ")[0]).tostring("yyyyMMdd") if ((get-date $_.split(" ")[0]) -gt $olddateinfo) { $shiftlistnum += 1 } else { $Properties = @{date=$file_date;filename=$file_name} $Newobject = New-Object PSObject -Property $Properties $file_arr +=$Newobject } } } if ($file_arr.length -gt 1) { $file_arr_date = $file_arr.date | sort -Unique $file_arr_date | % { $datecomp = $_ $file_arr_filtr = $file_arr | ? { $_.date -eq $datecomp } $xml_archive_file = "_archive_" + $_ + ".zip" $xml_archive_file_w_path = $src_folder + "\" + $xml_archive_file $xml_archive_lst = "_archive_" + $_ + "_" + $i + ".lst" $xml_archive_lst_w_path = $src_folder + "\" + $xml_archive_lst $xml_archive_lst_w_path7z = "@"+$xml_archive_lst_w_path (Get-Date).tostring() + ": Begin create file list for $_ ($xml_archive_lst_w_path)" | out-host $file_arr_filtr | % { $src_folder + "\" + $_.filename | Out-File $xml_archive_lst_w_path -NoClobber -Force -Append -Encoding ASCII } (Get-Date).tostring() + ": End create file list for $_" | out-host (Get-Date).tostring() + ": Begin create the archive file" | out-host $7zoutput = sz a $xml_archive_file_w_path $xml_archive_lst_w_path7z 2>&1 $7zoutput | out-host if ($7zoutput -match "Everything is Ok") { (Get-Date).tostring() + ": No problem found. All files from $_ are properly archived. Begin removal" | out-host set-location $src_folder gc $xml_archive_lst_w_path | % { remove-item $_ -Force } remove-item -force "$xml_archive_lst_w_path" } (Get-Date).tostring() + ": End create the archive file" | out-host $try_again=1 } } else { (Get-Date).tostring() + ": no files to archive in the pool number $i" | out-host $shiftlistnum = 0 $try_again=0 } } else { write-host "BREAK FILELIST EMPTY" exit } } (Get-Date).tostring() + ": Archive script end" | out-host Stop-Transcript
My Powershell script categories
- Active Directory
- Cluster
- Database
- Exchange
- Files and folders
- Hardware
- Network
- Operating System
- PKI
- SCCM
- Service and process
- Tips
- VMWare
Archive more than one million files located in one folder