A Practical Way to Handle Large ZIP Imports in the Background using Laravel Commands

Introduction
Many organizations need to handle the upload of large files containing bulk data. A single file might contain hundreds or thousands of records, each with several images linked to it. If the system processes these uploads in one immediate step, it can lead to slow responses, timeouts, and a frustrating user experience.

By splitting the import into two parts—an immediate upload and a background processing phase—these issues can be significantly reduced. This method allows users to quickly submit their files, frees them from waiting on long processing times, and provides a structured way to capture and correct any errors.

How the Process Works

Below is a diagram that outlines the overall flow from the moment a user uploads a ZIP file to the completion of all data processing:

[User] -- uploads ZIP --> (BulkZipUploadService) -- [S3 Upload + DB "bulk_jobs" entry]
 [Background Worker / Console] -- sees "bulk_jobs" pending -->
     (ProcessBulkZipCommand) -- downloads ZIP from S3
         | unzips CSV + images
         | validates + inserts DB records
         | collects errors -> error.csv -> S3
         | updates "bulk_jobs" with results
         v
    [Notification and Email to user + error file if needed]

Step 1: Initial Upload
A person selects a ZIP file that contains both a CSV of data and one or more image files. This ZIP is sent to the server. The system saves the file in a storage location (such as S3 or a similar service) and creates a record in a table (often called bulk_jobs or another suitable name). The record notes that there is a new file waiting to be processed.

Step 2: Background Monitoring
A background process or worker frequently checks for jobs in the bulk_jobs table that are marked as “pending” or “new.” As soon as it finds one, it starts the next phase of work.

Step 3: Extraction and Parsing
The system downloads the ZIP file from storage to a temporary place and unzips its contents. It looks for the CSV and any images that are referenced. Each row in the CSV is read and mapped to the relevant fields, such as product identifiers, descriptions, or pricing. If multiple images are listed, for example by using a colon in the filename field, the system identifies all those files in the unzipped folder.

Step 4: Validation
For each row, the data is checked for correctness. The system may confirm that certain fields are present and valid, that numerical values are within expected ranges, and that references (like category IDs) match known records. If anything is missing or incorrect, it is noted as an error.

Step 5: Data Insertion and Image Handling
Once a row passes validation, the system inserts or updates it in the database (such as in a “products” table or equivalent). At the same time, the images connected to that row are uploaded or moved to the permanent storage location. The system then links those images to the item in the database, ensuring that each record references its associated files.

Step 6: Collecting Errors
If a row fails validation, the system captures which row it was and why it was not accepted. After all rows have been processed, the system creates an error file (usually in CSV format) that includes each failed row and the specific reason it was deemed invalid. This error file is also uploaded to storage, so it can be retrieved later if needed.

Step 7: Updating Records and Notifying Users
When processing is done, the system updates the job record in the database to reflect how many rows were processed successfully, how many failed, and whether an error file is available. A notification or email is then sent to the user who originally uploaded the ZIP, letting them know that the import is complete. If there are any errors, the user can download the error file, correct the data, and re-upload only the problematic rows.

Advantages of This Approach

Immediate Feedback
The user’s ZIP is accepted right away, with no need to wait for every line of the CSV to be checked. This provides a smoother experience and avoids timeouts.
Background Reliability
A background process has the flexibility to handle large numbers of rows and big images without being limited by the duration of a single web request. If something does go wrong, the process can handle errors more gracefully or attempt retries.
Clear Error Log
Instead of halting at the first invalid row, the process can gather every issue into one file. This allows the user to easily see all mistakes, fix them in one pass, and re-upload without guessing where the failures were.
Scalability
If the amount of data grows, more workers or processing instances can be added to handle multiple jobs in parallel. The core workflow remains the same.
Better Maintenance
By splitting uploading from processing, it becomes easier to maintain and debug each part. Developers can focus on improving data validation or image handling without risking an impact on the file upload flow.

Conclusion

This two-step method of immediate upload followed by background processing simplifies the handling of large CSV files with multiple images. It not only creates a faster and more stable experience for users but also gives the development team clearer control over error reporting and data validation. Whether managing a few dozen items or tens of thousands, this pattern can greatly reduce system strain and user frustration, making large-scale data imports more reliable and efficient.