Tutorial8 min read

How to Create Sample Files for Testing

A comprehensive guide to creating test files using different methods—from command line tools to browser-based generators.

💡 Expert Tip: Storage Truth: A '1GB' file generated on MacOS (Base 10) is 1,000,000,000 bytes, while Windows (Base 2) interprets that same exact byte count as ~953MB.

Why You Need Sample Files

Whether you're testing a file upload feature, validating an import function, or measuring network transfer speeds, having reliable sample files is essential. The challenge is creating files that match your specific requirements without wasting time on manual preparation.

This guide covers the primary methods for creating test files, their advantages and limitations, and when to use each approach.

Method 1: Command Line Tools

Operating systems provide built-in utilities for creating files of specific sizes. These are useful when you need quick files without leaving your terminal.

Windows (fsutil)

The fsutil command creates files of a specified size in bytes. This requires administrator privileges.

fsutil file createnew testfile.bin 10485760

This creates a 10MB file. Note that files created this way may be sparse files (containing mostly zeros) depending on your file system.

PowerShell

PowerShell offers alternative methods for creating files:

$bytes = New-Object byte[] 10MB
[System.IO.File]::WriteAllBytes("testfile.bin", $bytes)

Linux/macOS (dd command)

The dd command is commonly used on Unix-like systems:

dd if=/dev/zero of=testfile.bin bs=1M count=10

For random data instead of zeros, use /dev/urandom:

dd if=/dev/urandom of=testfile.bin bs=1M count=10

Limitations of Command Line Methods

Creates only binary files (not structured data like CSV or JSON)
Requires terminal access and command familiarity
May need administrator privileges
Less convenient for quick, one-off file creation

Method 2: Programming Libraries

For more control over file content, developers often use programming libraries to generate structured test data.

Faker Libraries

Faker libraries exist for most programming languages and generate realistic-looking data:

JavaScript: @faker-js/faker
Python: Faker
Java: JavaFaker
Ruby: Faker
PHP: Faker

These libraries can generate names, emails, addresses, phone numbers, and many other data types in realistic formats.

Example: Creating a CSV with Python Faker

from faker import Faker
import csv

fake = Faker()

with open('test_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Email', 'Phone'])
    for _ in range(1000):
        writer.writerow([
            fake.name(),
            fake.email(),
            fake.phone_number()
        ])

Advantages of Programming Libraries

Full control over data structure and content
Can generate complex, nested data
Reproducible with seed values
Integrates into automated test suites

Limitations

Requires programming knowledge
Setup time for simple one-off needs
Dependency management in projects

Method 3: Browser-Based Generators

Browser-based file generators offer the convenience of creating files instantly without installing software or writing code. Modern tools run entirely in your browser using JavaScript, meaning files are created locally on your device.

Advantages

No installation required
Works on any device with a browser
Instant file creation
Privacy-friendly when processing is client-side
Multiple file formats in one tool

Common Formats Available

Binary files: For size testing and bandwidth measurement
CSV/JSON: For data import and API testing
PDF: For document processing validation
Images: For media handling tests

Choosing the Right Method

The best method depends on your specific situation:

Quick binary files: Command line tools or browser generators
Structured test data: Faker libraries or browser generators
Automated test suites: Programming libraries with version control
One-off testing: Browser-based generators for convenience

File Size Considerations

When creating sample files, consider the size limits of your testing environment:

Browser-based tools: Typically limited by available memory (often 100-500MB)
Command line tools: Limited by disk space
Mobile browsers: More restrictive memory limits (often under 100MB)

For very large files (multiple gigabytes), command line tools or dedicated software may be more reliable than browser-based solutions.

Best Practices for Test Files

Document your test data: Record how files were generated
Use reproducible methods: Seed random generators for consistency
Include edge cases: Special characters, maximum lengths, boundary values
Match production patterns: Use realistic data distributions
Clean up after tests: Remove generated files to avoid disk bloat

Conclusion

Creating sample files for testing doesn't have to be complicated. Command line tools work well for simple binary files, programming libraries offer the most control for complex data, and browser-based generators provide the fastest path to usable test files without any setup.

Choose the method that fits your workflow, and keep your test files organized for future use.