blobforge
← Back to Blog
Tutorial8 min read

How to Create Sample Files for Testing

A comprehensive guide to creating test files using different methods—from command line tools to browser-based generators.

💡 Expert Tip: Storage Truth: A '1GB' file generated on MacOS (Base 10) is 1,000,000,000 bytes, while Windows (Base 2) interprets that same exact byte count as ~953MB.

Why You Need Sample Files

Whether you're testing a file upload feature, validating an import function, or measuring network transfer speeds, having reliable sample files is essential. The challenge is creating files that match your specific requirements without wasting time on manual preparation.

This guide covers the primary methods for creating test files, their advantages and limitations, and when to use each approach.

Method 1: Command Line Tools

Operating systems provide built-in utilities for creating files of specific sizes. These are useful when you need quick files without leaving your terminal.

Windows (fsutil)

The fsutil command creates files of a specified size in bytes. This requires administrator privileges.

fsutil file createnew testfile.bin 10485760

This creates a 10MB file. Note that files created this way may be sparse files (containing mostly zeros) depending on your file system.

PowerShell

PowerShell offers alternative methods for creating files:

$bytes = New-Object byte[] 10MB
[System.IO.File]::WriteAllBytes("testfile.bin", $bytes)

Linux/macOS (dd command)

The dd command is commonly used on Unix-like systems:

dd if=/dev/zero of=testfile.bin bs=1M count=10

For random data instead of zeros, use /dev/urandom:

dd if=/dev/urandom of=testfile.bin bs=1M count=10

Limitations of Command Line Methods

  • Creates only binary files (not structured data like CSV or JSON)
  • Requires terminal access and command familiarity
  • May need administrator privileges
  • Less convenient for quick, one-off file creation

Method 2: Programming Libraries

For more control over file content, developers often use programming libraries to generate structured test data.

Faker Libraries

Faker libraries exist for most programming languages and generate realistic-looking data:

  • JavaScript: @faker-js/faker
  • Python: Faker
  • Java: JavaFaker
  • Ruby: Faker
  • PHP: Faker

These libraries can generate names, emails, addresses, phone numbers, and many other data types in realistic formats.

Example: Creating a CSV with Python Faker

from faker import Faker
import csv

fake = Faker()

with open('test_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Email', 'Phone'])
    for _ in range(1000):
        writer.writerow([
            fake.name(),
            fake.email(),
            fake.phone_number()
        ])

Advantages of Programming Libraries

  • Full control over data structure and content
  • Can generate complex, nested data
  • Reproducible with seed values
  • Integrates into automated test suites

Limitations

  • Requires programming knowledge
  • Setup time for simple one-off needs
  • Dependency management in projects

Method 3: Browser-Based Generators

Browser-based file generators offer the convenience of creating files instantly without installing software or writing code. Modern tools run entirely in your browser using JavaScript, meaning files are created locally on your device.

Advantages

  • No installation required
  • Works on any device with a browser
  • Instant file creation
  • Privacy-friendly when processing is client-side
  • Multiple file formats in one tool

Common Formats Available

  • Binary files: For size testing and bandwidth measurement
  • CSV/JSON: For data import and API testing
  • PDF: For document processing validation
  • Images: For media handling tests

Choosing the Right Method

The best method depends on your specific situation:

  • Quick binary files: Command line tools or browser generators
  • Structured test data: Faker libraries or browser generators
  • Automated test suites: Programming libraries with version control
  • One-off testing: Browser-based generators for convenience

File Size Considerations

When creating sample files, consider the size limits of your testing environment:

  • Browser-based tools: Typically limited by available memory (often 100-500MB)
  • Command line tools: Limited by disk space
  • Mobile browsers: More restrictive memory limits (often under 100MB)

For very large files (multiple gigabytes), command line tools or dedicated software may be more reliable than browser-based solutions.

Best Practices for Test Files

  • Document your test data: Record how files were generated
  • Use reproducible methods: Seed random generators for consistency
  • Include edge cases: Special characters, maximum lengths, boundary values
  • Match production patterns: Use realistic data distributions
  • Clean up after tests: Remove generated files to avoid disk bloat

Conclusion

Creating sample files for testing doesn't have to be complicated. Command line tools work well for simple binary files, programming libraries offer the most control for complex data, and browser-based generators provide the fastest path to usable test files without any setup.

Choose the method that fits your workflow, and keep your test files organized for future use.