[PHP] Large-Scale Data Import and Export with ThinkPHP, PHP, and MySQL

Large-Scale Data Import and Export

Note: “Large-scale” here refers to importing and exporting 100,000+ records from text files (including Office documents such as docx, xlsx).

MySQL has two very common table types: MyISAM and InnoDB. There are many comparisons of these two types online. The main difference is that InnoDB supports transactions and row-level locking, while MyISAM does not. MyISAM is relatively more efficient, and each has its own advantages and disadvantages.

Use Cases for MyISAM and InnoDB

MyISAM is suitable for:

  1. Frequent COUNT calculations;
  2. Infrequent inserts but frequent queries;
  3. No transactions needed.

InnoDB is suitable for:

  1. High reliability requirements or transaction support needed;
  2. Frequent table updates and queries with high likelihood of table locking.

Differences Between MyISAM and InnoDB

  1. MyISAM does not support advanced features like transactions, while InnoDB does.
  2. MyISAM tables do not support foreign keys.
  3. During write operations (insert, update, delete), MyISAM locks the entire table, while InnoDB locks only the affected rows.
  4. When your database has heavy writes, updates, and deletions with relatively few queries, or when data integrity requirements are high, choose InnoDB. When your database is primarily read-heavy with relatively few updates and writes, and data integrity requirements are less strict, choose MyISAM, as its query performance is faster than InnoDB.

In summary, InnoDB is a good choice for this scenario.

Optimization Methods for Inserting Large Amounts of Data

When PHP needs to insert large amounts of data, inserting one record at a time results in severe I/O performance bottlenecks.

Demo Code:

1
2
3
4
5
6
7
$params = array('value' => '50');
set_time_limit(0);
echo date("H:i:s");
for($i = 0; $i < 2000000; $i++) {
    $connect_mysql->insert($params);
}
echo date("H:i:s");

There are two methods to solve this problem:

1. Using Transactions (InnoDB supported)

For example, you can commit a transaction every 10,000 or 100,000 records, which greatly reduces I/O.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
echo date("H:i:s");
$connect_mysql->query('BEGIN');
$params = array('value' => '50');
for($i = 0; $i < 2000000; $i++) {
    $connect_mysql->insert($params);
    if($i % 100000 == 0) {
        $connect_mysql->query('COMMIT');
        $connect_mysql->query('BEGIN');
    }
}
$connect_mysql->query('COMMIT');
echo date("H:i:s");

2. SQL Statement Concatenation

This is a MySQL feature that uses optimized SQL statements: concatenate SQL statements using insert into table () values (),(),(),() and insert them all at once. If the string is too long, you need to configure MySQL by running:

1
SET GLOBAL max_allowed_packet = 2 * 1024 * 1024 * 10;
1
2
3
4
5
6
$sql = "INSERT INTO twenty_million (value) VALUES";
for($i = 0; $i < 2000000; $i++) {
    $sql .= "('50'),";
}
$sql = substr($sql, 0, strlen($sql) - 1);
$connect_mysql->query($sql);

The first method is more commonly used, while the second is mostly for test data.

By default, PHP has a memory limit of 128MB. When ThinkPHP exceeds this memory limit, you can either increase PHP’s default maximum memory or simply remove the APP_DEBUG constant definition in the entry file or set it to false. This will save a lot of memory. Of course, you can also use transaction mode to commit every n records. ThinkPHP’s addAll function can also be used for batch inserts, though it has quite a few bugs.

Starting and Committing Transactions in ThinkPHP

1
2
$table->startTrans();
$table->commit();

When working with Office files, PHPWord and PHPExcel are recommended.