[ElasticSearch]Modifying ES Mapping

Having used MySQL-like databases for so long, one day I ran into a major pitfall with ES.
When posting the mapping, I didn’t properly distinguish between “analyzed” and “not_analyzed” for a field, which accidentally caused all data in that column to be tokenized. The dataset had approximately 150 million records.
I naively thought I could just modify the field attributes like in MySQL. ES is based on Lucene, and there’s no other way — in simple terms, you either delete the index and re-import, or reindex. Reindexing means creating a new index and copying the data from the old index over. There are many tutorials online for this, for example:
http://blog.csdn.net/loveyaqin1990/article/details/77684599
https://www.cnblogs.com/wmx3ng/p/4112993.html

As for actual implementation code, there’s very little available online. After a long search, I only found a Python implementation. This article provides a migration implementation based on the official ES PHP SDK and bulk API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
<?php
require 'vendor/autoload.php';
$hosts['hosts'] = array(
    "host" => '127.0.0.1',
    "port" => '9200',
    'scheme' => 'http'
);
$client = Elasticsearch\ClientBuilder::create()
            ->setSSLVerification(false)
            ->setHosts($hosts)
            ->build();
for ($i = 1; $i <= 10; $i++) {
    if ($i != 10) { 
        $params['index'] = 'index-0'.$i;
    }
    else {
        $params['index'] = 'index-'.$i;
    }
    echo $params["index"]."\r\n";
    $params['type']  = 'raw';
    $params['scroll']  = '120s';
    $params["size"] = 50000;
    $params["body"] = array(
        "query" => array(
            "match_all" => array()
        )
    );
    $response = $client->search($params);
    $step = 1;
    while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {
        echo $step++."\t";
        $scroll_id = $response['_scroll_id']; 
        unset($response);
        $response = $client->scroll(
            array(
                "scroll_id" => $scroll_id,
                "scroll" => "120s" 
            )
        ); 
        if (count($response['hits']['hits']) > 0) {
            $bulk = array('index'=>$params['index']."-reindex",'type'=>$params['type']);
            foreach ($response["hits"]["hits"] as $key=>$val) {
                $bulk['body'][]=array(
                    'index' => array(
                            '_id'=>$val['_id']
                        ),
                );
                $bulk['body'][] = $val['_source'];
            }
            // insert reindex
            $res = $client->bulk($bulk);
            unset($bulk);
        } else {
            break;
        }
    }
}
?>