首页 » 大数据 » 正文

[ElasticSearch]ES修改mapping

用太多的MySQL这样的数据库了,直到有一天,用了ES遇到一个大坑。
就是post mapping的时候有一个“字段”analyzed 和 not_analyzed没区分好,一时失误导致该列所有数据全部分词了。数据量大概1.5亿条。
天真的以为能够像MySQL那样修改一下字段的属性即可。ES是基于Lucene的,没有别的办法,通俗一点讲,要么删除索引,重行导入,要么reindex。所谓的reindex就是建立一个新的index,把旧index的数据拷贝过去。这样的教程网上很多。比如:
http://blog.csdn.net/loveyaqin1990/article/details/77684599
https://www.cnblogs.com/wmx3ng/p/4112993.html

目前网上来讲,具体实现代码很少,我找了好久只找到了Python的实现。本文基于ES官方代码的PHP SDK和bulk有一个迁移实现。

<?php
require 'vendor/autoload.php';
$hosts['hosts'] = array(
    "host"=>'127.0.0.1',
    "port"=>'9200',
    'scheme' => 'http'
);
$client = Elasticsearch\ClientBuilder::create()
            ->setSSLVerification(false)
            ->setHosts($hosts)
            ->build();
for ($i = 1; $i <= 10; $i++) {
    if ($i != 10) { 
        $params['index'] = 'index-0'.$i;<br />
    }
    else {
        $params['index'] = 'index-'.$i;<br />
    }
    echo $params[&quot;index&quot;].&quot;\r\n&quot;;
    $params['type']  = 'raw';<br />
    $params['scroll']  = '120s';
    $params[&quot;size&quot;] = 50000;
    $params[&quot;body&quot;] = array(<br />
        &quot;query&quot; =&gt; array(<br />
            &quot;match_all&quot; =&gt; array()<br />
        )<br />
    );
    $response = $client-&gt;search($params);
    $step = 1;
    while (isset($response['hits']['hits']) &amp;&amp; count($response['hits']['hits']) &gt; 0) {
        echo $step++.&quot;\t&quot;;
        $scroll_id = $response['_scroll_id']; 
        unset($response);
        $response = $client-&gt;scroll(<br />
            array(<br />
                &quot;scroll_id&quot; =&gt; $scroll_id,
                &quot;scroll&quot; =&gt; &quot;120s&quot; 
            )<br />
        ); 
        if (count($response['hits']['hits']) &gt; 0) {<br />
            $bulk = array('index'=&gt;$params['index'].&quot;-reindex&quot;,'type'=&gt;$params['type']);
            foreach ($response[&quot;hits&quot;][&quot;hits&quot;] as $key=&gt;$val) {
                $bulk['body'][]=array(<br />
                    'index' =&gt; array(<br />
                            '_id'=&gt;$val['_id']
                        ),<br />
                );<br />
                $bulk['body'][] = $val['_source'];
            }
            // insert reindex
            $res = $client-&gt;bulk($bulk);
            unset($bulk);
        } else {<br />
            break;<br />
        }<br />
    }<br />
}<br />

发表评论