包含中国省、市、县、街道、村/居委会5级数据(港、澳、台除外(统计局官网没有给出)),数据来源为国家统计局。
程序基于tp3.2.3编写,依赖Querylist3。
主要代码:
/**
* @author Adam
* Time: 2018/6/15 14:50
*/
public function sheng()
{
$data = QueryList::Query($this->baseurl . "index.html", array("url" => array("td>a ", "href"), "name" => array("td>a", "text")), "", 'UTF-8', 'GB2312')->getData();
foreach ($data as $k => $v) {
$data[$k]["url"] = str_replace(".html", "", $v["url"]);
$data[$k]["saveid"] = $this->generate_full_id($data[$k]["url"]);
$res = $this->areamodel->where(array("area_id" => $data[$k]["url"]))->find();
if (empty($res)) {
$this->areamodel->data(array("area_id" => $data[$k]["url"], "area_pid" => 0, "area_name" => $data[$k]["name"], "level" => 1))->add();
}
}
echo json_encode($data);
die;
}
/**
* 二级 市 http://www.stats.gov.cn/tjsj/tjbz/tjyqhdmhcxhfdm/2016/14.html
* @author Adam
* Time: 2018/6/20 13:43
*/
public function secondway()
{
$toplevel = $this->areamodel->where(array("imported" => 0, "level" => 1))->order("area_id DESC")->find();
if (empty($toplevel)) {
echo "全部导入完毕,无需重新导入";
die;
}
$fatherid = $toplevel["area_id"];
$url = $this->baseurl . $toplevel["area_id"] . ".html";
$data = QueryList::Query($url, array("url" => array("td:eq(0)>a", "href"), "name" => array("td:eq(1)>a", "text")), "tr.citytr", 'UTF-8', 'GB2312')->getData();
foreach ($data as $k => $v) {
$data[$k]["url"] = str_replace(".html", "", $this->reg_real_id($v["url"])[0]);
}
M()->startTrans();
foreach ($data as $k1 => $v1) {
$res["area_id"] = $v1["url"];
$res["area_pid"] = $toplevel["area_id"];
$res["level"] = 2;
$res["area_name"] = $v1["name"];
if (!$this->areamodel->data($res)->add()) {
echo "循环导入失败" . M()->getLastSql();
die;
M()->rollback();
}
}
$this->areamodel->where(array("area_id" => $fatherid))->save(array("imported" => 1));
$sql = M()->getLastSql();
M()->commit();
echo "导入成功 " . $sql;
die;
}
/**
* 三级 区/县(市)
* @author Adam
* Time: 2018/6/21 15:34
*/
public function thirdway()
{
$secondlevel = $this->areamodel->where(array("imported" => 0, "level" => 2))->order("area_id DESC")->find();
if (empty($secondlevel)) {
echo "区/县全部导入完毕,无需重新导入";
die;
}
$topid = $secondlevel["area_pid"];
$fatherid = $secondlevel["area_id"];
$url = $this->baseurl . $topid . "/" . $fatherid . ".html";
$data = QueryList::Query($url, array("url" => array("td:eq(0)>a", "href"), "name" => array("td:eq(1)", "text"), "extra" => array("td:eq(0)", "text")), "tr.countytr", 'UTF-8', 'GB2312')->getData();
foreach ($data as $k => $v) {
$data[$k]["url"] = str_replace(".html", "", $this->reg_real_id($v["url"])[0]);
if (empty($data[$k]["url"])) {
$data[$k]["url"] = $this->spit_pure_id($v["extra"]); //防止空数据
}
}
// echo json_encode($data);die;
M()->startTrans();
foreach ($data as $k => $v) {
$res["area_id"] = $v["url"];
$res["area_pid"] = $secondlevel["area_id"];
$res["level"] = 3;
$res["area_name"] = $v["name"];
if (!$this->areamodel->data($res)->add()) {
echo "循环导入失败" . M()->getLastSql();
die;
M()->rollback();
}
}
if ($this->areamodel->where(array("area_id" => $fatherid))->save(array("imported" => 1)) == false) {
echo "更新父类失败";
M()->rollback();
die;
};
$sql = M()->getLastSql();
M()->commit();
echo "导入成功" . $sql;
die;
}
/**四级 办事处/镇
* @author Adam
* Time: 2018/6/21 15:35
*/
public function fourthway()
{
$thirdlevel = $this->areamodel->where(array("imported" => 0, "level" => 3))->order("area_id DESC")->find();
if (empty($thirdlevel)) {
echo "办事处/镇全部导入完毕,无需重新导入";
die;
}
$fatherid = $thirdlevel["area_id"];
$secondid = $thirdlevel["area_pid"];
$secondmeta = $this->areamodel->where(array("area_id" => $secondid))->find();
$topid = $secondmeta["area_pid"];
$url = $this->baseurl . $topid . "/" . str_replace($topid, "", $secondid) . "/" . $fatherid . ".html";
$data = QueryList::Query($url, array("url" => array("td:eq(0)>a", "href"), "name" => array("td:eq(1)", "text"), "extra" => array("td:eq(0)", "text")), "tr.towntr", 'UTF-8', 'GB2312')->getData();
foreach ($data as $k => $v) {
$data[$k]["url"] = str_replace(".html", "", $this->reg_real_id($v["url"])[0]);
if (empty($data[$k]["url"])) {
$data[$k]["url"] = $this->spit_pure_id($v["extra"]); //防止空数据
}
}
M()->startTrans();
foreach ($data as $k => $v) {
$res["area_id"] = $v["url"];
$res["area_pid"] = $thirdlevel["area_id"];
$res["level"] = 4;
$res["area_name"] = $v["name"];
if (!$this->areamodel->data($res)->add()) {
echo "循环导入失败" . M()->getLastSql();
die;
M()->rollback();
}
}
if ($this->areamodel->where(array("area_id" => $fatherid))->save(array("imported" => 1)) == false) {
echo "更新父类失败";
M()->rollback();
die;
};
$sql = M()->getLastSql();
M()->commit();
echo "导入成功" . $sql;
die;
}
/**五级 居委会/村委会
* @author Adam
* Time: 2018/6/21 15:37
*/
public function fifthway()
{
$fourthlevel = $this->areamodel->where(array("imported" => 0, "level" => 4))->order("area_id DESC")->find();
if (empty($fourthlevel)) {
echo "办事处/镇全部导入完毕,无需重新导入";
die;
}
$fatherid = $fourthlevel["area_id"];
$thirdid = $fourthlevel["area_pid"];
$thirdmeta = $this->areamodel->where(array("area_id" => $thirdid))->find();
$secondid = $thirdmeta["area_pid"];
$secondmeta = $this->areamodel->where(array("area_id" => $secondid))->find();
$topid = $secondmeta["area_pid"];
$url = $this->baseurl . $topid . "/" . str_replace($topid, "", $secondid) . "/" . str_replace($secondid, "", $thirdid) . "/" . $fatherid . ".html";
$data = QueryList::Query($url, array("url" => array("td:eq(0)", "text"), "name" => array("td:eq(2)", "text"), "extra" => array("td:eq(0)", "text")), "tr.villagetr", 'UTF-8', 'GB2312')->getData();
foreach ($data as $k => $v) {
$data[$k]["url"] = str_replace(".html", "", $this->reg_real_id($v["url"])[0]);
if (empty($data[$k]["url"])) {
$data[$k]["url"] = $this->spit_pure_id($v["extra"]); //防止空数据
}
}
M()->startTrans();
foreach ($data as $k => $v) {
$res["area_id"] = $v["url"];
$res["area_pid"] = $fourthlevel["area_id"];
$res["level"] = 5;
$res["area_name"] = $v["name"];
if (!$this->areamodel->data($res)->add()) {
echo "循环导入失败111" . M()->getLastSql();
die;
M()->rollback();
}
}
if ($this->areamodel->where(array("area_id" => $fatherid))->save(array("imported" => 1)) == false) {
echo "更新父类失败";
M()->rollback();
die;
};
$sql = M()->getLastSql();
M()->commit();
echo "导入成功" . $sql;
die;
}
/**
* 生成完整的12位id
* @param $id
* @return string
* @author Adam
* Time: 2018/6/20 17:19
*/
private function generate_full_id($id)
{
if (strlen($id) > 12) {
return $id;
} else {
$needzerecount = 12 - strlen($id);
for ($i = 0; $i < $needzerecount; $i++) {
$id .= "0";
}
return $id;
}
}
/**
* 去除字符转后面的0
* @param $id
* @return null|string|string[]
* @author Adam
* Time: 2018/6/20 13:02
*/
private function spit_pure_id($id)
{
$matches = preg_replace("/0+$/", "", $id);
return $matches;
}
/**
* 正则匹配id
* @param $id
* @return mixed
* @author Adam
* Time: 2018/6/20 17:22
*/
public function reg_real_id($id)
{
preg_match("/\d+.html$/", $id, $matches);
return $matches;
}github地址: https://github.com/adamin1990/China_locations_spider_database 包含程序源代码和采集的五级国家省市区数据库文件。
本文为Adamin90原创文章,转载无需和我联系,但请注明来自http://www.lixiaopeng.top
sunnier:
啥子东西这是
2018-06-22 17:32:24 回复