使用Clojure删除特定的XML节点

我具有以下XML结构:

(def xmlstr
"<ROOT>
  <Items>
    <Item><Type>A</Type><Note>AA</Note></Item>
    <Item><Type>B</Type><Note>BB</Note></Item>
    <Item><Type>C</Type><Note>CC</Note></Item>
    <Item><Type>A</Type><Note>AA</Note></Item>
  </Items>
</ROOT>")

如果要删除任何具有B型或C型的项目,结果应为:

<ROOT>
  <Items>
    <Item><Type>A</Type><Note>AA</Note></Item>
    <Item><Type>A</Type><Note>AA</Note></Item>
  </Items>
</ROOT>

我发现使用data.xml和data.xml.zip查询此类结构非常简单,例如:

;; lein try org.clojure/data.xml org.clojure/data.zip
(def xmldoc (clojure.data.xml/parse-str xmlstr))
(def zipxml (clojure.zip/xml-zip xmldoc))

(clojure.data.zip.xml/xml-> zipxml :Items :Item [:Type "A"] :Note clojure.data.zip.xml/text)
;; => ("AA" "AA")

但找不到用于删除/编辑子级的类似声明性功能.

最佳答案
The Tupelo library可以使用tupelo.forest轻松解决此问题.您可以找到the API docs on GitHub Pages.以下是使用您的示例的测试案例.

在这里,我们加载您的xml数据,并将其首先转换为enlive,然后将其转换为tupelo.forest使用的本机树结构:

(ns tst.tupelo.forest-examples
  (:use tupelo.forest tupelo.test )
  (:require
    [clojure.data.xml :as dx]
    [clojure.java.io :as io]
    [clojure.set :as cs]
    [net.cgrand.enlive-html :as en-html]
    [schema.core :as s]
    [tupelo.core :as t]
    [tupelo.string :as ts]))
(t/refer-tupelo)

; Discard any xml nodes of Type="A" or Type="B" (plus blank string nodes)
(dotest
  (with-forest (new-forest)
    (let [xml-str         "<ROOT>
                            <Items>
                              <Item><Type>A</Type><Note>AA1</Note></Item>
                              <Item><Type>B</Type><Note>BB1</Note></Item>
                              <Item><Type>C</Type><Note>CC1</Note></Item>
                              <Item><Type>A</Type><Note>AA2</Note></Item>
                            </Items>
                          </ROOT>"
          enlive-tree     (->> xml-str
                            java.io.StringReader.
                            en-html/html-resource
                            first)
          root-hid        (add-tree-enlive enlive-tree)
          tree-1          (hid->tree root-hid)

隐藏的后缀代表“十六进制ID”,它是唯一的十六进制值,其作用类似于指向树中节点/叶的指针.在这一阶段,我们刚刚将数据加载到森林数据结构中,创建了tree-1,如下所示:

 (is= tree-1
   {:attrs {:tag :ROOT},
    :kids  [{:attrs {:tag :tupelo.forest/raw},
             :value "\n                            "}
            {:attrs {:tag :Items},
             :kids  [{:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "A"}
                              {:attrs {:tag :Note}, :value "AA1"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "B"}
                              {:attrs {:tag :Note}, :value "BB1"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "C"}
                              {:attrs {:tag :Note}, :value "CC1"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                              "}
                     {:attrs {:tag :Item},
                      :kids  [{:attrs {:tag :Type}, :value "A"}
                              {:attrs {:tag :Note}, :value "AA2"}]}
                     {:attrs {:tag :tupelo.forest/raw},
                      :value "\n                            "}]}
            {:attrs {:tag :tupelo.forest/raw},
             :value "\n                          "}]})

接下来,我们使用以下代码删除所有空白字符串:

blank-leaf-hid? (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node
                            (let [value (hid->value hid)]
                              (and (string? value)
                                (or (zero? (count value)) ; empty string
                                  (ts/whitespace? value)))))) ; all whitespace string

blank-leaf-hids (keep-if blank-leaf-hid? (all-hids))
>>              (apply remove-hid blank-leaf-hids)
tree-2          (hid->tree root-hid)

产生看起来更整洁的tree-2:

(is= tree-2
  {:attrs {:tag :ROOT},
   :kids  [{:attrs {:tag :Items},
            :kids  [{:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "B"}
                             {:attrs {:tag :Note}, :value "BB1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "C"}
                             {:attrs {:tag :Note}, :value "CC1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA2"}]}]}]})

最终的代码片段删除Type =“ B”或Type =“ C”节点:

type-bc-hid?    (fn [hid] (pos? (count (glue
                            (find-leaf-hids hid [:** :Type] "B")
                            (find-leaf-hids hid [:** :Type] "C")))))

type-bc-hids    (find-hids-with root-hid [:** :Item] type-bc-hid?)
>>              (apply remove-hid type-bc-hids)
tree-3          (hid->tree root-hid)
tree-3-hiccup   (hid->hiccup root-hid) ]

产生以树格式和打ic格式显示的最终结果树:

(is= tree-3
  {:attrs {:tag :ROOT},
   :kids
          [{:attrs {:tag :Items},
            :kids  [{:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA1"}]}
                    {:attrs {:tag :Item},
                     :kids  [{:attrs {:tag :Type}, :value "A"}
                             {:attrs {:tag :Note}, :value "AA2"}]}]}]})
(is= tree-3-hiccup
  [:ROOT
   [:Items
    [:Item [:Type "A"] [:Note "AA1"]]
    [:Item [:Type "A"] [:Note "AA2"]]]]))))

完整示例参见in the forest-examples unit test.

更新

这是最紧凑的版本,其中删除了其他功能:

(dotest
  (with-forest (new-forest)
    (let [xml-str         "<ROOT>
                            <Items>
                              <Item><Type>A</Type><Note>AA1</Note></Item>
                              <Item><Type>B</Type><Note>BB1</Note></Item>
                              <Item><Type>C</Type><Note>CC1</Note></Item>
                              <Item><Type>A</Type><Note>AA2</Note></Item>
                            </Items>
                          </ROOT>"
          enlive-tree     (->> xml-str
                            java.io.StringReader.
                            en-html/xml-resource
                            first)
          root-hid        (add-tree-enlive enlive-tree)
          blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid)))
          has-bc-leaf?    (fn [hid] (or (has-child-leaf? hid [:** :Type] "B")
                                        (has-child-leaf? hid [:** :Type] "C")))
          blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids))
          >>              (apply remove-hid blank-leaf-hids)
          bc-item-hids    (find-hids-with root-hid [:** :Item] has-bc-leaf?)]
      (apply remove-hid bc-item-hids)
      (is= (hid->hiccup root-hid)
        [:ROOT
         [:Items
          [:Item [:Type "A"] [:Note "AA1"]]
          [:Item [:Type "A"] [:Note "AA2"]]]]))))

转载注明原文:使用Clojure删除特定的XML节点 - 代码日志