etree.strip_tags() does not remove all instances of a defined tag

Asked by phatfish

I'm a little hesitant to file a bug report since this seems a pretty obvious flaw with strip_tags(), and its quite likely i'm doing something wrong.

I have the following python code on lxml 2.2.2 with libxml 2.7.6 on FreeBSD 7.2:

from lxml import etree
html = """
<div>
 <div>
  I like <strong>beer</strong>.
  <br/>
  I like lots of <strong>beer</strong>.
  <br/>
  Click <a href="www.beer.com">here</a> for <a href="www.beer.com">this</a> beer.
  <br/>
 </div>
</div>
"""
element = etree.fromstring(html)
etree.strip_tags(element, 'a','br')
print etree.tostring(element)

which prints:

<div>
        <div>
                I like <strong>beer</strong>.

                I like lots of <strong>beer</strong>.

                Click here for <a href="www.beer.com">this</a> beer.
                <br/>
        </div>
</div>

I would expect *all* the "br" and "a" tags to be removed.

Another example, use "etree.strip_tags(element, 'strong','br')", you get this output:

<div>
        <div>
                I like beer.
                <br/>
                I like lots of <strong>beer</strong>.
                <br/>
                Click <a href="www.beer.com">here</a> for <a href="www.beer.com">this</a> beer.
                <br/>
        </div>
</div>

Again i would expect all the defined tags to be stripped. Surely this can't be correct behavior?

Thanks

Question information

Language:
English Edit question
Status:
Solved
For:
lxml Edit question
Assignee:
No assignee Edit question
Solved by:
phatfish
Solved:
Last query:
Last reply:
Revision history for this message
scoder (scoder) said :
#1

It's clearly misbehaving here, so this is a bug. Thanks for the report.
Could you still file a real bug report, please? Thanks!

Stefan

Revision history for this message
phatfish (phatfish) said :
#2

Sure, bug report created, ill mark this as answered.

I noticed as i was searching that the strip_tags/strip_elements helper functions were only added in 2.2.2. Thanks for adding that functionality, its very helpful.