Monday, January 23, 2012

HTML escaping in the XML processing


The entry because there was a problem parsing the XML. HTML escape characters were included in the data obtained from OwnCloud. Therefore, it is the error occurred in retrieving the data.

<artist id='2'>
  <name>Die &Auml;rzte</name>
  <albums>1</albums>
  <songs>13</songs>
  <rating>0</rating>
  <preciserating>0</preciserating>
</artist>

&Auml; :This is an escaped string

There was an error reading this XML

  
01-23 17:40:16.515: E/AmpacheProvider(13269): XMLHandler.XmlPullParserException
01-23 17:40:16.515: E/AmpacheProvider(13269): org.xmlpull.v1.XmlPullParserException: unresolved: Ä (position:TEXT @147:28 in java.io.StringReader@413dfa78) 
01-23 17:40:16.515: E/AmpacheProvider(13269):  at org.kxml2.io.KXmlParser.checkRelaxed(KXmlParser.java:302)
01-23 17:40:16.515: E/AmpacheProvider(13269):  at org.kxml2.io.KXmlParser.readEntity(KXmlParser.java:1268)
01-23 17:40:16.515: E/AmpacheProvider(13269):  at org.kxml2.io.KXmlParser.readValue(KXmlParser.java:1385)
01-23 17:40:16.515: E/AmpacheProvider(13269):  at org.kxml2.io.KXmlParser.next(KXmlParser.java:390)
01-23 17:40:16.515: E/AmpacheProvider(13269):  at org.kxml2.io.KXmlParser.next(KXmlParser.java:310)
01-23 17:40:16.515: E/AmpacheProvider(13269):  at org.kxml2.io.KXmlParser.nextText(KXmlParser.java:2056)
01-23 17:40:16.515: E/AmpacheProvider(13269):  at jp.co.kayo.android.localplayer.ds.ampache.util.XMLUtils.getTextValue(XMLUtils.java:26)

Reading resulted in an error because it was determined that the data is stored in the form of ENTITY_REF type instead of TEXT type.
Actual processing are as follows.

public String getTextValue(XmlPullParser parser){
        try {
            return parser.nextText();
        } catch (XmlPullParserException e) {
            Logger.e("XMLHandler.XmlPullParserException", e);
        } catch (IOException e) {
            Logger.e("XMLHandler.IOException", e);
        }
        return "";
}

Therefore, I tried replacing the nextToken nextText to the exact process.

        StringBuilder buf = new StringBuilder();
        do{
                int type = parser.nextToken();
                
                if(type == XmlPullParser.TEXT || type == XmlPullParser.CDSECT){
                    buf.append(parser.getText());
                }
                else if(type == XmlPullParser.ENTITY_REF){
                    //empty
                }
                else{
                    break;
                }
        }while(true);
        return buf.toString();


I used the nextToken to avoid the checking process.
And the process has been divided into multiple combining text

This process is now as follows:


Die &Auml;rzte  ->  Die rzte


But this is not enough.
Added the following code to restore the escaped string.

        StringBuilder buf = new StringBuilder();
        do{
                int type = parser.nextToken();
                
                if(type == XmlPullParser.TEXT || type == XmlPullParser.CDSECT){
                    buf.append(parser.getText());
                }
                else if(type == XmlPullParser.ENTITY_REF){
                    buf.append(Html.fromHtml("&"+parser.getName()+";").toString());
                }
                else{
                    break;
                }
        }while(true);
        return buf.toString();


Die &Auml;rzte  ->  Die Ärzte


This is not smart. Works as intended But.
If there is a correct implementation for ENTITY_REF. I want to know

This code is not going to release

No comments:

Post a Comment