Oct 24 2009

SingleFeed Export Module for Magento – How to strip HTML tags and special characters

Tested on:
SingleFeed Export Module v1.1.0
Magento v1.3.2.4

So I found this great module for Magento the other day, made by SingleFeed. It will export a product data feed every night. Then, we can import this feed into things like Google Base.

I ran into a problem though. You can’t import a data feed into Google Base that contains HTML tags and special characters. Most of the clients I work with prefer to have a WYSIWYG editor for things like the CMS pages, product descriptions, etc., which will add HTML formatting in the database. The SingleFeed Export Module does not automatically strip HTML on the fly (however I believe if you sign-up for an account at their website, they have a wizard that can do it for you.)

I poked around at the code for a bit and discovered that stripping the HTML tags and special chars would be quite easy using two functions: strip_tags, htmlspecialchars_decode.

Open app/code/community/SingleFeed/Export/Model/Mysql4/Profile.php and goto line 360:

360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
// format product data as needed
foreach ($products as $id=>&$p) {
  foreach ($p as $attr=>&$value) {
    // replace raw numeric values with source option labels
    if ($options = $this->attr($attr, 'options')) {
      if (is_array($value)) {
        foreach ($value as &$v) {
          $v = isset($options[$v]) ? $options[$v] : '';
        }
      } else {
        $value = isset($options[$value]) ? $options[$value] : '';
      }
    }
    // combine multiselect values
    if (is_array($value)) {
      $value = join(', ', $value);
    }
    // process special cases of loaded attributes
    switch ($attr) {
    // product url
    case 'url_path':
      $p["singlefeed.url"] = $baseUrl.$value;
      break;

I love well-written code, especially with good comments. All we’re going to do is add in our new formatting (in this case, just for the product description.)

360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
// format product data as needed
foreach ($products as $id=>&$p) {
  foreach ($p as $attr=>&$value) {
    // replace raw numeric values with source option labels
    if ($options = $this->attr($attr, 'options')) {
      if (is_array($value)) {
        foreach ($value as &$v) {
          $v = isset($options[$v]) ? $options[$v] : '';
        }
      } else {
        $value = isset($options[$value]) ? $options[$value] : '';
      }
    }
    // combine multiselect values
    if (is_array($value)) {
      $value = join(', ', $value);
    }
    // process special cases of loaded attributes
    switch ($attr) {
    // product url
    case 'url_path':
      $p["singlefeed.url"] = $baseUrl.$value;
      break;
    // product descriptions
    case 'description':
      $p["description"] = htmlspecialchars_decode(strip_tags($value));
      break;

That was almost a little too easy…