Matt's Blog
Magento canonical category URL paths for products - an in depth look into the mysteries of missing category paths

An in depth look into the mysteries of missing category URL paths in Magento

Are you integued by how Magento decides whether to output category URL paths in your Magento category pages? Here I will try to explain the inner workings of Magento, and how the system determines the URL that is output for a given product. I will also breifly explain a workaround (all be it a little of a hack) to offer a more consistent approach to full URLs throughout your website.

Before I begin, we will take the following category structure as an example throughout this post:

  • Cars (1)
    • Honda (2)
      • Product: Honda Jazz (1)
      • Estates (3)
        • Product: Honda Accord - Black (2)
        • Product: Honda Accord - Red (3)
      • Hatchbacks (4)
        • Product: Honda Civic (4)
    • Volvo (5)…
  • Vans…

All categories have been set as anchors (to show sub category products as well as their own) and their ID is given in brackets above.

Lets begin…. When navigating to a category page (for example Honda > Estates), when Magento processes the layered navigation, the following code is called along the way:

/app/code/core/Mage/Catalog/Model/Layer.php 

public function prepareProductCollection($collection)
{
	$collection
	    ->addAttributeToSelect(Mage::getSingleton('catalog/config')->getProductAttributes())
	    ->addMinimalPrice()
	    ->addFinalPrice()
	    ->addTaxPercents()
	    //->addStoreFilter()
	    ->addUrlRewrite($this->getCurrentCategory()->getId()); // HERE FILTER IS APPLIED

	Mage::getSingleton('catalog/product_status')->addVisibleFilterToCollection($collection);
	Mage::getSingleton('catalog/product_visibility')->addVisibleInCatalogFilterToCollection($collection);

	return $this;
}

Above we can see the ->addUrlRewrite() method is called, with the current category (of the page you’re viewing - ‘Honda > Estates’), passed into it.

If a product is assigned to the current category page you are viewing (such as the Honda Accord - Black), this is fine… The current category’s URL path is output, along with the product path.

However if the category you’re on (e.g. ‘Honda’ - root category) is an anchor and shows products in lower sub categories too (Estates, Hatchbacks), this is where the problem occurs… 

In this situation when the various sub category products (such as the Accord and Civic) are output, as they are not directly assigned to the Honda root category, only the product URL path is output. The only exception to this in our example would be the Honda Jazz (ID 1) as that product is assigned to the Honda root category, and therefore matches.

So why does Magento do this… Magento has been designed this way as it has no way of knowing what sub category the other products are assigned to at a higher cartegory level. Magento also allows for the fact that any product can be assigned to multiple categories (and hence it wouldn’t know which category path to output in that case), therefore it cannot simply assume it’s assigned to one category and output its path.

Let’s delve further into the code and find out why this is so… 

When the collection is prepared by the Layered navigation (above), the addUrlRewrite() method stores the category passed into it (the current page category id), in our case  ’Honda’ - 2:

/app/code/core/Mage/Catalog/Model/Resource/Eav/Mysql4/Product/Collection.php

public function addUrlRewrite($categoryId = '')
{
	$this->_addUrlRewrite = true;
	if (Mage::getStoreConfig(Mage_Catalog_Helper_Product::XML_PATH_PRODUCT_URL_USE_CATEGORY, $this->getStoreId())) {
	    $this->_urlRewriteCategory = $categoryId;
	} else {
	    $this->_urlRewriteCategory = 0;
	}

	if ($this->isLoaded()) {
	    $this->_addUrlRewrite();
	}

	return $this;
}

As we can see this simply sets the category ID (e.g. 2) passed into the function, providing we have category URLs enabled in Magento. This then in turn calls _addUrlRewrite():

/app/code/core/Mage/Catalog/Model/Resource/Eav/Mysql4/Product/Collection.php 

protected function _addUrlRewrite()
{
	$urlRewrites = null;
	if ($this->_cacheConf) {
	    if (!($urlRewrites = Mage::app()->loadCache($this->_cacheConf['prefix'].'urlrewrite'))) {
		$urlRewrites = null;
	    } else {
		$urlRewrites = unserialize($urlRewrites);
	    }
	}

	if (!$urlRewrites) {
	    $productIds = array();
	    foreach($this->getItems() as $item) {
		$productIds[] = $item->getEntityId();
	    }
	    if (!count($productIds)) {
		return;
	    }

	    $select = $this->getConnection()->select()
		->from($this->getTable('core/url_rewrite'), array('product_id', 'request_path'))
		->where('store_id=?', Mage::app()->getStore()->getId())
		->where('is_system=?', 1)
		->where('category_id=? OR category_id is NULL', $this->_urlRewriteCategory) // HERE ASSIGNED
		->where('product_id IN(?)', $productIds)
		->order('category_id DESC'); // more priority is data with category id
	    $urlRewrites = array();

	    foreach ($this->getConnection()->fetchAll($select) as $row) {
		if (!isset($urlRewrites[$row['product_id']])) {
		    $urlRewrites[$row['product_id']] = $row['request_path'];
		}
	    }

	    ........
}

Here we can see (commented: “// HERE ASSIGNED”), that a filter is applied to the SQL where statement, using the ID we have just set above (2).

As we’re on the ‘Honda’ root category page, the above code generates an SQL statement similar too…

SELECT `core_url_rewrite`.`product_id`, `core_url_rewrite`.`request_path` FROM `core_url_rewrite` WHERE (store_id='1') AND (is_system=1) AND (category_id='2' OR category_id is NULL) AND (product_id IN('1', '2', '3', '4')) ORDER BY `category_id` DESC 

Here you note that the category id is 2 (for Honda). Therefore in the core_url_rewrite table it first checks for product ID 1,2,3 and 4 whether they are assigned to category id 2. If nothing is found, it simply outputs the product URL with no category prefix (category = NULL). The only one that would include a category path is product 1, as this is assigned to category 1 (Honda). Running the SQL on your system we give you a good idea as to what it is doing.

If we then navigated to a category page that product 2 and 3 were assigned to (say category id 3 - Honda > Estates), you would get something like…

SELECT `core_url_rewrite`.`product_id`, `core_url_rewrite`.`request_path` FROM `core_url_rewrite` WHERE (store_id='1') AND (is_system=1) AND (category_id='3' OR category_id is NULL) AND (product_id IN('2', '3')) ORDER BY `category_id` DESC 

As product 2 and 3 are assigned to category 3, the core_url_rewrite table will now output the category path aswell as the product path. And no problems! For reference, Product 1 (Honda Jazz) and 4 (Honda Civic) are no longer shown as these are assigned to the category above Estates.

Here you can now spot the problem we mentioned previously should you like full URL paths…

As product 2 and 3 are assigned to a sub category of Honda (Estates), when you’re viewing its parent category (Honda), as the products aren’t assigned to category 2 directly (as it’s assigned to 3), it defaults to using the product URL without the category prefix. Therefore as the category ‘Honda’ is an anchor, the products are shown with the non canonical category URL. This is because there was no match in the core_url_rewrite table for category 2 with product 1,2,3 or 4.

Removing the where condition (commented: “// HERE ASSIGNED”) solves the problem to an extent (all be it a hack). This is because the SQL is ordered by category id, so will therefore output the full path first, before the non prefixed URL. This providing the parent category was created before the child.

The only problem to this (and the reason Magento rightly have chosen to do it this way) is if the product is assigned to multiple categories, it will output the first in the database (the last category the product was assigned to), and this cannot be preempted, therefore should be used with caution.

If the condition above is removed, it would mean should a product be assigned to two categories (say Honda Estates and Hatchbacks), no matter which category you were viewing the URL would be the same, and not represent the category path you are viewing. E.g. on the estates category the URL would read /honda/estates/honda-accord-red and on the hatchbacks category it would also read /honda/estates/honda-accord-red - far from ideal!

This however is not a problem should you be happy with this weakness in single category URLs, or if you know for sure you’ll only be assigning them to one category each!

I hope this explination helps and would be great to hear if anyone else has come across this problem, or indeed has any other solutions!

Thanks for reading

Blog comments powered by Disqus