EchoDitto Blog

Object Oriented PHP and the Migrate Module: Code Organization for Most Extreme Happiness

There are 2 kinds of people in this world: people who use the Migrate module to migrate content into a new Drupal site, and wrong people.

This post assumes a basic familiarity with the module.

The Migrate module is written in object-oriented PHP. All the code you will write, to migrate any number of content types, will be to extend the class Migration. You create a custom Migration class for every node type (possibly also for files and users as well, but let's keep it simple). You can save yourself a lot of redundant code by creating your own abstract parent migration class, and extending that class to create your migration class for each content type. Doing this does create a few sticking points, but you can use public, arbitrary functions, created in the parent class and called from the child, to streamline your code, such that you needn’t repeat anything in more than one class.

Let’s start with creating our own abstract parent class:

Some crucial pieces of code will be omitted for brevity, such as declaring $this->map and $this->team. Let’s also assume we’re using a SQL source.

abstract class OurParentMigration extends Migration {
 public function __construct() {
    // Always call the parent constructor first for basic setup
    parent::__construct();
 
 }
}

Notice this is an abstract class, so that it doesn't get treated as an actual migration.

Now let's put together the query on our source database. It’s quite likely that a lot of aspects of this query are going to be the same for all of the migration classes, for each node type. Let’s say your source database has a base table, similar to Drupal’s ‘node’ table, which stores basic, common metadata for nodes of all content types. You can begin by adding something like this to the constructor function in OurMasterMigration:

$this->query = db_select('main_legacy_table', 't');
$this->query->fields('t',array('title','entry_date','url',’published’,’body’,’teaser’));

Notice, while most of the example modules just define $query and pass it to MigrateSourceSql, it’s possible to define common parts of the query in an abstract parent class in $this, and then pull it out of $this->query in child classes, to continue defining the query:

class OurBlogPostMigration extends OurParentMigration {
  public function __construct() {
    parent::__construct();
    $query = $this->query;
    $query->addField('t','blog_image');
    $query->condition('t.content_type',’blog’);
    $this->source = new MigrateSourceSQL($query);
  }
}

And we can do the same thing with our field mappings.

In the OurParentMigration’s constructor:

$this->addFieldMapping(‘title’,’title’);
$this->addFieldMapping(‘body’,’body’);
//etc

and then in OurBlogPostMigration’s constructor:

$this->addFieldMapping(‘field_image',’blog_image’);

Piece of cake, right? No more redundant definitions.

But we run into a little trouble when we’re trying to use the prepare or prepareRow methods. What if we want to run prepareRow in the parent function as well as in a child? Such as:

in OurParentMigration class:

public function prepareRow($row){
    //if there’s a teaser but no body, put it in the body, so we don’t have summaries without bodies
    if (!$row->body and $row->teaser){
        $row->body = $row->teaser;
        $row->teaser = '';
    }
}

and in OurBlogPostMigration you want to:

public function prepareRow($row){
    //don’t use the value of a field, if another field is red
    if ($row->some_field_condition == ‘red’){
        $row->some_conditional_field == ‘’;
    }
}

The problem here is that due to namespace overlap, only the prepareRow method from the child will execute. So if we have logic we need to run at the child as well as at the parent level, what do we do?

One way is to utilize PHP’s Object Oriented syntax and to call the parent’s prepareRow method from within the child prepareRow method:

Within OurBlogPostMigration:

public function prepareRow($row){
    parent::prepareRow($row);
    //unique code only for this migration below
}

But this is a little limiting, in that your code must either pertain to all of your migration classes, or one of them. What if you want to use a block of code more than once, but not always?

For example, we may have separate taxonomy term mapping to do for legacy blog posts and news articles, and separate term logic for pages and press releases.

We can define arbitrary public methods in the parent class for each piece of common logic to execute on $row, and then invoke them as needed in the prepareRow method in the child class.

So, in OurParentMigration, outside of the constructor, we could do this:

public function blog_news_terms($row){
    ///some logic for converting mapping legacy categories to new drupal terms, unique to blog posts and news articles
    //be sure to
    return $row;
}
 
public function page_press_terms($row){
    ///some logic for converting mapping legacy categories to new drupal terms, unique to pages and press releases
    //be sure to
    return $row;
}

and then in OurBlogPostMigration and OurPageMigration we can make the calls, first in OurBlogPostMigration:

public function prepareRow($row){
    $row = $this->blog_news_terms($row);
 
    //unique code for just blogs goes here
}

and in OurPageMigration:

public function prepareRow($row){
    $row = $this->page_press_terms($row);
 
    //unique code for just pages goes here
}

In this manner we can execute code in prepareRow or other Migrate methods that is common to some, but not all migration classes, without duplicating code.