Tag Archives: wordpress

Techcafeteria Blog Facelift

If you visit the blog (as opposed to just subscribe), you’ll note that I did a little cleaning.  My old WordPress site had gotten a bit corrupted, so, instead of trying to fix it, I just installed a new copy of WordPress, found a simple theme, and selectively imported the important things from the database. It was about four hours work.

If you ever visited Techcafeteria.com, without the “/blog” appended, that was actually a site that I created in a little-known content management system called Frog CMS. I ditched that; now techcafeteria.com simply points to the blog.

So, nothing fancy – I’m not here to rack up page views and compete with Yahoo!  Do let me know if I broke anything.

Adventures In Web Site Migration

This post was first published on the Idealware Blog in April of 2010.

I recently took on the project of migrating the Idealware articles and blog from their old homes on Idealware’s prior web site and Google’s Blogger service to our shiny, new, Drupal-based home. This was an interesting data-migration challenge. The Idealware articles were static HTML web pages that needed to be put in Drupal’s content database. And there is no utility that imports Blogger blogs to Drupal. Both projects required research and creativity.

The first step in any data migration project is to determine if automating the task will be more work than just doing it by hand. Idealware has about 220 articles published; cutting and pasting the text into Drupal, and then cleaning up the formatting, would be a grueling project for someone. On the other hand, automating the process was not a slam dunk. Database data is easier to write conversion processes for than free form text. HTML is somewhere in the middle, with HTML codes that identify sections, but lots of free form data as well.

Converting HTML Articles with Regular Expressions

My toolkit (of choice) for this project was Sed, the Unix Stream Editor, and a generic installation of Drupal. Sed does regular expression searching and replacing. So I wrote a script that:

  1. Deleted lines with HTML tags that we didn’t need;
  2. stored data between title and body tags;
  3. and converted those items to SQL code that would insert the title and article text into my Drupal database.

This was the best I could do: other standardized information, such as author and publishing date, was not standardized in the text, so I left calling those out for a clean-up phase that the Idealware staff took on. The project was a success, in it that it took less than two days to complete the conversion. It was never going to be an easy one.

Without going too far, the sed command to delete, say, a “META” tag is:

/\<meta/d

That says to search for a literal “less than” bracket (the forward slash implies literal) and the text meta and delete any line that contains it. A tricky part of the cleanup was to make sure that my search phrases weren’t ones that might also match article text.

Once I’d stripped the file down to just the data between the “title” and “body” tags, I issued this command:

s/\<title\>(.*)\<\/title\>.*\<body\>(.*)\<\/body\>/insert into articles (title, body) values (‘\1’, ‘\2’);/

This searches for the text between HTML “title” tags, storing it in variable 1, then the text between “body” tags, storing it in variable 2, then substitutes the variable data into a simple SQL insert statement in the replacement string. Iterating a script with all of the clean-up commands, culminating in that last command, gave me a text file that could be imported into the Drupal database. The remaining cleanup was done in Drupal’s WYSIWYG interface.

Blog Conversion

As I said, there is no such thing as a program or module that converts a Blogger Blog into Drupal format. And our circumstance was further complicated by the fact that the Idealware Blog was in Blogger’s legacy “FTP” format, so the conversion options available were further limited.

There is an excellent module for converting WordPress blogs to Drupal, and there were options for converting a legacy Blogger blog to WordPress. So, then the question was, how well will the blog survive a double conversion? The answer was: very well! I challenge any of you to identify the one post that didn’t come through with every word and picture intact.

I had a good start for this, Matthew Saunders at the Nonprofits and Web 2.0 Blog posted this excellent guide. If you have a current Blogger blog to migrate, every step here will work. My problem was that the Idealware blog was in the old “FTP” format. Google has announced that blogs in their original publishing format must be converted by May 1st. While this fact had little or no relationship to the web site move to Drupal, it’s convenient that we made the move well in advance of that.

To prep, I installed current, vanilla copies of WordPress and Drupal at techcafeteria.com. I tracked down Google’s free blog converters. While there is no WP to Drupal converter, most other formats are covered, and I just used their web-based Blogger to WordPress tool to convert the exported Idealware blog to WP format. The conversion process prompted me to create accounts for each author.

To get from WordPress to Drupal, I installed above-mentioned WordPress-import module. As with the first import, this one also prompted me to create the authors’ Drupal accounts. It also had an option to store all images locally (which required rights to create a public-writeable folder on the Drupal server). Again, this worked very well.

With my test completed, I set about doing it all over again on the new Idealware blog. Here I had a little less flexibility. I had administrative rights in Drupal, but I didn’t have access to the server. Two challenges: The server’s file upload limit (set in both Drupal and PHP’s initialization file) was set to a smaller size than my WordPress import file. I got around this by importing it in by individual blogger, making sure to include all current and former Idealware bloggers. The second issue was in creating a folder for the images, which I asked our host and designer at Digital Loom.com to do for me.

Cleanup!

The final challenge was even stickier — the posts came across, but the URLs were in a different format than the old Blogger URLs This was a problem for the articles as well. How many sites do you think link to Idealware content out there? For this, I begged for enough server access to write and run a PHP script that renamed the current URLs to their former names — a half-successful effort, as Drupal had dramatically renamed a bunch of them. The remainder we manually altered.

All told, about two hours research time, three or four hours conversion (over a number of days) and more for the clean-up, as I wasted a lot of time trying to come up with a pure SQL command to do the URL renaming, only to eventually determine that it couldn’t be done without some scripting. A fun project, though, but I’d call it a success.

I hope this helps you out if you ever find yourself faced with a similar challenge.

Wanna play with OpenID?

Yesterday, Sun announced a rollout of OpenID for all of the company’s employees, and joined Microsoft, Yahoo!, AOL and others in embracing the emerging Single Sign-on standard.

In order to deepen my understanding of OpenID and what it’s ramifications might be for me and the non-profit community, I’m diving in and inviting you to join me. I’ve set up an OpenID server at http://openid.techcafeteria.com that you are welcome to use to establish your own ID. From there, you can also manage your identity, optionally revealing some demographic info to sites that you authenticate to (completely optional!) and managing the sites that you have authenticated to.

I’ve also set up my blog to allow for OpenID as a registration option, via a handy WordPress plugin.

Some notes if you want to join in:

  • If you sign up, you might want to then register on my blog and leave a comment on this entry. That way we’ll know who we’re playing with.
  • If you have trouble accessing http://openid.techcafeteria.com, wait a few hours – it should be fully reachable by Friday at the very latest. I just set up the DNS a few hours ago

If you don’t know where to use OpenID other than my blog, note that plugins are available for WordPress, livejournal, Drupal, MediaWiki, and other community-based applications, as well as a module for apache. Technet has articles on how to integrate it with ASP sites. So, it’s out there – look for the logo:

OpenId Logo

New plan for Content!

Regular vistitors to the Coconino County Home Page know one thing well: there’s not much reason to be a regular visitor to the site. The page tends to be updated annually, as opposed to regularly. This is defensible: I chose my subject matter for a number of reasons, the primary one being my love for it, but the secondary being the relative low amount of updating that would be required. And, as readers of my Site Notes know, my third motivation has always been to just have a web site where I can keep my skills (such as they are) fresh.

So I’ve done a few things to make adding content simpler, taking advantage of the latest buzz on the Internet: Really Simple Syndication (RSS). First, the bookmarks are now managed using del.icio.us, a very powerful bookmark sharing site. I highly recommend it! Second, I’m using RSS to centralize content creation for about four different web sites that I maintain, which will make it simpler to publish to krazy.com.

Over the next half year or so, I will be migrating Krazy.com to a full RSS/blogging platform called WordPress.. Don’t be concerned – the updated content on the site was blog-like long before I ever heard the term, and it will not change dramatically when it’s moved to the new platform. For those interested in the techy details, I will chronicle this more thoroughly in the site notes.

Why blog?

With over 8 million blogs out there (as of March, when I saw Mena Trott, founder of blogging service Six Apart, speak at the NTEN Non-Profit Technology conference), there’s a real good question as to why someone like me would add another “sad, default-Blogger-templated website” to the giant heap of the same out there. Well, I have a few reasons.

Mainly, while most people set up blogs and then notice how conveniently they can distribute them via RSS (Really Simple Syndication), I got here from the reverse direction. I have a need to strategically publish content to a variety of web sites, and RSS is an effective tool to do it. By maintaining a blog, I can pretty handily write all of that content here and then selectively copy it where it needs to go. The destinations for these posts include www.krazy.com, my website devoted to the classic cartoon “Krazy Kat” and it’s author, George Herriman; a private web site inside San Francisco Goodwill that I maintain (running on Drupal), where I blog on technology issues relevant to my organization and role as IT Director; and, possibly, the Digital Divide Network, where I am hoping to be more active.

Secondly, I think I have enough web street cred to be legitimate. I wasn’t born on the web yesterday. In 1989, I ran a computerized Bulletin Board system (BBS) out of my home. I wrote software to convert Usenet newsgroups to PCBoard BBS format so I could carry them for my callers. I registered krazy.com in 1994, and had a web site up there by January of 1995, a little earlier than most of you, right? Since about 2000, you’ve been able to find my web site at Google by typing “krazy kat” in the box and pressing “I’m Feeling Lucky”. The Coconino County Homepage is the first unsponsored link at just about any search engine when you look for either “Krazy Kat” or “George Herriman”.

Third, I hope to grow this into more than just a blog. WordPress supports adding additional pages, and RSS feeds on related topics, as well as forums and other features are likely additions in the months to come. Ultimate goal: port the whole Krazy Kat web site to WordPress as well.

Finally, this is not a place where you’re going to hear cute stories about my dog, and I promise to keep the “blogging about blogs” itch scratched, as much as possible. I will discuss related technologies, but from my perspective as a technology strategist, which I think puts a broader slant on things then just “ain’t it cool”. I’ll also throw in some biographical/political/pure opinion stuff, but I’ll try to keep it entertaining.

So, again, welcome!