rawkinrich

Re: XML import in to Jobberbase MySql database tool

I'm a newbie and am looking for a job aggregator - the design of jobberbase is fantastic and would look great as an aggregator.

I believe a 'plugin' that would allow RSS feeds to display jobs from other sources (with other features) would be useful.

Just an idea for you guys, I know you are discussing this in the thread about hacking the code but something easy to add-on is a must.

navjotjsingh

Re: XML import in to Jobberbase MySql database tool

Currently design of jobberBase is such that it does not support addons. v2.0 will probably contain the plugin system, so you will have to wait until that if you are looking for a addon for this. Until then, you will have to hack the core files to integrate RSS feeds into jobberBase.

chrisdegrote

Re: XML import in to Jobberbase MySql database tool

Hey all,

I thought this discussion was dead but I did something wrong with the instant e-mail notification. Anyway I've received a possible solution for importing XML feeds in your database. The method is Shell scripting and I'm now trying to make it all work with my site.

Does anybody got any experience with this? When I've got it all working I will notify you

mattcody

Re: XML import in to Jobberbase MySql database tool

It's pleasing to see that this discussion is still quite active...

I saw this listed elsewhere online (http://www.shearersoftware.com/software … s.php.html) about importing XML into Wordpress...  and thought someone else might be able to adapt this for Jobberbase?

Any takers?!

Thanks,

Matt



<?php
/*
RSS import for WordPress
by Andrew Shearer (awshearer@shearersoftware.com)

For current version and more info, see:
http://www.shearersoftware.com/software … ss-import/

This script is currently intended to be run from the command line or from the
web after it has been configured by editing variables in the first few lines
of the script.

To use it, first set the $path variable below to a path to an RSS file or
directory containing a blogBrowser archive (one folder per year, one RSS file
per month.) Then run this script from the command line (php import-rss.php).

Examples:

Import an rss.xml file in this directory:
$path = dirname(__FILE__).'/rss.xml';

Import an rss.xml file with a full path specified (Mac OS X full path):
$path = '/Users/testuser/Sites/mysite/rss.xml';

Import a blogBrowser archive in a folder named C:/documents/weblog,
including monthly RSS files such as weblog/2003/12.xml (Windows full path):
$path = 'C:/documents/weblog';

Future improvements: make this runnable from a web browser. Single RSS files
could be handled through uploads, and multi-file blogBrowser archives could be
specified by base URL.

Revision history:

2003-12-26  ashearer  Improved date conflict resolution with $kUpdatePostsAlways
                      and $kUpdatePostsIfNewer options; added $kTakeNoAction;
                      added more comments
2003-12-22  ashearer  Added blogBrowser archive support; optional mod. dates;
                      mod. date column autocreation
2003-12-21  ashearer  RSS import, initial version

*/

//$path = dirname(__FILE__).'/../archivedir';
//$path = dirname(__FILE__).'/rss.xml';
$path = 'http://www.example.com/rss.xml';

$kCreateModDateField = false;    // autocreate post_modified field?
$kSetModDateField = true;       // import post_modified field from RSS file?
$kUpdatePostsAlways = true;    // true to import RSS version even if it replaces current version
$kUpdatePostsIfNewer = false;    // if true, in case of conflict, use newer version; requires post_modified field
$kTakeNoAction = false;          // like -n flag; report actions but don't actually change DB

error_reporting(E_ALL);

class post {
    var $title;
    var $content;
    var $createDate;
    var $modDate;
    var $guid;
    var $categories;
    var $postTitle;
}

$kExcludeCategories = array('Testing' => '');

$currentPost = null;
$currentText = '';

function parseDateISO8601($input) {
    // returns the date in SQL (MySQL, at least)-compatible text format
    return substr($input, 0, 10) . ' ' . substr($input, 11, 8);
}

function parseDateRFC822($input) {
    // returns the date in SQL (MySQL, at least)-compatible text format
    return strftime('%Y-%m-%d %H:%I:%S', strtotime($input));
}

function startElement($parser, $name, $attrs) {
    global $currentPost, $currentText, $currentGuidAttrs;
    if ($name == 'item') {
        $currentPost = new post();
        $currentPost->categories = array();
    }
    elseif ($name == 'guid') {
        $currentGuidAttrs = $attrs;
    }
    $currentText = '';
}

function endElement($parser, $name) {
    global $currentPost, $currentText;
   
    switch ($name) {
        case 'title': case 'http://www.w3.org/1999/02/22-rdf-syntax-ns# title':
            $currentPost->title = $currentText;
            break;
       
        case 'content:encoded': case 'http://purl.org/rss/1.0/modules/content/ encoded':
            $currentPost->content = $currentText;
            break;
           
        case 'description': case 'http://www.w3.org/1999/02/22-rdf-syntax-ns# description':
            // content:encoded trumps description, so only save the description
            // if there's no content already
            if (!isset($currentPost->content) || !strlen($currentPost->content)) {
                $currentPost->content = $currentText;
            }
            break;
       
        case 'pubDate':
            $currentPost->createDate = parseDateRFC822($currentText);
            break;
       
        case 'dc:date': case 'http://purl.org/dc/elements/1.1/ date':
            $currentPost->createDate = parseDateISO8601($currentText);
            break;
       
        case 'dcterms:modified': case 'http://purl.org/dc/terms/ modified':
            $currentPost->modDate = parseDateISO8601($currentText);
            break;
       
        case 'category': case 'dc:subject': case 'http://purl.org/dc/elements/1.1/ subject':
            $currentPost->categories[] = $currentText;
            break;
       
        case 'guid':
            if (isset($currentGuidAttrs['isPermaLink']) && $currentGuidAttrs['isPermaLink'] == 'true') {
                $currentPost->permalink = $currentText;
            }
            $currentPost->guid = $currentText;
            break;
       
        case 'item': case 'http://www.w3.org/1999/02/22-rdf-syntax-ns# item':
            processPost($currentPost);
            $currentPost = null;
            break;
    }
   
    $currentText = '';
}

function characterData($parser, $data) {
    global $currentText;
    $currentText .= $data;
}

// WordPress-specific code

$post_author = 'admin';

require_once('../wp-config.php');
require_once(ABSPATH.WPINC.'/template-functions.php');
require_once(ABSPATH.WPINC.'/functions.php');
require_once(ABSPATH.WPINC.'/vars.php');

if ($kCreateModDateField && !$kTakeNoAction) {
    require_once(ABSPATH.'/wp-admin/install-helper.php');
    $res = '';
    $tablename = $tableposts;
    $ddl = "ALTER TABLE $tableposts ADD COLUMN post_modified datetime";
    maybe_add_column($tablename, 'post_modified', $ddl);
    if (check_column($tablename, 'post_modified', 'datetime')) {
        $res .= $tablename . ' - ok <br />'."\n";
    } else {
        $res .= 'There was a problem with ' . $tablename . '<br />'."\n";
        //++$error_count;
    }
    echo $res;
}

function processPost(&$post) {
    global $kSetModDateField, $kUpdatePostsAlways, $kUpdatePostsIfNewer, $kTakeNoAction;
   
    //print_r($post);
   
    // Filter out (ignore) posts having categories that are all listed as "excluded"
    // If a post has no categories, or at least one non-excluded category, it is still
    // included.
    if (sizeof($post->categories)) {
        $gotIncludedCategory = false;
        foreach ($post->categories as $categoryName) {
            if (!isset($kExcludedCategories[$categoryName])) {
                $gotIncludedCategory = true;
                break;
            }
        }
        if (!$gotIncludedCategory) {
            return;
        }
    }
   
    global $post_author, $kExcludeCategories;
    global $wpdb;
    global $tableusers, $tableposts, $tablepost2cat, $tablecategories;

    $post_author_ID = $wpdb->get_var("SELECT ID FROM $tableusers WHERE user_login = '$post_author'");
   
    $post_content = $post->content;
    $post_content = str_replace('<br>', '<br />', $post_content); // XHTMLify <br> tags
   
    /* Un-word-wrap the content, because <br /> tags will be added at display time
    for line breaks, and RSS feeds are often already soft-wrapped. Replace \n and \r
    with spaces.
   
    However, we don't want to remove word wrapping inside <pre> tags. Stopping short
    of a full HTML parser, we only un-wrap those sections not inside <pre> tag pairs.
    (This code could be misled by things that look like <pre> tags wrapped in HTML comments,
    but oh well.)
    */
    /*$pos = $lastpos = 0;
    while ($lastpos !== false && ($pos = strpos($post_content, '<pre>', $lastpos)) !== false) {
        $post_content = substr($post_content, 0, $lastpos)
            . str_replace("\n", ' ', str_replace("\r", ' ', substr($post_content, $lastpos, $pos - $lastpos)))
            . substr($post_content, $pos);
        $lastpos = strpos($post_content, '</pre>', $pos);
    }
    if ($lastpos !== false) {
        $post_content = substr($post_content, 0, $lastpos)
            . str_replace("\n", ' ', str_replace("\r", ' ', substr($post_content, $lastpos)));
    }
    */
   
    $post_content = addslashes($post_content);
   
    #$post_content = str_replace("\r", ' ', $post_content);
    #$post_content = str_replace("\n", ' ', $post_content);
    $post_date = addslashes($post->createDate);
    $post_title = addslashes($post->title);
    $post_modified = $kSetModDateField ? addslashes($post->modDate) : '';
    $post_name = '';
    if (isset($post->permalink) && strlen($post->permalink)) {
        $matches = array();
        if (preg_match('|/[0-9]{4}/[0-9]{2}/[0-9]{2}/([A-Za-z0-9_-]*)/?|', $post->permalink, $matches)) {
            $post_name = $matches[1];
            $post_name = mysql_escape_string($post_name);
        }
    }
       
   
    $categoryIDList = array();
    foreach ($post->categories as $categoryName) {
        if (isset($kExcludedCategories[$categoryName])) {
            continue;
        }
        $categoryID = $wpdb->get_var("SELECT cat_ID FROM $tablecategories WHERE cat_name = '".mysql_escape_string($categoryName)."'");
        if (!$categoryID) {
            if ($kTakeNoAction) {
                echo "Would have inserted new category '$categoryName'.";
                $categoryID = 0;
            }
            else {
                $categoryNiceName = sanitize_title($categoryName);
                $wpdb->query("INSERT INTO $tablecategories
                    (cat_name, category_nicename)
                  VALUES
                    ('".mysql_escape_string($categoryName)."','".mysql_escape_string($categoryNiceName)."')");
                $categoryID = $wpdb->get_var("SELECT LAST_INSERT_ID()");
            }
        }
        else {
            // category already exists; could update its nicename here if it tended not to be correct already.
            //$wpdb->query("UPDATE $tablecategories SET category_nicename = '".mysql_escape_string(sanitize_title($categoryName))."' WHERE cat_ID = ".intval($categoryID));
        }
        $categoryIDList[] = $categoryID;
    }
   
   
    // Quick-n-dirty check for dups:
    if ($kUpdatePostsIfNewer) {
        $dupcheck = $wpdb->get_results("SELECT ID,post_date,post_title,post_modified FROM $tableposts WHERE post_date='$post_date' AND post_title='$post_title' LIMIT 1",ARRAY_A);
    }
    else {
        $dupcheck = $wpdb->get_results("SELECT ID,post_date,post_title FROM $tableposts WHERE post_date='$post_date' AND post_title='$post_title' LIMIT 1",ARRAY_A);
    }
    if ($dupcheck[0]['ID']) {
        // post already exists
        if ($kUpdatePostsAlways || ($kUpdatePostsIfNewer && $kSetModDateField && $dupcheck[0]['post_modified'] < $post_modified)) {
            print "<br />\n\nUpdating post, ID = '" . $dupcheck[0]['ID'] . "'<br />\n";
            print "Timestamp: " . $post_date . "<br />\n";
            print "Post Title: '" . stripslashes($post_title) . "'<br />\n";
            if (!$kTakeNoAction) {
                $postID = $dupcheck[0]['ID'];
                $result = $wpdb->query("
                UPDATE $tableposts
                    SET post_author = '$post_author_ID', post_date = '$post_date',
                    ".($kSetModDateField ? "post_modified = '$post_modified', " : "")."
                    post_content='$post_content',
                    post_title = '$post_title', post_name = '$post_name' WHERE ID = ".intval($postID));
                //echo "DELETE FROM $tablepost2cat WHERE post_id = ".intval($postID);
                $result = $wpdb->query("DELETE FROM $tablepost2cat WHERE post_id = ".intval($postID));
                foreach ($categoryIDList as $categoryID) {
                    $result = $wpdb->query("
                        INSERT INTO $tablepost2cat
                            (post_id, category_id)
                        VALUES
                            (".intval($postID).",".intval($categoryID).")
                        ");
                }
            }
        }
        else { 
            print "<br />\n\nSkipping duplicate post, ID = '" . $dupcheck[0]['ID'] . "'<br />\n";
            print "Timestamp: " . $post_date . "<br />\n";
            print "Post Title: '" . stripslashes($post_title) . "'<br />\n";
        }
    }
    else {
        print "<br />\nInserting new post.<br />\n";
        print "Timestamp: " . $post_date . "<br />\n";
        print "Post Title: '" . stripslashes($post_title) . "'<br />\n";
        if (!$kTakeNoAction) {
            $result = $wpdb->query("
            INSERT INTO $tableposts
                (post_author,post_date,post_content,post_title,post_name,post_category".($post_modified ? ",post_modified" : "").")
            VALUES
                ('$post_author_ID','$post_date','$post_content','$post_title','$post_name','1'".($post_modified ? ",'$post_modified'" : "").")
            ");
            $postID = $wpdb->get_var("SELECT LAST_INSERT_ID();");
            if ($postID) {
                foreach ($categoryIDList as $categoryID) {
                    $result = $wpdb->query("
                    INSERT INTO $tablepost2cat
                        (post_id, category_id)
                    VALUES
                        (".intval($postID).",".intval($categoryID).")
                    ");
                }
            }
        }
    }
}

// XML parsing code
function importRSSFile($filePath) {
    if (function_exists('xml_parser_create_ns')) {
        $xml_parser = xml_parser_create_ns('iso-8859-1',' ');    // space sep for namespace URI
    }
    else {
        $xml_parser = xml_parser_create();
    }
    // make sure to turn off case-folding; XML 1.0 is case-sensitive
    xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
    xml_set_element_handler($xml_parser, "startElement", "endElement");
    xml_set_character_data_handler($xml_parser, "characterData");
    if (!($fp = fopen($filePath, "r"))) {
        die("could not open XML input");
    }
   
    while ($data = fread($fp, 4096)) {
        if (!xml_parse($xml_parser, $data, feof($fp))) {
            die(sprintf("XML error: %s at line %d",
                        xml_error_string(xml_get_error_code($xml_parser)),
                        xml_get_current_line_number($xml_parser)));
        }
    }
    xml_parser_free($xml_parser);
    fclose($fp);
}

function importBlogArchive($dirPath) {
    $startYear = 1980;
    $endYear = intval(strftime('%Y'));
    for ($testYear = $startYear; $testYear <= $endYear; $testYear++) {
        for ($testMonth = 1; $testMonth <= 12; $testMonth++) {
            $rssFilePath = $dirPath.'/'.$testYear.'/'.($testMonth < 10 ? '0' : '').$testMonth.'.xml';
            if (is_file($rssFilePath)) {
                importRSSFile($rssFilePath);
            }
        }
    }
}

if (is_dir($path)) {
    importBlogArchive($path);
}
else {
    importRSSFile($path);
}

/*echo '<pre>';
print_r($EZSQL_ERROR);
echo '</pre>';
*/

?>

redjumpsuit

Re: XML import in to Jobberbase MySql database tool

hi, in case anyone is still looking for solution for this...i now have an available add-on to backfill your jobberBase job board with feeds from Indeed.com and SimplyHired.com, which you can read about here:

http://www.redjumpsuit.net/2010/07/25/b … mplyhired/

Monetize your job board using a simple PayPal payment solution for jobberBase 1.9.1!

SteveSPI

Re: XML import in to Jobberbase MySql database tool

Hello All,

Not posted on here for a long time now.

I can provide one time backfills for your new jobboard in the UK via a simple CSV upload. As if you are just looking to kick start your site, then I can upload anything upto about 10,000 jobs in one go. This just saves the hassle of having to recode any of your site.

I started up Jobberfeed.com but never really took it to much further. But new sites are reguarly being indexed on there to be included on the feeds.

Problem I found with working with XML is the amount of server capacity it took up. So if you are running on a Shared Host, like JustHost, One.Com ,GoDaddy etc. Likelyhood is, at some point they will shut your site down as you have maxed out the capacity of the processor on the server (Trust me this has happened a few times, when I was developing the code that imported and distrubuted the feeds).

How I plan on making it work, is simply providing login to Jobberfeed and on a weekly basis uploading the new CSV file on the server ready for download.

There will be a small price for the service.

One time back fill (download link will last 1 week) - £5
Monthly Unlimited Download (file updated weekly) - £10 a month (Paypal Subscription).

On average the files should contain around 15,000 jobs and will be formated in a jobberbase format. Only the Category field will need to be updated.

The service will be ready in about 2 weeks. So drop me an email or send a PM if interested.

Cheers,

SteveSPI

SpeedyJob.co.uk

Mario

Re: XML import in to Jobberbase MySql database tool

For those who talked about jobberbase-to-jobberbase import, how can this be possible while the company email is hidden? It's required for job applications isn't it?

Mario

Re: XML import in to Jobberbase MySql database tool

Mario wrote:

For those who talked about jobberbase-to-jobberbase import, how can this be possible while the company email is hidden? It's required for job applications isn't it?

Looks like my question is quite general and not only about importing from other jobberbase sites. How do you people import jobs from feeds that dont share companies' email? (like indeed's and simplyhired's)

redjumpsuit

Re: XML import in to Jobberbase MySql database tool

simple.

you use your admin email as the poster email and then you specify 0 on the apply online option. this way, applications will not be sent to your email but rather applicants will use the link on the job post, if they want to apply for a job.

Monetize your job board using a simple PayPal payment solution for jobberBase 1.9.1!

hobo

Re: XML import in to Jobberbase MySql database tool

For jobberbase to jobberbase import code, some changes are needed to make it work on jobberbase 1.9.1.
The changes are related to how new version deals with location fields.
To avoid extra log post, help you out and shamelesly promote some links I created a small website where you can download the jobberBase to jobberBase rss code for jobberBase version 1.6 and version 1.9.1.

Check it out: jobfeeds.me

Last edited by hobo (2010-09-09 18:34:43)

aishwin

Re: XML import in to Jobberbase MySql database tool

Hello guys,

I am looking for job wrapping tool, that can post/add all the new jobs from the portals everyday with a scheduler. The tool should post the openings in the database.  Pls help, its urgently needed.

Thanks
aishwinv@gmail.com

bgreen0722

Re: XML import in to Jobberbase MySql database tool

Hi ...

I have the feeds script installed here: http://www.atheistlist.com/jobberbase/s … ronjob.php

It runs ok but shows no jobs listed. I have checked to make sure there are jobs listed in at least some of the categories via my rss reader. Any help is appreciated.

Thanks
Brenda

hobo

Re: XML import in to Jobberbase MySql database tool

Did you make any changes other than the sql file?