jobsbulletin

Topic: A stupid mistake

Hi Guys,

Yesterday, I updated my backfill module to lessen duplicate issues...
My mistake was, I deployed it without even testing it...
Yeah stupidity... I was so sleepy that time and made a gamble.
Shockingly this morning, my site went down.
I checked my database and a single job is inserted for about 1.7 million!
The table now is 4Gb+ and phpmyadmin cannot handle deletion of such number of records.
I can't empty my table because I don't have backup... (my schedule is next week)

So first, I need your suggestions on how to delete the records...
I know the query but what I mean is I can't delete all of it at once.
I know how to code also, I just need the design on how will the process be.
I have ideas in my mind but I just want to ask the experts on what they have to say.
(Yes, it was plain stupidity... before you answer, tell me that I am stupid please.. haha!)

Now, secondly... how come that my site went down?
I mean, it should not be a problem when jobs reaches that amount of records right?
It seems that jobberbase wasn't able to handle the fetching of data that is why I get
"Gateway Timeout" and "Internal Server Error" if you visit my site.

I am no expert and would just like to ask some questions.
This is my first website that I administrate and yes, I made this site so that I could gain experience.
So please,... share. I love to talk to experts and exchange ideas with other newbies like me.

A while ago, I opened the class jobs and found out that the method GetJobs
fetches first all the ID that satisfies the conditions passed on and then the jobs are queried one by one.
My question is,.. is that process more efficient than querying all of it at once?

Thanks. Hope get answers and learn from you guys.
I beg you. haha! :p

http://www.jobsbulletin.ph - Jobs Bulletin Philippines :: Where jobs and skills meet!

putypuruty

Re: A stupid mistake

I assume that you don't have too many 'correct' records in the jobs table. If that is the case, I would recommend that you create another jobs table (with the name jobs1, for example) and copy there the records that are fine by using a query similar to:

Code:
insert into jobs1 select * from jobs where id < ?

where you would replace the ? with the ID of the most recent 'correct' job ad.

Then, you could just drop the jobs table and rename the jobs1 table back to jobs. Dropping the 'bad' table would be very fast, almost instantaneous.

Hope it helps!

jobsbulletin

Re: A stupid mistake

Hi putypuruty,

Thanks for the suggestion but I already managed to delete them by chunk via scheduled jobs.

Anyway, as for my question, does it matter if you query the jobs one by one vs fetching them all one time?
As of now, I have 7k records and rss is not working anymore.
I think this is related to database queries.
Please confirm...

http://www.jobsbulletin.ph - Jobs Bulletin Philippines :: Where jobs and skills meet!

putypuruty

Re: A stupid mistake

Hi!

To be honest, I don't know exactly how jobberbase behaves with a lot of jobs. I'll have to investigate this issue - I hope that I'll be able to do it this week.

hobo

Re: A stupid mistake

Few, maybe useful tips:

-Increase your php memory available, some servers are set to 8mb by default, and it is not enough at times. ask Google how to change this setting on your specific server.
-Limit number of rss results to a reasonable number (maximum number possible depends on your memory limit setting, but this can be set by editing one of the jobberbase files, search forum to find out which ones and how).
-With so many jobs, search engine spiders will be almost constantly on your site. This could be a double edged sword in some cases since they also use some of your server resources. (Again, more memory will ensure the site will stay up.)
-Pagination could be added on pages listing jobs from individual companies. (Don't ask me how.)
-Setup cron job to delete old hits and stats.
-You already replaced the search, so there will be no issues there.

No other issues should be there smile.