Job Update
May. 5th, 2006 04:39 pmI don't often talk about my job on LJ. In many ways, it's a pretty boring one - I scan documents and correct and mark the output so that they're readable for blind students at the university. It's one of those unfortunate jobs that is rather boring and monotonous, but requires a human. So, I get good uni pay, a workplace that happens to be my place of study, flexible hours, and no supervision, and constant access to an internet connection. So, while the work itself gets boring, the fringe benefits more than make up for it.
Currently I'm scanning in an education textbook on Classroom discipline. Now, as a scanner, I'm not a fan of this book. The inking throughout the book is various shades of green (which is fine, since the scanner scans in B/W full contrast anyway), the pages are very thin (and as such, I keep getting faded bits of text from the other side of the page which gives the OCR program hissy fits), it's big enough that I can't do double-page scanning on the scanner I'm using, and it has various paper planes flying around on the margins. It's also got a very poor margins system, in which both pages have the most margin on the left hand side of the page, and very little on the right side. Which means I really have to put my arms into it when I scan the damn thing, or else I miss the last character of each line. And until my boss manages to grab a reading list for the book, I have to scan the whole damn thing in. All 330 pages of it. Oh, and there's comic strips throughout the book. Which I have to transcribe.
The only thing that makes it bearable is that all the tables are really simple, and the diagrams are far and few between. It's mostly just nice text, which means it won't be a horror come correcting and mark-up time.
So, for those of you intending to write textbooks, please, for the love of god, use decent-thickness paper, don't even consider comics through the book, and make sure you've left at least some margin on the spine-side of the book. For our sakes, if nothing else...
Currently I'm scanning in an education textbook on Classroom discipline. Now, as a scanner, I'm not a fan of this book. The inking throughout the book is various shades of green (which is fine, since the scanner scans in B/W full contrast anyway), the pages are very thin (and as such, I keep getting faded bits of text from the other side of the page which gives the OCR program hissy fits), it's big enough that I can't do double-page scanning on the scanner I'm using, and it has various paper planes flying around on the margins. It's also got a very poor margins system, in which both pages have the most margin on the left hand side of the page, and very little on the right side. Which means I really have to put my arms into it when I scan the damn thing, or else I miss the last character of each line. And until my boss manages to grab a reading list for the book, I have to scan the whole damn thing in. All 330 pages of it. Oh, and there's comic strips throughout the book. Which I have to transcribe.
The only thing that makes it bearable is that all the tables are really simple, and the diagrams are far and few between. It's mostly just nice text, which means it won't be a horror come correcting and mark-up time.
So, for those of you intending to write textbooks, please, for the love of god, use decent-thickness paper, don't even consider comics through the book, and make sure you've left at least some margin on the spine-side of the book. For our sakes, if nothing else...
(no subject)
Date: 2006-05-05 07:19 am (UTC)(no subject)
Date: 2006-05-05 10:25 am (UTC)Been doing that for a particular book for about 2 months...
(no subject)
Date: 2006-05-05 10:50 am (UTC)I'm sure I heard about it on one of the techwriting lists...
off to search.....
(no subject)
Date: 2006-05-05 11:00 am (UTC)(pdf to text software)
I don't recognise the toolnames, but then I've never had to do that.. normally it's taking Word docs and getting (by hand) into FrameMaker or recently, XML, and using PDF as the publishable output.
Please tell me you use software to pull out the text and not cut-paste?
;>
(Note - I use cut-paste to get data into XML topics, cos it does a much better (and faster) job than any extraction to XML... software really doesn't understand DITA yet....
(no subject)
Date: 2006-05-08 01:23 am (UTC)We could extract the text directly from the PDF files (and I've worked on extracted text before), but it turns out it's not nearly as fast, since it means we have to spend much longer transcribing tables and images, since we don't have anything close to an imprint of these in the file already. Also, extracting all the text from a PDF tends to throw a lot of formatting of little text boxes entirely out of whack. Believe it or not, it's traditionally faster to send a Publisher's PDF through the OCR program, then tidy it up, rather than pull all the text out and then add everything in afterwards.
Of course, for a lot of PDFs, this is all academic - a lot of the PDFs are just PDFised image scans of a small document fragments, and so we have to treat it like a set of images anyway, because they are. Thank god for ABBYY FineReader, since it can scan from both a scanner and a PDF file...
(no subject)
Date: 2006-05-05 04:21 pm (UTC)How evil.
η
(no subject)
Date: 2006-05-08 01:26 am (UTC)(no subject)
Date: 2006-05-05 04:11 pm (UTC)Everybody does typesetting electronically these days - there's just no excuse.
η
(no subject)
Date: 2006-05-08 01:25 am (UTC)