Picking out text from a screenshot

M

Metspitzer

I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?
 
P

Peter Jason

I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?
Perhaps if you load it into AcrobatX and OCR it?
 
E

Ed Cryer

Metspitzer said:
I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?
What you want is an OCR program. I'll leave you to google for one, but
I've just hit on this one;
http://www.free-ocr.com/
I've never tried it, but your situation seems ideal for a test run.
Try one and let us know.

Ed
 
J

J. P. Gilliver (John)

Metspitzer said:
I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?
I don't know the answer (though I suspect not), but if you have Office,
that has some OCR ability.

What are you going to do with the non-text parts? How are you going to
handle overlapping window parts? Is there a reason you can't just use
highlight-and-copy anyway?
--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

"I'd give my right arm to be ambidextrous"

I already am largely ambisinistral.
 
P

Paul

Metspitzer said:
I take screenshots that contain a lot of text. Is there a built in
program (Win7) that will convert the image to text?
You're looking for OCR. (That's a general function,
to go from a pixmap, to a string of text, perhaps
output in Word format.)

And generally that's something you pay for. I don't
know if any of free ones are "worthy" or not.

http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software

*******

But another area that tries to do things like that,
are "screen readers" or text to voice functions. They
need to vocalize the text they seen on the screen,
for the visually impaired. This doesn't immediately
solve your problem, but the article shows there are
other "hooks" in the system, that can help acquire
the text strings you want.

http://en.wikipedia.org/wiki/Screen_reader

You would need a screen reader, that happens to keep a
text copy of "what it saw". That then, would be a
"poor man's OCR", relying on messages from the system
for the details. That is better than starting from
scratch, picking apart pixmaps.

Paul
 
J

J. P. Gilliver (John)

James Silverton said:
PureText may do what you want. It's free and I use it a lot.
He did say built in - or is AcrobatX part of 7?

(As others have said, what you need is OCR: screenshots of plain text
should give near 100% accuracy. There are a few free ones, or if you
have a scanner, I'd be slightly surprised if it didn't come with some.)
 
R

Robin Bignall

He did say built in - or is AcrobatX part of 7?

(As others have said, what you need is OCR: screenshots of plain text
should give near 100% accuracy. There are a few free ones, or if you
have a scanner, I'd be slightly surprised if it didn't come with some.)
Mine came with the ABBYY OCR program, that has quite a clever screen
copier. Very good, so I bought the Pro version (not cheap).
 
R

Robin Bignall

I don't know the answer (though I suspect not), but if you have Office,
that has some OCR ability.
I use the ABBYY OCR program that came with my scanner.*
What are you going to do with the non-text parts?
Copy them to a graphics program.
How are you going to handle overlapping window parts?
ABBYY allows just text, or just graphics or both, to a whole bunch of
places: clipboard, file, Word etc. You can choose whole screen or bits
of it, such as a window.
Is there a reason you can't just use highlight-and-copy anyway?
Dunno. Never tried.

* Any decent OCR program with a screen copier should be able to do what you asked.
 
P

Peter Jason

He did say built in - or is AcrobatX part of 7?
.......uh, these technical matters confuse me. I
use the Acrobat thing because it's fast and has a
very good search. It is compatible with Win7.
 
M

Metspitzer

I don't know the answer (though I suspect not), but if you have Office,
that has some OCR ability.

What are you going to do with the non-text parts? How are you going to
handle overlapping window parts? Is there a reason you can't just use
highlight-and-copy anyway?
Highlight and copy is all I want to do. Is there a way to do that
with a jpg image?
Win7 defaults to Windows photo viewer. What should I be using?
 
M

Metspitzer

You're looking for OCR. (That's a general function,
to go from a pixmap, to a string of text, perhaps
output in Word format.)

And generally that's something you pay for. I don't
know if any of free ones are "worthy" or not.

http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software

*******

But another area that tries to do things like that,
are "screen readers" or text to voice functions. They
need to vocalize the text they seen on the screen,
for the visually impaired. This doesn't immediately
solve your problem, but the article shows there are
other "hooks" in the system, that can help acquire
the text strings you want.

http://en.wikipedia.org/wiki/Screen_reader

You would need a screen reader, that happens to keep a
text copy of "what it saw". That then, would be a
"poor man's OCR", relying on messages from the system
for the details. That is better than starting from
scratch, picking apart pixmaps.

Paul
OCR. Got it.
Thanks
 
P

Paul

Metspitzer said:
OCR. Got it.
Thanks
I did a test, and you can see a "partial" result here.

http://imageshack.us/a/img849/3530/mak3.png

There is a problem with your idea. The problem with screen
captures, is things like ClearType. If your OS has
ClearType enabled, it puts "color fringes" around
the letters.

http://en.wikipedia.org/wiki/Cleartype

*******

For my experiment, I chose to view some text in a web browser
(rather than some dialog box).

I chose a couple ways to capture the web page. One was "Export to PDF",
which avoids ClearType and renders the web page into a PDF. That
gives a clean copy of the screen. I converted the PDF to an image, so
I could pretend that test file, came from a paper scanner.

The second method, I used "screen capture" of the web page,
to capture it. Doing screen capture, also captures the
effects of ClearType.

In my Imageshack screenshot, the upper left is an "Export To PDF"
method, while the lower left is via screen capture. You can see
the color fringes around the text in the lower left.

When I ran OCR on the image in the lower left (with the color
fringes), the recognition rate was 0%. Nothing got captured.
There was no text to wipe over and copy/paste.

For the view in the upper right, there I took a picture copy
of the PDF (so the OCR could work on it), and brought it over
to my OCR tool. You can see in the upper right "results",
I managed to wipe over some selections. In Acrobat Paper Capture,
if you can wipe the text cursor over the surface of the document,
and things highlight, that means the OCR step worked properly.
Since Adobe Paper Capture (in Acrobat), layers the text strings
on top of the original image, you can check for proper character
recognition, by looking for differences between the string
on top the image, and the image itself underneath. In my upper-right
example, you can see there are no differences, or 100% recognition
in the sample area. (I zoomed in, to make those examples easier
to see, but the whole document on the upper right, was clean like that.)

Summary: Screen capture sucks as an information source, unless you're
very careful to turn off any screen anti-aliasing method.

Paul
 
M

Metspitzer

I did a test, and you can see a "partial" result here.

http://imageshack.us/a/img849/3530/mak3.png

There is a problem with your idea. The problem with screen
captures, is things like ClearType. If your OS has
ClearType enabled, it puts "color fringes" around
the letters.

http://en.wikipedia.org/wiki/Cleartype

*******

For my experiment, I chose to view some text in a web browser
(rather than some dialog box).

I chose a couple ways to capture the web page. One was "Export to PDF",
which avoids ClearType and renders the web page into a PDF. That
gives a clean copy of the screen. I converted the PDF to an image, so
I could pretend that test file, came from a paper scanner.

The second method, I used "screen capture" of the web page,
to capture it. Doing screen capture, also captures the
effects of ClearType.

In my Imageshack screenshot, the upper left is an "Export To PDF"
method, while the lower left is via screen capture. You can see
the color fringes around the text in the lower left.

When I ran OCR on the image in the lower left (with the color
fringes), the recognition rate was 0%. Nothing got captured.
There was no text to wipe over and copy/paste.

For the view in the upper right, there I took a picture copy
of the PDF (so the OCR could work on it), and brought it over
to my OCR tool. You can see in the upper right "results",
I managed to wipe over some selections. In Acrobat Paper Capture,
if you can wipe the text cursor over the surface of the document,
and things highlight, that means the OCR step worked properly.
Since Adobe Paper Capture (in Acrobat), layers the text strings
on top of the original image, you can check for proper character
recognition, by looking for differences between the string
on top the image, and the image itself underneath. In my upper-right
example, you can see there are no differences, or 100% recognition
in the sample area. (I zoomed in, to make those examples easier
to see, but the whole document on the upper right, was clean like that.)

Summary: Screen capture sucks as an information source, unless you're
very careful to turn off any screen anti-aliasing method.

Paul
Thanks for that info. It is really a shame. I would have thought
that a computer would be pretty good at recognizing typed text.
 
G

Gene E. Bloch

Thanks for that info. It is really a shame. I would have thought
that a computer would be pretty good at recognizing typed text.
And you would have been right.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top