ALT Linux Sisyphus

unicode-0.9.4/000075500000000000000000000000001165276173700131675ustar00rootroot00000000000000unicode-0.9.4/COPYING000064400000000000000000000000101165276173700142110ustar00rootroot00000000000000GPL v3

unicode-0.9.4/README000064400000000000000000000030771165276173700140560ustar00rootroot00000000000000This file is in UTF-8 encoding.

To use unicode utility, you need: 
 - python >=2.2 (generators are needed), preferrably wide
   unicode build, 
 - python optparse library (part of python2.3)
 - UnicodeData.txt file (http://www.unicode.org/Public) which
   you should put into /usr/share/unicode/, ~/.unicode/ or current working directory.
 - if you want to see UniHan properties, you need also Unihan.txt file
   which should be put into /usr/share/unicode/, ~./unicode/ or current working directory.

Enter regular expression or hexadecimal number as an argument.
Not much documentation at the moment, see the manpage, here are just some examples
how to use this script:

$ unicode.py euro
U+20A0 EURO-CURRENCY SIGN
UTF-8: e2 82 a0   UTF-16BE: 20a0   Decimal: &#8352;
₠
Category: Sc (Symbol, Currency)
Bidi: ET (European Number Terminator)

U+20AC EURO SIGN
UTF-8: e2 82 ac   UTF-16BE: 20ac   Decimal: &#8364;
€
Category: Sc (Symbol, Currency)
Bidi: ET (European Number Terminator)

$ unicode.py  00c0
U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
UTF-8: c3 80   UTF-16BE: 00c0   Decimal: &#192;
À (à)
Lowercase: U+00E0
Category: Lu (Letter, Uppercase)
Bidi: L (Left-to-Right)
Decomposition: 0041 0300

You can specify a range of characters as argumets, unicode will show
these characters in nice tabular format, aligned to 256-byte boundaries.  
Use two dots ".." to indicate the range, e.g.

       unicode 0450..0520

will display the whole cyrillic, armenian and hebrew blocks (characters from U+0400 to U+05FF)

       unicode 0400..

will display just characters from U+0400 up to U+04FF
unicode-0.9.4/README-paracode000064400000000000000000000022711165276173700156250ustar00rootroot00000000000000Written by Radovan Garabík <garabik @ kassiopeia.juls.savba.sk>.
For new versions, look at http://kassiopeia.juls.savba.sk/~garabik/software/unicode/

-------------------

paracode exploits the full power of the Unicode standard to convert the
text into visually similar stream of glyphs, while using completely
different codepoints. It is an excellent didactic tool demonstrating the
principles and advanced use of the Unicode standard. paracode is a
command line tool working as a filter, reading standard input in UTF-8
encoding and writing to standard output.

Use optional -t switch to select what tables to use.

Special name 'all' selects all the tables.

Note that selecting 'other', 'cyrillic_plus' and 'cherokee' tables (and
'all') makes use of rather esoteric characters, and not all fonts
contain them.

Special table 'mirror' uses quite different character substitution,
is not selected automatically with 'all' and does not work well
with anything except plain ascii alphabetical characters.

Example:

paracode -t cyrillic+greek+cherokee

paracode -t cherokee  <input >output

paracode -r -t mirror  <input >output

Possible tables are:

cyrillic

cyrillic_plus

greek

other

cherokee

all

unicode-0.9.4/changelog000077700000000000000000000000001165276173700201322debian/changelogustar00rootroot00000000000000unicode-0.9.4/debian/000075500000000000000000000000001165276173700144115ustar00rootroot00000000000000unicode-0.9.4/debian/README.Debian000064400000000000000000000003561165276173700164560ustar00rootroot00000000000000unicode for Debian
------------------

packaged as native package, the source resides at
http://kassiopeia.juls.savba.sk/~garabik/software/unicode/

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>, Fri,  7 Feb 2003 15:09:19 +0100
unicode-0.9.4/debian/changelog000064400000000000000000000111321165276173700162610ustar00rootroot00000000000000unicode (0.9.4) unstable; urgency=low

  * recognise split unihan files (closes: #551789)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Sun, 07 Feb 2010 18:36:29 +0100

unicode (0.9.3) unstable; urgency=low

  * run pylint & pychecker – fix some previously unnoticed bugs 

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Mon, 04 May 2009 22:40:51 +0200

unicode (0.9.2) unstable; urgency=low

  * giving "latin alpha" as an argument will now search
    for all the character names containing the "latin.*alpha"
    regular expression, not _either_ "latin" or "alpha" strings
    (closes: #439146), idea from  martin f. krafft.
  * added forgotten README-paracode to the docfiles

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Thu, 30 Oct 2008 18:58:48 +0100

unicode (0.9.1) unstable; urgency=low

  * add package URL to debian/copyright and 
    debian/README.Debian (closes: #495555)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Sat, 23 Aug 2008 10:28:02 +0200

unicode (0.9) unstable; urgency=low

  * include paracode utility
  * clarify GPL version (v3)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Wed, 19 Sep 2007 19:01:55 +0100

unicode (0.8) unstable; urgency=low

  * fix traceback when letter has no uppercase or lowercase forms

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Sun,  1 Oct 2006 21:42:33 +0200

unicode (0.7) unstable; urgency=low

  * updated to use unicode-data (closes: #386853)
  * data files can be bzip2'ed now
  * use data from unicode data files, not from python unicodedata 
    module (the latter tends to be obsolete)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Sat, 16 Sep 2006 21:44:34 +0200

unicode (0.6) unstable; urgency=low

  * fix stupid undeclared options bug (thanks to Tim Hatch) 
  * remove absolute path from z?grep, rely on OS's default PATH 
    to execute the command(s)
  * add default path to UnicodeData.txt for MacOSX systems

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Wed,  4 Jan 2006 19:57:54 +0100

unicode (0.5) unstable; urgency=low

  * work around browser invocations that cannot handle UTF-8 in URL's 

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Sun,  1 Jan 2006 00:59:60 +0100

unicode (0.4.9) unstable; urgency=low

  * better directional overriding for RTL characters
  * query wikipedia with -w switch
  * better heuristics guessing argument type

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Sun, 11 Sep 2005 18:30:59 +0200

unicode (0.4.8) unstable; urgency=low

  * catch an exception if locale.nl_langinfo is not present (thanks to
    Michael Weir)
  * default to no colour if the system in MS Windows
  * put back accidentally disabled left-to-right mark - as a result,
    tabular display of arabic, hebrew and other RTL scripts is
    much better (the bug manifested itself only on powerful i18n terminals, 
    such as mlterm)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Fri, 26 Aug 2005 14:25:58 +0200

unicode (0.4.7) unstable; urgency=low

  * some UniHan support (closes: #187214)
  * --color as a synonum for --colour added (closes: #273503)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Thu,  4 Aug 2005 16:36:07 +0200

unicode (0.4.6) unstable; urgency=low

  * change charset guessing (closes: #241889), thanks to Євгeнiй Meщepяĸoв
    (Eugeniy Meshcheryakov) for the patch
  * closes: #229857 - it has been closed together with 215267

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Tue, 20 Apr 2004 15:39:34 +0200

unicode (0.4.5) unstable; urgency=low

  * catch exception if input sequence is invalid in given encoding
    (closes: #188438)
  * automatically find and symlink UnicodeData.txt from perl, if installed
    (thanks to LarstiQ <larstiq @ larstiq.dyndns.org> for the patch)
    (closes: #215267)
  * change architecture to 'all' (closes: #215264)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Wed, 21 Jan 2004 10:30:38 +0100

unicode (0.4) unstable; urgency=low

  * added option to choose colour output (closes: #187215)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Wed,  9 Apr 2003 16:37:39 +0200

unicode (0.3.1) unstable; urgency=low

  * added python to Build-depends (closes: #183662)
  * properly quote hyphens in manpage (closes: #186151)
  * do not use UTF-8 in manpage (closes: #186193)
  * added versioned dependency for python2.3 (closes: #186444)

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Mon, 24 Mar 2003 14:39:31 +0100

unicode (0.3) unstable; urgency=low

  * Initial Release.

 -- Radovan Garabík <garabik@kassiopeia.juls.savba.sk>  Fri,  7 Feb 2003 15:09:19 +0100

unicode-0.9.4/debian/compat000064400000000000000000000000021165276173700156070ustar00rootroot000000000000004
unicode-0.9.4/debian/control000064400000000000000000000007571165276173700160250ustar00rootroot00000000000000Source: unicode
Section: utils
Priority: optional
Maintainer: Radovan Garabík <garabik@kassiopeia.juls.savba.sk>
Build-Depends: debhelper (>= 4)
Standards-Version: 3.8.0

Package: unicode
Architecture: all
Depends: python (>= 2.3)
Suggests: perl-modules | console-data (<< 2:1.0-1) | unicode-data
Description: display unicode character properties
 unicode is a simple command line utility that displays
 properties for a given unicode character, or searches
 unicode database for a given name.
unicode-0.9.4/debian/copyright000064400000000000000000000005551165276173700163510ustar00rootroot00000000000000This program was written by Radovan Garabík 
<garabik @ kassiopeia.juls.savba.sk> on 
Fri,  7 Feb 2003 15:09:19 +0100, and
packaged for Debian as a native package.

The sources and package can be downloaded from:
http://kassiopeia.juls.savba.sk/~garabik/software/unicode/

Copyright:
© Radovan Garabík,
released under GPL v3, see /usr/share/common-licenses/GPL
unicode-0.9.4/debian/dirs000064400000000000000000000000101165276173700152640ustar00rootroot00000000000000usr/bin
unicode-0.9.4/debian/docs000064400000000000000000000000301165276173700152550ustar00rootroot00000000000000README
README-paracode

unicode-0.9.4/debian/rules000075500000000000000000000035561165276173700155020ustar00rootroot00000000000000#!/usr/bin/make -f
# Sample debian/rules that uses debhelper.
# GNU copyright 1997 to 1999 by Joey Hess.

# Uncomment this to turn on verbose mode.
#export DH_VERBOSE=1

# This is the debhelper compatibility version to use.
#export DH_COMPAT=4

CFLAGS = -Wall -g

ifneq (,$(findstring noopt,$(DEB_BUILD_OPTIONS)))
	CFLAGS += -O0
else
	CFLAGS += -O2
endif
ifeq (,$(findstring nostrip,$(DEB_BUILD_OPTIONS)))
	INSTALL_PROGRAM += -s
endif

configure: configure-stamp
configure-stamp:
	dh_testdir
	# Add here commands to configure the package.

	touch configure-stamp

build: build-stamp

build-stamp: configure-stamp 
	dh_testdir

	# Add here commands to compile the package.
	#$(MAKE)
	#/usr/bin/docbook-to-man debian/unicode.sgml > unicode.1

	touch build-stamp

clean:
	dh_testdir
	dh_testroot
	rm -f build-stamp configure-stamp

	# Add here commands to clean up after the build process.
	#-$(MAKE) clean

	dh_clean

install: build
	dh_testdir
	dh_testroot
	dh_clean -k
	dh_installdirs

	# Add here commands to install the package into debian/unicode.
	#$(MAKE) install DESTDIR=$(CURDIR)/debian/unicode
	cp unicode paracode $(CURDIR)/debian/unicode/usr/bin

# Build architecture-dependent files here.
#binary-arch: build install
# We have nothing to do by default.

# Build architecture-independent files here.
binary-indep: build install
	dh_testdir
	dh_testroot
#	dh_installdebconf	
	dh_installdocs
#	dh_installexamples
#	dh_installmenu
#	dh_installlogrotate
#	dh_installemacsen
#	dh_installpam
#	dh_installmime
#	dh_installinit
#	dh_installcron
	dh_installman unicode.1 paracode.1
#	dh_installinfo
#	dh_undocumented
	dh_installchangelogs 
#	dh_link
	dh_strip
	dh_compress
	dh_fixperms
#	dh_makeshlibs
	dh_installdeb
#	dh_perl
#	dh_shlibdeps
#	dh_python
	dh_gencontrol
	dh_md5sums
	dh_builddeb

binary: binary-indep binary-arch
.PHONY: build clean binary-indep binary-arch binary install configure
unicode-0.9.4/paracode000075500000000000000000000147341165276173700147040ustar00rootroot00000000000000#!/usr/bin/python

import sys, unicodedata
from optparse import OptionParser

table_cyrillic = {

'A' : u'\N{CYRILLIC CAPITAL LETTER A}',
'B' : u'\N{CYRILLIC CAPITAL LETTER VE}',
'C' : u'\N{CYRILLIC CAPITAL LETTER ES}',
'E' : u'\N{CYRILLIC CAPITAL LETTER IE}',
'H' : u'\N{CYRILLIC CAPITAL LETTER EN}',
'I' : u'\N{CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I}',
'J' : u'\N{CYRILLIC CAPITAL LETTER JE}',
'K' : u'\N{CYRILLIC CAPITAL LETTER KA}',
'M' : u'\N{CYRILLIC CAPITAL LETTER EM}',
'O' : u'\N{CYRILLIC CAPITAL LETTER O}',
'P' : u'\N{CYRILLIC CAPITAL LETTER ER}',
'S' : u'\N{CYRILLIC CAPITAL LETTER DZE}',
'T' : u'\N{CYRILLIC CAPITAL LETTER TE}',
'X' : u'\N{CYRILLIC CAPITAL LETTER HA}',
'Y' : u'\N{CYRILLIC CAPITAL LETTER U}',

'a' : u'\N{CYRILLIC SMALL LETTER A}',
'c' : u'\N{CYRILLIC SMALL LETTER ES}',
'e' : u'\N{CYRILLIC SMALL LETTER IE}',
'i' : u'\N{CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I}',
'j' : u'\N{CYRILLIC SMALL LETTER JE}',
'o' : u'\N{CYRILLIC SMALL LETTER O}',
'p' : u'\N{CYRILLIC SMALL LETTER ER}',
's' : u'\N{CYRILLIC SMALL LETTER DZE}',
'x' : u'\N{CYRILLIC SMALL LETTER HA}',
'y' : u'\N{CYRILLIC SMALL LETTER U}',

}

table_cyrillic_plus = {
'Y' : u'\N{CYRILLIC CAPITAL LETTER STRAIGHT U}',
'h' : u'\N{CYRILLIC SMALL LETTER SHHA}',
}

table_greek = {
'A' : u'\N{GREEK CAPITAL LETTER ALPHA}',
'B' : u'\N{GREEK CAPITAL LETTER BETA}',
'E' : u'\N{GREEK CAPITAL LETTER EPSILON}',
'H' : u'\N{GREEK CAPITAL LETTER ETA}',
'I' : u'\N{GREEK CAPITAL LETTER IOTA}',
'K' : u'\N{GREEK CAPITAL LETTER KAPPA}',
'M' : u'\N{GREEK CAPITAL LETTER MU}',
'N' : u'\N{GREEK CAPITAL LETTER NU}',
'O' : u'\N{GREEK CAPITAL LETTER OMICRON}',
'P' : u'\N{GREEK CAPITAL LETTER RHO}',
'T' : u'\N{GREEK CAPITAL LETTER TAU}',
'X' : u'\N{GREEK CAPITAL LETTER CHI}',
'Y' : u'\N{GREEK CAPITAL LETTER UPSILON}',
'Z' : u'\N{GREEK CAPITAL LETTER ZETA}',

'o' : u'\N{GREEK SMALL LETTER OMICRON}',
}

table_other = {
'!' : u'\N{LATIN LETTER RETROFLEX CLICK}',

'O' : u'\N{ARMENIAN CAPITAL LETTER OH}',
'S' : u'\N{ARMENIAN CAPITAL LETTER TIWN}',
'o' : u'\N{ARMENIAN SMALL LETTER OH}',
'n' : u'\N{ARMENIAN SMALL LETTER VO}',
}

table_cherokee = {
'A' : u'\N{CHEROKEE LETTER GO}',
'B' : u'\N{CHEROKEE LETTER YV}',
'C' : u'\N{CHEROKEE LETTER TLI}',
'D' : u'\N{CHEROKEE LETTER A}',
'E' : u'\N{CHEROKEE LETTER GV}',
'G' : u'\N{CHEROKEE LETTER NAH}',
'H' : u'\N{CHEROKEE LETTER MI}',
'J' : u'\N{CHEROKEE LETTER GU}',
'K' : u'\N{CHEROKEE LETTER TSO}',
'L' : u'\N{CHEROKEE LETTER TLE}',
'M' : u'\N{CHEROKEE LETTER LU}',
'P' : u'\N{CHEROKEE LETTER TLV}',
'R' : u'\N{CHEROKEE LETTER SV}',
'S' : u'\N{CHEROKEE LETTER DU}',
'T' : u'\N{CHEROKEE LETTER I}',
'V' : u'\N{CHEROKEE LETTER DO}',
'W' : u'\N{CHEROKEE LETTER LA}',
'Y' : u'\N{CHEROKEE LETTER GI}',
'Z' : u'\N{CHEROKEE LETTER NO}',

}

table_mirror = {

'A' : u'\N{FOR ALL}',
'B' : u'\N{CANADIAN SYLLABICS CARRIER KHA}',
'C' : u'\N{LATIN CAPITAL LETTER OPEN O}',
'D' : u'\N{CANADIAN SYLLABICS CARRIER PA}',
'E' : u'\N{LATIN CAPITAL LETTER REVERSED E}',
'F' : u'\N{TURNED CAPITAL F}',
'G' : u'\N{TURNED SANS-SERIF CAPITAL G}',
'H' : u'H',
'I' : u'I',
'J' : u'\N{LATIN SMALL LETTER LONG S}',
'K' : u'\N{LATIN SMALL LETTER TURNED K}', # fixme
'L' : u'\N{TURNED SANS-SERIF CAPITAL L}',
'M' : u'W',
'N' : u'N',
'O' : u'O',
'P' : u'\N{CYRILLIC CAPITAL LETTER KOMI DE}',
'R' : u'\N{CANADIAN SYLLABICS TLHO}',
'S' : u'S',
'T' : u'\N{UP TACK}',
'U' : u'\N{ARMENIAN CAPITAL LETTER VO}',
'V' : u'\N{N-ARY LOGICAL AND}',
'W' : u'M',
'X' : u'X',
'Y' : u'\N{TURNED SANS-SERIF CAPITAL Y}',
'Z' : u'Z',

'a' : u'\N{LATIN SMALL LETTER TURNED A}',
'b' : u'q',
'c' : u'\N{LATIN SMALL LETTER OPEN O}',
'd' : u'p',
'e' : u'\N{LATIN SMALL LETTER SCHWA}',
'f' : u'\N{LATIN SMALL LETTER DOTLESS J WITH STROKE}',
'g' : u'\N{LATIN SMALL LETTER B WITH HOOK}',
'h' : u'\N{LATIN SMALL LETTER TURNED H}',
'i' : u'\N{LATIN SMALL LETTER DOTLESS I}' + u'\N{COMBINING DOT BELOW}',
'j' : u'\N{LATIN SMALL LETTER LONG S}' + u'\N{COMBINING DOT BELOW}',
'k' : u'\N{LATIN SMALL LETTER TURNED K}',
'l' : u'l',
'm' : u'\N{LATIN SMALL LETTER TURNED M}',
'n' : u'u',
'o' : u'o',
'p' : u'd',
'q' : u'b',
'r' : u'\N{LATIN SMALL LETTER TURNED R}',
's' : u's',
't' : u'\N{LATIN SMALL LETTER TURNED T}',
'u' : u'n',
'v' : u'\N{LATIN SMALL LETTER TURNED V}',
'w' : u'\N{LATIN SMALL LETTER TURNED W}',
'x' : u'x',
'y' : u'\N{LATIN SMALL LETTER TURNED Y}',
'z' : u'z',

'0' : '0',
'1' : u'I',
'2' : u'\N{INVERTED QUESTION MARK}\N{COMBINING MACRON}',
'3' : u'\N{LATIN CAPITAL LETTER OPEN E}',
'4' : u'\N{LATIN SMALL LETTER LZ DIGRAPH}',
'6' : '9',
'7' : u'\N{LATIN CAPITAL LETTER L WITH STROKE}',
'8' : '8',
'9' : '6',
',' : "'",
"'" : ',',
'.' : u'\N{DOT ABOVE}',
'?' : u'\N{INVERTED QUESTION MARK}',
'!' : u'\N{INVERTED EXCLAMATION MARK}',

}

tables_names = ['cyrillic', 'cyrillic_plus', 'greek',
'other', 'cherokee']

table_default = table_cyrillic
table_default.update(table_greek)

table_all={}
for t in tables_names:
    table_all.update(globals()['table_'+t])

parser = OptionParser(usage="usage: %prog [options]")

parser.add_option("-t", "--tables",
    action="store", default='default', dest="tables", type="string",
    help="""list of tables to use, separated by a plus sign. 
Possible tables are: """+'+'.join(tables_names)+"""  and a special name 'all' to specify 
all these tables joined together. 
There is another table, 'mirror', that is not selected in 'all'.""")

parser.add_option("-r", "--reverse",
      action="count", dest="reverse",
      default=0,
      help="Reverse the text after conversion. Best used with the 'mirror' table.")

(options, args) = parser.parse_args()

if args:
    to_convert = ' '.join(args).decode('utf-8')
else:
    to_convert = None

tables = options.tables.split('+')
tables = ['table_'+x for x in tables]

tables = [globals()[x] for x in tables]

table = {}
for t in tables:
    table.update(t)

def reverse_string(s):
    l = list(s)
    l.reverse()
    r = ''.join(l)
    return r

def do_convert(s, reverse=0):
    if reverse:
        s = reverse_string(s)
    l = unicodedata.normalize('NFKD', s)
    out = []
    for c in l:
        out.append(table.get(c, c))
    out = ''.join(out)
    out = unicodedata.normalize('NFKC', out)
    return out

if not to_convert:
    if options.reverse:
        lines = sys.stdin.readlines()
        lines.reverse()
    else:
        lines = sys.stdin

    for line in lines:
        l = line.decode('utf-8')
        out = do_convert(l, options.reverse)
        sys.stdout.write(out.encode('utf-8'))

else:
    out = do_convert(to_convert, options.reverse)
    sys.stdout.write(out.encode('utf-8'))
    sys.stdout.write('\n')
unicode-0.9.4/paracode.1000064400000000000000000000030521165276173700150270ustar00rootroot00000000000000.\"                                      Hey, EMACS: -*- nroff -*-
.TH PARACODE 1 "2005-04-16"
.SH NAME
paracode \- command line Unicode conversion tool
.SH SYNOPSIS
.B paracode
.RI [ -t tables ] 
string
.SH DESCRIPTION
This manual page documents the
.B paracode
command.
.PP
\fBparacode\fP exploits the full power of the Unicode standard to convert the text
into visually similar stream of glyphs, while using completely different codepoints.
It is an excellent didactic tool demonstrating the principles and advanced use of
the Unicode standard.
.PP
\fBparacode\fP is a command line tool working as
a filter, reading standard input in UTF-8 encoding and writing to
standard output.

.SH OPTIONS
.TP
.BI \-t tables
.BI \-\-tables

Use given list of conversion tables, separated by a plus sign.

Special name 'all' selects all the tables.

Note that selecting 'other', 'cyrillic_plus' and 'cherokee' tables (and 'all') 
makes use of rather esoteric characters, and not all fonts contain them.

Special table 'mirror' uses quite different character substitution,
is not selected automatically with 'all' and does not work well
with anything except plain ascii alphabetical characters.

Example:

paracode -t cyrillic+greek+cherokee

paracode -t cherokee  <input >output

paracode -r -t mirror  <input >output

Possible tables are:

cyrillic

cyrillic_plus

greek

other

cherokee

all

.TP
.BI \-r

Display text in reverse order after conversion, best used together with -t mirror.

.SH SEE ALSO
iconv(1)

.SH AUTHOR
Radovan Garab\('ik <garabik @ kassiopeia.juls.savba.sk>

unicode-0.9.4/unicode000075500000000000000000000527541165276173700145600ustar00rootroot00000000000000#!/usr/bin/python

#from __future__ import generators

import os, glob, sys, unicodedata, locale, gzip, re, traceback, string, commands
import urllib, webbrowser

# bz2 was introduced in 2.3, we want this to work also with earlier versions
try:
    import bz2
except ImportError:
    bz2 = None

from optparse import OptionParser

VERSION='0.9.4'

# list of terminals that support bidi
biditerms = ['mlterm']

locale.setlocale(locale.LC_ALL, '')

# guess terminal charset
try:
    iocharsetguess = locale.nl_langinfo(locale.CODESET) or "ascii"
except:
    iocharsetguess = "ascii"

if os.environ.get('TERM') in biditerms and iocharsetguess.lower().startswith('utf'):
    LTR = u'\u202d' # left to right override
else:
    LTR = ''

def out(*args):
    "pring args, converting them to output charset"
    for i in args:
        sys.stdout.write(i.encode(options.iocharset, 'replace'))

colours = {
            'none'       :    "",
            'default'    :    "\033[0m",
            'bold'       :    "\033[1m",
            'underline'  :    "\033[4m",
            'blink'      :    "\033[5m",
            'reverse'    :    "\033[7m",
            'concealed'  :    "\033[8m",

            'black'      :    "\033[30m",
            'red'        :    "\033[31m",
            'green'      :    "\033[32m",
            'yellow'     :    "\033[33m",
            'blue'       :    "\033[34m",
            'magenta'    :    "\033[35m",
            'cyan'       :    "\033[36m",
            'white'      :    "\033[37m",

            'on_black'   :    "\033[40m",
            'on_red'     :    "\033[41m",
            'on_green'   :    "\033[42m",
            'on_yellow'  :    "\033[43m",
            'on_blue'    :    "\033[44m",
            'on_magenta' :    "\033[45m",
            'on_cyan'    :    "\033[46m",
            'on_white'   :    "\033[47m",

            'beep'       :    "\007",
            }

general_category = {
      'Lu':  'Letter, Uppercase',
      'Ll':  'Letter, Lowercase',
      'Lt':  'Letter, Titlecase',
      'Lm':  'Letter, Modifier',
      'Lo':  'Letter, Other',
      'Mn':  'Mark, Non-Spacing',
      'Mc':  'Mark, Spacing Combining',
      'Me':  'Mark, Enclosing',
      'Nd':  'Number, Decimal Digit',
      'Nl':  'Number, Letter',
      'No':  'Number, Other',
      'Pc':  'Punctuation, Connector',
      'Pd':  'Punctuation, Dash',
      'Ps':  'Punctuation, Open',
      'Pe':  'Punctuation, Close',
      'Pi':  'Punctuation, Initial quote',
      'Pf':  'Punctuation, Final quote',
      'Po':  'Punctuation, Other',
      'Sm':  'Symbol, Math',
      'Sc':  'Symbol, Currency',
      'Sk':  'Symbol, Modifier',
      'So':  'Symbol, Other',
      'Zs':  'Separator, Space',
      'Zl':  'Separator, Line',
      'Zp':  'Separator, Paragraph',
      'Cc':  'Other, Control',
      'Cf':  'Other, Format',
      'Cs':  'Other, Surrogate',
      'Co':  'Other, Private Use',
      'Cn':  'Other, Not Assigned',
}

bidi_category = {
     'L'   : 'Left-to-Right',
     'LRE' : 'Left-to-Right Embedding',
     'LRO' : 'Left-to-Right Override',
     'R'   : 'Right-to-Left',
     'AL'  : 'Right-to-Left Arabic',
     'RLE' : 'Right-to-Left Embedding',
     'RLO' : 'Right-to-Left Override',
     'PDF' : 'Pop Directional Format',
     'EN'  : 'European Number',
     'ES'  : 'European Number Separator',
     'ET'  : 'European Number Terminator',
     'AN'  : 'Arabic Number',
     'CS'  : 'Common Number Separator',
     'NSM' : 'Non-Spacing Mark',
     'BN'  : 'Boundary Neutral',
     'B'   : 'Paragraph Separator',
     'S'   : 'Segment Separator',
     'WS'  : 'Whitespace',
     'ON'  : 'Other Neutrals',
}

comb_classes = {
        0: 'Spacing, split, enclosing, reordrant, and Tibetan subjoined',
        1: 'Overlays and interior',
        7: 'Nuktas',
        8: 'Hiragana/Katakana voicing marks',
        9: 'Viramas',
       10: 'Start of fixed position classes',
      199: 'End of fixed position classes',
      200: 'Below left attached',
      202: 'Below attached',
      204: 'Below right attached',
      208: 'Left attached (reordrant around single base character)',
      210: 'Right attached',
      212: 'Above left attached',
      214: 'Above attached',
      216: 'Above right attached',
      218: 'Below left',
      220: 'Below',
      222: 'Below right',
      224: 'Left (reordrant around single base character)',
      226: 'Right',
      228: 'Above left',
      230: 'Above',
      232: 'Above right',
      233: 'Double below',
      234: 'Double above',
      240: 'Below (iota subscript)',
}

def get_unicode_properties(ch):
    properties = {}
    if ch in linecache:
        fields = linecache[ch].strip().split(';')
        proplist = ['codepoint', 'name', 'category', 'combining', 'bidi', 'decomposition', 'dummy', 'digit_value', 'numeric_value', 'mirrored', 'unicode1name', 'iso_comment', 'uppercase', 'lowercase', 'titlecase']
        for i, prop in enumerate(proplist):
            if prop!='dummy':
                properties[prop] = fields[i]

        if properties['lowercase']:
            properties['lowercase'] = unichr(int(properties['lowercase'], 16))
        if properties['uppercase']:
            properties['uppercase'] = unichr(int(properties['uppercase'], 16))
        if properties['titlecase']:
            properties['titlecase'] = unichr(int(properties['titlecase'], 16))

        properties['combining'] = int(properties['combining'])
        properties['mirrored'] = properties['mirrored']=='Y'
    else:
        properties['codepoint'] = '%04X' % ord(ch)
        properties['name'] = unicodedata.name(ch, '')
        properties['category'] = unicodedata.category(ch)
        properties['combining'] = unicodedata.combining(ch)
        properties['bidi'] = unicodedata.bidirectional(ch)
        properties['decomposition'] = unicodedata.decomposition(ch)
        properties['digit_value'] = unicodedata.digit(ch, '')
        properties['numeric_value'] = unicodedata.numeric(ch, '')
        properties['mirrored'] = unicodedata.mirrored(ch)
        properties['unicode1name'] = ''
        properties['iso_comment'] = ''
        properties['uppercase'] = ch.upper()
        properties['lowercase'] = ch.lower()
        properties['titlecase'] = ''
    return properties

def do_init():
    HomeDir = os.path.expanduser('~/.unicode')
    HomeUnicodeData = os.path.join(HomeDir, "UnicodeData.txt")
    global UnicodeDataFileNames
    UnicodeDataFileNames = [HomeUnicodeData, '/usr/share/unidata/UnicodeData.txt', '/usr/share/unicode/UnicodeData.txt', './UnicodeData.txt'] + \
        glob.glob('/usr/share/unidata/UnicodeData*.txt') + \
        glob.glob('/usr/share/perl/*/unicore/UnicodeData.txt') + \
        glob.glob('/System/Library/Perl/*/unicore/UnicodeData.txt') # for MacOSX

    HomeUnihanData = os.path.join(HomeDir, "Unihan*")
    global UnihanDataGlobs
    UnihanDataGlobs = [HomeUnihanData, '/usr/share/unidata/Unihan*', '/usr/share/unicode/Unihan*', './Unihan*']

def get_unihan_files():
    fos = [] # list of file names for Unihan data file(s)
    for gl in UnihanDataGlobs:
        fnames = glob.glob(gl)
        fos += fnames
    return fos

def get_unihan_properties_internal(ch):
    properties = {}
    ch = ord(ch)
    global unihan_fs
    for f in unihan_fs:
        fo = OpenGzip(f)
        for l in fo:
            if l.startswith('#'):
                continue
            line = l.strip()
            if not line:
                continue
            char, key, value = line.strip().split('\t')
            if int(char[2:], 16) == ch:
                properties[key] = unicode(value, 'utf-8')
            elif int(char[2:], 16)>ch:
                break
    return properties

def get_unihan_properties_zgrep(ch):
    properties = {}
    global unihan_fs
    ch = ord(ch)
    chs = 'U+%X' % ch
    for f in unihan_fs:
        if f.endswith('.gz'):
            grepcmd = 'zgrep'
        elif f.endswith('.bz2'):
            grepcmd = 'bzgrep'
        else:
            grepcmd = 'grep'
        cmd = grepcmd+' ^'+chs+r'\\b '+f
        status, output = commands.getstatusoutput(cmd)
        output = output.split('\n')
        for l in output:
            if not l:
                continue
            char, key, value = l.strip().split('\t')
            if int(char[2:], 16) == ch:
                properties[key] = unicode(value, 'utf-8')
            elif int(char[2:], 16)>ch:
                break
    return properties    

# basic sanity check, if e.g. you run this on MS Windows...
if os.path.exists('/bin/grep'):
    get_unihan_properties = get_unihan_properties_zgrep
else:
    get_unihan_properties = get_unihan_properties_internal

def error(txt):
    out(txt)
    out('\n')
    sys.exit()

def get_gzip_filename(fname):
    "return fname, if it does not exist, return fname+.gz, if neither that, fname+bz2, if neither that, return None"
    if os.path.exists(fname):
        return fname
    if os.path.exists(fname+'.gz'):
        return fname+'.gz'
    if os.path.exists(fname+'.bz2') and bz2 is not None:
        return fname+'.bz2'
    return None

def OpenGzip(fname):
    "open fname, try fname.gz or fname.bz2 if fname does not exist, return file object or GzipFile or BZ2File object"
    if os.path.exists(fname) and not (fname.endswith('.gz') or fname.endswith('.bz2')):
        return file(fname)
    if os.path.exists(fname+'.gz'):
        fname = fname+'.gz'
    elif os.path.exists(fname+'.bz2') and bz2 is not None:
        fname = fname+'.bz2'
    if fname.endswith('.gz'):
        return gzip.GzipFile(fname)
    elif fname.endswith('.bz2'):
        return bz2.BZ2File(fname)
    return None
    #raise IOError

def GrepInNames(pattern, fillcache=False):
    p = re.compile(pattern, re.I)
    f = None
    for name in UnicodeDataFileNames:
        f = OpenGzip(name)
        if f != None:
            break
    if not fillcache:
        if not f:
            out( """
Cannot find UnicodeData.txt, please place it into 
/usr/share/unidata/UnicodeData.txt,
/usr/share/unicode/UnicodeData.txt, ~/.unicode/ or current 
working directory (optionally you can gzip it).
Without the file, searching will be much slower.

""" )
            for i in xrange(sys.maxunicode):
                try:
                    name = unicodedata.name(unichr(i))
                    if re.search(p, name):
                        yield myunichr(i)
                except ValueError:
                    pass
        else:
            for l in f:
                if re.search(p, l):
                    r = myunichr(int(l.split(';')[0], 16))
                    linecache[r] = l
                    yield r
            f.close()
    else:
        if f:
            for l in f:
                if re.search(p, l):
                    r = myunichr(int(l.split(';')[0], 16))
                    linecache[r] = l
            f.close()

def myunichr(n):
    try:
        r = unichr(n)
        return r
    except ValueError:
        traceback.print_exc()
        error("Consider recompiling your python interpreter with wide unicode characters")

def is_ascii(s):
    "test is string s consists completely out of ascii characters"
    try:
        unicode(s, 'ascii')
    except UnicodeDecodeError:
        return False
    return True

def guesstype(arg):
    if not is_ascii(arg):
        return 'string', arg
    elif arg[:2]=='U+' or arg[:2]=='u+': # it is hexadecimal number
        try:
            val = int(arg[2:], 16)
            if val>sys.maxunicode:
                return 'regexp', arg
            else:
                return 'hexadecimal', arg[2:]
        except ValueError:
            return 'regexp', arg
    elif arg[0] in "Uu" and len(arg)>4:
        try:
            val = int(arg[1:], 16)
            if val>sys.maxunicode:
                return 'regexp', arg
            else:
                return 'hexadecimal', arg
        except ValueError:
            return 'regexp', arg
    elif len(arg)>=4:
        try:
            val = int(arg, 16)
            if val>sys.maxunicode:
                return 'regexp', arg
            else:
                return 'hexadecimal', arg
        except ValueError:
            return 'regexp', arg
    else:
        return 'string', arg

def process(arglist, t):
    # build a list of values, so that we can combine queries like
    # LATIN ALPHA and search for LATIN.*ALPHA and not names that
    # contain either LATIN or ALPHA
    result = []
    names_query = [] # reserved for queries in names - i.e. -r
    for arg_i in arglist:
        if t==None:
            tp, arg = guesstype(arg_i)
            if tp == 'regexp':
                # if the first argument is guessed to be a regexp, add
                # all the following arguments to the regular expression -
                # this is probably what you wanted, e.g. 
                # 'unicode cyrillic be' will now search for the 'cyrillic.*be' regular expression
                t = 'regexp'
        else:
            tp, arg = t, arg_i
        if tp=='hexadecimal':
            val = int(arg, 16)
            r = myunichr(val)
            list(GrepInNames('%04X'%val, fillcache=True)) # fill the table with character properties
            result.append(r)
        elif tp=='decimal':
            val = int(arg, 10)
            r = myunichr(val)
            list(GrepInNames('%04X'%val, fillcache=True))
            result.append(r)
        elif tp=='regexp':
            names_query.append(arg)
        elif tp=='string':
            try:
                unirepr = unicode(arg, options.iocharset)
            except UnicodeDecodeError:
                error ("Sequence %s is not valid in charset '%s'." % (repr(arg),  options.iocharset))
            unilist = ['%04X'%ord(x) for x in unirepr]
            unireg = '|'.join(unilist)
            list(GrepInNames(unireg, fillcache=True))
            for r in unirepr:
                result.append(r)
    if names_query:
        query = '.*'.join(names_query)
        for r in GrepInNames(query):
            result.append(r)
    return result

def maybe_colours(colour):
    if use_colour:
        return colours[colour]
    else:
        return ""

# format key and value
def printkv(*l):
    for i in range(0, len(l), 2):
        if i<len(l)-2:
            sep = "  "
        else:
            sep = "\n"
        k, v = l[i], l[i+1]
        out(maybe_colours('green'))
        out(k)
        out(": ")
        out(maybe_colours('default'))
        out(unicode(v))
        out(sep)

def print_characters(list, maxcount, query_wiki=0):
    """query_wiki - 0 - don't
                    1 - spawn browser
    """
    counter = 0
    for c in list:

        if query_wiki:
            ch = urllib.quote(c.encode('utf-8')) # wikipedia uses UTF-8 in names
            wiki_url = 'http://en.wikipedia.org/wiki/'+ch
            webbrowser.open(wiki_url)
            query_wiki = 0 # query only the very first character

        if maxcount:
            counter += 1
        if counter > options.maxcount:
            out("\nToo many characters to display, more than %s, use --max option to change it\n" % options.maxcount)
            return
        properties = get_unicode_properties(c)
        out(maybe_colours('bold'))
        out('U+%04X '% ord(c)) 
        if properties['name']:
            out(properties['name'])
        else:
            out(maybe_colours('default'))
            out(" - No such unicode character name in database")
        out(maybe_colours('default'))
        out('\n')

        ar = ["UTF-8", string.join([("%02x" % ord(x)) for x in c.encode('utf-8')]) ,
              "UTF-16BE", string.join([("%02x" % ord(x)) for x in c.encode('utf-16be')], ''),
              "Decimal", "&#%s;" % ord(c) ]
        if options.addcharset:
            try:
                rep = string.join([("%02x" % ord(x)) for x in c.encode(options.addcharset)] )
            except UnicodeError:
                rep = "NONE"
            ar.extend( [options.addcharset, rep] )
        printkv(*ar)

        if properties['combining']:
            pc = " "+c
        else:
            pc = c
        out(pc)
        uppercase = properties['uppercase']
        lowercase = properties['lowercase']
        if uppercase:
            out(" (%s)" % uppercase)
            out('\n')
            printkv( "Uppercase", 'U+%04X'% ord(properties['uppercase']) )
        elif lowercase:
            out(" (%s)" % properties['lowercase'])
            out('\n')
            printkv( "Lowercase", 'U+%04X'% ord(properties['lowercase']) )
        else:
            out('\n')
        printkv( 'Category', properties['category']+ " (%s)" % general_category[properties['category']] )

        if properties['numeric_value']:
            printkv( 'Numeric value',  properties['numeric_value'])
        if properties['digit_value']:
            printkv( 'Digit value',  properties['digit_value'])

        bidi = properties['bidi']
        if bidi:
            printkv( 'Bidi', bidi+ " (%s)" % bidi_category[bidi] )
        mirrored = properties['mirrored']
        if mirrored:
            out('Character is mirrored\n')
        comb = properties['combining']
        if comb:
            printkv( 'Combining', str(comb)+ " (%s)" % (comb_classes.get(comb, '?')) )
        decomp = properties['decomposition']
        if decomp:
            printkv( 'Decomposition', decomp )
        if options.verbosity>0:
            uhp = get_unihan_properties(c)
            for key in uhp:
                printkv(key, uhp[key])
        out('\n')

def print_block(block):
    #header
    out(" "*10)
    for i in range(16):
        out(".%X " % i)
    out('\n')
    #body
    for i in range(block*16, block*16+16):
        hexi = "%X" % i
        if len(hexi)>3:
            hexi = "%07X" % i
            hexi = hexi[:4]+" "+hexi[4:]
        else:
            hexi = "     %03X" % i
        out(LTR+hexi+".  ")
        for j in range(16):
            c = unichr(i*16+j)
            if unicodedata.combining(c):
                c = " "+c
            out(c)
            out('  ')
        out('\n')
    out('\n')

def print_blocks(blocks):
    for block in blocks:
        print_block(block)

def is_range(s, typ):
    sp = s.split('..')
    if len(sp)<>2:
        return False
    if not sp[1]:
        sp[1] = sp[0]
    elif not sp[0]:
        sp[0] = sp[1]
    if not sp[0]:
        return False
    low = list(process([sp[0]], typ))
    high = list(process([sp[1]], typ))
    if len(low)<>1 or len(high)<>1:
        return False
    low = ord(low[0])
    high = ord(high[0])
    low = low // 256
    high = high // 256 + 1
    return range(low, high)

parser = OptionParser(usage="usage: %prog [options] arg")
parser.add_option("-x", "--hexadecimal",
      action="store_const", const='hexadecimal', dest="type", 
      help="Assume arg to be hexadecimal number")
parser.add_option("-d", "--decimal",
      action="store_const", const='decimal', dest="type",
      help="Assume arg to be decimal number")
parser.add_option("-r", "--regexp",
      action="store_const", const='regexp', dest="type",
      help="Assume arg to be regular expression")
parser.add_option("-s", "--string",
      action="store_const", const='string', dest="type",
      help="Assume arg to be a sequence of characters")
parser.add_option("-a", "--auto",
      action="store_const", const=None, dest="type",
      help="Try to guess arg type (default)")
parser.add_option("-m", "--max",
      action="store", default=10, dest="maxcount", type="int",
      help="Maximal number of codepoints to display, default: 10; 0=unlimited")
parser.add_option("-i", "--io",
      action="store", default=iocharsetguess, dest="iocharset", type="string",
      help="I/O character set, I am guessing %s" % iocharsetguess)
parser.add_option("-c", "--charset-add",
      action="store", dest="addcharset", type="string",
      help="Show hexadecimal reprezentation in this additional charset")
parser.add_option("-C", "--colour",
      action="store", dest="use_colour", type="string",
      default="auto",
      help="Use colours, on, off or auto")
parser.add_option('', "--color",
      action="store", dest="use_colour", type="string",
      default="auto",
      help="synonym for --colour")
parser.add_option("-v", "--verbose",
      action="count", dest="verbosity",
      default=0,
      help="Increase verbosity (reads Unihan properties - slow!)")
parser.add_option("-w", "--wikipedia",
      action="count", dest="query_wiki",
      default=0,
      help="Query wikipedia for the character")

(options, arguments) = parser.parse_args()

linecache = {}
do_init()

if len(arguments)==0:
    parser.print_help()
    sys.exit()

if options.use_colour.lower() in ("on", "1", "true", "yes"):
    use_colour = True
elif options.use_colour.lower() in ("off", "0", "false", "no"):
    use_colour = False
else:
    use_colour = sys.stdout.isatty()
    if sys.platform == 'win32':
        use_colour = False

l_args = [] # list of non range arguments to process
for argum in arguments:
    is_r = is_range(argum, options.type)
    if is_r:
        print_blocks(is_r)
    else:
        l_args.append(argum)

if l_args:
    unihan_fs = []
    if options.verbosity>0:
        unihan_fs = get_unihan_files() # list of file names for Unihan data file(s), empty if not available
        if not unihan_fs:
            out( """
Unihan_*.txt files not found. In order to view Unihan properties, 
please place the file into /usr/share/unidata/, 
/usr/share/unicode/, ~/.unicode/
or current working directory (optionally you can gzip or bzip2 them).
You can get the files by unpacking ftp://ftp.unicode.org/Public/UNIDATA/Unihan.zip
Warning, listing UniHan Properties is rather slow.

""")
            options.verbosity = 0
    try:
        print_characters(process(l_args, options.type), options.maxcount, options.query_wiki)
    except IOError: # e.g. broken pipe
        pass

unicode-0.9.4/unicode.1000064400000000000000000000053201165276173700146770ustar00rootroot00000000000000.\"                                      Hey, EMACS: -*- nroff -*-
.TH UNICODE 1 "2003-01-31"
.SH NAME
unicode \- command line unicode database query tool
.SH SYNOPSIS
.B unicode
.RI [ options ] 
string
.SH DESCRIPTION
This manual page documents the
.B unicode
command.
.PP
\fBunicode\fP is a command line unicode database query tool.

.SH OPTIONS
.TP
.BI \-h 
.BI \-\-help 

Show help and exit.

.TP
.BI \-x
.BI \-\-hexadecimal

Assume 
.I string
to be a hexadecimal number 

.TP
.BI \-d
.BI \-\-decimal

Assume 
.I string
to be a decimal number 

.TP
.BI \-r
.BI \-\-regexp

Assume 
.I string
to be a regular expression

.TP
.BI \-s
.BI \-\-string

Assume 
.I string
to be a sequence of characters

.TP
.BI \-a
.BI \-\-auto

Try to guess type of
.I string
from one of the above (default)

.TP
.BI \-mMAXCOUNT
.BI \-\-max=MAXCOUNT

Maximal number of codepoints to display, default: 20; use 0 for unlimited

.TP
.BI \-iCHARSET
.BI \-\-io=IOCHARSET

I/O character set. For maximal pleasure, run \fBunicode\fP on UTF-8
capable terminal and specify IOCHARSET to be UTF-8. \fBunicode\fP
tries to guess this value from your locale, so with properly set up
locale, you should not need to specify it.

.TP
.BI \-cADDCHARSET
.BI \-\-charset\-add=ADDCHARSET

Show hexadecimal reprezentation of displayed characters in this additional charset.

.TP
.BI \-CUSE_COLOUR
.BI \-\-colour=USE_COLOUR

USE_COLOUR is one of
.I on
.I off
.I auto

.B \-\-colour=on
will use ANSI colour codes to colourise the output

.B \-\-colour=off
won't use colours.

.B \-\-colour=auto 
will test if standard output is a tty, and use colours only when it is.

.BI \-\-color
is a synonym of 
.BI \-\-colour

.TP
.BI \-v
.BI \-\-verbose

Be more verbose about displayed characters, e.g. display Unihan information, if available.

.TP
.BI \-w
.BI \-\-wikipedia

Spawn browser pointing to Wikipedia entry about the character.

.SH USAGE

\fBunicode\fP tries to guess the type of an argument. For example,
you can use any of the following to display information about 
U+00E1 LATIN SMALL LETTER A WITH ACUTE (\('a):

\fBunicode\fP 00E1

\fBunicode\fP U+00E1

\fBunicode\fP \('a

\fBunicode\fP 'latin small letter a with acute'

You can specify a range of characters as argumets, \fBunicode\fP will
show these characters in nice tabular format, aligned to 256-byte boundaries.
Use two dots ".." to indicate the range, e.g. 

\fBunicode\fP 0450..0520

will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)

\fBunicode\fP 0400.. 

will display just characters from U+0400 up to U+04FF

.SH BUGS
Tabular format does not deal well with full-width, combining, control
and RTL characters.

.SH SEE ALSO
ascii(1)

.SH AUTHOR
Radovan Garab\('ik <garabik@melkor.dnp.fmph.uniba.sk>