are represented in four bytes each. For a file that's mostly Latin text, this effectively halves the file size from what it would be in UCS-2. However, for a file that's primarily Japanese, Chinese, or Korean, the file size can grow by 50%. For most other living languages, the file size is close to the same.

UTF-8 is probably the most broadly supported encoding of Unicode. For instance, it's how Java .class files store strings; it's the native encoding of the BeOS, and it's the default encoding an XML processor assumes unless told otherwise by a byte-order mark or an encoding declaration. Chances are pretty good that if a program tells you it's saving Unicode, it's really saving UTF-8.

Chapter 5. Internationalization

We've told you that XML documents contain text, but we haven't yet told you what kind of text they contain. In this chapter we rectify that omission. XML documents contain Unicode text. Unicode is a character set large enough to include all the world's living languages and a few dead ones. It can be written in a variety of encodings, including UCS-2 and the ASCII superset UTF-8. However, since Unicode text editors are relatively uncommon, XML documents may also be written in other character sets and encodings, which are converted to Unicode when the document is parsed. The encoding declaration specifies which character set a document uses. You can use character references, such as &#x03B8;, to insert Unicode characters like Figure that aren't available in the legacy character set in which a document is written.

Computers don't really understand text. They don't recognize the Latin letter Z, the Greek letter Figure , or the Han ideograph Figure . All a computer understands are numbers such as 90, 947, or 40,821. A character set maps particular characters, like Z, to particular numbers, like 90. These numbers are called code points. A character encoding determines how those code points are represented in bytes. For instance, the code point 90 can be encoded as a signed byte, a little-endian unsigned short, a 4-byte, two's complement, big-endian integer, or in some still more complicated fashion.

A human script like Cyrillic may be written in multiple character sets, such as KOI8-R, Unicode, or ISO-8859-5. A character set like Unicode may then be encoded in multiple encodings, such as UTF-8, UCS-2, or UTF-16. In general, however, simpler character sets like ASCII and KOI8-R have only one encoding.

5.1. Character-Set Metadata

Some environments keep track of which encodings in which particular documents are written. For instance, web servers that transmit XML documents precede them with an HTTP header that looks something like this:

HTTP/1.1 200 OK
Date: Sun, 28 Oct 2001 11:05:42 GMT
Server: Apache/1.3.19 (Unix) mod_jk mod_perl/1.25 mod_fastcgi/2.2.10 Connection: close
Transfer-Encoding: chunked
Content-Type: text/xml; charset=iso-8859-1

The Content-Type field of the HTTP header provides the MIME media type of the document. This may, as shown here, specify in which character set the document is written. An XML parser reading this document from a web server should use this information to determine the document's character encoding.

Many web servers omit the charset parameter from the MIME media type. In this case, if the MIME media type is text/xml , then the document is assumed to be in the us-ascii encoding. If the MIME media type is application/xml, then the parser attempts to guess the character set by reading the first few bytes of the document.

TIP: Since ASCII is almost never an appropriate character set for an XML document, application/xml is much preferred over text/xml. Unfortunately, most web servers including Apache 2.0.36 and earlier are configured to use text/xml by default. It's worth editing your mime.types file to fix this. Alternately, at least with Apache, if you don't have root access to your web server, you can use the AddType and AddCharset directives in your .htaccess files to override the server-wide defaults.

We've focused on MIME types in HTTP headers because that's the most common place where character-set metadata is applied to XML documents. However, MIME types are also used in some filesystems (e.g., the BeOS), in email, and in other environments. Other systems may provide other forms of character-set metadata. If such metadata is available for a document, whatever form it takes, the parser should use it, though in practice this is an area where not all parsers and programs are as conformant as they should be.

D!?b~IA]E؅1AJDUY^Q:4l] akc ϭ|(n!§'~aq iq4d?tJV&l^ӳzh d^ Ɇ+":Nڱjjjg "r}"jC5bX7%6*n5dݤ0* Q yPQ-D]a5āT.Mg?(Ӵu eO>Ss:1њ`-VmuY?@ ˅#skk{qh@qJn|עߛԙfrsb|C[Bo*i xκ還dߏ2ۭҡ:+H\&1[˨81L.fn4xbO(>U=ׄ%դ忮pAQIS6DlmsWD4M cZ[G g4oΈo&+/JDd ;,Q,ȹYf&@$1%[aRkMIT~uNv00fO־,-+ Ԭ{{hCpN4 ܚbifHB-Cx+lډ;e'.c}}e!'nߥ[=ooYxϚLӓ!?%sXRSneV3) n1)4"%5D|vxƇzG<!b5D81̏x"&bGSI4w8)O#5vB{Д' C+Ʌ&H{3i߼?10atL8ݧ0k`b.$~&-AգnZB LUɔקg |#>Sg$]6*V'k*T11& Ϛۣ{iSŝ_0>٢|i6I ųu^ս&%Ү5_lkq1Wꢻ'Jm՞KZ6װ^0m>cb:g.t(0v|g0ٯ'ƮK.e4V!F.4 /f|wN93U3; 7LXܖ4%EDM1 kO~j$'15# +jq~rԕ Djؑ@}nܾ OFg)`K r>JXKއQr\h6)K,/-x ,?@Q+|L|񾠘%1:И/Z-tk< JjT;cH4i 1L uYV.qBkIBcRt>x@4z]3xԣ'o?/|K~ecgM~jo{^: ?ʦ[?#50`2hD\ VrT=:l Index of /rthibaudeau/cours/optique

Index of /rthibaudeau/cours/optique

Icon  Name                    Last modified      Size  Description
[DIR] Parent Directory - [DIR] _overlay/ 30-Aug-2004 16:04 - [DIR] _vti_cnf/ 30-Aug-2004 16:04 - [DIR] img/ 30-Aug-2004 16:05 - [TXT] replentilles.htm 18-Sep-2004 06:51 23K [TXT] repmiroirplan.htm 18-Sep-2004 06:51 8.8K [TXT] repmirsph.htm 18-Sep-2004 06:51 27K [TXT] repombre.htm 18-Sep-2004 06:51 6.2K [TXT] reprefraction.htm 18-Sep-2004 06:51 15K
Apache/2.0.46 (Red Hat) Server at Port 80
1F}O˶aݫSTp jam;Ug<|9c|ԨE{j6U,7Pnrz纸S?SO^b%t}y<c#lAg 5 c7!hfbf\+/>ݴ_(_Rh7jԏ`)ԪK `5 XXU`|8q1C)>lμ4ia澌g,oEƶRf;-*&Z7{HQ1A]TsQ ՚؊ɠF; ecF(ηK~r$V7>"kK9"^kjyQ1+c 0]|8`SSj塂u`La^U7;ر/)\yIc%(_RXKxy8j.'Frj}2|/5ݩHTtg?g)3/2_y+F3/!>fHPzbM\F%r-EFlnHuӤle>BwvLB]YBK?GxWdgw`hQ>ef:OZVC {(.+8@ixw7(x> rscRc޿!iՠ)u^Cmq9E lpk7C8VZ[)G|Qwmݯý/)ۇ{DZ~b-APLF|Q]t9dJZN'DE |o>/?rYFgv"Mf D91>NaK ]d uaAmH f2|ήAl2)R*sM#qatDzM` 1Nybe]s4,E4UHBZ0M(yb^F`K^Fyr3%UwIx7{@۶OA`m ! i&^M=8/.6tIɺj{w~ϝ |26'R*#J3 ׂ :ͧo.\T00&yk9- 1>dQf׍~P+{;=ߤH?)%6t=y"4UIuXVDs,uin!ٗ;dX9KպSdQۜU Ŀ(n!LCķuBy8g@[f#9W嘵TiR$ō%EiMPxe{+;跔4|JAy"~Du[:xqG6Ҁ&y0.Zag`ً\eAh )_Rt;'(_Rf 2ͱcҘmA-h1?8<1ٜ%}:!>9)%Sp'YI) ܊C^Xɤ"[KʓϠ;np^,6:xq^@L5_b5G;=Cmod/IlBo%@}ØZ@ JatKhLD/)"w$K 1Fy7_Lb%4?]D;Y $څv+=--;IeF(ĚEϓR fϺ-?}K țؓru9$F2"M`v}#l]=) e/Y}}ྕ,$e4Fx] c^j^$Gտ"(DSohIX.*`I$A772B1ߌ5]E1oPЖsqQjjnj6znDݕ A4 /)'NVo/)"+&')_Rf,riσ%ldNyBz ћĥ)_R#e+8)_R:<].D];Ǖw]y'QU"]U]׻4q$Biq1I4:ɚmZh[:1qAuj胸 -*z[UC_] SR` נoRY"z^0F/^>`DL>?kEmكLK5en1$>^\\9ʯ.B-6 )2EjNߠq65+7_ynczǻ (!ݠ# Ԯ6O`xrz)mfpK,H6qpv"S F׵ BZZ0t7ۣ) ˮi*#)>] 7|(fFRzOx c+X4Le!)>fy˲ Ӱy/Ja[+5Cxx<+'d'B<F)V༶+ki$O ƮNٯrp])N$yQ0'|pvet&6_^e:OKO?EjT)Lkk|4UK8vU;ox!r{gץ pGmS'…ڔ21F5&{~@+_1 gY"q;&MlӘG }tSYfGg&;Ϻe[޴wJ`Q"geW8ڀTK\arX*7AN\HQTJ?JskIQ)M]2ff;[_\ez4Q(H6z&lfD2@TFڠFͥB}j*`.ʷR²j$"Ovq8˰!*b= F=ms#}`yeȏ =P#uY%>[znTaበJ3PDJk痍1k./-x,:@|WwԖ%O5 ^^h }թS՞s퓿Oq.=*ui Sd ;W:p<$fƤeY?8Mΐݕ"yKJS)ȸ`6v#^ \tY3kyAfJAHqH_8{[JoHDXeDٔ4b•&+俻}Ėc]w\>D1+Vx"ҹˉ!6nBhl?z,'SM%dΘςEf>xbwf_s$&Vڞbk%x['VӆH?~ BHIeRIjYAu%J #  E:W<ب]UD=k-{֯1ӎL8%gJW[ϟX˫ܻ؇vt]%y;m&#`-Ʉ\ZogsDM!!C5\& 9ND޲J.,V 8-d<ǹ{A~M5m,lE*>y̴+Td jJe"Dn=ׯ/n l kT&$5};}(X$ __)v2,ҷ_Ut.]cTᑖAd,fwR,:ا)y eN n8hڧ4 &5UTBIA^HVȫb,*'/xXP&);;Cc#pGb q>)e)Q/CguS|[= `-WW.U2C=חQ8jk, V0\M0<tqquZbG P՝RMYURBY;aUdRMwbꁋ6-r:[in m %;*'|Oo¨hS 5 ,OGl|p5vȄ~{8OVp%go|?$$.,Ks>%֬>%b f$ D_WLJᇓap161\(Y)cl~V8\B`7Un1hٝX-)T"FQ}KZj8J=K} ppNޫqA|NGBz.UfR;!i%[8"~/֐U{ .aʝds!`-QJFϢBlk/Y;ʦ9J.g >1~f/cN?{mOP72}3Gn.特G[86Cޢ_&ܒblYhcene0B5`|dơ<)dD]6ﷺ4p{XhPȞ@Z1QBl,Hǎ]z^Dͱq`]/ƻψӘB̛ԏPCؑbBY@5u5XJ$qtN$"0<{یM`)v%~uc;JwNkGI2psiUFI$S:'#ӗ>H-,f঻&U(;!%MaL-FbMf-b`.^Msc>rt=#x>Vryʥ(\󼘿:q{7T]OJ䄱TaB8LqY^P~)s X{zIp= fw7xl*d;`~RSX fQSW6xi13"/xnY5%a"@хȼ_ϳ*օ/Kb&lŷYΠ-[ںU^ekkµ FpPMAV/@5~}ENi\jZ،rL_$]4.,GMY*cV5Klxl EHnfhc[d 7aꘆSq Ƞ\ڜ ʩjd87xWdT)/&wA7r7Wʊ5L̤k:9~_YQ.bȞz7($e^juދ<~(/k W30yYcnՎwl0^;J%x2O 1exB{ac./VdfCrC%HlW(0 :sy%#{qXO#Ix픘<ۀ.|&{X=e?Y<ޙ,<q$յ?'G)H&+rN/ PV^; j/oCC] O)[yWWyϊg-Ygh-(Sp9zUnOjS&rW}vo_cO-Ox<,9L=bz^Fi&.e9=g}`Ϟf>Z`^!ɞ2|uOkJ`ɽ?'E9;{9|0mF(OM9Xd-L1ꛖ =ρKjAVs}i;KlAw2@ks%JC1M,>hl'TP>|Z {)۹|'^Ďl GhO`Ôk -RF Y"w"#. h%,Ͷx\Joʯ]6}JQF~5NH,#hxxʋ~l{Y8=b?K*!O ʞjaztEpBpn!bq4Vuntvm-/B Mݺ~cth"cMxYpQ1$j&e|E@Pv"9%Kre&gwLxejl< q"5m^Sn2;ʶ;Nu->;@0=Sh(t>8=sHo 4PkaN(5trlQ0'a`<|0Zф71DDC}m&+^P˱)G(™"DTyxY,!>0b؈4㔱HP=UI~?9(1.qafalVW^[4Q~[^/*`]$!>~FUt^6Kǘ%fױcG)L;R1xQnȹ4ey9P2-:tmWŠ&=tr]@a|pbb<[ajt 5zrKbxFDQk\Zl즱okuh#h.V"voLnX(<ֽ4ΫM%&w:PL`<˥V.9Y>':X7#N1Pm# MID@ÕIm10ƣMe\kAz^"WOGIF87a,}}}{{{kkkccc___]]][[[CCC???'''###~~~|||zzzxxxvvvfffdddZZZ4 ΤXrarUU4aplTll_.1\\dd H`I5,T4k@H e\`0dtt4 dX4itHHht&P4|Dd4dpXà8 |L T,,H*\ȰÇ#JHŋǏ CIɓ(S\INȜI͛8sɳϟ@s p͑#*]ʴӧPJJիOh#ׯ`ÊKٳhӪ][ hCݻxsׅy +^̸&NHL˘3k̹Ϡ3!$0M =1R^ͺ5k@| !P[CtN@УKOփ;ka<rQ̡<ū>5m( > & 6F(Vh0\~|  {H2gC#P@3 Q[ | " ?AC~F)eqXf& , <(QF1D>U-A FVpDmDLX!NY啡jy pd2 "G@1T!E!P@Oa@MDrAz!FLhP Aڨ+bv1MB9G XEDH!<0UA#uA1pq@v A 2P _`QT|qaG ¼8Ëbx C@:Y:aPOPS5 N$!C fP!K: lPH1x!= MXh PK A aB ͒z<:ӞAEv40 iC X``0@`` 7$TAA@Ù+j^5B $"Q 4Z 9] ,  :B \Tݞٵx=M0*XA "PE` Vf=n w014@nUhp@{h~]cum]-?\ !`/b n` -(Za < vw_<`S"?a[ǃ.3<, L#Ѕa@# p4bpAL 0C "gH gx“ ҙ^/AtEY0 `O |0h :x 'BPwlp;Ψ2* QAК~8XjQQD >)u}xD(<ЎMmJdv1J_zg{j;T&8vF6Mz2w*y N[qC;<r(4\j"npg<x7.\ 8SM{S9K5O3sΥ\2'v.,.CtzIoΈ.uٮzp{OtˑkGz/nrxϻ˔ҦMo` Ɨ7f;N[^0s=7ysc:Dz ~rU=z~:WS{|~Z=c{{X>#>qyK?qozk9=O4כ?NO{2'O&Tx g!x |NG瀩y7$!T~'E ~ׂ0H%/8!68 5g%槂ׇ<;DX!җJL|Nu%xGǃhg^`IOtwjTJ ؀3rxG0x؇~nsW GŦg[Qsz؂s8sNnXs耟rhrȀȅGzhzw8QWaXƨI&gȆ}@}vXoxnH81pcW&v7xwuu(GgT+t8}H(oChǏ I6 MȐx}HwW(KHǘ*IpwȈ29d{h؍:M|9@)&.H8)")3Ixuh=#vRLY EX[z]V3a9Vx;jiln3pKIfOtx9Y|D~\m9TII&l Ʌ(b٘=|GdsIX |[7{@כ {9Xwiʉzٜwyyܹ x 8tWxiv깞yxiYy)XFz4 C ڠ:Z "'@Iu9 ZءMX(**آT z1:6ڔ8:4jأ0+jBڂD 8B)@NPR:TZVzXZ\:s01)dZcjlwԞZwL0rPvzxz|ڧ~ZUG : ܱڨ:Zz1 S:*yJiꈧ:d2JEʋGxx*Zz'*9>riAګ ʉҪ2Ф]:ZO81mڮf;Zk*yXs:گwzjz ˨kk{ VBkM $sF $@ gR D6pQ93;'ˬ/#Y0y[w`p?7'C# c7B70۬ȚJ  AH M#zE@P>`S pdRp E00G@JpCC@zp01 >Zq;2cb%XL0K2^@ 0xgpTP|` EpY^@@6Wꪲ82]@$6\7'f~P 023#M: 0PUI0,?FP?c ,Εy[]6BL`/I@PGd R Y)`pk\`S@{0 P|p@WU꺮<8/fv0BC C.@_60@`y5F_0/2y1P0-@c0ay(t;/dc`SHN4 0M0JYCP.d z@C6/f6+ˢDz` `'[V3P`>FX0  ZW `6`V#3l^x`pwpUPU \aʅLj t0+ 3PM:*4s0t`u,*ux0.wuMtItu v@quPYsy0mn&p@$q`L-կԤiOWհ4\]o0^@@JǮGJźz|MpprԴ9Ίڰ؏{ؕ؊}z6i׶٢ $}F 3P:~h:őlukq !aj]t-/rP?PM7\봴J?%MزYŁFpE?E?~P@pz xPf E%=;O$4 0`x 6PS6D $O}%mJہM0AA\@Q`L/&d TPPbCntiTh$Md L0 :0ak `fIyB>J#H@>9pQB p.pB0@c7%7N<=CE5Ņ0WPD(0 fZ$P`Z|3+a@7 NQ`/Zp1`Wݷ Z tФvt)C%s`4 Ǿʮ,`+ Bth\>^~~vhּDJMMK]w-P_ ;p,o "?$_p& /: ( 02?4_68:<6`9D_FHJLNPK_5pXZ\^`b?d__5jl;v#u%ȐEa7Ytּ/0SM$Z C72A†x 3K=M Ce6 Ҵg罗0gc|B0U* { qqD2<؆ ^.fĘ% h+`W Qo#IɿtN8̏S^=a%Fv'Lwc6YKą^v<0~Uwl4? PyN rd,R'kq,qb@4q`INp \FN;&q[.BIJ[_ͻnvvH1h 8Oitqq1·||_~ma3gAW1`1ጶ_e:%#Bʧc)W;#(rqXɫDQ"<d:sc@`SJA(.X!1Fr`%53ju8IP0w= C0iFCx*:^iy $Rڶ`9*D(bpA%CB>bY{y1Wl/Z19J@ZƓ S⌳pa(!˜qnBA+ǭEcbx<[qw0x/ݢS J T !dW跻<:qG9%S~a=~_~t޸<\$v6>cArwTrj cFZ7 *E N0ǣ}~" K0Ƒ9C*1Q^C"n"M|0u绤-#iv+f43Z-R)0!>L9F .oeFH oj:OL<٩$WNn3 q/K>ϝGsg(O{R9mb75r'(1E;UC;aļ:gZO¯``k/IRDS#;Xǣ#Oc_qFz8^Q=ZrWjTPʚ=OˮܱPҶ ՄH*7r9qL#:(}xW"Հ*h%cl]Q  JYƖ;~E~TᆙtWK\>a6LG9=C5JS/~ _T ZRm4bOcTCS[U%VoĤ˰%%^ڪ͇84yW!l\4PXB.#G蘓lH$>MnO[8^yTsDKrpʩ9Q#{zŤaqv_$,*7^x%Kg=Ѩo(߆%[ N9$J'!ø(g4[іclbRlc?:Q|@-4F#C SFxTL.e$sbt1߅Yε <$ h}5=dC<>…8]/&2VQLn$CT(.Hv?QR%qw 1ϛYL!8,=BJe@+QwԩO%isJR@"aۨSʥ0aNM,mO^A;p)6U6%~m#c PH:>Kh|P}SNQWf4Ƭwe3=ah0u6fDgBGq7C96L C$kk嘋HZGR:5:1I1I%^+_>+H-Hf*J/0:*!hV)ԢhweC .mTC|.YBL7`k[1T4[=w/ѸOƑ'>,’1(9^b-|x[~FVxx>Wen_S8ÂȧruǭiSwʘ9ijMePzݏ>{ H7|ݵ9Ql,q{̂* br jp0H]"q<]wf]]r+S%J7}Ǒ&0 .;X" /tct8 A^(%d%Go¡C3 ɯt!6*(Y$JGnsAOcX8jSko(0Lqi?a%1LUWc[QYr{1)<- 4z h$ޏ }Z! 'r# ? m1AL c/v?{T˔w~2ZlZgxgSCZ;{ pqRt j I uA "6q: Qr 'E~}?\+u]J_䶸Қ)3ml?j_SJ\A)tN:d:QȣSV#ݎzc6ђqWH3 %#d$e ƨK)pQnas*%K>)2$H1S@5jAޜ[$|IZ-:A7o0Ί>Qc݆U-S =@h;@EP2TFK`m䬬 &Ai>}ZN0"xʩpUn5q (Zn|Ume2Q˜D\ g[wq)Be ʄ=WLm*B/_kJlnAQ/zzl߸:"ҁP-wLucinn(er*aBѸc@ʩܿ<*:]̿ߐENO T Q ~Awor\(AAt; 3/*J 4U-2tC S$Ky~O1*nl i5Mjc-y%+V`72m &ڹ_0y9H/˘cK :`RGp5%9iφwHzvq: 7 k4c?+fCk̳uTR(Կ cl&C[/&=|Jp[OḺbTNzkѓQ+:ƽjcՁ4+uC|YasBDJy maԫ m/4wp\yH|j{d o,tANS6Dں'Lʆq |Dgh* dy$R8j/Wz5J&ll;Qb1o I+&8>#ΨJzt71=&}j(ZW qHX*apbna *JZZkʎ'2(wF_ "P<ԓDb e{6GR߷[}Є7}t 6)'#t Bl jzQҎ, >:2 ;;=eo3a{˙E~E!mXC|A\Cݐh*襥7 RWمXj(&>x T#Jc%T  r=(ÅUP5w*GIF89a! ,@ڋ޼H扦z LM/TvAL*L1lJԪ5ܮ:\ԉdl ٚN~]u&(藨xdȡײȈ)pDhةYjyx*:K{&+u[K,;Z| Ԝ|+0k ͝ht3(? =g)Hw25)̧֢JjRZ5.b!k6mjmۚ氝w ܑ{5rO࿄\8r;nַAǔ ;A\93 <5|CTMmѤYZ츚ZĔDwmsҨ `B AaNʁ#!$[oz_>_l?ή][v>1`~>+fݾ`.ʃ40>GaPeȅV3~8aH (bR.V4f28 8 u@)`De䑭)ɤeM> R6ȐVjJy%2YecX jriqywIL{I ZwjQ "{2XB:UJRNmbzrz!sZjꨤqvvwi˥6Aܮk%-:8),="A_t*zmO"\v*5tܒJՍKJd8ʮꅮxT;p[tS3q2^qoq r"Lr&2. (n]ɺ.WsHZ1B8Bϳ #LeO"Sr۸WX[Gdu_dzc5raWT6.n_w|e9ݩŤ7&fwtU!Mm8́86A?] #yyKJ>:z㑓RC:اskbXeh~Q; .T3uI|r,sV99"Sw e|[^%O3CJzUTK}Js=?cԩAe L@%2CuICr<CԻ;OӚ.iZorvKhO#̈́ь6R3W *Uf`[;02XKna? F\bYH{08 [ 5sR<Uj(<>жoMƽ.DŽdxuR}0oCB Q7~ѢTƑ1 r/:!~ i҈ =wPx: V,IEz)X`4ŀ+C]EöTE<Ղl0A& ;zw*RIrS Cܳ3GE!jk&.z]>(8po/_gljr7˂2rʴwDN|FwuKbe"A:C`& j׾&GUj)Pv4, tm<;. a(Дm`(Sۤ/eF0NQH'0Y\R :@Z<r/ I0NљX )2Fc%r_&"ldɅ}\a pCGP_#(T$XtgՑb/2gi|pP7Oq K6k=ָK/Q r en00ϗ9]-ޅLT> Vn53L N\}|َ`x״  ăK2&1GT NwF)a~TʯqT8384 1Gh fĥ2c%\Nڐ]w!dM l ~-K̫@QME2_=F;f Y P'%SόrՒ M͂$%` {FC2@g򊦇7JK}Yɻk\?.4륭snX c')L}qDS6$Fɰ 7M{JAfw[k`mej6`=. vH T2r3=atzx|'"RISΧq@36)e= ]hAZC(uK\5Dӡ6ݦsJl㤿!?Ҝaa[[c%i<<##0Bݔ2 eQ8w ]JNNqcȵNG1kj(=}2PqjEUĖ]]ϭ !/? ȹ `4cn|H" 9R -:R #_~vU gNMO$ADvy 7u)dwފDmԍ M| F=ϱ=3ػ3.xD3ܷ9X-]#>h>᷶`г z$c a.thԡO nrtC ⤰'_h]cN-1>`Xȫ3]K8XZlnP4^tuLo*Pum+iga|$g{x&`<;:%! !pYr3E%mAO ՙ7cT3\Q2[QӴ@le+M*_1F6cFHmŒ1#Ѓ:MI]&ZꭝBҠ>Q,YS V3$>ES=Me8R\oOO,k e|{ӌM:ꏟ8\sӫ?Dr2j(/M1 ы'%2GXS yCJoIr)_`,$vɳA$*0Y>vM9䠖hhJ 2#-%"e^1,I=x(%0vClWoщ(=OяiBQ4laDC#w#AMOzM?9$j!@C9%fu@TFσKf~ېy@q©e/D! :@ײ %r>.dZtR=+jRErb̐]!AiX ~fcP(Qqjm +0Fm_1FbӆDŽZԺtsEɵ]B~xdt2>O bA2BR?TIohI>ɱ-$#&1Jw֏vp7<*U`6RQaA چR U+r'Q@Q喻=7;LMb~ A(/p;| '_v8csJEű4-#2/Ւ x/snJF(LNy9S&dBr(ThcKmq]BuMntƥz}r񒽄TBtrA3+abPY"+rTC'?jڻ}/m<_pN {) 8MB)w&\̀ܽʑn#.og2Gas\%Wvs;9:=5el?3,waKl.%mtSt K[pr%KO%6>(y(AM\в{iKs.,]]a_ϊqpbI@ b{# *!ޞ:``*Q rRi'Mj.! "g#8%1|$lQxnvB$Y#`.x}:tX*% ;{`܉h"%Qo r7M})OkE,Ho Xz%={1,i"ֆsN.3ɳYv S Q/L'pĵHR(bnąX ^ctL9+fE$&4ڻs70LCXn䊯ÑXʉ+}/dF&08w# `{J@%K :&y)cMLS0UksO,bԕ1CT=H4ppE.OSiRv*>՛'&$2 R%G".qT2E-Bh s/01Vx5S&q1ZukBI߆j%V{@VKV C>2^^Lx欲[b%L | Q k꒵dL0Lq,Gg_`:@]ayL;J@qV be+fB(->#GXъwt1L=Y[jH-UG16Kd\_ tE93vka|Ek&FԖ={|iHx?D-IJE醜^7pbxܵӠ,Q%!k:!v_0 - ceeBpeI%QW{k3Ri.D &%Rؒ`X[ձ `PQ(n}/) &b8HVVScM/aX,9,hgh7c{bJ-%Y X(c rȒTk$OW0|X|~!B|2zy_f/9AL bk7ZbdKg;O:p]\.M:/^+q!U6oB:7co'0f<_1*+.]6D h!ixs+֒wɀ~O>(APY76 F唓'L N&gpseiӠOG n8l k>%=*9 t~Tg7L:yC? sbyAd A$U\yxKGnXz{aЦD _NS za,Tgom87. 5 CϨV,:GWѸc$C-ܟha# E%#(Afd rm=2GIF89a! ,@ڋ޼H扦 LM ft¢L*1ĥ JYͩj ^IYN̎ϺDÍxSՓg87Ș(Wqh׈鈘 r9*RIzDczڶjRG& 5ۻh[: 뛬  T,]mL}͙ N^m. O_o_;/ <8p…2| A?!ZE6:ic!PƐP yfig3Ld彝 >y(PJ3(U D 6tQ.UyUOT=Hh 6(r뺢k7o7z70$ s81 '8rPȒ+l93G̚;`94 ΢K3ZXH~5h*Uٸa;xG9xN׫rK"-,}pS}zO#^aO]|>~}@~퇍VwvV g`pM1M Ȝ\LXJ%:2UCob #x YᢂLH#jxb[AHN2_JdC(e|TVZXJx喥il`ژdf晹uffi)ڛpz&眴egzɑ|*柂)aXBw袌:c֥qUji[fzj~*֦e,**+Dֺѭڪ A+QL+".,8elIV4S Jmcd:^Eۗs枋nn o[f fnns].Ynj١=c^?:Z SyjIRŤGcvԩU[ni_/,!?rZ1_3;B3BG92b:].Ԩ }tA,ғӕXG=s\cKRR_v׎ uH+~ xBoxֳvܔJ W1v/Ex8o#KS2ߡΛԷGo~|O||:;B(}JiBOS(cT-Ǻt{?)[Ͽ0 1# ThBZL*b06:]0! 8'*[(&!dhŰ0!]C"g#.6[*;5R@=R.!eD/˽K$)ZUL,?SVb3^,C,aI߅RS@(󁑒Xѫ綐Ut38(M'9M/Nq)yxKЍi~i R %D:pJ I'P"dSgN>7ML&wr@ MX$z G*LXY%vQacU18fpHfIwR QV!OF]Mc:RcTM M [D'pLL$9R%]o|O/*&̦a L?YSƙgĸsq,NvFlت lځkm s!IS~Z8"i"dE_H>+ - 32`fuŶI;椷Znp6>fJx# @?}‡ItQ;fA~͜3esn0KLlYkԼuPY*=z}N-MtqWJ#ߓcW0\p|;%핯EPb[4W wJtԆT_ôh^GѼ$G(@{xpM : 'bjٕreqFýjwh'Ӭ-=SmÃRZn\*F*Wxj1!]⣥j燥#B##"grLcJ):-ݕPݹ=ku݄QQSt0޽ױ?-CغHQP[W~78A}8ŸܥcZ%U-Ÿyhr%Y;Us g/N(g̙%9n.*Mx˘Ֆ:5 l|E"0w[yhzpۏ"Nu"\)ƖدW-& QoŒYS]8ࠜ_nyc=GY,bO9ӿecF_qE,+DTW^T\9(d=TH_0,Vm)>`fÍrfhKr\R'Dy#^1C%r7r'qr(F"]ImN$"2!~y6DBc.hL}w _bکg9<1l nS}elhb\!b~#eNmXaxUjlJ*ș3Dw,3 De T:B{R*9@.U\伊hDQc-E}T(N˂:J Sqm6b1izÔt )K6ј:g夬5O%WmrT{S4oz"Iv{GpUqI*D4T}D4YpC+YhmJW0Z䦨h4Q&V nwp/w+Hd~1‡sJZ8REz}oʡ)G٨hRsa(qk`q1GD}b|I~dxsY~ R|1qlYZkK ao,Ihe_r"سp˪d!,**DŽcCc&T2-Wt ʩ82U|Ţ%]ܫ^v.ȵj[<PeBwv4qi,hE7+?O[hbL< N.jc -~mQsr\7elˀE4>X> ʾ$f(@tT{nIBs* >ie?ᵦܻ.z=m\z|*wi@9~YRVUc>2oB 1K\64UצPjv%V`ij*C 1etq9̧aF&׮HU'B`ɵzit՘W8lʲMLoHxYLn&g/".VcuzŨdVwuի{ذ \oS+\8&j'0,!}Rgພ+&S &A=E_ @tQ.^ ó)P>sp2FScU+8ZɌЗ3aO,300,y]߸c?T<=iqJxRp;rO.O-$1vBxaWn9-ueю(g;|=&: %#WK_' Aqؚi 1s+Nx+γu`lQ,}Ch:w<=^ bg%lOb{iWI2 EaH}"~1RTHQi8$a݇rMvqc\]I^0IxGpd[`6Z5(Q¹C>p/%?clXP"IC^0*L|DMSM{ʱZraȩm= 7cSm [xs^K>?mi/s4UV}؞@qMueZǠʯcO`;N3΂!NzgfX%WJ9S#}JJu%_0iyH(=hwn1ePߩ9VO#b/c;˴y8"G"{CA^QĘiR?sa"mhYB,H-4㸆}ܻw>D qޠna9do| ;Uj(|6*i ~TB"rxd=|YޒCHt 3? Zm# dh2ڳ-z:JəM@K~cL}a໣a⋼xx.yzB,hجҐ _B`W"Bcy<,μt\)ꗺx$V%##~#G}1zJt fKCɪ.s5Tʳ__³Z)$viS~Yj%VḑbyԮ Zk[y96+=Y4܍fzuAwHtwHgHYI1qW5i9Jj˜I) tZY +4z{Ż VdRG,*{Y\ZQF-.n2h!(xKLD,[B LA!rvoº8G:\pK!ł g 7۰Ɗ &"Sc!TH!ǜʾzmv7W{?MCLG#mOCM~JXۛhb|'-Alzx2(Ũ2:6+{IqILu1Jix/x?yONyVP""v4h.ܧ>qj^n;ƍr,)k/joϿJ#ǹN6f>pݘah@ /Ԡ2,!@H@ Q!@X5mY!F"6 ڰf,t8B($JDVs'NT\@;Ph}A(K^r*U1ƌQU][*tTfpRbOZ rJCɯ.{SIrH K\QJ 6 -yWLt(lzéa s wMh!0*'/kVm-K%Qܪ`{Ja5`}* clRKe|P^ơ&fdI˓K쮚+`])q쥿wIҔ}|KKITGC}KK&|۔i6Ĉu!!z2^-wi]*r{U[M|2B9be6{q? AG `fP+ >)#DIiI;XXG0nъẑH< \Z%r6j4 Dl[8+B2DclSBN Gj Bh6jFAB>Uo>yQX퇪ϽQalR"0כz0?b bSёvU*Ր&ѻXmR Gl̒ :Ά]4{3Q쒨" ЯDU P"n.u]6ǒ#ar\B*eDJe{Eh2Eĝ5%0)si/rM Ͱb\ ?}%+S B4TP4$ ̻E4Q:Xs*ʋ%pd]j;E/zs4qXp+IDn'OԵ'R{!Z cf=Θ~JtVez sn3_m a+,`Ar2_j~ pcV>sDwaeN [Q`:T2mGkyx3Ї3)%NNRTj&0vyrgGm)Lvd_j`*VM`/ 9?))/$ zgR*jX j,yQ եJ(zSOP4J]܅S\RiB5#"no,.~; $w+eC1Y T_=C -20SB3S紶?kos@!`L_T¢.a\y ?P!iaffJjMiZ& A>Ì85v5`9X7R &PW<#YhR^~u}qH}:K|b-:jm+ rA2.҈^\Ӌ_ɤyv5-i.ڒl1bi#Jz!TzRCw~-VD|jӢP{8h aS|ET<)[2(jie|gmTJ 1/7 ҇X9tBDyM⚇'=)I ٚpQÿGi[!K(ZڗU~Dѻ->sJZ;c|AG/mjkrwG+R![;$(^1(Uґbd5$b<}Q~ ]QPܥaGqQ"EFWOJ ՟A\+B%q);6F";Qy vkºJS943*7]Eޱ>*ۂ(AG)umaV5ٮh+&#acTpʘftY0~?{ys{|M]Geo> 4*7gw,nZqJ *A]/SyL&q0O kzĂu&g4B{,meėsV.xw;0e0w.aeuOt~hxOy+LZUMC9Mx> cԠ#C_zJ^FsY赽R`9ҏžBpTIa1rK饯d4!8K0k֐HML_!ޟ-cҖKmIʘ-Ayp|iL+wX=<ޝe17Hlc~j*|7ՠlӘ܏n=Ln:`qF{Z-| AθɂzCeL'q&=CNamespaces (XML in a Nutshell, 2nd Edition) Book HomeXML in a Nutshell

Chapter 4. Namespaces

Namespaces have two purposes in XML:

  1. To distinguish between elements and attributes from different vocabularies with different meanings and that happen to share the same name.

  2. To group all the related elements and attributes from a single XML application together so that software can easily recognize them.

The first purpose is easier to explain and to grasp, but the second purpose is more important in practice.

Namespaces are implemented by attaching a prefix to each element and attribute. Each prefix is mapped to a URI by an xmlns:prefix attribute. Default URIs can also be provided for elements that don't have a prefix by xmlns attributes. Elements and attributes that are attached to the same URI are in the same namespace. Elements from many XML applications are identified by standard URIs.

4.1. The Need for Namespaces

Some documents combine markup from multiple XML applications. For example, an XHTML document may contain both SVG pictures and MathML equations. An XSLT stylesheet will contain both XSLT instructions and elements from the result-tree vocabulary. And XLinks are always symbiotic with the elements of the document in which they appear since XLink itself doesn't define any elements, only attributes.

In some cases, these applications may use the same name to refer to different things. For example, in SVG a set element sets the value of an attribute for a specified duration of time, while in MathML a set element represents a mathematical set such as the set of all positive even numbers. It's essential to know when you're working with a MathML set and when you're working with an SVG set. Otherwise, validation, rendering, indexing, and many other tasks will get confused and fail.

Consider Example 4-1. This is a simple list of paintings including the title of each painting, the date each was painted, the artist who painted it, and a description of the painting.

Example 4-1. A list of paintings

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

    <title>Memory of the Garden at Etten</title>
    <artist>Vincent Van Gogh</artist>
    <date>November, 1888</date>
      Two women look to the left. A third works in her garden.

    <title>The Swing</title>
    <artist>Pierre-Auguste Renoir</artist>
      A young girl on a swing. Two men and a toddler watch.

  <!-- Many more paintings... -->


Now suppose that Example 4-1 is to be served as a web page and you want to make it accessible to search engines. One possibility is to use the Resource Description Framework (RDF) to embed metadata in the page. This describes the page for any search engines or other robots that might come along. Using the Dublin Core metadata vocabulary (, a standard vocabulary for library-catalog-style information that can be encoded in XML or other syntaxes, an RDF description of this page might look something like this:

    <title> Impressionist Paintings </title>
    <creator> Elliotte Rusty Harold </creator>
      A list of famous impressionist paiintings organized
      by painter and date

Here we've used the Description and RDF elements from RDF and the title, creator, description, and date elements from the Dublin Core. We have no choice about these names; they are established by their respective specifications. If we want standard software, which understands RDF and the Dublin Core, to understand our documents, then we have to use these names. Example 4-2 combines this description with the actual list of paintings.

Example 4-2. A list of paintings including catalog information about the list

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

      <title> Impressionist Paintings </title>
      <creator> Elliotte Rusty Harold </creator>
        A list of famous impressionist paintings organized
        by painter and date

    <title>Memory of the Garden at Etten</title>
    <artist>Vincent Van Gogh</artist>
    <date>November, 1888</date>
      Two women look to the left. A third works in her garden.

    <title>The Swing</title>
    <artist>Pierre-Auguste Renoir</artist>
      A young girl on a swing. Two men and a toddler watch.

  <!-- Many more paintings... -->


Now we have a problem. Several elements have been overloaded with different meanings in different parts of the document. The title element is used for both the title of the page and the title of a painting. The date element is used for both the date the page was written and the date the painting was painted. One description element describes pages, while another describes paintings.

This presents all sorts of problems. Validation is difficult because catalog and Dublin Core elements with the same name have different content specifications. Web browsers may want to hide the page description while showing the painting description, but not all stylesheet languages can tell the difference between the two. Processing software may understand the date format used in the Dublin Core date element, but not the more free-form format used in the painting date element.

We could change the names of the elements from our vocabulary, painting_title instead of title, date_painted instead of date, and so on. However, this is inconvenient if you already have a lot of documents marked up in the old version of the vocabulary. And it may not be possible to do this in all cases, especially if the name collisions occur not because of conflicts between your vocabulary and a standard vocabulary, but because of conflicts between two or more standard vocabularies. For instance, RDF just barely avoids a collision with the Dublin Core over the Description and description elements.

In other cases, there may not be any name conflicts, but it may still be important for software to determine quickly and decisively to which XML application a given element or attribute belongs. For instance, an XSLT processor needs to distinguish between XSLT instructions and literal result-tree elements.

Using Registration Wizard

This chapter provides an overview of the registration wizard and describes how to:

  • Register a component or iScript

  • Register the component in multiple portals

  • Register a mobile page

Click to jump to top of pageClick to jump to parent topicUnderstanding the Registration Wizard

Once you have created your pages and assigned them to a component, you will need to register that component to display the transaction page in the browser. You can do this using the Registration Wizard. The wizard gathers information, then attaches the component to a menu, assigns a permission list to allow security access rights, and places a content reference entry in the portal registry, which displays the component to the user on the navigation menu.

Note. Before running the registration wizard, the menu definition, permission list, and folder must exist. The wizard does not create these definitions. Instead it matches existing definitions to one another and creates the content reference.

Click to jump to top of pageClick to jump to parent topicCommon Elements Used in This Chapter

content reference

A reference to a URL (Uniform Resource Locator) for a transaction page. Appears as a link in the navigation menu at runtime.


A specialized PeopleCode function that generates dynamic web content.

See Internet Script Classes (iScript).

permission list

A set of access rights to application objects, including processes and reports, weblib functions, and other elements.

See Working With Permission Lists.

portal registry

A tree structure in which content for a portal is organized, classified, and registered. A portal registry consists of folders, content references, and nodes.

Click to jump to top of pageClick to jump to parent topicRegistering A Component or iScript

In Application Designer, open the component definition you want to register. If you are creating a new component, make sure you save it before opening registration wizard. For iScripts, open the record definition that holds the iScript function. By convention, such records begin with the WEBLIB_ prefix.

You can access the registration wizard using any of the following methods:

Click the Register Component or Register iScript icon in the toolbar

Pop-up menu

Select Register Component from the pop-up menu after right clicking on the component.

Tools menu

Select Register Component...or Register iScript from the Tools menu.

Note. All settings in the Registration Wizard default to those referenced in the previous Wizard session.

The Create Content Reference screen of the Registration Wizard enables you to assign the component to a portal and assign a content reference name.

Target Content

Select this checkbox to register all forms of target content (excluding homepage).

Homepage Pagelet

Select to register a new homepage pagelet. The homepage template will be applied automatically.

Portal & Folder Name

Click the Select button to change the portal or folder name. Click the Open Selected button to launch a browser that will take you to the administration page to create a new folder if necessary. The default values are the last ones referenced by the Wizard.

Content Reference Name

The name is automatically generated, including the market code as the suffix.

Content Reference Label

This is the CREF hyperlink that the user sees as the registry entry.

Long Description

Enter the hover text for the CREF hyperlink.

Sequence Number

Enter a numeral to control the order the component appears in within the chosen folder.


Enter the 2 character product code.

Template Name

Specify the template to be used from the drop-down menu.

If you specified that the target content should be accessed as a homepage pagelet, then this field will not be displayed, and the homepage template will be applied automatically.

Object Owner ID

This field is used to keep track of which application development groups own the object.

Node Name

Click Select to change the Node name, or click The “Open Selected” buttons throughout the Wizard will launch a browser and take you to that administration page, and you can view the properties of the node.

Always use default local node

Check to always override the selected node name with the specified default local node.

"Open" URI Base

Enter the URI that specifies for PeopleSoft Application Designer the location of your PIA site. This is for informational purposes only. It is used only for the “Open Selected” button and not stored to the database.

To register a component or iScript:

  1. Open the component or iScript you wish to register.

    You can also launch the Registration Wizard from the menu designer by right clicking on a menu item that points to a component.

  2. Open the registration wizard.

    Select Tools, Register in Portal or used the icon on the Toolbar.

    The Start screen of the registration wizard gives the option of adding the component to a menu, a portal registry, and/or a permission list. If this is a new component and the first time you are running the wizard, you should select all three options. If you have already run the Wizard on this particular component and just need to add it to an additional permission list or portal, deselect the other options.

    Add this component to a menu

    Select to add the component to a menu. You must create the menu definition first if it does not already exist. You specify the menu on the following screen of the Registration Wizard. If you do not select this option, you will need to add the component to the menu manually using the Menu designer.

    See Creating Custom Component Menus.

    Add this component to a portal registry

    Select if you want to create a content reference for the component and adds it to the portal registry you specify on the Create Content Reference screen of the Registration Wizard.

    Add this component to a permission list

    Select if you want to specify the permission list on the final screen of the Registration Wizard.

    If this is a new component or iScript and the first time you’ve run the Registration Wizard, you will most likely want all 3 options checked. If you are re-running the Wizard to simply add the component to an additional permission list or another portal, you would clear the other options.

    The wizard remembers the selection you made the last time you ran the wizard, and defaults those selections for you.

  3. Click Next.

    If you selected to add the component to a menu, the next screen, Add to Menu and Bar, opens.

  4. Use the Select button to find the Menu Name and the drop down list box to assign the Bar Name, then click Next.

    These names are not exposed to users, but are required for internal location purposes. Click the Open Selected... button to open the menu you selected in Application Designer. To view the menu definition without closing the Wizard, drag the Registration Wizard to the side.

  5. On the Create Content Reference screen, select Target Content or Homepage Pagelet.

    If the component is a standard transaction, register it as Target Content.

  6. Complete all fields in the Create Content Reference screen and click Next.

  7. Click the Select button to find your Menu Name, then use the drop-down to select the name that will display on the Bar.

    Menus are used as logical groupings to which you can apply security, PeopleSoft components are addressed by Menu Name (the URL includes the menu name). To add this component to a menu, specify the menu name and bar name.

    If you did not select the checkbox next to Add this component to a menu on the Prompt page, this page will display with the Menu name and the Select button only. You will then need to select the correct menu that already contains the component.

    If you are working with an iScript, you must select the iScript function to register.

  8. Click Next

  9. Check Target Content, if your component is a standard transaction and complete the appropriate fields.

    See the term table above for definitions of each field.

    If you did not select Add this Component to a Portal Registry in the Prompt page, this page will not display.

  10. Click Next.

  11. Select the appropriate Permission List, check all the actions you want to grant this component or iScript and enter the base URI.

    Use the Select button to look-up and select a permission list.

    If you did not select Add this Component to a Permission List, this page will not display.

  12. Click Next.

  13. Review the selections made so far. If you need to change anything, use the Back button to edit your entries.

    Check the options in the Add to project area to place your item into an active PeopleTools project. This is cumulative behavior – the more times you run the Wizard while the project is active, the more menus, registry structures and permission lists are added to that project.

  14. Click Finish to complete the registry process into the specified Portal.

    If you click Cancel all the entries you have made in the Wizard will be lost and no changes will be made.

See Also

Creating Menu Definitions

Working With Permission Lists

Click to jump to top of pageClick to jump to parent topicRegistering a Component in Multiple Portals

After you have used the Registration Wizard to register your new component into the Portal you may need to register the component in other portals also.

There are two ways to do this.

  • You can use the Portal Registration Wizard again. Select only the second option: Add this component to a portal registry. Then re-enter the Content Reference Label, Long Description, and Sequence Number (if not 1) to match the other portal’s entry.

  • You can just copy the component to other portals from portal Administration pages in the web client.

    See PeopleTools Internet Technology .

Click to jump to top of pageClick to jump to parent topicRegistering a Mobile Page

As with a component or iScript, you can also register a mobile page in the portal using the registration wizard, though the wizard is slightly different This wizard always assigns the mobile page to the MOBILE portal and does not require a node name.

To register a mobile page in the portal:

  1. Open the mobile page you want to register.

  2. Open the registration wizard.

    Use any one of three methods to do this: the Registration Wizard icon in the toolbar, from the Tools menu, or from the pop-up menu.

    The Register Mobile Page of the wizard opens.

  3. Set the Folder Name and Content Reference information.

  4. Assign the Sequence Number, Product, and Template Name.

  5. Select the Object Owner Id from the drop-down box.

  6. Select Next.

    The final screen of the wizard opens enabling you to verify the settings you have selected. The registration wizard will also add your registry entry to the current project. The Registry entry check box is selected by default.

  7. Review your settings in the output window and select Finish.

See Also

Using Mobile Pages

3.9. Two DTD Examples

Some of the best techniques for DTD design only become apparent when you look at larger documents. In this section, we'll develop DTDs that cover the two different document formats for describing people that were presented in Example 2-4 and Example 2-5 of the last chapter.

3.9.1. Data-Oriented DTDs

Data- oriented DTDs are very straightforward. They make heavy use of sequences, occasional use of choices, and almost no use of mixed content. Example 3-6 shows such a DTD. Since this is a small example, and since it's easier to understand when both the document and the DTD are on the same page, we've made this an internal DTD included in the document. However, it would be easy to take it out and put it in a separate file.

Example 3-6. A flexible yet data-oriented DTD describing people

<?xml version="1.0"?>
<!DOCTYPE person  [
  <!ELEMENT person (name+, profession*)>
                 last  CDATA #REQUIRED>
  <!-- The first and last attributes are required to be present
       but they may be empty. For example,
       <name first="Cher" last=""> -->
  <!ELEMENT profession EMPTY>
  <!ATTLIST profession value CDATA #REQUIRED>
  <name first="Alan" last="Turing"/>
  <profession value="computer scientist"/>
  <profession value="mathematician"/>
  <profession value="cryptographer"/>

The DTD here is contained completely inside the internal DTD subset. First a person ELEMENT declaration states that each person must have one or more name children, and zero or more profession children, in that order. This allows for the possibility that a person changes his name or uses aliases. It assumes that each person has at least one name but may not have a profession.

This declaration also requires that all name elements precede all profession elements. Here the DTD is less flexible than it ideally would be. There's no particular reason that the names have to come first. However, if we were to allow more random ordering, it would be hard to say that there must be at least one name. One of the weaknesses of DTDs is that it occasionally forces extra sequence order on you when all you really need is a constraint on the number of some element. Schemas are more flexible in this regard.

Both name and profession elements are empty so their declarations are very simple. The attribute declarations are a little more complex. In all three cases the form of the attribute is open, so all three attributes are declared to have type CDATA. All three are also required. However, note the use of comments to suggest a solution for edge cases such as celebrities with no last names. Comments are an essential tool for making sense of otherwise obfuscated DTDs.

3.9.2. Narrative-Oriented DTDs

Narrative-oriented DTDs tend be a lot looser and make much heavier use of mixed content than do DTDs that describe more database-like documents. Consequently, they tend to be written from the bottom up, starting with the smallest elements and building up to the largest. They also tend to use parameter entities to group together similar content specifications and attribute lists.

Example 3-7 is a standalone DTD for biographies like the one shown in Example 2-5 of the last chapter. Notice that not everything it declares is actually present in Example 2-5. That's often the case with narrative documents. For instance, not all web pages contain unordered lists, but the XHTML DTD still needs to declare the ul element for those XHTML documents that do include them. Also, notice that a few attributes present in Example 2-5 have been made into fixed defaults here. Thus, they could be omitted from the document itself, once it is attached to this DTD.

Example 3-7. A narrative-oriented DTD for biographies

<!ATTLIST biography xmlns:xlink CDATA #FIXED

<!ELEMENT person (first_name, last_name)>
<!-- Birth and death dates are given in the form yyyy/mm/dd -->
                 died CDATA #IMPLIED>

<!ELEMENT date   (month, day, year)>
<!ELEMENT month  (#PCDATA)>
<!ELEMENT day    (#PCDATA)>
<!ELEMENT year   (#PCDATA)>

<!-- xlink:href must contain a URI.-->
<!ATTLIST emphasize xlink:type (simple) #IMPLIED
                    xlink:href CDATA   #IMPLIED>

<!ELEMENT profession (#PCDATA)>
<!ELEMENT footnote   (#PCDATA)>

<!-- The source is given according to the Chicago Manual of Style
     citation conventions -->
<!ATTLIST footnote source CDATA #REQUIRED>

<!ELEMENT first_name (#PCDATA)>
<!ELEMENT last_name  (#PCDATA)>

<!ATTLIST image source CDATA   #REQUIRED
                width  NMTOKEN #REQUIRED
                height NMTOKEN #REQUIRED
                ALT    CDATA   #IMPLIED
<!ENTITY % top_level "( #PCDATA | image | paragraph | definition 
                      | person | profession | emphasize | last_name
                      | first_name | footnote | date )*">

<!ELEMENT paragraph  %top_level; >
<!ELEMENT definition %top_level; >
<!ELEMENT emphasize  %top_level; >
<!ELEMENT biography  %top_level; >

The root biography element has a classic mixed-content declaration. Since there are several elements that can contain other elements in a fairly unpredictable fashion, we group all the possible top-level elements (elements that appear as immediate children of the root element) in a single top_level entity reference. Then we can make all of them potential children of each other in a straightforward way. This also makes it much easier to add new elements in the future. That's important since this one small example is almost certainly not broad enough to cover all possible biographies.

3.7. Parameter Entities

It is not uncommon for multiple elements to share all or part of the same attribute lists and content specifications. For instance, any element that's a simple XLink will have xlink:type and xlink:href attributes, and perhaps xlink:show and xlink:actuate attributes. In XHTML, a th element and a td element contain more or less the same content. Repeating the same content specifications or attribute lists in multiple element declarations is tedious and error-prone. It's entirely possible to add a newly defined child element to the declaration of some of the elements but forget to include it in others.

For example, consider an XML application for residential real-estate listings that provides separate elements for apartments, sublets, coops for sale, condos for sale, and houses for sale. The element declarations might look like this:

<!ELEMENT apartment (address, footage, rooms, baths, rent)>
<!ELEMENT sublet    (address, footage, rooms, baths, rent)>
<!ELEMENT coop      (address, footage, rooms, baths, price)>
<!ELEMENT condo     (address, footage, rooms, baths, price)>
<!ELEMENT house     (address, footage, rooms, baths, price)>

There's a lot of overlap between the declarations, i.e., a lot of repeated text. And if you later decide you need to add an additional element, available_date for instance, then you need to add it to all five declarations. It would be preferable to define a constant that can hold the common parts of the content specification for all five kinds of listings and refer to that constant from inside the content specification of each element. Then to add or delete something from all the listings, you'd only need to change the definition of the constant.

An entity reference is the obvious candidate here. However, general entity references are not allowed to provide replacement text for a content specification or attribute list, only for parts of the DTD that will be included in the XML document itself. Instead, XML provides a new construct exclusively for use inside DTDs, the parameter entity, which is referred to by a parameter entity reference. Parameter entities behave like and are declared almost exactly like a general entity. However, they use a % instead of an &, and they can only be used in a DTD while general entities can only be used in the document content.

3.7.1. Parameter Entity Syntax

A parameter entity reference is declared much like a general entity reference. However, an extra percent sign is placed between the <!ENTITY and the name of the entity. For example:

<!ENTITY % residential_content "address, footage, rooms, baths">
<!ENTITY % rental_content      "rent">
<!ENTITY % purchase_content    "price">

Parameter entities are dereferenced in the same way as a general entity reference, only with a percent sign instead of an ampersand:

<!ELEMENT apartment (%residential_content;, %rental_content;)>
<!ELEMENT sublet    (%residential_content;, %rental_content;)>
<!ELEMENT coop      (%residential_content;, %purchase_content;)>
<!ELEMENT condo     (%residential_content;, %purchase_content;)>
<!ELEMENT house     (%residential_content;, %purchase_content;)>

When the parser reads these declarations, it substitutes the entity's replacement text for the entity reference. Now all you have to do to add an available_date element to the content specification of all five listing types is add it to the residential_content entity like this:

<!ENTITY % residential_content "address, footage, rooms,
                                baths, available_date">

The same technique works equally well for attribute types and element names. You'll see several examples of this in the next chapter on namespaces and in Chapter 9.

This trick is limited to external DTDs. Internal DTD subsets do not allow parameter entity references to be only part of a markup declaration. However, parameter entity references can be used in internal DTD subsets to insert one or more entire markup declarations, typically through external parameter entities.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.

3.6. External Unparsed Entities and Notations

Not all data is XML. There are a lot of ASCII text files in the world that don't give two cents about escaping < as &lt; or adhering to the other constraints by which an XML document is limited. There are probably even more JPEG photographs, GIF line art, QuickTime movies, MIDI sound files, and so on. None of these are well-formed XML, yet all of them are necessary components of many documents.

The mechanism that XML suggests for embedding these things in your documents is the external unparsed entity. The DTD specifies a name and a URI for the entity containing the non-XML data. For example, this ENTITY declaration associates the name turing_getting_off_bus with the JPEG image at

<!ENTITY turing_getting_off_bus
         SYSTEM ""
         NDATA jpeg>

3.6.2. Embedding Unparsed Entities in Documents

The DTD only declares the existence, location, and type of the unparsed entity. To actually include the entity in the document at one or more locations, you insert an element with an ENTITY type attribute whose value is the name of an unparsed entity declared in the DTD. You do not use an entity reference like &turing_getting_off_bus;. Entity references can only refer to parsed entities.

Suppose the image element and its source attribute are declared like this:


Then, this image element would refer to the photograph at

<image source="turing_getting_off_bus"/>

We should warn you that XML doesn't guarantee any particular behavior from an application that encounters this type of unparsed entity. It very well may not display the image to the user. Indeed, the parser may be running in an environment where there's no user to display the image to. It may not even understand that this is an image. The parser may not load or make any sort of connection with the server where the actual image resides. At most, it will tell the application on whose behalf it's parsing that there is an unparsed entity at a particular URI with a particular notation and let the application decide what, if anything, it wants to do with that information.

TIP: Unparsed general entities are not the only plausible way to embed non-XML content in XML documents. In particular, a simple URL, possibly associated with an XLink, does a fine job for many purposes, just as it does in HTML (which gets along just fine without any unparsed entities). Including all the necessary information in a single empty element like <image source = "" /> is arguably preferable to splitting the same information between the element where it's used and the DTD of the document in which it's used. The only thing an unparsed entity really adds is the notation, but that's too nonstandard to be of much use.

In fact, many experienced XML developers, including the authors of this book, feel strongly that unparsed entities are a complicated, confusing mistake that should never have been included in XML in the first place. Nonetheless, they are a part of the specification, so we describe them here.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.

2.10. Checking Documents for Well-Formedness

Every XML document, without exception, must be well-formed. This means it must adhere to a number of rules, including the following:

  1. Every start-tag must have a matching end-tag.

  2. Elements may nest, but may not overlap.

  3. There must be exactly one root element.

  4. Attribute values must be quoted.

  5. An element may not have two attributes with the same name.

  6. Comments and processing instructions may not appear inside tags.

  7. No unescaped < or & signs may occur in the character data of an element or attribute.

This is not an exhaustive list. There are many, many ways a document can be malformed. You'll find a complete list in Chapter 20. Some of these involve constructs that we have not yet discussed such as DTDs. Others are extremely unlikely to occur if you follow the examples in this chapter (for example, including whitespace between the opening < and the element name in a tag).

Whether the error is small or large, likely or unlikely, an XML parser reading a document is required to report it. It may or may not report multiple well-formedness errors it detects in the document. However, the parser is not allowed to try to fix the document and make a best-faith effort of providing what it thinks the author really meant. It can't fill in missing quotes around attribute values, insert an omitted end-tag, or ignore the comment that's inside a start-tag. The parser is required to return an error. The objective here is to avoid the bug-for-bug compatibility wars that plagued early web browsers and continue to this day. Consequently, before you publish an XML document, whether that document is a web page, input to a database, or something else, you'll want to check it for well-formedness.

The simplest way to do this is by loading the document into a web browser that understands XML documents such as Mozilla. If the document is well-formed, the browser will display it. If it isn't, then it will show an error message.

Instead of loading the document into a web browser, you can use an XML parser directly. Most XML parsers are not intended for end users. They are class libraries designed to be embedded into an easier-to-use program such as Mozilla. They provide a minimal command-line interface, if that; that interface is often not particularly well documented. Nonetheless, it can sometimes be quicker to run a batch of files through a command-line interface than loading each of them into a web browser. Furthermore, once you learn about DTDs and schemas, you can use the same tools to validate documents, which most web browsers won't do.

There are many XML parsers available in a variety of languages. Here, we'll demonstrate checking for well-formedness with the Apache XML Project's Xerces-J 1.4, which you can download from This open source package is written in pure Java so it should run across all major platforms. The procedure should be similar for other parsers, though details will vary.

To use this parser, you'll first need a Java 1.1 or later compatible virtual machine. Virtual machines for Windows, Solaris, and Linux are available from To install Xerces-J 1.4.4, just add xerces.jar and xercesSamples.jar files to your Java class path. In Java 2 you can simply put those .jar files into your jre/lib/ext directory.

The class that actually checks files for well-formedness is called sax.SAXCount. It's run from a Unix shell or DOS prompt like any other standalone Java program. The command-line arguments are the URLs to or filenames of the documents you want to check. Here's the result of running SAXCount against an early version of Example 2-5. The very first line of output tells you where the first problem in the file is. The rest of the output is a more or less irrelevant stack trace.

D:\xian\examples\02>java sax.SAXCount 2-5.xml
[Fatal Error] 2-5.xml:3:30: The value of attribute "height" must not contain the '<' character.
Stopping after fatal error: The value of attribute "height" must not contain the '<' character.
at org.apache.xerces.framework.XMLParser.reportError(
at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(
at org.apache.xerces.framework.XMLDocumentScanner.scanAttValue(
at org.apache.xerces.framework.XMLParser.scanAttValue(
at org.apache.xerces.framework.XMLDocumentScanner.scanElement(
at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.
at org.apache.xerces.framework.XMLDocumentScanner.parseSome(
at org.apache.xerces.framework.XMLParser.parse(
at org.apache.xerces.framework.XMLParser.parse(
at sax.SAXCount.print(
at sax.SAXCount.main(

As you can see, it found an error. In this case the error message wasn't particularly helpful. The actual problem wasn't that an attribute value contained a < character. It was that the closing quote was missing from the height attribute value. Still, that was enough for us to locate and fix the problem. Despite the long list of output, SAXCount only reports the first error in the document, so you may have to run it multiple times until all the mistakes are found and fixed. Once we fixed Example 2-5 to make it well-formed, SAXCount simply reported how long it took to parse the document and what it saw when it did:

D:\xian\examples\02>java sax.SAXCount 2-5.xml
2-5.xml: 140 ms (17 elems, 12 attrs, 0 spaces, 564 chars)

Now that the document has been corrected to be well-formed, it can be passed to a web browser, a database, or whatever other program is waiting to receive it. Almost any nontrivial document crafted by hand will contain well-formedness mistakes. That makes it important to check your work before publishing it.

TIP: This example works with Xerces-J 1.0 through 1.4.4. The recently released Xerces-J 2.0 provides a similar program named sax.Counter.

#Y!,xc<Cgqg'[³) z<||/P+ɦuzù@b5⽼^Q ZQ@h d14}ya1$>AovÛe>m%ޅ}C[е؅Rh蔜Ok~%e`5oo!l+\Fxc cO.q2to31*ArHeZǷ 5!UatxJ[g@ >k07޽.T6(I3o! o|Lc1u~2ZE,=48{oظ'P׸LJvS 5sQ_-7춤tkxܹn#f\>)95e\qzI|CXOy]vd)P1y&{aBcY셠拎/6XFzs!0k!xoz%yc+b^ PC[7-|w(c n%}q-y5%&x?cTEbX!Ifc÷YtoΝkn#4J lx@ pZMGI/(r@S^ xZb!8R?f!Ʒ.cYwx7|WAp;+9z7_{*9fLxIȡ.0>{h?#cJQ"[Q/=rI=v ͥYpVqmނ34B0?ypJ1?J0# iK1?J8]身+9]P^CO{/z-O@QDIh(ɤ ~|mV<֒9ܧZj!q%(sHp|`SV-GGz p%(.~bD%AUFn+a>hLE`=m͎mؓ/XO`CħG0ygLePJYy Aŧ>C0/WS'%<+*f  iXp.wr̢51! 5bF#V# GQzR|cn7CP0a[vxf( T,d'Bʼń8*M#g +b?HxN+6=Wl+f,S(9?6术?11; =uG6ee߇pA.` PEfrX4QQF z;z+bנ+ڵ̯o MgOGYbO?ZlL`~p""e ( |ƌ.뱹oQrO J{wt3{WQW\M! 8L"߼V͚(.yT?un@070zʤ_ce7o q|ȁ7s PKrBS1V}܏7ȨePVf3i+gtTzj󃕀,zWl iԲxihѐQ/B0&Ƥ~6OjxJq0H?r3ՙ z:?Vxw-4Zsv"n|Vy|]p7.!ǵOace/ ztA/R1ް-^}N~e-&5bs[ggtO%^pLZLj+ rG Qz<שFv{&G=e]9g۳] ʤ:8rχܫ7,obZ=כu[DWEЩ.ė6J<ok?}μXv>Ϧ)zZ2[\ܠBPp??nK e6o!~XKh`caY!񡕫Y69]{h_A3?(o{S6by㗍鳅L! ^%i'wW.@k_nJPos%'cc]) ٤ = i酠M=VrQrKMJПrfJOݽ,K9|A%(9NK=2!VPZ1CPO2g(2 |kw*A=5u|X=JYbpY J!,P32y!CL?1J060 ?|?7%r] 9; aJi$"+Aڅ,ݳ|ލjp޷{BXf[{o%ZJYU+%^_g.Sx7Wۍ yGIF87aBh̻"""DDDUUUwww@3f3333f333ff3fffff3f3f̙3f3333f3333333333f3333333f3f33ff3f3f3f3333f3333333f3̙333333f333ff3ffffff3f33f3ff3f3f3ffff3fffffffffff3fffffff3fff̙ffff3fffff3f̙3333f33̙3ff3ffff̙f3f̙3f̙̙3f̙3f3333f333ff3fffff̙̙3̙f̙̙̙3f̙3f3f3333f333ff3fffff3f3f̙3fݠ!,B H*\ȰÇ#JHŋ3jȱǏ CIɓ(S\ɲ˗0cʜI͛8sɳJ7*њFLjRLK}*&ԖW?f}UeW_Y=96cٔgIj̶᪥(M0˷ߢ/4]+hX␍#K6p˘'+0aCMӨSܑrA\† gyY=pDތ&aqćU>x̃/ONv=rF;\Oo5Yգ\vJʵFR0W¶NBcy9Æ^ ZMk3ĸCП*z,`_-sؕ'7P)9++SovB5ka%h7ڱGgH n"롐5㽧nx7ρ[}qSLݐsByd)xM[)-zJVCML3G~AUr65}+GE6%p _#TrhO)kkA=m{ mԓo2%zJkzɺ#(S~-%L0݈sJdڵWr^%xfRczg!ujc`f'71C%ǵuw|$⭏,W]Ko3iL#*Z<ۯWɁn{o2G+=zw* Ve͡YUYvu{v̲!n>*!XoJF>Sr|. 99~Z*P 8&ޱRDaSxA +q%1Mr)A[tZԓjQQeJJeί)wƄ*ezL_Ѧrh덝M=GT|RPRd 9IRh2?=Ef<mU6C}"fc{ʤvaSr4cJT ʸm* @ e; %რPG ݡ:H16o7!b1q!90_3%gφרAGV9c!}^߬GM _`~唈/np1͆ PG&<9Nsr^J Z!JG{ڂehK FwH~!1!n>9?-er'lj<8=0bK!Py2{)s^ȁ%hzzZO]:zZח +c<ܘ(+Ay Z*e߂S ,n:z8 šZk- A-'Ϯj`LWS JqR&B.m5 [mo9ChxV=¹iR!\PaCv,!m0Q }-CuTqM9C# r2wԢeOtQqū,eD%֢q A!g2S+8&O ?sGqy{*<Chm '~v5F0KB%emĩjpv5L1¯wH=N>&  ec)qFb٬c{qIWI ZkaOR\a;m2e=52oũ9?yWAS }#r7%E&;xVЂA\M[*8[- aok~ڢBRƮaHM[^2Q2hU ffn`GC3Bt/dPxf#5*WDUH # Uh]Λ`0tpپ=>Gs숐O[-ڱև-NЦv`LڱևpyGe!hkj]3׊Mh❉'^-'$ 2T GY^5c 燠͝Ўz?Vt[t#`܍rIom+aIʝ~B tAK#T!WkώvCf|r*9ߪ{:pڰz?0R3Գ5xգ8.\]!3UoY!g&+#wد(M!V{0!@>|}6#YU=Gc@ >gƐ"&~Ga[XE)3\@ lW%LW.LԂCB5J‹j_'%a΢6D/VL [Cv%2>+~-y/#h;쏐JXekFdQ%cZtY -Nx+ICP& |>e<ʹ"fQrh߅)M/v4X`.[BR9Bg*O߰ Ƌ]AJ#*~zx*AW|fRz{ǀQn{y;܆C^(#!IB1%g&Ifx>щam]4%6Hz1YJ!%z~˵(C91+ 䉋 >p8B+j*m_AI 0sc~-s攠/ce-Uk&QؕSŬrZs#vmpCV o(MРܰ`KqLjJ`_3ޝJGa(~S_ %%aY̌JP-zQd#}pT5N~eƌ/2J`^Jf>CPEגG -p;R/*e kǤBV⭪ȇ}7,k:#n,Ī΍Գ9Is;- +%s椷0r`-̆`lBPcۙ⟾VFiÓuCJy Q=,:O{µ#xEyZhy 9}qF7)s0#^A*A=u-Q)-Q'n'8JJvC#qH* ,(A=(a B=Y Cژe X菒1a€˚ljO+;֡U@Os\BCPKuIŤ^ FSGF)_-Bcr/GPT©_%~o)gj75eThe XML Declaration (XML in a Nutshell, 2nd Edition) Book HomeXML in a Nutshell

2.9. The XML Declaration

XML documents should (but do not have to) begin with an XML declaration. The XML declaration looks like a processing instruction with the name xml and version, standalone, and encoding attributes. Technically, it's not a processing instruction though, just the XML declaration; nothing more, nothing less. Example 2-7 demonstrates.

Example 2-7. A very simple XML document with an XML declaration

<?xml version="1.0" encoding="ASCII" standalone="yes"?>
  Alan Turing

XML documents do not have to have an XML declaration. However, if an XML document does have an XML declaration, then that declaration must be the first thing in the document. It must not be preceded by any comments, whitespace, processing instructions, and so forth. The reason is that an XML parser uses the first five characters (<?xml) to make some reasonable guesses about the encoding, such as whether the document uses a single byte or multibyte character set. The only thing that may precede the XML declaration is an invisible Unicode byte-order mark. We'll discuss this further in Chapter 5.

2.9.1. encoding

So far we've been a little cavalier about encodings. We've said that XML documents are composed of pure text, but we haven't said what encoding that text uses. Is it ASCII? Latin-1? Unicode? Something else?

The short answer to this question is "Yes." The long answer is that by default XML documents are assumed to be encoded in the UTF-8 variable-length encoding of the Unicode character set. This is a strict superset of ASCII, so pure ASCII text files are also UTF-8 documents. However, most XML processors, especially those written in Java, can handle a much broader range of character sets. All you have to do is tell the parser which character encoding the document uses. Preferably this is done through metainformation, stored in the filesystem or provided by the server. However, not all systems provide character-set metadata so XML also allows documents to specify their own character set with an encoding declaration inside the XML declaration. Example 2-8 shows how you'd indicate that a document was written in the ISO-8859-1 (Latin-1) character set that includes letters like ö and ç needed for many non-English Western European languages.

Example 2-8. An XML document encoded in Latin-1

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
  Erwin Schrödinger

The encoding attribute is optional in an XML declaration. If it is omitted and no metadata is available, then the Unicode character set is assumed. The parser may use the first several bytes of the file to try to guess which encoding of Unicode is in use. If metadata is available and it conflicts with the encoding declaration, then the encoding specified by the metadata wins. For example, if an HTTP header says a document is encoded in ASCII but the encoding declaration says it's encoded in UTF-8, then the parser will pick ASCII.

The different encodings and the proper handling of non-English XML documents will be discussed in greater detail in Chapter 5.

ujL#J(gf5C 9r߼2ewW~M (on b/Ji^Vc%A=;N,#pnsqiqF0IhGls+w{Z홛ZCP#[ %H_#e`"K}AVaoX(ߣ~eJm x!( CdP2sź,l [Ojz䇰QyBR_x(WK)~@t<+`n38CP#.g(e7omtGĵ5Z@{ wsU0RQ/x5BPc{2J0_vW9+!9oє[bV<r`ζ-6B~QZ&,R<CϾim\ ֭WtK,(į9ȲǬ!=D͌RCA8+9K*Œ20V_"Ȭ3E]>񻄠_SJCu Gz3vĔ`=l8*=Vr?yDAU;o2ǬQ9څ o?R#cL8D\h`7n^ JsMk dz(e=e<ޔ7zBe<3ӣ}3b !2J>-%}qFX2%fxc A!:@H-f3qm]mo#TCd)@jih)eK{('ePzq`  %yY!, b˔@/Cc$O+c~ ,BP?==&E@u#83虠;%STnh%8;fhʖ"ωUgZp;v:VǕyu D %u&I@ȁ ? eU@qO Ps9uoAo<&y &sO)< J#F3p DŽ-GLVZOLq?M"muv֖hvŹ1=ͪIrE yz-G5xz7r._ [˽'6fVP3ڸ%]h0ƒo!;,yC/XsDz%x_7ǞJ"J *5z;fP¶h+ZokFE PC[Q2xS[䢌x_uL=k%B1zRc%f-JgN<2X9+G-|ʴ_1;2'%ΰ9^t\ J :n> {lc2aV#͇7zXY!b9u" >DkrCW3[KͲ 24F6WJLET_)Aد3߲WX<}YD {a__Dh:szXOjRʠ`,6mZ^R!m-6v܅%% ?:O1Itҡ ] !y;] 9^t?Wax]/@-%9jQL6]3`_\ܦ}Z4[!" z;B̯[=(3JY {Gy|}I:&[NzB^m`9N@T8 [WȁwlHR1^Tٹ]6u5BiqzbBͦg\3}Oß`qNˬ~ړw)3Qfy- =S)9Ԏi¶۾Q5WdrGP76 Ox2իlv fso3YQbHr*ʽJX9,K 3ʞuI_H;=0Oj!(S:V5YK,+;cmk)@ ťcju;=}r~W<GX:ZKx nsWAn.,"Ly3Ut$ Ō<eڊ7E`܉捋BjAoR^6+!Uݬ+@-1üLY|.̚Cr%4Z1!(3&[dD!xfKA1yVb<} !Dg> E)RZAo:f 8u4!J!#\ L۔pܛ13pvrrs%Ra.7Q,Jԃ'6gPRھ#R k }-%91@m5 9 [HB82F?AR{$CלZ@A`̮BIbrћ>eGuoB%ܠJE,i!b5 9(Ϲ*A flhdƽf3u{I7 J1"҂AoK)Z&ƈ6&!ϚUו 9y<eRT"y)Л8]O[C+[(3~{ `%l+ 9C!OW6;BPfE9GCCy[Bg33J^K| +%Iz{P3Bc!2X4C׌+~3H)%hRύ녩?pg Y!,AoFk[ LZq2JΎ=YlJ FemjjȁmbJP9Ch=#mTpqgׇ7Nri(8pBsjv1BPB>n7Rϛ*y<$>xZBP-Iԣ-X٩Ęxk9k=zat3բ,lm Qqq;X5p;{wP2kb\!_ofQB)9^OZa[g~F:<{?D-ƫG$ZO$lX-BP&/[6;DqC0yx(J4?m-AX<eU@RuP*a-xVr~meܭ*9 wZP%ãAB]eh3٥WzSFdLx ]zˠOǩC1}*.BxJ ܾ#/9>j{ B0rs&*9[=ٖ3Bxf{93߿a㳰C~e{nC{{VxqgTzN՝>ޛ,~zlvO+aިA[炄qgk IF;ۑziM1MҴ(6?`%*w <_ |װ@? :Nrb mw~ ~ëy{΍2;O(`n{95P"G`Q%hiS6Xo}}cy (9yp5OF!gU,9Rxc/N%eýf]r}_e~Sifp!ո/C|kMW+7&ryʌe_vQ!q)A1|Wd~ŸY,P qH]!XDF({0;JFNzA=c޴ny?1ǫP2e6׮|'ZrBON`ۮ*5EW`:OZl"!ǵ'QwU l Re+#mi>r໺qb} YT=vMV }PGIF87aX9h̻"""DDDUUUwww@3f3333f333ff3fffff3f3f̙3f3333f3333333333f3333333f3f33ff3f3f3f3333f3333333f3̙333333f333ff3ffffff3f33f3ff3f3f3ffff3fffffffffff3fffffff3fff̙ffff3fffff3f̙3333f33̙3ff3ffff̙f3f̙3f̙̙3f̙3f3333f333ff3fffff̙̙3̙f̙̙̙3f̙3f3f3333f333ff3fffff3f3f̙3fݠ!,X9 H*\ȰÇ#JHŋ3jȱǏ CIɓ(S\ɲ˗0cʜI!p숳N:sTЌAoE*P)PEYiN1VԪeW\Jr*SJN5;4ګOߢ ۴Z[j0TrIv)ýu +^̸ǐ##L[Xr_–ͼheΛ9' -z2ңQV/װc˞M۸s ;?ܿy٢ 9R!F%q巂u Ij WJ0K`Ƃ_*GZ ^^7'#.s XHJb+"atlu>r`ϖx 7eė_:!1zcE#0 L~ž8:̀vtBЗJ5G |%I!ɓAya #*OvOB۴Lx0Oq3(9+-Bȡm )PL̛-l M!(x3OF8#Գf3̇ψ7#bzsg71Ϣh}m ( bEۯXO~]l !f ˈJxba!-AsbQJ#Glj9=M\JPZ$lF:ԣ >K ,}p-?C0{}W ,~0plsHnT~!t<ۂKarz*TCгu)լ!0ʏU#кԂf$Lޯ4 r#qOߡČ!-X<{b" &-%[yAڸ8BПx1,Ϩ!=Fxc%*2`2)N25Mq渕`6?QCɀB)0pAaew*XGܿi5~"Ms݉JP){Az4 vw1f!J/Z{퇰| BPP{6rĶP ʏkDPr}[<rcG~v%1yY|ݿ J۟LMnef|ݧ "{S #_mR(AFL!!(S7EPr`\!ID-aΤ *J ӾܿwQY`R(QCgYXL4i] >oK!s"g(Ao:F M8rTYJLG!pA$ZdOBo%8u6g,hA϶3w'x;*T;ӻz{^bX :Vў|T~#=x۞[rE ƕ khiQ5 Ǽ|{,e2op8{Vbd P14̤ah)C.py9Y7/eLj5)@1E tkI T '<6!cXby(CP,|Z=!(nmYF vr#ݯ G wr߅Po[vBPWtpBKxjcJ9iRPӡN2DYe}mO(nTԒUD cS!2zjP%ERkH Zjx3FUv<σeg;L:E6|emꛚ;SiG&H+U)phGA:Z2+sC`VBE)^}GPlRzσpkGD,(A#qۃCφ qyfcg?/[S}q2uIt?iF]xZo4sYB2fXeN<}R̯ChA?A,E%GH< %(F?B ]eվr34J!ղRJ!Fp84akD\PZ2j#~ŬB GV8P'{K:!!oF> ΍ȕC8EO>EOsð,  2;X3r$mV2Xm#Y|&g?xx,g6#LHx>E=B05[!Y.0 z( 8௘!0DAvpy;:(9{_&3:<T~[n ! ==?rWo$J{4hBc=e gZ7z}\&Vv5_ ̯3o!rh.Z9XJd3(A=Ղ}_J<rM@Ɂ#X !:^CPOƏeqWDL $0*>Y*=*GP&Q҄PAz\}K%;"E(@-? <Ц`v ۴X_)z|N2ӧ?-S@Д8pWo> %ǵ|Q"hΦz: b 9jM -Uf|~JbIW;b*(Az<?Ќ]̡lS̡ %k5y+9skx{Ozcq߄ho:2#@xh{>- 5zN%_-(A2FW2Ўx|e/m-<"qJkL y?Ͽz~cD=Jt@W C/$4`{5`-=[J0|%q6azz H`^ؓoi=ΐZχLjh7퐧*LzeHL$=BCVBXKj(Cr܏Jq1[e#ش{BLGioxȁP𬧰OB'hG!" zF>6W'GP}Wpo5ye Z DU-ED`^tyUrd>pFPrx{Gr+….x@WYV ʔsI񬝾'KWqE%(Mh#JMFJdU䌮`LK`5^CPDx@0SBp;qK[\-p)y=1IЛ?f3); ~X?&ؒRj霫|JPK1[EjqDFd8ԨwՄ{2ދN+I'ewZF(!zV${KN+=ˎ0ߝ!)A(AšLjQ[xjJcչ'G xWD^D%rNi ewP!yԛzLm#Y'x!ga=-5le!oth\V-U<0z!h˽ZV:wAw.`;/!%p2v> F9vG=%xXmrfI;Cw'7x*!OC( /Z(Q2{z5g\`~#T<<εVrkZ|CS~AO7{^v##0`E xM3I}p:kq?Jh4 a&{!쳯-}2ɿm }YY=J,|bBp,',A;%HʴQ0߅γ`FJ G  9Q3ϐ A=m)3z*VM|?|%!,c2B>a0CFOk_d\zDMb@H7u s9-0>"4bxNVEݽׇ:%(p[x<km:a)⒭7kVAQ2oEiu!ލԈ$fg#j1К B ~QZ0wZG3yyvݑ˩w1&Rc8vBU9?R7#~-FxڱxO`;J_~P4H U2?Oȡ=dDz [#ej|vdRrhQ/6`uGP|Mesu*7j7sZsHꔜ? $&L33^w?¶dj !(r~2J0fT#P^.n)#c/ې`Fy4@rMxwmxxl){/ \sk! UG {>3RZ50J{x(C,Yr&!-OVcJPfMn! q͸@qJ0@:x+2uJ0'N /B J[J(+y!"{]χՔEB+B T9)L}BX&np=_47=_䪝sd\%BPϢ"׈ӾG_7*%A2_~[HL2Z~\uIC8ݹZ"PqGoq 7$'H{J ! C>sKIF\J׻-FGz*yDF|U7WPX#HH7qT0S<u54 W\@_;"#3` aY-%(#Fk|~)HoUK!z,? Dr^%VƢE#¯r^XKJœW/a%,kш-{7#zWlOe*Ai׊,{* :A^Iyʪ|]dUBP&-7o`IH?BiK)XB(wkD(WE|8{W o }~ 47J#ۢΰ9ً+V9|C㴼Ordv'+W}2gx5] ͣ|w޺Hg%sT[|!2S >V I?7Z%;-Xdˣ ~Ō#h 6?PB~r4/4>y^o4ҹAO~7c䞱P˞6J83Vxc&ھ=IRE- !.jᦘJDZVJ Jx{FzyۃRX)a[n^%hW OBX0n ;GH᾿1fW<"  &6B֧Pl3Ƭ o^8":`mHB-b+= 9,jw2b/$_q B%$Sݹ-u~M&c˦ҰG٣ a= n=Bɕݠu`g<=ҢGwoD 1|{P;&w2JЗt!1Q lG?@yt9}R<# @o!ʹ=#ĶJ[\w[=J(\D9fF ڜyrs1yp 9:N PsW Aex:H}7VxleX.~@kZ߭%/{D=o0CXO#$FśAwX9qDHxDn7Hy I y!(SEk&%Pߦw"VH2O9M xfu"E\%s*`:NIxewB8w\Bb\J?_-uPȁ{Ȍ q+%v*O-Ra ` 3,z_^-"*9fyF%IJOV!JXzsw+A=k/|_sx\W!*&o}>`U.\șg5ȞRŕ1=W $_U#Spth9㜰Q)aK@U%XPn˳oǦC8{ XNr[ l _znʑ@bAem7Hc^7ӵi9p]z0)gqM2ˤ5P9w?< lXI%E\X%q}WJU0cLJ %moݛwFyUr`;1bX i vi3ke!*û8{!MӯsX I!6X,l^6_BЖ3;\B8;nr{Ύ_f0 e૽i-<{!IXDўNh3#%ZZͪђ[xڨ)a=1 @=z2Ϲ*p!+X=⊸%^fR8qۣbe67f;PU $![aGkXۣ]eO.گJ|iXٹ`n3⭇zT 3 r>_ePbs=gX z\VxZJs .HS# ۿ[ fȾO4WEyȡ-9D]TTeu\Z7(A=kˮq*!:b&^Y>其P_h1Cf=JX]6AΚ"~2 2Jp_vGː]RɁ/mu0V8Z @ edQk,7?J%u?䇠˶w>!5G$$%6Qk%Mrs`A %;I @=.Y+e{[rЮW_OQF F^-FF>]C=JI oDfU @yB%Sbx(A;&(A;E~j 'L mQ2qK=r܆bA&L %."!hn 4s0M)A[gqW/LxC@pT/Rr`2ӱ ;%ۘ:6Bӫ`6*c{F؅?nAo#Qr<#ŕAttributes (XML in a Nutshell, 2nd Edition) Book HomeXML in a Nutshell

2.3. Attributes

XML elements can have attributes. An attribute is a name-value pair attached to the element's start-tag. Names are separated from values by an equals sign and optional whitespace. Values are enclosed in single or double quotation marks. For example, this person element has a born attribute with the value 1912-06-23 and a died attribute with the value 1954-06-07:

<person born="1912-06-23" died="1954-06-07">
  Alan Turing

This next element is exactly the same as far an XML parser is concerned. It simply uses single quotes instead of double quotes, puts some extra whitespace around the equals signs, and reorders the attributes.

<person died = '1954-06-07'  born = '1912-06-23' >
  Alan Turing

The whitespace around the equals signs is purely a matter of personal aesthetics. The single quotes may be useful in cases where the attribute value itself contains a double quote. Attribute order is not significant.

Example 2-4 shows how attributes might be used to encode much of the same information given in the data-oriented document of Example 2-2.

Example 2-4. An XML document that describes a person using attributes

  <name first="Alan" last="Turing"/>
  <profession value="computer scientist"/>
  <profession value="mathematician"/>
  <profession value="cryptographer"/>

This raises the question of when and whether one should use child elements or attributes to hold information. This is a subject of heated debate. Some informaticians maintain that attributes are for metadata about the element while elements are for the information itself. Others point out that it's not always so obvious what's data and what's metadata. Indeed, the answer may depend on where the information is put to use.

What's undisputed is that each element may have no more than one attribute with a given name. That's unlikely to be a problem for a birth date or a death date; it would be an issue for a profession, name, address, or anything else of which an element might plausibly have more than one. Furthermore, attributes are quite limited in structure. The value of the attribute is simply a text string. The division of a date into a year, month, and day with hyphens in the previous example is at the limits of the substructure that can reasonably be encoded in an attribute. Consequently, an element-based structure is a lot more flexible and extensible. Nonetheless, attributes are certainly more convenient in some applications. Ultimately, if you're designing your own XML vocabulary, it's up to you to decide when to use which.

Attributes are also useful in narrative documents, as Example 2-5 demonstrates. Here it's perhaps a little more obvious what belongs to elements and what to attributes. The raw text of the narrative is presented as character data inside elements. Additional information annotating that data is presented as attributes. This includes source references, image URLs, hyperlinks, and birth and death dates. Even here, however, there's more than one way to do it. For instance, the footnote numbers could be attributes of the footnote element rather than character data.

Example 2-5. A narrative XML document that uses attributes

<biography xmlns:xlink="">

  <image source=""
  width="152" height="345"/>
  <person born='1912-06-23'
  <last_name>Turing</last_name> </person> was one of the first people
  to truly deserve the name <emphasize>computer scientist</emphasize>.
  Although his contributions to the field were too numerous to list,
  his best-known are the eponymous <emphasize xlink:type="simple"
  Test</emphasize> and <emphasize  xlink:type="simple"

  <last_name>Turing</last_name> was also an accomplished
  <profession>mathematician</profession> and
  <profession>cryptographer</profession>. His assistance
  was crucial in helping the Allies decode the German Enigma
  machine.<footnote source="The Ultra Secret, F.W. Winterbotham,

  He committed suicide on <date><month>June</month> <day>7</day>,
  <year>1954</year></date> after being convicted of homosexuality
  and forced to take female hormone injections.<footnote
  source="Alan Turing: the Enigma, Andrew Hodges, 1983">2</footnote>

1.3. How XML Works

Example 1-1 shows a simple XML document. This particular XML document might be seen in an inventory-control system or a stock database. It marks up the data with tags and attributes describing the color, size, bar-code number, manufacturer, name of the product, and so on.

Example 1-1. An XML document

<?xml version="1.0"?>
<product barcode="2394287410">
  <name>DataLife MF 2HD</name>
  <description>floppy disks</description>

This document is text and might well be stored in a text file. You can edit this file with any standard text editor such as BBEdit, jEdit, UltraEdit, Emacs, or vi. You do not need a special XML editor. Indeed, we find most general-purpose XML editors to be far more trouble than they're worth and much harder to use than simply editing documents in a text editor.

Programs that actually try to understand the contents of the XML document--that is, do more than merely treat it as any other text file--will use an XML parser to read the document. The parser is responsible for dividing the document into individual elements, attributes, and other pieces. It passes the contents of the XML document to an application piece by piece. If at any point the parser detects a violation of the well-formedness rules of XML, then it reports the error to the application and stops parsing. In some cases the parser may read further in the document, past the original error, so that it can detect and report other errors that occur later in the document. However, once it has detected the first well-formedness error, it will no longer pass along the contents of the elements and attributes it encounters.

Individual XML applications normally dictate more precise rules about exactly which elements and attributes are allowed where. For instance, you wouldn't expect to find a G_Clef element when reading a biology document. Some of these rules can be precisely specified with a schema written in any of several languages including the W3C XML Schema Language, RELAX NG, and DTDs. A document may contain a URI indicating where the schema can be found. Some XML parsers will notice this and compare the document to its schema as they read it to see if the document satisfies the constraints specified there. Such a parser is called a validating parser . A violation of those constraints is called a validity error , and the whole process of checking a document against a schema is called validation. If a validating parser finds a validity error, it will report it to the application on whose behalf it's parsing the document. This application can then decide whether it wishes to continue parsing the document. However, validity errors are not necessarily fatal (unlike well-formedness errors), and an application may choose to ignore them. Not all parsers are validating parsers. Some merely check for well-formedness.

The application that receives data from the parser may be:

  • A web browser such as Netscape Navigator or Internet Explorer that displays the document to a reader

  • A word processor such as StarOffice Writer that loads the XML document for editing

  • A database such as Microsoft SQL Server that stores the XML data in a new record

  • A drawing program such as Adobe Illustrator that interprets the XML as two-dimensional coordinates for the contents of a picture

  • A spreadsheet such as Gnumeric that parses the XML to find numbers and functions used in a calculation

  • A personal finance program such as Microsoft Money that sees the XML as a bank statement

  • A syndication program that reads the XML document and extracts the headlines for today's news

  • A program that you yourself wrote in Java, C, Python or some other language that does exactly what you want it to do

  • Almost anything else

XML is an extremely flexible format for data. It is used for all of this and a lot more. These are real examples. In theory, any data that can be stored in a computer can be stored in XML format. In practice, XML is suitable for storing and exchanging any data that can plausibly be encoded as text. It's only really unsuitable for multimedia data such as photographs, recorded sound, video, and other very large bit sequences.

XML in a Nutshell is a comprehensive guide to the rapidly growing world of XML. It covers all aspects of XML, from the most basic syntax rules, to the details of DTD and schema creation, to the APIs you can use to read and write XML documents in a variety of programming languages.

0.1. What This Book Covers

There are hundreds of formally established XML applications from the W3C and other standards bodies, such as OASIS and the Object Management Group. There are even more informal, unstandardized applications from individuals and corporations, such as Microsoft's Channel Definition Format and John Guajardo's Mind Reading Markup Language. This book cannot cover them all, any more than a book on Java could discuss every program that has ever been or might ever be written in Java. This book focuses primarily on XML itself. It covers the fundamental rules that all XML documents and authors must adhere to, whether a web designer uses SMIL to add animations to web pages or a C++ programmer uses SOAP to exchange serialized objects with a remote database.

This book also covers generic supporting technologies that have been layered on top of XML and are used across a wide range of XML applications. These technologies include:

An attribute-based syntax for hyperlinks between XML and non-XML documents that provide the simple, one-directional links familiar from HTML, multidirectional links between many documents, and links between documents to which you don't have write access.

An XML application that describes transformations from one document to another, in either the same or different XML vocabularies.

A syntax for URI fragment identifiers that selects particular parts of the XML document referred to by the URI--often used in conjunction with an XLink.

A non-XML syntax used by both XPointer and XSLT for identifying particular pieces of XML documents. For example, an XPath can locate the third address element in the document, or all elements with an email attribute whose value is

A means of distinguishing between elements and attributes from different XML vocabularies that have the same name; for instance, the title of a book and the title of a web page in a web page about books.

An XML vocabulary for describing the permissible contents of XML documents from other XML vocabularies.

The Simple API for XML, an event-based application programming interface implemented by many XML parsers.

The Document Object Model, a language-neutral tree-oriented API that treats an XML document as a set of nested objects with various properties.

An XMLized version of HTML that can be extended with other XML applications such as MathML and SVG.

The Resource Directory Description Language, an XML application based on XHTML for documents placed at the end of namespace URLs.

All these technologies, whether defined in XML (XLinks, XSLT, Namespaces, Schemas, XHTML, and RDDL) or in another syntax (XPointers, XPath, SAX, and DOM), are used in many different XML applications.

This book does not specifically cover XML applications that are relevant to only some users of XML, such as:

Scalable Vector Graphics, a W3C-endorsed standard XML encoding of line art.

The Mathematical Markup Language, a W3C-endorsed standard XML application used for embedding equations in web pages and other documents.

The Resource Description Framework, a W3C-standard XML application used for describing resources, with a particular focus on the sort of metadata one might find in a library card catalog.

Occasionally we use one or more of these applications in an example, but we do not cover all aspects of the relevant vocabulary in depth. While interesting and important, these applications (and hundreds more like them) are intended primarily for use with special software that knows their format intimately. For instance, most graphic designers do not work directly with SVG. Instead, they use their customary tools, such as Adobe Illustrator, to create SVG documents. They may not even know they're using XML.

This book focuses on standards that are relevant to almost all developers working with XML. We investigate XML technologies that span a wide range of XML applications, not those that are relevant only within a few restricted domains.

