X-Sampa / CXS

(Customisable to normal X-Sampa)

Conlang X-Sampa: modified X-Sampa used on the Conlang Mailing List

[ Consonants | Non-Pulmonic | Vowels | Other | Diacritics | Suprasegmentals | Tones | Libraries | Tabular ]

Unicode to CXS Mapping

The following table lists the Unicode, shows the IPA symbol and the ASCII equivalent in the CXS encoding.

The modifications CXS introduces are marked in bold and in green color.

Unicode HTML IPA CXS Comment

C1 Controls and Latin-1 Supplement

U+00E6 æ  æ  & small ae
U+00E7 ç  ç  C small c with cedilla
U+00F0 ð  ð  D small eth
U+00F8 ø  ø  2 small o with stroke

C0 Controls and Basic Latin

U+0020       space is a general break for keeping the overview
U+002E .  .  . syllable break
U+007C |  |  | minor intonation break
U+0061 a  a  a open front unrounded vowel
U+0062 b  b  b voiced bilabial stop
U+0063 c  c  c voiceless palatal stop
U+0064 d  d  d voiced alveolar stop
U+0065 e  e  e front close-mid unrounded vowel
U+0066 f  f  f voiceless labialdental fricative
U+0068 h  h  h voiceless glottal fricative
U+0069 i  i  i front close unrounded vowel
U+006A j  j  j voiced palatal approximant
U+006B k  k  k voiceless velar stop
U+006C l  l  l voiced alveolar lateral approximant
U+006D m  m  m voiced bilabial nasal
U+006E n  n  n voiced alveolar nasal
U+006F o  o  o back close-mid rounded vowel
U+0070 p  p  p voiceless bilabial stop
U+0071 q  q  q voiceless uvular stop
U+0072 r  r  r voiced alveolar trill
U+0073 s  s  s voiceless alveolar fricative
U+0074 t  t  t voiceless alveolar stop
U+0075 u  u  u back close rounded vowel
U+0076 v  v  v voiced labiodental fricative
U+0077 w  w  w voiced labiovelar approximant
U+0078 x  x  x voiceless velar fricative
U+0079 y  y  y front close rounded vowel
U+007A z  z  z voiced alveolar fricative

Latin Extended-A

U+0127 ħ  ħ  X\ pharyngeal voiceless fricative
U+014B ŋ  ŋ  N eng
U+0153 œ  œ  9 oe

Latin Extended-B

U+0180 ƀ  ƀ  B b with stroke: same as beta
U+01A5 &#421;  ƥ  p_< p with hook: voiceless implosive?
U+01AB &#427;  ƫ  t_j t with palatal hook: ancient for t_j
U+01AD &#429;  ƭ  t_< t with hook: voiceless implosive?
U+01BB &#443;  ƻ  dz) two with stroke: archaic for [dz]; use U+02A3 instead
U+01C0 &#448;  ǀ  |\ dental click
U+01C1 &#449;  ǁ  |\|\ lateral click
U+01C2 &#450;  ǂ  =\ alveolar click
U+01C3 &#451;  ǃ  !\ retroflex click

IPA Extensions

U+0250 &#592;  ɐ  6 turned a
U+0251 &#593;  ɑ  A script a / latin letter alpha
U+0252 &#594;  ɒ  Q turned script a / latin letter turned alpha
U+0253 &#595;  ɓ  b_< b with hook
U+0254 &#596;  ɔ  O open o
U+0255 &#597;  ɕ  s\ c with curl
U+0256 &#598;  ɖ  d` d with tail
U+0257 &#599;  ɗ  d_< d with hook
U+0258 &#600;  ɘ  @\ reversed e
U+0259 &#601;  ə  @ turned e / schwa
U+025A &#602;  ɚ  @` schwa with hook: @ + rhoticity
U+025B &#603;  ɛ  E epsilon
U+025C &#604;  ɜ  3 reversed epsilon
U+025D &#605;  ɝ  3` reversed epsilon with hook: 3 + rhoticity
U+025E &#606;  ɞ  3\ closed epsilon
U+025F &#607;  ɟ  J\ dotless j with stroke: vcd. palatal stop
U+0260 &#608;  ɠ  g_< g with hook: implosive
U+0261 &#609;  ɡ  g script g == g = vcd. velar stop
U+0262 &#610;  ɢ  G\ sc g = vcd. uvul. stop
U+0263 &#611;  ɣ  G gamma: vcd. velar fric.
U+0264 &#612;  ɤ  7 rams horn / baby gamma: unr. back. high-mid vowel
U+0265 &#613;  ɥ  H turned h: vcd. lab.-pal. approx.
U+0266 &#614;  ɦ  h\ h with hook: vcd. glot. fric.
U+0267 &#615;  ɧ  x\ heng with hook: sim. S and x
U+0268 &#616;  ɨ  1 i with stroke
U+0269 &#617;  ɩ  I letter iota. obsoleted by sc i (U+026A)
U+026A &#618;  ɪ  I sc i
U+026B &#619;  ɫ  5 l with middle tilde: velarised l
U+026C &#620;  ɬ  K l with belt: vcl. alv. lat. fric.
U+026D &#621;  ɭ  l` l with retrfl. hook: retroflex lateral
U+026E &#622;  ɮ  K\ lezh: vcd. alv. lat. fric.
U+026F &#623;  ɯ  M turned m
U+0270 &#624;  ɰ  M\ turned m with long leg
U+0271 &#625;  ɱ  F turned m with hook
U+0272 &#626;  ɲ  J n with left hook
U+0273 &#627;  ɳ  n` n with retrfl. hook
U+0274 &#628;  ɴ  N\ sc n
U+0275 &#629;  ɵ  8 barred o
U+0276 &#630;  ɶ  &\ sc oe
U+0277 &#631;  ɷ  U closed omega: better use small upsilon (U+028A)
U+0278 &#632;  ɸ  p\ phi
U+0279 &#633;  ɹ  r\ turned r
U+027A &#634;  ɺ  l\ turned r with long leg
U+027B &#635;  ɻ  r\` turned r with hook
U+027C &#636;  ɼ  r\_r r with long leg: obsoleted by r + raised
U+027D &#637;  ɽ  r` r with tail
U+027E &#638;  ɾ  4 r with fishhook
U+027F &#639;  ɿ  z= reversed r with fishhook: apical dental vowel: use syllabic z
U+0280 &#640;  ʀ  R\ sc r: uvul. trill
U+0281 &#641;  ʁ  R inverted sc r: uvul. vcd. fric.
U+0282 &#642;  ʂ  s` s with hook
U+0283 &#643;  ʃ  S esh
U+0284 &#644;  ʄ  J\_< dotless j with stroke and hook: patal. vcd. impl.
U+0285 &#645;  ʅ  z`= squat reversed esh: apical retr. vowel: use syllabic z`
U+0286 &#646;  ʆ  S_j esh with curl: use palatal S
U+0287 &#647;  ʇ  |\ turned t: dental click. obsolete: use U+01C0
U+0288 &#648;  ʈ  t` t with retrfl. hook
U+0289 &#649;  ʉ  u\ letter u bar
U+028A &#650;  ʊ  U upsilon
U+028B &#651;  ʋ  v\ v with hook
U+028C &#652;  ʌ  V turned v
U+028D &#653;  ʍ  W turned w
U+028E &#654;  ʎ  L turned y: palat. lat. approx.
U+028F &#655;  ʏ  Y sc y
U+0290 &#656;  ʐ  z` z with retr. hook
U+0291 &#657;  ʑ  z\ z with curl
U+0292 &#658;  ʒ  Z ezh / yogh
U+0293 &#659;  ʓ  Z_j ezh with curl: palatalised vcd. postalv. fric.
U+0294 &#660;  ʔ  ? glottal stop
U+0295 &#661;  ʕ  ?\ pharyngeal vcd. fric.
U+0296 &#662;  ʖ  |\|\ inverted glottal stop: lateral click. obsolete: use U+01C1
U+0297 &#663;  ʗ  !\ stretched c: palatal / alveolar click. obsolete: use U+01C3
U+0298 &#664;  ʘ  O\ bilabial click
U+0299 &#665;  ʙ  B\ sc b
U+029A &#666;  ʚ  &\ closed open E (sic!): non-IPA for sc oe, use U+0276 instead
U+029B &#667;  ʛ  G\_< sc g with hook
U+029C &#668;  ʜ  H\ sc h: vcl. epiglottal fric.
U+029D &#669;  ʝ  j\ j with crossed-tail
U+029F &#671;  ʟ  L\ sc l: velar lat. approx.
U+02A0 &#672;  ʠ  q_< q with hook: vcl. uvul. impl.
U+02A1 &#673;  ʡ  >\ glottal stop with stroke: voiced epiglottal stop
U+02A2 &#674;  ʢ  <\ reversed glottal stop with stroke: voiced epiglottal stop
U+02A3 &#675;  ʣ  dz) dz digraph: FIXME: Unicode 3.2 says: 'vcd. dental affricate', but would that not rather be dD) or d_dD) then?
U+02A4 &#676;  ʤ  dZ) dezh digraph
U+02A5 &#677;  ʥ  dz\) dz digraph with curl
U+02A6 &#678;  ʦ  ts) ts digraph: FIXME: see U+02A3 above
U+02A7 &#679;  ʧ  tS) tesh digraph
U+02A8 &#680;  ʨ  ts\) ts digraph with curl
U+02A9 &#681;  ʩ  fN) feng digraph
U+02AA &#682;  ʪ  ls) ls digraph
U+02AB &#683;  ʫ  lz) lz digraph
U+02AC &#684;  ʬ  ._w_w FIXME: I made this up!
U+02AD &#685;  ʭ  ._d_d FIXME: I made this up!

Spacing Modifier Letters

U+02B0 &#688;  ʰ  _h aspirated; small h
U+02B1 &#689;  ʱ  _t small h with hook: breathy voiced
U+02B2 &#690;  ʲ  _j small j: palatalised
U+02B7 &#695;  ʷ  _w small w
U+02B8 &#696;  ʸ  _j small y: usage= small j
U+02BC &#700;  ʼ  _> apostrophy: glottalised/ejective: FIXME: do you use this for Korean, too?
U+02C0 &#704;  ˀ  _> small glot.stop: glottalised/ejective; better use U+02BC
U+02C7 &#711;  ˇ  _F_R caron: falling-rising (Mandarin 3rd) tone
U+02C8 &#712;  ˈ  ' primary stress
U+02C9 &#713;  ˉ  _T spacing macron: high (Mandarin 1st) tone; use U+030B instead
U+02CC &#716;  ˌ  " letter vert. line below: secondary stress
U+02CD &#717;  ˍ  _L letter low macro: low level tone; use U+0300 instead
U+02CE &#718;  ˎ  _L_B letter low grave: low-falling tone
U+02CF &#719;  ˏ  _B_L low-rising tone
U+02D0 &#720;  ː  : triangular colon: length mark
U+02D1 &#721;  ˑ  :\ half triangular colon: half-long
U+02D2 &#722;  ˒  _O centred right half ring: more rounded; use U+0339 instead
U+02D3 &#723;  ˓  _c centred left half ring: less rounded; use U+031C instead
U+02D4 &#724;  ˔  _r up tack: raised; use U+031D instead
U+02D5 &#725;  ˕  _o down tack: lowered; use U+031E instead
U+02D6 &#726;  ˖  _+ plus: advanced; use U+031F instead
U+02D7 &#727;  ˗  _- minus: retracted; use U+0320 instead
U+02D8 &#728;  ˘  _X breve: extra-short; use U+0306 instead
U+02DA &#730;  ˚  _0 ring above: voiceless (on characters with a descender); use U+030A instead
U+02DC &#732;  ˜  ~ small tilde: nasalised; use U+0303 instead
U+02DD &#733;  ˝  _T double acute accent: extra high level tone; use U+030B instead
U+02DE &#734;  ˞  ` rhotic hook
U+02E0 &#736;  ˠ  _G small gamma: velarised
U+02E1 &#737;  ˡ  _l small l: lateral release
U+02E4 &#740;  ˤ  _?\ reversed glot.stop: pharyngealised
U+02E5 &#741;  ˥  _T level tone: extra high (top); accent variant is U+030B
U+02E6 &#742;  ˦  _H level tone: high; accent variant is U+0301
U+02E7 &#743;  ˧  _M level tone: mid; accent variant is U+0304
U+02E8 &#744;  ˨  _L level tone: low; accent variant is U+0300
U+02E9 &#745;  ˩  _B level tone: extra low (bottom); accent variant is U+030F
U+02EC &#748;  ˬ  _v letter voicing; use U+032C instead

Combining Diacritical Marks

U+0300 &#768;  ̀  _L grave: low level tone
U+0301 &#769;  ́  _H acute: high level tone
U+0302 &#770;  ̂  _F circum: falling
U+0303 &#771;  ̃  ~ nasalised
U+0304 &#772;  ̄  _M macron: mid level tone
U+0306 &#774;  ̆  _X breve: extra-short
U+0308 &#776;  ̈  _" diaeresis: centralised
U+030A &#778;  ̊  _0 ring above: voiceless
U+030B &#779;  ̋  _T double acute: extra-high level tone
U+030C &#780;  ̌  _R caron: rising
U+030D &#781;  ̍  = vertical line above: syllabic (e.g. N=)
U+030F &#783;  ̏  _B double grave: extra-low level tone
U+0318 &#792;  ̘  _A left tack below: advanced tongue root
U+0319 &#793;  ̙  _q right tack below: advanced tongue root
U+031A &#794;  ̚  _} left angle above: no audible release
U+031C &#796;  ̜  _c left halt ring below: more open
U+031D &#797;  ̝  _r up tack below: raised
U+031E &#798;  ̞  _o down tack below: lowered
U+031F &#799;  ̟  _+ plus sign below: advanced or fronted
U+0320 &#800;  ̠  _- minus sign below: retracted or backed
U+0321 &#801;  ̡  _j palatised hook
U+0322 &#802;  ̢  ` retroflex hook
U+0324 &#804;  ̤  _t diaeresis below: breathy voiced
U+0325 &#805;  ̥  _0 ring below: voiceless
U+0328 &#808;  ̨  ~ combining ogonek
U+0329 &#809;  ̩  = vert line below: syllabic
U+032A &#810;  ̪  _d bridge below: dental
U+032B &#811;  ̫  _w double arch below: labialisation
U+032C &#812;  ̬  _v caron below: voiced
U+032F &#815;  ̯  _^ inv. breve below: non-syllabic
U+0330 &#816;  ̰  _k tilde below: creaky voiced
U+0334 &#820;  ̴  _e tilde overlay: velarisation/pharyngealisation
U+0339 &#825;  ̹  _O right half ring below: more rounded
U+033A &#826;  ̺  _a inv. bridge below: apical
U+033B &#827;  ̻  _m square below: laminal
U+033C &#828;  ̼  _N inv. double arch/seagull below: linguolabial
U+033D &#829;  ̽  _x x above: mid-centralised
U+0361 &#865;  ͡  _ combining double inverted breve: linking. A special algorithm needs to convert a_i into ai)

Greek and Coptic

U+03B2 &#946;  β  B beta
U+03B8 &#952;  θ  T theta
U+03C7 &#967;  χ  X chi: uvul. vcl. fric.

General Punctuation

U+2016 &#8214;  ‖  || double vertical line: major (intonation) group
U+203F &#8255;  ‿  -\ linking, absence of a break

Superscripts and Subscripts

U+207F &#8319;  ⁿ  _n small n: nasal release


U+2191 &#8593;  ↑  ^ upwards arrow: FIXME: Unicode 3.2 says: ingressive airflow?? I translated this as upstep.
U+2193 &#8595;  ↓  ! downwards arrow: FIXME: same thing. I translated this as downstep.
U+2197 &#8599;  ↗  <R> ne arrow: global rise
U+2198 &#8600;  ↘  <F> se arrow: global fall


January 30th, 2005
Comments? Suggestions? Corrections? You can drop me a line.