MangaBaka Japanese Romanization YAML Ruleset

``` mangabaka_japanese_romanization: # METADATA AND LICENSE name: "MangaBaka Japanese Romanization" version: "2.0-alpha2" copyright: "Oakminati - MangaBaka Team https://mangabaka.org/" license: license_type: "CC BY-SA" license_description: > This MangaBaka Romanization style guide (v2.0) is released under the Creative Commons Attribution-ShareAlike 4.0 International License. You are free to share, copy, and adapt this work for any purpose, even commercially, as long as proper attribution is given and any derivatives are distributed under the same license. Full license details: https://creativecommons.org/licenses/by-sa/4.0/ general_note: > For volunteers: Some rules, especially capitalization rules, may seem complex at first. Proper spacing of elements is more important than perfect capitalization. If a formal noun within a bridge is accidentally capitalized, this is primarily an aesthetic issue and does not break the structural correctness of the title. Likewise, capitalizing the verbal element of a bridge (e.g., resembling a 'ga Shiteiru' hook) is visually inconsistent but structurally recoverable. Accurate spacing allows for easier review and correction during later refinement. # LLM RESPONSE llm_instructions: main_objective: "Romanize the given Japanese native title according to this style document" response_language: "English" response_format: structure: - "Romanized title first, in plain text" - "No quotation marks or surrounding symbols unless part of the original title" - "Follow with a short structured breakdown of the romanization" - "Include applied rules and relevant notes for review" conflict_handling: rule: > If any ambiguity or rule conflict arises between the title and this style document, explicitly identify the issue and explain which rule was prioritized. refinement_instruction: > Clearly flag areas where the style document may require refinement. # BASE ROMANIZATION SYSTEM phonological_romanization_system: base_system: name: "Modified Hepburn (Revised Hepburn)" scope: "Phonological transliteration rules only" base_system_refresher: particles: ha: "When は is used as a particle, always romanized as 'wa'" he: "When へ is used as a particle, always romanized as 'e'" wo: "When を is used as a particle, always romanized as 'o'" example: "Watashi wa Gakkou e Tomodachi o Mukae ni Ikimasu" syllabic_n: rule: "ん is always romanized as 'n'" apostrophe_usage: "Use n' when followed by a vowel or 'y' to avoid ambiguity" examples: ["Ten'i", "Kon'yaku", "Onna"] zu_variants: rule: "Both ず and づ are always romanized as 'zu'" examples: ['Mizu', 'Tsuzuku'] ji_variants: rule: "Both じ and ぢ are always romanized as 'ji'" examples: ['Jikan', 'Chijimu'] base_system_adjustments: macrons: false long_vowels: rule: "Long vowels are written explicitly without macrons" note: "Macrons are intentionally avoided; long vowels are written out for readability and technical compatibility" format: ["ou", "oo", "uu", "aa", "ee", "ii"] sokuon_ch_override: rule: > Sokuon (っ) always doubles the following consonant. When preceding 'ch', use 'cch' instead of standard Hepburn 'tch' making the sokuon consonant doubling constant. note: "Replaces standard Hepburn 'tch' with transparent consonant doubling 'cch' for structural consistency" examples: ["Bocchi", "Acchi"] # DEFINITIONS formal_nouns: meaning: "Grammaticalized nouns that function as abstract nominal placeholders or structural elements" items: ['koto', 'hazu', 'wake', 'mono', 'tokoro', 'tame', 'you'] explanatory_unit: meaning: "A fused explanatory 'no' (ん / の) combined with a copula or conjunction" examples: ['ndesu', 'nodesu', 'nda', 'noda', 'ndesuga', 'nodesuga'] temporal_suffix: Meaning: "When 後, 前, or 中 function as temporal suffixes, they are read as fixed suffix forms" note: "In temporal compound usage, these readings are fixed regardless of standalone reading variation" readings: 後: "go" 前: "mae" 中: "chuu" # STRUCTURAL CORE (PILLARS AND VALLEY CAPITALIZATION PHILOSOPHY WITH CORRECT SPACING) structural_integrity_rule: > Do not split fused units or insert hyphens for readability reasons. Word length is not a valid justification for breaking a unit that is structurally defined as fused by this system. Long verb–auxiliary chains, explanatory sequences, bound suffix constructions, Yojijukugo, and long Fossilized adverbs must remain intact. Readability adjustments must not override structural rules. # rule_precedence_order: # description: > # When rules conflict, apply precedence in the following order. # hierarchy: # 1: "Yojijukugo and strong idioms" # 2: "Thematic portmanteaus and coined lexical units" # 3: "Split Semantic Binomial or Determinative compounds" # 4: "Lexicalized single-concept units and non-transparent compounds" # 5: "Loanword classification (including gikun exceptions)" # 6: "Bridge rules (formal noun and grammatical bridges)" # 7: "Explanatory sequence fusion" # 8: "Auxiliary fusion rules" # 9: "Connective separation rules" capitalization: capitalize: "(Pillars); All nouns, verbs, adjectives, adverbs, and the separated suru (or its inflection) of a productive noun" lowercase: "(Valleys); All particles, copulas, copula inflections, conjunctions, and vocal bridge elements, even when sentence-final" simple_connectives: particles: logic: "Particles are always separate and lowercase" modified_hepburn: "When は, へ, and を function as particles, they are romanized as 'wa', 'e', and 'o' respectively" examples: ['no', 'ga', 'o', 'de', 'mo', 'ni', 'wa', 'tte'] notes: > The separation rule applies only when the particle retains its independent grammatical function. If the element forms part of a lexicalized conjunction (e.g., datte, nanode), the lexicalized rule takes priority. copulas: logic: "Copulas and copula inflections are always separate and lowercase" examples: ['da', 'desu', 'datta', 'deshita', 'darou', 'ja'] notes: > Separation applies when the copula functions independently. If part of a lexicalized conjunction (e.g., desuga, nanode), the fused form takes priority. conjunctions: logic: > Fused as a single lexical unit and lowercase ONLY when functioning as a distinct conjunction with its own established meaning. examples: ['demo', 'dakedo', 'desuga', 'desukara', 'nanode', 'nanoni', 'datte'] notes: "Lexicalized conjunction status overrides particle or copula separation rules" always_separate_grammatically: logic: > Strings of particles or copula-particle combinations that retain their individual grammatical roles must remain separate, even if some dictionaries list them as a single unit or conjunction. examples: ['no ka', 'da to', 'da ni', 'na no', 'no wa'] notes: > If the string functions as a true conjunction with independent semantic value, the fused lexical rule applies instead. conditional_fused_or_split: description: > Certain forms (like ので, のに, なので, なのに, だって, では) may function either as lexicalized conjunctions or as independent grammatical elements. Romanization depends on syntactic and semantic function. patterns: fused_conjunction: - ので → node - のに → noni - だって → datte - では → dewa - なので → nanode - なのに → nanoni split_grammatical: - ので → no de - のに → no ni - だって → da tte - では → de wa - なので → na node - なのに → na noni notes: > Specifically, なので and なのに are fused (nanode / nanoni) when directly following a concrete noun or na-adjective. When part of a formal noun bridge structure, they are split (na node / na noni). See formal_noun_bridges patterns. bridges: formal_noun_bridges: description: > Lowercase these specific visual patterns to ensure grammar stays in the valley. These patterns overrule general noun capitalization but it should also prevent lowercasing where these formal nouns are a pillar ('Sonna Koto ni'). patterns: - sequence: "no [particle/copula/sentence-end]" logic: "Always lowercase the formal noun and the particle/copula that follows." examples: ['no koto ni', 'no you na', 'no tame no', 'no koto desu', 'no hazu ga'] - sequence: "[verb/adjective] [particle/copula/sentence-end]" logic: > Lowercase ONLY the formal noun and the following bridge elements. The preceding Verb/Adjective MUST remain capitalized as a Pillar, this includes the split suru from a noun, the whole unit (noun + suru) acts as a verb. examples: ['Tensei Shiteshimatta you desu', 'Iku wake nai', 'Kawaii koto ni', 'Shitakunai wake ja Nai', 'Iu koto '] - sequence: " na [conjunction/explanatory unit]" logic: > Lowercase the formal noun and the following conjunction or explanatory unit as a single bridge. DO NOT break down the conjunction or explanatory unit into individual particles (e.g., use 'node', NOT 'no de'). examples: ['koto na node', 'wake na ndesu', 'hazu na nodakara'] grammatical_bridges: description: > In addition to the formal noun bridges, we want to apply the lowercase bridge logic to the following selected bridges, inflections of these vocal elements also count. Any auxiliary elements after the bridge are fused to the vocal part of the bridge, any long chains are NOT to be broken up for readability. accepted_bridges: ['de aru', 'to suru', 'ni suru', 'ni tsuite', 'ni taishite', 'ni yotte', 'no aru', 'no nai'] inflection_note: "Aru and Suru inflections ('de atta', 'de arou', 'to shite', 'ni shita') follow the same lowercase split rule." examples: ['Koi da to shite', 'Senshi to shite', 'Jakuten de aru', 'Yome ni shita ndaga'] bridge_chaining: description: > Bridges can be chained sequentially. When a particle or copula serves as the end of a formal noun bridge and the start of a grammatical bridge, the entire sequence remains in lowercase and spaced correctly. examples: - "'Ikiru koto ni shita' (Ikiru [Verb Pillar] + koto ni [Bridge A] + ni shita [Bridge B])" - "'Minagoroshi ni suru koto ni shita' (ni suru [Bridge A] + (suru) koto ni [Bridge B] + ni shita [Bridge C])" auxiliaries: logic: > All auxiliary elements are ALWAYS fused directly to the main verb or adjective be it via stem or te-form. They also fuse directly to vocal verbs of bridges and to the separated suru (or its inflections). (Long) Auxiliary chains are NOT to be broken up for readability, they are always one single unit. note: > Auxiliary elements include but is not limted to: passive, causative, progressive, polite, past, negative, desiderative, kudasai, and stacked auxiliary chains. Never split these up for any reason. kudasai: "Kudasai is considered a request auxiliary for -te form verbs, so is also always fused to these verbs" examples: ['Natteshimatta', 'Natteshimaimashite', 'Kawaisugiru', 'Dekiai Saretemashita', 'te shiteiru', 'Mitekudasai', 'Yondekurenakatta'] explanatory_sequence: logic: > The explanatory 'no' or 'n' (derived from 'no') is ALWAYS fused to the following copula or conjunction to avoid loose characters and to create one single explanatory unit. Furthermore these units are ALWAYS separate from any auxiliary chains preceeding them. na_fusion: > If 'na' is preceding the explanatory unit this is also fused directly if, and only if, 'na' itself is preceeded by a 'concrete noun' or 'na-adjective' and will become part of the explanatory unit, otherwise it will stay separate. Furthermore this resolves semantic ambiguity with the noun 何 in certain scenarios ('nandesu' vs 'Nan desu'). examples: ['nda', 'ndesu', 'ndesukara', 'noda', 'nodesu', 'nodaga', 'nandesu', 'nanoda', 'nandake', 'nanodesukara'] example_sentence: "'Jirai nandesu ka?' is properly parsed as 'Are you a landmine [girl]?' and cannot be confused with Nan" ja_sequence: logic: > 'ja' (from 'de wa') is a contracted copula so it's always lowercase and separate. Any element following 'ja' or 'de wa' is ALWAYS capitalized, especially with negation, this is deliberate stylistic choice for emphasis in titles (rhetorical device) and thus this overrules the always lowercase vocal bridge element rule. examples: ['ja Nai', 'ja Nakatta', 'ja Dame', 'de wa Jimi'] jan_contraction: "The contraction じゃん is always romanized as 'jan', lowercase and separate" suru_taxonomy: productive_constructions: logic: > [Productive] Noun + Suru constructions (where Suru makes a noun active) are separated by a space. The Suru (or its inflection) is ALWAYS capitalized, and together they are considered one single action (verb) pillar. Examples: ['Tensei Suru', 'Kantei Shita', 'Tsuihou Sareta'] lexicalized_suru_verbs: logic: "Dictionary entries where 'suru' is the stem are fused as a single pillar." Examples: ['Koisuru', 'Aisuru', 'Kakusuru', 'Aishitekudasai'] su_ending_verbs: logic: "True verbs ending in '-su' are fused normally; their 'shite/shita' forms are natural inflections, not the 'suru' auxiliary" Examples: ['Mezashite (from Mezasu)', 'Keshita (from Kesu)'] auxiliary_fusion: logic: "Auxiliaries always fuse to the preceding element. In 'Noun + Suru' chains, the auxiliary fuses to the Suru element" Examples: ['Tensei Shiteshimatta', 'Dekiai Saretemashita'] # THE FLUFF (FACADE) FOR READABILITY te_form_chains: logic: > When two or more lexical verbs are chained via -te form, write each lexical verb separately. Auxiliary elements ALWAYS fuse to the verb immediately preceding them, this is ALSO the case when there is only a single -te form verb (not a chain) having an auxiliary element. examples: - "Houtteoku (Single lexical verb + auxiliary 'oku' = Fused)" - "Houtte Oitekure (Two lexical verbs + auxiliary 'kure' = Split after first -te)" - "Houtte Oitekuremasen (Two lexical verbs + auxiliary chain = Split after first -te, auxiliaries fused)" loanwords_and_names: identifiable_loanwords: logic: "Identifiable non-assimilated words whose meaning remains unchanged use their origin spelling" examples: ['Skill', 'Dungeon', 'Level', 'Party', 'Samen'] assimilated_loanwords: logic: "Clipped slang or phonologically adapted words are romanized from kana" examples: ['Baito', 'Rabuho', 'Gyaru', 'Anime'] native_names: logic: "Always romanize Japanese personal and place names; do NOT use English exonyms" examples: ['Toukyou', 'Oosaka', 'Kyouto'] foreign_names: logic: "Use the established original-language spelling of these names" examples: ['Chris', 'Hathaway'] edge_cases: logic: > When a loanword falls between identifiable and assimilated categories, evaluate whether the Japanese usage preserves the original semantic scope or reflects semantic narrowing, slang usage, or contextual specialization. If the meaning is altered or context-specific in Japanese, apply the assimilated loanword rule and romanize from kana. If uncertainty remains, default to the assimilated loanword rule. examples: > パッド derives from "pad" but in Japanese slang typically refers specifically to bra padding. In a context such as 巨乳受付嬢, the meaning is specialized; therefore, it is romanized as "Paddo" rather than "Pad". gikun_exceptions: logic: > When a kanji compound uses a gikun/jukujikun reading but the intended meaning clearly corresponds to an identifiable loanword or foreign term, romanize according to the meaning rather than the phonetic reading. This overrides standard kana romanization rules. examples: > 倶楽部 is read phonetically as "Kurabe", but the intended meaning is "Club", so it is romanized as "Club" in the title. compounds_and_lexicalization: semantic_binomial_compounds: description: > Compounds composed of two or more elements of equal grammatical weight or coordinate relationship (A and B). These are romanized as separate elements to preserve the distinct identities of the constituents. logic: "Always split for readability, even if dictionary-recognized as a single term" examples: ['Oukou Kizoku', 'Shinonou Koushou'] determinative_compounds: description: > Modifier + Head noun combinations (Noun+Noun or Adj+Noun) that are semantically transparent, where the first element describes a role, status, rank, or location. logic: "Always split for readability, even if dictionary-recognized as a single term" examples: ['Akuyaku Reijou', 'Hakushaku Fujin', Tensei Oujo, Kokunai Ryokou] lexicalized_compounds: description: > Standard nouns, fossilized compounds, and units where the combination creates a specific new concept or object that is more than the sum of its parts. Includes occupational titles, formal titles, and established role nouns. logic: "Always fused. This category includes Verb-Stem + Noun fusions" examples: ['Akuyaku', 'Isekai', 'Tensei', 'Uketsukejou', 'Mikeneko', 'Urenokori', 'Oujihi'] yojijukugo_and_idioms: logic: "Yojijukugo and strongly fixed idiomatic expressions are always fused, no hypens shall be used" examples: ['Yuuyuujiteki', 'Issekinichou'] thematic_portmanteaus_and_coined_terms: description: > Many Japanese titles feature unique, coined compounds (zougo) created by blending or clipping multiple existing words into a single keyword. These function as a "brand name" or thematic identity for the series. logic: "Treat these as single, fused lexical units, even if not in standard dictionaries, no hypens shall be used" examples: - 死に戻り: Shinimodori (Fusion of two nouns creating a new trope: "return from death") - 異世改活: Iseikaikatsu (Isekai + Kaikatsu, overlapping 'kai' reading) - ガチャンキイ: Gachankii (Gacha + Yankii, clipped for stylistic effect) - 魔王城: Maoujou (Mouou + Oujou, overlapping 'Ou' (王)) adverbs: logic: > 1) Fossilized adverbs that no longer function as independent verbs are always fused. 2) Active lexical verb phrases always remain separate. examples_fused: ['Itsunomanika', 'Nazeka', 'Doushitemo', 'Hyottoshite', 'Moshikashitara'] examples_split: ['Dou Mitemo', 'Sou Ieba', 'Te ni Ireta'] notes: "This distinction ensures readability and preserves Pillar/Valley hierarchy. Fossilized adverbs fuse regardless of historical verbal origin" echo_and_reduplication: description: > Reduplications and onomatopoeic words in Japanese are handled according to their lexicalization, rhythm, and emphasis. rules: - "Reduplications (echo) are fused into a single word when a word is repeated for emphasis or as onomatopoeia." - "Fusion occurs only for 'soft' or mimetic words where the repeated rhythm is natural." - "Harsh or imperative verbs should remain separate, as they function as direct commands or dialogue hooks." - "Onomatopoeic words (representing sound effects) are always romanized literally and never replaced with English equivalents (e.g., パン → Pan, not Clap)." - "Sequences of different sound effects are treated as standalone, capitalized words and are not fused (e.g., ぺチパンパチン → Pechi Pan Pachin)." - "If a single sound is repeated to form a lexicalized rhythmic word, the repeated sounds are fused as one word (e.g., ぺチぺチ → Pechipechi)." examples: soft_mimetic_fusion: - ムリムリ → Murimuri - ワクワク → Wakuwaku - フワフワ → Fuwafuwa harsh_direct_repeats: - シヌシヌ → Shinu Shinu - ヤメヤメ → Yame Yame sequence_of_sounds: - ぺチパンパチン → Pechi Pan Pachin - ドキドキバクン → Dokidoki Bakun single_sound_reduplication: - ぺチぺチ → Pechipechi - ザワザワ → Zawazawa quotations_and_symbols: quotation_symbols: logic: "Any quotation in the native title is always romanized as double straight quotes \" \"" nested_quotes: > When quotes are nested like 『text「keyword」 text』, romanize the second pair as single straight quotes. If more nesting ever occurs, they will alternative between double and single straight quotes. symbols: ['「 」', '『 』', '《 》', '≪ ≫', '“ ”', '" "'] examples: [] lenticular_brackets: logic: > If the content inside lenticular brackets is short (≤ ~3 to 4 lexical units) and functions as a named ability, class, power, or some other "brand" keyword they are always romanized as normal square brackets [keyword]. If it otherwise is a long clause then they are treated as a quotation and are romanized as double straight quotes "some longer clause". note: "In manga or light novel titles it is generally always a skill name or power name" symbols: ['【 】'] examples: [] square_brackets: logic: "When (fullwidth) square brackets are used they are always romanized as normal square brackets [ ]" note: "In manga or light novel titles it is generally always a skill name or power name" symbols: ['[ ]'] examples: [] placeholder_symbols: description: > Words with placeholder symbols are romanized differently based on if they are hiding a full word or if they are hiding parts (kana) of the word. When 〇 or ○ is used in the native title, the romanized title shall always use the ○ symbol. Any other symbol is used as-is from the native title. symbols: ['〇', '○'] full_word: logic: > When placeholder symbols are used to hide a full word, consider them stand-ins for unknown nouns or concepts. This means if suru or its inflection follows the symbol, it is separated and capitalized following the productive noun + suru rule, any auxilary elements fuse to the suru or its inflection. examples: ['○ Saretai'] partial_word: logic: > When placeholder symbols are used to hide part of a word (e.g. one or more kana within a lexical unit), the symbol is treated as an internal character of that word. It does NOT function as a noun stand-in and does NOT trigger productive noun + suru rules. The word remains a single fused lexical unit and is romanized in full with the symbol preserved in its original position. identifiable_loanwords: > When placeholders occur inside identifiable loanwords that would otherwise be written in their original language spelling, romanize using kana transcription to preserve the censoring symbol's position and aesthetic. note: > Partial placeholder symbols never create structural boundaries and do not affect auxiliary fusion, bridge rules, or compound formation. examples: ['Ma○ko', 'Chi○chi○'] other_symbols: - TBD punctuation: sokuon_punctuation: logic: > When a terminal sokuon (っ) functions as expressive punctuation at the end of a word, it is ignored if explicit punctuation follows (e.g., っ!, っ?, っ。). If no punctuation follows and the sokuon clearly marks emotional emphasis, it is romanized as an exclamation mark (!). note: "This rule applies only to final expressive sokuon and does not affect lexical gemination within words" examples: - はわわっ → Hawawa! - ああっ → Aa! - あぁっ → Ah! - えっ! → E! - ねえっ。 → Nee. small_vowel_interjection: logic: > When あぁ appears as a standalone hiragana interjection or sentence-initial expressive utterance, it is romanized as "Ah" to reflect breathy elongation. restrictions: > This rule applies only to hiragana interjections and does not apply to small kana used in standard phonetic combinations or katakana loanwords. examples: - あぁ… → Ah… - あぁっ → Ah! # what about えぇ or うぅ ? need to check possible combinations used in titles numerals: description: > Numerals and their counters, optionally modifiers, and temporal suffixes are romanized differently depending if the whole unit is native Japanese or contains Arabic numerals. native_numeral_units: logic: - When the numeral is written entirely using Japanese numeric expressions (kanji or kana), including native modifiers such as 数, 何, 十数, etc., the numeral and its counter/classifier form a single fused romanized unit. - Any lexicalized special reading (e.g., 二人 → Futari, 二十歳 → Hatachi) MUST be used. - Any derivational suffixes (-間, -目) remain fused to the counter. - Temporal directional suffixes (前, 後) attach using a hyphen. - If multiple numeral units are chained, each unit is separated by a space. examples: - 二人 → Futari - 二十歳 → Hatachi - 四年生 → Yonensei - 何回目 → Nankaime - 数日前 → Suujitsu-mae - 三年後 → Sannen-go arabic_numeral_units: logic: - When Arabic numerals are used, a hyphen separates the Arabic numeral from the Japanese counter or classifier. - The counter always uses its standard numeral-attached reading. - Special native readings (e.g., Futari, Hitori) are NOT used as there is no lexicalization with Arabic numerals. - The counter and all directly attached suffixes form a single fused block after the hyphen. - Temporal directional suffixes (前, 後) remain fused to the counter to avoid double hyphenation. - If multiple numeral units are chained, each unit is separated by a space. examples: - 2人 → 2-nin (NOT 2-ri) - 8月31日 → 8-gatsu 31-nichi - 10年間 → 10-nenkan - 5万年 → 5-mannen - 29番目 → 29-banme - 100年後 → 100-nengo - 100日前 → 100-nichimae ordinal_numerals: logic: - The prefix 第 converts a cardinal number into an ordinal unit. - When the number is written in kanji, the ordinal unit is fully fused. - When Arabic numerals are used, a hyphen separates Dai from the numeral. - The ordinal unit (Dai + number) is treated as one unit. - The following noun is always separated and capitalized, no matter how small. - Kanji numerals follow their standard on-reading when used with 第 (e.g., 第九 → Daikyuu). notes: "This now mimics English usage if word-ordinals (Seventh) and numeral-ordinals (7th)" examples: - 第3四半期 → Dai-3 Shihanki - 第一章 → Daiichi Sho ("First Chapter") - 第二部 → Daini Bu ("Second Part") - 第三幕 → Daisan Maku ("Third Act") - 第七王子 → Dainana Ouji ("Seventh Prince") - 第1章 → Dai-1 Sho ("1st Chapter") - 第2部 → Dai-2 Bu ("2nd Part") - 第3幕 → Dai-3 Maku ("3rd Act") - 第7王子 → Dai-7 Ouji ("7th Prince") - 公安第9課 Kouan Dai-9 Ka ("Public Safety 9th Division") - 第二の職業 → Daini no Shokugyou ("Second Profession") ```