Jump to content

User talk:Beland

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Feel free to leave a note at the bottom of this page in the usual manner; I assume you'll be subscribed to the thread to get notified about replies. Just to keep things tidy, I generally only keep stuff on this page if it requires further action from me or you haven't read my reply yet, so check the page history for older conversations if you need to refer back.

I created the spelling and grammar checking project at Wikipedia:Typo Team/moss. If you are responding to an edit related to special characters, language tags, or manual of style compliance, HTML cleanup or markup issues, it might have been motivated by some report generated by that project. -- Beland (talk) 16:48, 6 June 2024 (UTC)[reply]

Coherent style for formulas

[edit]

In Nilpotent Lie algebra, you introduced recently the awful formula {{math|''n'' ∈ <math>\mathbb{N}</math>}}. I have changed it into <math>n\in\mathbb{N}</math>. Please, avoid mixing latex and html rendering in the same formula.

Happy new year. D.Lazard (talk) 09:42, 31 December 2021 (UTC)[reply]

@D.Lazard: Yeah, it's definitely more readable when it's all in LaTeX. I've just been going through making articles compliant with MOS:BBB, which only requires changing over the blackboard bold characters. I had been converting relatively simple formulas to LaTeX completely, but it got a bit time consuming, and some longer formulas were quite daunting. MOS:MATH doesn't say anything about not mixing the two, so my thinking was that the mixed style was at least MOS-compliant, and we could go back and convert the rest of the markup later. I was hoping some other editors skilled in LaTeX would be able to help with that. Would you be able to help with some conversions? I see 88 articles with mixed markup (not all of which are from my edits) and several hundred articles with ℝ, ℤ, or ℂ. We could also add a note to MOS:MATH saying not to mix LaTeX and HTML, and resolve to do blackboard-bold-motivated conversions in one step rather than two? -- Beland (talk) 18:59, 31 December 2021 (UTC)[reply]
Well, I added the note to MOS:MATH and put the mixed markup pages on my personal cleanup todo list. It may be a few months before I get to all of them, as there are thousands of articles in my cleanup queue. -- Beland (talk) 22:36, 2 June 2022 (UTC)[reply]
Did a few of these today. Note to myself, use edit summary:
per [[MOS:FORMULA]], do not mix LaTeX and {{math}} in the same expression
-- Beland (talk) 15:44, 12 March 2024 (UTC)[reply]

List of possible chemical formulas that don't use subscripts

[edit]

Hello! I have recently been fixing typos from moss and I see there is a list of possible chemical formulas that don't use subscripts. I was wondering a couple of things:

  • what do the numbers on the left of the entries mean? For example 16/5 - H3S10
  • are they for reference only, or would it be in any way helpful to investigate and tag them with their common names, if they are indeed chemical formulae?

Thanks and happy typo hunting 😄 rbstrachan (talk) 19:41, 5 August 2022 (UTC)[reply]

Greetings! The first is the number of instances this possible formula was found, and the second is the number of pages. So in this instance, H3S10 was found 16 times across 5 pages. It looks like Graeme Bartlett already determined it is not a chemical formula and made a redirect for H3S10, so I took that off the list. That's a preferred way to fix items that are not chemical formulas if they have articles. The spell checker won't care if you make it a link or not, but it might help readers to do so.
Yes, the general intention is to investigate each, determine if they are actually a chemical formula, and update the markup accordingly. There's a full list of suggestions of what to do at Wikipedia:Typo Team/moss#Chemical formulas.
The idea of using the common name to link these to articles is interesting, and something I hadn't really thought about. The spell checker doesn't really care if there's a link or not; it only looks at the display text. So, it will complain about both "H20" and "H2O" ("[[water|H2O]]") because the manual of style says it should be H2O (using {{chem2}}). Turning that into a link would make it H2O, which is a bit ugly but potentially helpful to the reader. Sometimes there's a very technical context, and the problem text shows up in chemical equations or something, where putting words (like "water") wouldn't make sense. In that case, we probably don't need a link anyway, and fixing the typography is all that's needed. Sometimes having the name instead of the formula would make the article easier to read, so switching it out and making it a link would be an improvement; you'd have to use your judgement.
I suspect most or all of these either aren't chemical formulas or don't have chemical substance articles we can point to, so the suggestion to add links to chemical articles might apply more to Wikipedia:Typo Team/moss#Known chemical formulas that don't use subscripts, where there usually is an article.
Poking at the "Possible" list just now, I had a bit of trouble figuring out which articles the spell checker was complaining about. I put a note at the top explaining how to use the "insource://" trick, which should be sufficient until I can get those included in the report automatically (or we empty out this list). Thanks for your interesting question and your ongoing cleanup work! -- Beland (talk) 08:09, 8 August 2022 (UTC)[reply]
(talk page stalker) Just out of curiosity, is there a comparable list of chemical formulas that don't contain numbers (like HNO and NaCl), and therefore could be mistaken for words? BD2412 T 22:39, 8 August 2022 (UTC)[reply]
That's an interesting question! While thinking about it, I thought of another, related question.
While converting chemical formulae written with HTML <sub> tags to use the {{chem2}} template, should element names such as In, Fe, etc. and chemical formulas that don't contain numbers, such as HNO and NaCl as you mentioned, also be converted?
On one hand, I'm not sure that it's worth adding the bulk of a template for things that don't technically need them and which don't benefit from a visual improvement to the way they are displayed. On the other hand, it may reduce the number of false positives for projects like WP:TT/M.
One of the main reasons that I can see for converting HTML tags to the {{chem2}} template is to make it possible to search Wikipedia for chemical formulae without having to resort to regex.[1] Having said that, since elements and most basic chemical formulas don't contain numbers, they don't contain <sub> tags, so making them use the {{chem2}} template would not do anything to make them more easily searchable.
In regards to both of our questions I do vaguely remember reading somewhere that the Moss scripts ignore capitalised words, and as elements and chemical formulas (should) always start with a capital, these may not be issues in the first place. 😅
😊 — rbstrachan (talk) 23:34, 8 August 2022 (UTC)[reply]
That's right, for spell-checking purposes moss ignores capitalized words made of only letters, on the assumption they are proper nouns. (These problem formulas are actually pulled from a list of ignored but suspicious words.) Even when I stop doing that (because I want to verify the spellings of proper nouns) most of the ones without numbers would have articles or redirects, so they would still be ignored. The only reason they became an issue for moss is that not using subscripts violates Wikipedia:Manual of Style/Chemistry#Symbols and Unicode subscripts violate MOS:SUBSCRIPT.
There may be other reasons to wrap these formulas, though, such as for accessibility. It doesn't look like they are currently adding alt text, but if you use "auto=yes" with {{chem2}}, it does link each element symbol to the article on that element. I'm not sure if that's something we should be doing everywhere or nowhere? It might be worth checking with Graham87 (who uses a screen reader and who helped figure out how to handle fractions) or Wikipedia:WikiProject Accessibility or Wikipedia:WikiProject Chemistry to see if anyone has any particular preferences. -- Beland (talk) 01:36, 9 August 2022 (UTC)[reply]

References

  1. ^ As an example, to find instances of Si8O22F2 written with HTML tags, you have to search for insource:/Si\<sub\>8\<\/sub\>O\<sub\>22\<\/sub\>F\<sub\>2\<\/sub\>. When written with the {{chem2}} template, it can be done with just Si8O22F2 — no regex, or even insource: necessary.
Thanks, I don't have any opinions on these issues accessibility-wise ... I guess what to do depends on context. Graham87 02:17, 9 August 2022 (UTC)[reply]
👍 -- Beland (talk) 02:42, 9 August 2022 (UTC)[reply]

Watch out for Hebrew letters

[edit]

Hebrew is written right to left, unlike English which is written left to right. So the character following a Hebrew letter like Aleph will appear to its left rather than to its right. This causes a problem when the Hebrew letter is intended to be part of an English text rather than a Hebrew text. You have twice ignored this fact when replacing ℵ0 with א0 at Cardinality of the continuum.

More generally, you should always look at the resulting text as it is displayed to the ordinary user and make sure that it is what you want it to be. JRSpriggs (talk) 14:45, 31 January 2023 (UTC)[reply]

@JRSpriggs: Ah, thanks for the note! I hadn't noticed that I had made the same edit before. That's surprising that the character and the HTML entity have different text direction behavior; I'll be on the lookout for that in the future. -- Beland (talk) 16:35, 31 January 2023 (UTC)[reply]

Please stop converting thin spaces to ordinary spaces in mathematical typography.

[edit]

Edits such as special:diff/1219321937, which in part converted some explicit thin spaces in mathematical typography to ordinary spaces, are not helpful. If another editor explicitly chose a size of space to stick into a formula, you should assume they did so for an intentional reason and not automatically second-guess that decision. Often regular spaces leave formulas written using plain wikimarkup (e.g. in {{math}} templates) looking incorrect, and explicit hair spaces or thin spaces make the formula appear more correctly. –jacobolus (t) 01:53, 17 April 2024 (UTC)[reply]

@Jacobolus: In my experience, thin and hair spaces usually aren't necessary, and can sometimes cause excess whitespace. This is a good reason to keep markup simple, along with reducing the skill burden of learning wikitext so we can attract and retain editors. The version of Tensor with those removed renders correctly for me. Sometimes different operating systems and web browsers and fonts render characters like these in an overlapping way; I would consider that a bug in that stack which should be reported and fixed. But once that happens, we don't need to keep these characters around forever. Does the version without thin and hair spaces render incorrectly for you? It looks like Cedar101 may have been the first editor to introduce this character in 2017; pinging them to see if they are (still) having typographical problems. -- Beland (talk) 02:12, 17 April 2024 (UTC)[reply]
I am extremely dubious of the evidence-free claim that editors of very mathematical pages are deterred by the presence of occasional explicit unicode characters. But I can tell you for certain that good editors are highly discouraged by having their careful deliberate choices trampled by lazy automated regressions.
The version of Tensor with the full-sized spaces is definitely worse than the version with thin spaces, and it is clear why the thin spaces were originally chosen. If you feel like it you are welcome to rewrite the whole page using LaTeX instead, which looks better and has simpler markup, but please stop automatically breaking people's intentional choices in mathematical typography. –jacobolus (t) 03:27, 17 April 2024 (UTC)[reply]
@Jacobolus: Ah, your comment pointed out that I added space rather than removing it, which I missed. I would have expected the latter to generate complaints about overlapping text characters. I'm surprised that the complaint is that there was too much space; a full space is normally a safe substitution. It turns out I actually get overlapping characters myself with no space there, so I'll see what I can do to get that fixed. In the meantime, I'll use {{thinsp}} since those are generally a sign that someone is intentionally using a thin space in wikitext. (And it's nice that templates can have documentation to explain what they mean and why they are being used.) HTML entities are often automatically imported from other environments rather than being inserted intentionally.
A high difficulty of editing can result from an accumulation of small difficulties, which new editors sometimes must confront all at once to make useful contributions. Much of the point of wikitext is to spare editors from having to learn HTML, though it's reasonable to expect deeply involved math editors to know LaTeX. But it seems a bit much to expect, say, a math professor who already knows LaTeX to learn wikitext and HTML syntax if one of those isn't really necessary. Perhaps the added difficulty is more pronounced for articles where there isn't already a lot of complicated mathematical markup, but that is most of them. -- Beland (talk) 16:34, 17 April 2024 (UTC)[reply]
Wikitext is built on HTML, and HTML entities are a basic feature. Using {{thinsp}} instead of &thinsp; is not substantially beneficial. The template is not inherently more accessible, being a weird english-wikipedia-ism that someone has to go do a search to learn about instead of a common standard used across the web.
If you are writing a new page, feel free to use either one. But please don't do automatic replacements of one for another (not sure if you were planning on it). At best it creates pointless watchlist spam. From what I can tell this kind change does not have (and should not have) the backing of any sitewide policy. –jacobolus (t) 17:12, 17 April 2024 (UTC)[reply]
@Jacobolus: I generally assume that editors have to learn how to use Wikipedia templates, because they are used in pretty much every article, usually quite frequently. Wikification, where we replace web-standard HTML tags (which do work without modification) with Wikipedia-specific markup, is a general directive, and indeed the whole point of Wikipedia:WikiProject Wikify. That wouldn't be necessary if we weren't trying to save people from learning HTML. I wasn't planning to mindlessly swap thin space HTML entities for templates, but at some point I will probably do a pass through the entire project to remove inappropriate ones. As you can see, most of the existing instances are not in math articles, are not fixing problems with overlapping characters, and do not align with our usual style. -- Beland (talk) 17:28, 17 April 2024 (UTC)[reply]
This seems like a huge waste of time. Most of the examples of thin spaces from your link seem deliberate, and don't seem to be harming anything. –jacobolus (t) 17:33, 17 April 2024 (UTC)[reply]
@Jacobolus: Well, the first instance, on Kazakhstan, actually is breaking the citation template, causing the string "&thinsp," to show up in the article. Even if it was working properly, a non-ASCII space would be polluting downstream data for citation consumers. (For example, journal web sites that list all Wikipedia references to papers on that paper's page.) The Pirate Bay is also polluting a citation template.
In the second article, Moon, the usage violates MOS:UNITNAMES, which specifies a full, non-breaking space between a number and a unit abbreviation. It looks sloppy to have different amounts of whitespace in different measurement expressions.
In the third article, Amazon (company), the usage violates MOS:$, which specifies no space after "US$" and a full, non-breaking space before "million". It looks sloppy to have different amounts of whitespace in different instances of currency expressions. Apartheid is breaking the same rule.
And so on. -- Beland (talk) 22:21, 17 April 2024 (UTC)[reply]

Removal of special characters

[edit]

The changes you are making such as here and here, such as removing &nbsp; and &NoBreak; make changes to the intended formatting. What is problematic about these that they need to be changed? —Quondum 14:25, 14 November 2024 (UTC)[reply]

Greetings, and thanks for your attention to detail! There is a general mandate from MOS:MARKUP to keep markup simple, and to use wiki markup instead of HTML markup (like HTML entities) where there are equivalents. Various templates can be used to prevent word wrapping where needed, though in some cases, preventing word wrapping causes other problems.
Having a non-breaking space next to a regular space does not prevent breaking, so I tend to remove those automatically. In the case of in Binary prefix, I presume the nbsp;s are there to add horizontal space between numbers like "0.9766" and percentages like "(−2.3%)". Given the parentheses, this space is not needed for clarity, and generally not included on English Wikipedia in this type of context. The extra spaces make the content wider, which in tables on narrow screens can cause horizontal scrolling or unnecessary word wrapping.
Regarding ISO/IEC 80000...NoBreak; isn't used anywhere on English Wikipedia other than the three articles I removed it from yesterday (except for special cases, like discussing the entity itself). I went back today and tested to see if browsers would break on a dash in the middle of numerical digits. Firefox does not, but Chrome does. However, making my Chrome window as narrow as possible causes horizontal scrolling in the Overview table, which is an even worse experience for readers than awkward word wrapping. To prevent this, I removed all the non-breaking directives from that table. I also added some soft hyphens to prevent some long words from making certain columns gratuitously wide. That should minimize but not eliminate breaking inside ISO and IEC document names. Does that look like a reasonable solution? -- Beland (talk) 15:52, 14 November 2024 (UTC)[reply]
From what you say, you took into account that the &nbsp; was intended in some cases to increase space width, so your tweaks to achieve a preferred result is great. From the initial edit comment I thought perhaps it was simply an automated change.
The &NoBreak; was me trying to create cohesive (nonbreaking) names, but your consideration of narrow screens makes sense. I had neglected that entirely, and inside tables that becomes much significant. Besides, it was a sort of experiment by me, and if it does not make sense in general, it probably doesn't make sense in any given article. So I think your approach makes sense.
On the &NoBreak;, I have found it to be a workable way to suppress breaking where it is really not wanted. The different major browsers (Firefox, Chrome, Safari) all have different breaking behaviour, and this gives a way of ensuring that they behave similarly. And while your comment about non-use in article wikitext applies, I have used &NoBreak; for benefit in some templates, specifically to avoid breaking between their output and adjacent text (see {{tmath}}, for example, where wrapping was happening between the output and adjacent punctuation). Given that this is entirely something that I found through experimentation (borne of frustration at wrapping behaviour), if you have any comment on this use, it would be most welcome. —Quondum 16:23, 14 November 2024 (UTC)[reply]
For articles, I generally find the wikitext alternative {{nowrap}} to be more readable. Templates do a good job hiding complex HTML from article editors who benefit from simple markup, so I generally leave the HTML entities in templates alone. However, when crafting templates you may find the CSS method of wrap-suppression more readable and probably useful in more situations. (See Template:Nowrap/doc#Technical details.) -- Beland (talk) 16:48, 14 November 2024 (UTC)[reply]
{{nowrap}} in a template is insufficient for the purpose, where the addition of &NoBreak; is at least hidden from editors. Anyhow, if you don't immediately see an issue, it is fine. —Quondum 19:14, 14 November 2024 (UTC)[reply]

Hi. I'm sincerely disappointed that, within the pages on Trump, the term fascism is used without knowing in detail the true roots of fascism (it was, unfortunately, born in Italy), but I don't want to start a new discussion about this (Talk:Donald Trump and fascism#Fascism?). If you all agree, I would like to change the title to something less direct. JacktheBrown (talk) 02:17, 17 November 2024 (UTC)[reply]

You have already raised this issue at Talk:Donald Trump and fascism#Current title. That is the appropriate forum to discuss a page move, not user talk pages. (I'm not sure who you mean by "you all".) As Artem_P75 mentioned, this article has just undergone a community process and the current title was affirmed. You can read that discussion for yourself at Talk:Donald Trump and fascism/Archive 1#Requested move 29 October 2024. I have no particular opinion on the title of this article, but I think it would be inappropriate to ask the community the same question again unless the article or real-life facts have substantially changed, or several years have passed. -- Beland (talk) 02:49, 17 November 2024 (UTC)[reply]
"You all" = active users on the page; I should have used a more appropriate term (I was very sleepy, forgive me). Thank you very much for your reply. JacktheBrown (talk) 08:10, 17 November 2024 (UTC)[reply]

ArbCom 2024 Elections voter message

[edit]

Hello! Voting in the 2024 Arbitration Committee elections is now open until 23:59 (UTC) on Monday, 2 December 2024. All eligible users are allowed to vote. Users with alternate accounts may only vote once.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2024 election, please review the candidates and submit your choices on the voting page. If you no longer wish to receive these messages, you may add {{NoACEMM}} to your user talk page. MediaWiki message delivery (talk) 00:05, 19 November 2024 (UTC)[reply]

"Colony of Trinidad and Tobago"

[edit]

Hi. I noticed that you've added "Colony of Trinidad and Tobago" as the birth place for a lot of people. There was never an entity by that name - from the time the two states were united only the name "Trinidad and Tobago" was used. It's a bit like talking about "the Republic of the United States of America" - sure, the country's a republic, but that isn't the formal or common name. Guettarda (talk) 22:54, 27 November 2024 (UTC)[reply]

If I remember correctly, the "nationality" field is considered redundant if legal nationality was the same country as birthplace, and I was removing redundant fields. I remember adding in this case "British Empire" to clarify that people born in Trinidad and Tobago when it was a colony have British nationality. It doesn't particularly matter on its own if that field says "Trinidad and Tobago, British Empire" or "Colony of Trinidad and Tobago, British Empire".
For death place, we often omit the country or other larger-scale entities for the sake of brevity, but just writing "Trinidad and Tobago" would be a bit confusing it if it's referring to the British Crown Colony. It's also confusing if the birthplace refers to the colony and the death place refers to the sovereign country with the same phrase "Trinidad and Tobago". If we always use "Colony of Trinidad and Tobago" to refer to the colony and "Trinidad and Tobago" to refer to the sovereign country, that seems a bit less confusing. We could say that "Colony" here is not part of the name so much as a disambiguation phrase, so in principle we could write "Trinidad and Tobago (colony)" instead, but that seems a bit clunkier.
Even more important is to link the country name to the right article. For example, I link to Gran Colombia for people born there 1819-1831, because linking to Colombia gives the erroneous impression they were born in the modern state, but the two had different borders and some areas are now part of different sovereign countries (which I often add a parenthetical to identify). Both those entities were officially called "Republic of Colombia", but that's just a redirect to the modern entity, and so isn't a suitable link target.
Usually there's a separate article on the colonial period (e.g. Massachusetts Bay Colony), though in this case, Colony of Trinidad and Tobago is a redirect to History of Trinidad and Tobago. That article does use the capitalized phrase "Colony of Trinidad and Tobago", and that's the target for the incoming redirect. I do see the phrase "Colony of Trinidad and Tobago" and "Crown Colony of Trinidad and Tobago" capitalized that way in professional academic journals when I do a search on Google Scholar, though a lowercase "colony" is more common. I would infer that "Colony of Trinidad and Tobago" as a name is not incorrect, even if it is not official and not the most common form. It seems useful for disambiguation in infoboxes, but I'm open to suggestions. -- Beland (talk) 23:46, 27 November 2024 (UTC)[reply]
For starters, "Crown Colony of Trinidad and Tobago" is incorrect. The type of government that developed in Trinidad, and which was copied elsewhere, came to be called "crown colony government". You could apply it to Trinidad and Tobago between 1889 and roughly 1920, but it's a term political scientists and historians use. And it's problematic to draw conclusions based on capitalisation of the word "Colony", given that Victorians regularly capitalised nouns that Wikipedia would never capitalise. And if we just chose to follow contemporary usage, we'd use "colony of Trinidad", because Tobago was pretty much ignored until the 1950s.
I remember adding in this case "British Empire" to clarify that people born in Trinidad and Tobago when it was a colony have British nationality. For starters, using "British Empire" for people born outside the UK, but not people born in it makes no sense unless we think of people from "the colonies" as somehow lesser. It was normal to consider "colonials" less human in the middle of the 20th century, but it's no ok today.
Beyond that, I don't think you're clarifying anything for readers. British nationality law is complicated, and it wasn't codified until 1948, and was changed radically in 1962 (before independence). Someone born in Trinidad and Tobago before 1948, or between 1948 and April 1962 or between April and August 31 1962 presumably did not have the same legal status. And it's even worse if you're talking about Bajans or Grenadians.
Inventing an entity called the "Colony of Trinidad and Tobago, British Empire" doesn't clarify things. Instead, it's more likely to reinforce false perceptions that our readers probably have already. While "colony" isn't incorrect, it's an imprecise term that means something very different from the common understanding of the word; unlike the US, Canada, or Australia, there was no real colonisation.
Gran Colombia is a totally different entity from modern Colombia. "United Kingdom (European Union)" from "United Kingdom (Brexit)" is probably a closer comparison, though in practical terms independence less disruption than Brexit. We also don't disambiguate people born in the Fourth French Republic from those born in the Fifth French Republic, despite the differences in the country's borders.
Finally, "colony" and "empire" reinforce a lesser, subaltern position. While they are factual descriptors, using them when they aren't precisely necessary isn't good. Guettarda (talk) 03:27, 28 November 2024 (UTC)[reply]
How would you prefer to convey the notion "this person was born in Trinidad and Tobago and had British nationality at the time of their birth"? -- Beland (talk) 05:03, 29 November 2024 (UTC)[reply]

Template:Infobox legislation

[edit]

I've reverted part of your edit to {{Infobox legislation}} based on the comments at Template talk:Infobox legislation#Please restore the image function to this template. While I'm not sure that an image is helpful I really think a full discussion should be held first. Cheers. CambridgeBayWeather (solidly non-human), Uqaqtuq (talk), Huliva 17:08, 30 November 2024 (UTC)[reply]

By the way is it a feature of your talk page that I can't start a new section and have to edit the full page? More likely my browser is acting up. CambridgeBayWeather (solidly non-human), Uqaqtuq (talk), Huliva 17:08, 30 November 2024 (UTC)[reply]
It's certainly not anything I've done intentionally to my talk page. It's a bit frustrating to have something reverted in order to have a discussion "first" when the only reason I made the change was in response to...a discussion. But such is the nature of consensus-building unless everyone is personally informed about every single proposed change. -- Beland (talk) 17:24, 30 November 2024 (UTC)[reply]

Nomination for merger of Template:Incomprehensible

[edit]

Template:Incomprehensible has been nominated for merging with Template:Confusing. You are invited to comment on the discussion at the template's entry on the Templates for discussion page. Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:33, 13 December 2024 (UTC)[reply]

Experimenting with the OpenAI API for the moss project

[edit]

Hi Beland, I experimented a little to explore how the OpenAI API could be used to help with the moss project. I wrote another javascript script to extract sentences with unknown words from the last few entries of Wikipedia:Typo_Team/moss/D#Dan_-Danb and then asked the AI model to assess the unknown words and provide correction suggestions in case there is a problem. I got the following results:

Results
Text: "Brain Awareness Week (#brainweek) is the global campaign to increase public awareness of the progress and benefits of brain research."
Word to check: "brainweek"
AI response:
{
  "word": "brainweek",
  "misspelled": true,
  "correct_spelling": "Brain Week"
}


Text: "Named by US-ACAN for Gerald J. Schoofs, radioscience researcher at Byrd Station, summer 1965-66."
Word to check: "radioscience"
AI response:
{
  "word": "radioscience",
  "misspelled": true,
  "correct_spelling": "radio science"
}


Text: "Pe'er is also involved in the Human Cell Atlas as a member of the organizing committee, co-chair of the Analysis Working Group, and member of the Human Lung Cell Atlas initiative, and serves on the scientific advisory board of scverse."
Word to check: "scverse"
AI response:
{
  "word": "scverse",
  "misspelled": false,
  "correct_spelling": null
}


Text: "Both are collected short-short stories and schetches."
Word to check: "schetches"
AI response:
{
  "word": "schetches",
  "misspelled": true,
  "correct_spelling": "sketches"
}


Text: "In Cambridge, he served numerous posts in the local government, including selectman, constable, tythingman, and grand juror."
Word to check: "tythingman"
AI response:
{
  "word": "tythingman",
  "misspelled": true,
  "correct_spelling": "tithingman"
}


Text: "The Danaes were based on the design of the preceding C class series, but were lengthened by 20 feet (6 m) to allow a sixth 6-inch (152 mm) gun to be worked in between the bridge and the forefunnel."
Word to check: "forefunnel"
AI response:
{
  "word": "forefunnel",
  "misspelled": true,
  "correct_spelling": "fore funnel"
}


Text: "The major facilities available are waiting room, free wifi from railwire, computerized reservation facility and vehicle parking."
Word to check: "railwire"
AI response:
{
  "word": "railwire",
  "misspelled": false,
  "correct_spelling": null
}


Text: "During the 1950s through 1970s, the fair's multivaried attractions included New England Village, Dutch Village, lumberjack competitions, oxen draws, sheep shearing contests, music and dancing, puppet shows, animal petting zoos, rides and games, displays of farm machinery, and food and livestock exhibits and judging."
Word to check: "multivaried"
AI response:
{
  "word": "multivaried",
  "misspelled": true,
  "correct_spelling": "multifaceted"
}
System prompt for AI model
Determine if a specific word in a provided text is misspelled and, if so, provide the correct spelling.

# Steps

1. Verify the spelling of the word to check against a standard English dictionary.
2. If the word is misspelled, determine the correct spelling.
3. Provide a conclusion on whether the word is misspelled or correctly spelled.
4. If misspelled, offer the corrected spelling.

# Output Format

- A JSON object with the following structure:
  ```json
  {
    "word": "[word in question]",
    "misspelled": [true/false],
    "correct_spelling": "[correct spelling if misspelled, otherwise null]"
  }
  ```

# Examples

**Example 1**

- Input: 
Text: "The quick brown fox jumps oveer the lazy dog."
Word to check: "oveer"
  
- Output:
  ```json
  {
    "word": "oveer",
    "misspelled": true,
    "correct_spelling": "over"
  }
  ```

**Example 2**

- Input:
Text: "The quick brown fox jumps over the lazy dog."
Word to check: "fox"
  
- Output:
  ```json
  {
    "word": "fox",
    "misspelled": false,
    "correct_spelling": null
  }
  ```

# Notes

- Ensure that only the specific word in question is evaluated for spelling accuracy.
- Provide clear and concise spelling corrections if applicable.

I'm not sure about the first one (is it a hashtag on social media?) but the others seem to be correct. Do you think something like this could have uses for the moss project? Phlsph7 (talk) 14:57, 13 December 2024 (UTC)[reply]

This particular batch is from the "ME+" sublist, which are words that moss has identified as coMpound English words, basically two dictionary words stuck together. It's often unclear if the words should be separated or if the compound should be added to the dictionary, and determining that can take a Google Scholar or Google Books search. (Or not, if separate words sounds better and the volunteer decides it's more important to go fast than add every compound seen in the wild to Wiktionary.) I'd have to manually review whether "scverse" should be "SCVerse", if "tythingman" is a compound in British English or something, maybe put "varied" instead of "multifaceted". But these are the sorts of things that human editors do now with the moss listings, and the AI suggestions might speed that along. -- Beland (talk) 20:35, 13 December 2024 (UTC)[reply]
Oh, I should also say that moss could generate suggestions for ME on its own, if we're happy with the automatic suggestion being splitting the word into the two dictionary words. It can also supply suggestions for T1, since those are potential typos that are an edit distance of 1 away from a dictionary word. Certain other sublists like TE, moss would not have obvious suggestions for, and OpenAI might be more of a help for those. -- Beland (talk) 20:38, 13 December 2024 (UTC)[reply]