Page semi-protected

Wikipedia:Bots/Requests for approval

From Wikipedia, the free encyclopedia
< Wikipedia:Bots  (Redirected from Wikipedia:BRFA)
Jump to: navigation, search

BAG member instructions

If you want to run a bot on the English Wikipedia, you must first get it approved. To do so, follow the instructions below to add a request. If you are not familiar with programming it may be a good idea to ask someone else to run a bot for you, rather than running your own.

 Instructions for bot operators

Current requests for approval

Gabrielchihonglee-Bot 4

Operator: Gabrielchihonglee (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 14:28, Tuesday, January 16, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot)

Source code available: will be given after test run

Function overview: Change Cite web template in pages to move ThePeerage's website from parameter "publisher" to "website".

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#thepeerage.com

Edit period(s): One time run

Estimated number of pages affected:

Namespace(s): Mainspace

Exclusion compliant (Yes/No): Yes

Function details: Flow of the bot:

  1. Get all pages with template Cite web and the parameter publisher is ThePeerage's website
  2. Delete parameter 'publisher' and add the link to 'website'
  3. Save


Discussion

Gabrielchihonglee-Bot 3

Operator: Gabrielchihonglee (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 14:40, Monday, January 15, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot)

Source code available: will be given after test run

Function overview: Changing param "symbol" into "symbol_type_article" in infobox template.

Links to relevant discussions (where appropriate): Wikipedia:Bot_requests#Symbol_parameter_in_Infobox_former_country

Edit period(s): One time run

Estimated number of pages affected: Less than 3000 pages

Namespace(s): Main

Exclusion compliant (Yes/No): yes

Function details: Flow of the bot:

  1. Get all pages with template Infobox former country
  2. Change parameter name from "symbol" into "symbol_type_article"
  3. Save

Discussion

  •  Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 01:02, 16 January 2018 (UTC)
Made 1 manual edit to test the theory works. And yes the theory works. :P --Gabrielchihonglee (talk) 01:16, 16 January 2018 (UTC)

Bellezzasolo Bot

Operator: Bellezzasolo (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 22:05, Sunday, January 7, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python

Source code available: GitHub

Function overview: Notify IP user talk pages of replies

Links to relevant discussions (where appropriate): Village pump (proposals)

Edit period(s): Continuous

Estimated number of pages affected: 100/day

Namespace(s): User talk

Exclusion compliant (Yes/No): Yes

Function details: Finds any talk namespace page with {{replyto}} or similar templates which ping IP users (or rather, don't). It then checks for a {{talkback}} or similar template on the IP user's talk page. If not found, it adds a message linking to the first talk page.

Discussion

  • 100/day..hmm. For reference of where this will trigger, please list 10 of these instances from a single 24-hour period (preferably from different talk pages). — xaosflux Talk 00:32, 8 January 2018 (UTC)
100/day looks on the ballpark based on User:Bellezzasolo Bot/Pings.. Galobtter (pingó mió) 09:30, 8 January 2018 (UTC)
I allowed the bot 2 test edits under extreme supervision yesterday as PoC. User talk:86.167.176.35 and User talk:204.130.228.108 demonstrate an example. Bellezzasolo Discuss 10:57, 8 January 2018 (UTC)
Could you make it so it links to the section of the discussion, not just the page? AdA&D 13:54, 8 January 2018 (UTC)
(Non-BAG comment) Looking at User:Bellezzasolo Bot/Pings, I see two ISPs (BT, Deutsche Telekom) that I know to have highly dynamic IP addressing, so unless the bot can ping them quickly, as the Notifications tool does, it is likely to miss the intended user. ​—DoRD (talk)​ 12:48, 8 January 2018 (UTC)
(negative BAG comment) Yeah, oh another point which slipped my mind - it really should just not give talkbacks for long over talk messages (maybe check the signature like {{ping}} requires and make sure its not from over 24 hours ago). I see many from a year ago etc. Which actually means that I'm not sure about how often pinging ip occurs.. Galobtter (pingó mió) 12:53, 8 January 2018 (UTC)
@Galobtter:I just saw this and although it has been days, I think it may be worth correcting.Seems like you intended to write (Non- BAG comment) above. –Ammarpad (talk) 13:38, 18 January 2018 (UTC)
That's why I asked for a list. — xaosflux Talk 13:54, 8 January 2018 (UTC)
Thought of it myself, working on implementing it. 24 hours seems a good figure. If the bot is put in place, I see pinging starting at maybe 10/day, but the existence of the bot may well cause a significant increase, as many are aware and will either not ping or manually perform the bot's action. Bellezzasolo Discuss 13:29, 8 January 2018 (UTC)
Yeah, existence of the bot should increase that rate. Galobtter (pingó mió) 13:33, 8 January 2018 (UTC)
  • What is your planned behavior for multiple mentions of an IP on the same page within any certain period? (e.g. will 3 mentions create 3 talk messages?) — xaosflux Talk 13:54, 8 January 2018 (UTC)
No, only one is created at the moment, no matter the time period. The bot checks for a <!-- [[Template:Please see]] --> and will avoid spamming the page as a result. Bellezzasolo Discuss 14:26, 8 January 2018 (UTC)
So if they are mentioned on 2 pages, they will get notification for the first, but then no further notifications? — xaosflux Talk 14:34, 8 January 2018 (UTC)
Yes, for I air highly on the side of caution. Of course, I could either parse the subst (but I don't trust myself), or insert a hidden template for the required metadata. I've implemented the time sensitive changes for now, and am testing Bellezzasolo Discuss 15:03, 8 January 2018 (UTC)
  • Note:, there is a core request open for this functionality: phab:T58828. Looks like it is assigned to @Cenarium:. Cenarium, care to comment? — xaosflux Talk 14:37, 8 January 2018 (UTC)
  •  Note: This bot appears to have edited since this BRFA was filed. Bots may not edit outside their own or their operator's userspace unless approved or approved for trial. AnomieBOT 15:24, 8 January 2018 (UTC)
It is being assisted with the edits. Bellezzasolo Discuss 16:22, 8 January 2018 (UTC)

Example of live operation - User talk:203.163.242.72 - Talk:Madonna albums discography

Run completed, sleeping
last run: 2018-01-08 16:21:02.304948
Run completed, sleeping
last run: 2018-01-08 16:27:56.799182
Run completed, sleeping
last run: 2018-01-08 16:30:32.524265
Run completed, sleeping
last run: 2018-01-08 16:31:56.958179
Run completed, sleeping

Bellezzasolo Discuss 16:35, 8 January 2018 (UTC)

Also, looking at the notification you're leaving on the talk pages, I think the "I am an experimental bot" notice should (a) be right before the signature, and (b) link to your talk page for people to report any possible errors. Enterprisey (talk!) 22:03, 9 January 2018 (UTC)

MykhalBot

Operator: Mykhal (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:36, Tuesday, December 19, 2017 (UTC)

Automatic, Supervised, or Manual: initially manual, later maybe supervised, then potentially automatic

Programming language(s): Python/PWB

Source code available: https://www.mediawiki.org/wiki/Manual:Pywikibot/Installation

Function overview: ocassional obvious multi-ocurrence typos, factual and/or technical errors fixing

Links to relevant discussions (where appropriate):

Edit period(s): one-time; after proving the good functionality, might be scheduled

Estimated number of pages affected: few tens, initially

Namespace(s): 0 (main)

Exclusion compliant (Yes/No): Yes

Function details: Various. Currently the task is to replace the Google search result URL to be replaced by the intended, i.e. real target URL (however, these first edits were unfortunately reverted, because not-yet approved bot status, and false edit summary alarm).

Discussion

Your function overview is too broad, we don't want bots to wander around articles just to fix random "typos" or "errors". Can you be more specific as to this task? — xaosflux Talk 14:09, 20 December 2017 (UTC)
Xaosflux: On the other wiki site, before this type of URL fix, the bot was doing something much simpler, essentially and translated to EN: replacing obvious unnecessary "% of 100" to just "%" in e.g. "imdb.com rating: 77% of 100" for the movie pages. The nature of the edit was described in the edit summary. But here on the EN wiki, one would really have to make a separate specialized robot for that? I hoped I would use the robot from case to case to automate fixing of various obvious kind of problems, as I notice them. —Mykhal (talk) 22:46, 22 December 2017 (UTC)
.. But for this google URL fix, it would certainly be appropriate to do a specialized bot. I am not 100% sure this kind of fix is not yet covered some bot. So, this would be separate bot approval request, probably with source code published somewhere. But now I wonder if bots must be specialized, how the various smaller casual tasks are being automated. Everyone does not want to use his regular Wikipedia account for that. Mykhal (talk) 23:02, 22 December 2017 (UTC)
MykhalBot recently did what can be called an unauthorized trial run. Of the 16 edits, eight fixed links that were commented out anyway, the other eight fixed links that were included in infobox image parameters and ignored. None of them changed either reader experience or behind-the-scenes functionality of the page in question. Since the bot apparently runs on other wikis too, Mykhal may want to check this edit; it seems the link is broken anyway, but the "fix" seems incomplete to me, with a ,d.aWc remaining at the end of the "true" fixed URL. Huon (talk) 19:34, 21 December 2017 (UTC)
Huon: The bot was not prioritizing, so it was operating on articles obtained in a search query. Even editing the commented URL is useful in my opinion, as the raw google search result tracking URL does not even contain the original search query info, so probably no useful info is lost in the replacement.
I admit the bug causing mentioned garbage appendage to the extracted URL, the problem was not detected in manual/visual inspection of the diff before edit confirmation for this case. (The code was being improved iteratively, but I'm going to change regexp-based URL handling with proven URL parser). Thanks for the notice. But the argument on "reader experience" is not very useful, as many robots do various improvements that do not affect reader experience at all. —Mykhal (talk) 22:37, 22 December 2017 (UTC)
@Mykhal: What's the search query you're using? --AntiCompositeNumber (Ring me) 22:45, 22 December 2017 (UTC)
AntiCompositeNumber: The search query was insource:/google\..{2,6}\/url/. —Mykhal (talk) 22:50, 22 December 2017 (UTC)
Compare WP:COSMETICBOT. That part of the bot policy deals with edits that don't affect what I called the "reader experience". Huon (talk) 23:53, 22 December 2017 (UTC)
  • A user has requested the attention of the operator. Once the operator has seen this message and replied, please deactivate this tag. (user notified) Any update on refining the definition for this bot? As mentioned above non-specific "typo fixing" and "cosmetic only" type jobs are generally disapproved. — xaosflux Talk 15:24, 15 January 2018 (UTC)

IznoBot

Operator: Izno (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 15:02, Saturday, November 11, 2017 (UTC)

Automatic, Supervised, or Manual: Supervised/Manual

Programming language(s): WP:AWB

Source code available: AWB

Function overview: WP:Lint <center> inside of {{US Census population}}

Links to relevant discussions (where appropriate): None available

Edit period(s): One-time run

Estimated number of pages affected: 20,000

Namespace(s): Main

Exclusion compliant (Yes/No): AWB default

Rationale: I identified an opportunity to WP:Lint for <center> in |footnote= of Template:US Census population a few weeks ago (to work on our 8 million errors-worth of obsolete HTML tags). Yesterday I took the time to start hacking at this project on User:IznoRepeat. When I got through the list of items I knew about, I went to see how large the problem was and found that there were 20k pages in mainspace alone. I was already concerned about the rate I was making the edits, so I'm here to request a bot flag for a separate account (User:IznoBot) to work on this problem.

Function details: The exact regex I ended with yesterday was the following:

  • Find (with regex): <center>(.*?)</center>
  • Replace: $1\n|align-fn=center

This is an extremely permissive find pattern and I would be willing to modify the regex if desired to look for the exact parameter name (|footnote=). I will be reviewing most/all edits regardless. This exact find and replace is evidenced at [1].

I also plan to run with general fixes on, which suggested several fixes to me yesterday. One with gen fixes accepted as-is; one with gen fixes suggested which I modified manually.

Discussion

That search patter is indeed to permissive, at the least change it to start with the parameter you care about: |footnote=<center>xaosflux Talk 00:00, 12 November 2017 (UTC)

@Xaosflux: Correct me if I am wrong, but this seems like a WP:COSMETICBOT. It also seems controversial since it is a low priority lint error. If you are going to supervise the edits, Izno, why do you need the bot flag? (Also, I'm not sure I like having my subpages be copied without my knowledge and/or permission.) Nihlus 04:01, 12 November 2017 (UTC)
It may be, I haven't looked at good examples yet. For 20000 repeated edits, it should be a flagged account to avoid watchlist flooding etc (assuming it should happen at all). — xaosflux Talk 04:32, 12 November 2017 (UTC)
That's a fair point; however, I got yelled at in multiple areas about clogging up user's watchlists with my bot when doing medium level lint fixes. I don't think a low priority run would be a good idea. Nihlus 04:35, 12 November 2017 (UTC)
<center> will stop working on Wikimedia wikis at some point in the future (this is a fact), at which point the change is clearly no longer cosmetic. I would call it "egregiously invalid HTML" now given that it's obsolete in the version of HTML that Wikipedia outputs (that is, DOCTYPE html aka HTML 5). This is regardless of its priority for linting, which is assigned by an engineer without solicitation from the community.
I suspect you were having problems mostly because your edits were being made outside the main space (which isn't critical), but maybe I'm not aware of some specific edits. The bot will only run in the mainspace, so "ensuring Wikipedia continues to look beautiful" is the acceptable rationale for most/all people, whereas it is difficult to defend signature cleaning in the same way as it is not outward-facing.
For the flag, Xaosflux covers that nicely. For the supervision, that's due to running gen fixes as well as taking the opportunity to make "better" edits than are suggested for gen fixes, if I identify such (optional behavior; I am happy not to make these suggested changes). I don't expect false positives, but there is always that potential as well. I have no problem with performing the task fully-automated, but you will find no requirement to do so in the policy for the flag. --Izno (talk) 04:55, 12 November 2017 (UTC)
That's fine. Does finding (\| *?footnote *?= *?)<center>(.*?)</center> and replacing with $1$2\n|align-fn=center work for you? --Izno (talk) 04:55, 12 November 2017 (UTC)
@SSastry (WMF): Anything coming in the future with these types of lint errors, considering they are a low priority? Nihlus 05:16, 12 November 2017 (UTC)
We prioritized the linter categories in relation to the goal of replacing Tidy. At this time, on the parsing team, we don't have any immediate parsing related work that depends on the other linter categories. It is up to wikis what they wish to do with these issues. But, I know that some UI folks and designers at the foundation prefer that the obsolete tags not be used (See phab:T175709). Editor on the Italian Wikipedia have been replacing the obsolete tags and have even set up abuse filters for discouraging their use in edits. Hope this context is helpful. SSastry (WMF) (talk) 23:35, 12 November 2017 (UTC)
I'll retract my objection then. I still think a manual approach to fixing 8 million tags is not the best way to go about it, but I won't stop people from trying. Nihlus 23:16, 13 November 2017 (UTC)
@Izno: For 20,000 edits, we're going to need a discussion. Could you start one at the Village Pump explaining what you plan to do, why you're doing it, and asking for feedback or support? As a side note, I'm not comfortable approving the "extra" edits beyond the main task and genfixes. That would basically give your bot broad leeway to make any useful edit under the bot flag, which is not a good idea, in my opinion. ~ Rob13Talk 15:59, 18 December 2017 (UTC)
@BU Rob13: Yes, I'll start one soon-ly. No problem nixing the extra edits, but if so, I'd prefer to run semi for trials and full-automatic for the full 20k. --Izno (talk) 18:10, 18 December 2017 (UTC)
@BU Rob13: Posted at VPT and will cross-post to VPPRO shortly. --Izno (talk) 17:55, 28 December 2017 (UTC)

NOTE: There are 34 pages with |footnote=<center> but don't use Template:US Census population which you'll need to watchout for, see here. -- WOSlinker (talk) 18:10, 28 December 2017 (UTC)

{{US Census population}} could be coded to strip <center>...</center> from the footnote parameter, and behave like |align-fn=center if it's present. It's not efficient to do this in every call but is it worth 20000 edits to avoid it? Will <center>...</center> cause problems if it's still in the source but not the output? The template is transcluded in 30000 articles. PrimeHunter (talk) 00:17, 29 December 2017 (UTC)
@PrimeHunter: I left a response to that question on WP:VPT. My short answer is "yes, that's possible, but I don't see the value given we have 60k other articles from which to remove center". --Izno (talk) 00:20, 29 December 2017 (UTC)

One thing I want to mention here is that, to my understanding, it's not because something isn't HTML 5 that the something will stop being supported. It's certainly possible <center> will stop working, but not being HTML 5 alone usually isn't reason enough to assume this. I'd be uncomfortable approving this as is, short of confirmation from WMF devs that center tag will actually stop working. Headbomb {t · c · p · b} 13:56, 17 January 2018 (UTC)

Bots in a trial period

TomBot 3

Operator: Tom.Reding (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 23:57, Monday, January 8, 2018 (UTC)

Automatic, Supervised, or Manual: Supervised/Manual

Programming language(s): WP:AWB

Source code available:

Function overview: Tag a list of NYC Subway station talks with {{WikiProject New York City}}.

Links to relevant discussions (where appropriate): WP:BOTREQ#Bot to tag article talk pages for WikiProject New York City & User talk:Epicgenius#Broadway Junction (New York City Subway).

Edit period(s): One-time run

Estimated number of pages affected: ~400

Namespace(s): Article talk

Exclusion compliant (Yes/No): Yes

Function details: Tag article talks in this list which don't already contain an alias of {{WikiProject New York City}} with: {{WikiProject New York City | transportation=yes | importance=<inherit from {{WikiProject Trains|NYPT-importance=...}}> | class=<inherit from {{WikiProject Trains|class=...}}>}}, per the user's botreq, the template documentation, and discussion below.

Discussion

  • As the person who originally made this request on WP:BOTREQ, I think the |transportation-importance= parameter would be redundant, since the vast majority of the time, it is already defined under the WP:TRAINS template. epicgenius (talk) 16:43, 10 January 2018 (UTC)
Epicgenius - redundant, or discouraged? Something can be redundant w/o being discouraged. The template made it sound, at the very least, redundant, if not encouraged. I'd rather get it right on the first pass though, and I have no experience with the associated WikiProjects, nor the use of their template, so I just want to confirm.   ~ Tom.Reding (talkdgaf)  16:57, 10 January 2018 (UTC)
In that case, I think the |transportation-importance= parameter in {{WPNYC}} should be the same as the |NYPT-importance= parameter in {{WPTRAINS}}. If there are two different importance parameters for the same WikiProject, it can get a little confusing. epicgenius (talk) 17:00, 10 January 2018 (UTC)
Ok, thanks. Could you edit Template:WikiProject New York City/doc to that effect? Would be good for outsiders like me and newer members.   ~ Tom.Reding (talkdgaf)  17:03, 10 January 2018 (UTC)
I have done it. epicgenius (talk) 19:34, 10 January 2018 (UTC)
Approved for trial (30 edits).xaosflux Talk 04:32, 16 January 2018 (UTC)
 Done 30 edits. Only 58 pages remain due to someone else already doing most of the work.   ~ Tom.Reding (talkdgaf)  16:52, 16 January 2018 (UTC)
  • Note TomBot has been added to the WP:CHECKPAGE as a user, will need to be shifted if/when bot flag is granted. Primefac (talk) 14:26, 16 January 2018 (UTC)

MilHistBot 2

Operator: Hawkeye7 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 21:38, Thursday, January 11, 2018 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Perl

Source code available: User:MilHistBot/membership.pl

Function overview: The Bot checks the list of active members of WikiProject Military History (Wikipedia:WikiProject Military history/Members/Active) and moves members who have been inactive for more than 365 days to the inactive members list (Wikipedia:WikiProject Military history/Members/Inactive). (This is similar to the function performed by the Rick Bot for the list of administrators.)

Links to relevant discussions (where appropriate): Wikipedia_talk:WikiProject_Military_history/Coordinators#Audit_of_Wikipedia:WikiProject_Military_history/Members/Active

Edit period(s): Monthly

Estimated number of pages affected: Two

Namespace(s): Wikipedia

Exclusion compliant (Yes/No): No. Seems unnecessary when we are only dealing with only two designated pages.

Function details: The Bot reads the active members list, and checks for when they were last active. If this was more than 365 days ago, then the member is removed from the active list and added to the inactive list.

Discussion

Approved for trial (150 edits or 35 days). OK to trial, after you think you have good samples let us know. — xaosflux Talk 15:22, 15 January 2018 (UTC)
Thanks. I started it this morning; it will run on the 16th of the month from now on. The diffs are here and here. I have verified that the edits are correct. Hawkeye7 (discuss) 21:57, 15 January 2018 (UTC)

Pi bot 3

Operator: Mike Peel (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 20:56, 28 November 2017 (UTC)

Automatic, Supervised, or Manual: Automatic

Programming language(s): Python (pywikibot)

Source code available: on bitbucket

Function overview: Look through references to references to reports to Cochrane (organisation) to check for updates to them; when found, tag with {{update inline}} [2], and add to the report at Wikipedia:WikiProject Medicine/Cochrane update/August 2017 for manual checking by editors [3]. Also archive report lines marked with {{done}} to the archive at Wikipedia:WikiProject Medicine/Cochrane update/August 2017/Archive [4] [5].

Links to relevant discussions (where appropriate): This was previously run by @Ladsgroup on an ad-hoc basis. I was asked to take over the running of it on a more regular basis by @JenOttawa:. See [6] and [7].

Edit period(s): Once per month

Estimated number of pages affected: Depends on the number of Cochrane updates each month, and the number of references to them. Likely to be a number in the tens rather than the hundreds.

Namespace(s): Mainspace and Wikipedia

Exclusion compliant (Yes/No): No, not relevant in this situation

Function details: The code searches for cases of "journal=Cochrane" in Wikipedia articles, extracts the Pubmed ID from the reference, then fetches the webpage from pubmed and looks for a "Update in" link. If an update is available, then it marks the reference as {{update inline}}, with a link to the updated document, and adds it to the report at Wikipedia:WikiProject Medicine/Cochrane update/August 2017 where users manually check to see if the article needs updating. If it does, then they can update the reference and mark it as {{done}} in the report, and the bot then archives the report when it next runs. If it does not, then it can be marked with <!-- No update needed: ID_HERE --> in the article code, and the bot won't re-report the outdated link in the future. I've made some test edits under my main user account to demonstrate how the bot works, links are in the function overview above. Mike Peel (talk) 20:56, 28 November 2017 (UTC)

Discussion

  • Comment: Is text like "journal=The Cochrane database of systematic reviews" (as in Postpartum bleeding) or "journal = Cochrane Database of Systematic Reviews" (as in Common cold) or the presumably incorrect "title=Cochrane Database of Systematic Reviews" (as in Common cold) or "journal = Cochrane Database Syst Rev" (as in Common cold) relevant to this request? You might want to include those variations. – Jonesey95 (talk) 21:08, 28 November 2017 (UTC)
    • @Jonesey95: The code that's currently used to select articles is generator = pagegenerators.SearchPageGenerator('insource:/\| *journal *= *.+Cochrane/', site=site, namespaces=[0]). That was written by @Ladsgroup, and I'm not sure how to modify it to catch more cases. It also currently returns the message "WARNING: API warning (search): The regex search timed out, only partial results are available. Try simplifying your regular expression to get complete results. Retrieving 50 pages from wikipedia:en." Once the articles are selected, pmids = re.findall(r'\|\s*?pmid\s*?\=\s*?(\d+?)\s*?\|', text) is run on the article text to find the references to update, which will actually catch more than just the Cochrane reviews in the article, but only the references with updates are touched by the code. TBH, I'm not an expert in regexes, so any suggestions you have to improve these would be very welcome! Thanks. Mike Peel (talk) 21:17, 28 November 2017 (UTC)
      • Insource searches have a very low timeout value, so anything with a mildly complex regex will time out. See T106685 for some details. The only way I know of to get around it is to search for multiple regexes in succession, like this:
        • insource:/\| journal =.+Cochrane/
        • insource:/\| journal=.+Cochrane/
        • insource:/\|journal =.+Cochrane/
        • insource:/\|journal=.+Cochrane/
      It looks like the regex you have will catch all of the above cases except the junky "title" instance, which should be fixed manually by someone who knows the right way to fix it. – Jonesey95 (talk) 00:45, 29 November 2017 (UTC)
      • @Jonesey95: I've added a loop that runs each of those regexes in turn, and just for the fun of it I've also added the same set for 'title' as well as 'journal' so it'll try to catch those odd cases. It currently checks 6576 Wikipedia articles in total, which will include duplicates (since I don't currently filter them out - is there a good way to merge and de-duplicate the return values from SearchPageGenerator or PreloadingGenerator?). While 6 out of 8 of the regexes run without timeouts, the last two do still return the warning, but they're "insource:/\|title =.+Cochrane/" and "insource:/\|title=.+Cochrane/" - so if there's not a good way around that then maybe we just live with it (those two queries return 98 and 304 results respectively, which is a lot less than some of the others, so this is a bit odd).
      • I'd like to set this going for a full run soon, if that would be OK? Thanks. Mike Peel (talk) 21:36, 1 December 2017 (UTC)
        • @Mike Peel: Re "is there a good way to merge and de-duplicate the return values", you could maintain a list in-memory of the page IDs/titles that have been processed and skip anything that has shown up before. That may or may not be helpful depending on the amount of duplication. Anyway, I have a broader question. As you said above, the bot actually checks all PMIDs in a given page for updates, not just the Cochrane-related ones; this includes logging said non-Cochrane-related updates on the Cochrane updates page. Is there any potential for this to be a problem? Alternatively, would it be useful to potentially expand the task scope to all PMIDs? — Earwig talk 05:59, 12 December 2017 (UTC)
          • @The Earwig: De-duplicating: that's true, although I was hoping there might be a built-in option. :-) The numbers are fairly small here, and the code should cope fine with a second pass through a page (it'll see the messages left by any previous and not do anything). On checking PMIDs - @JenOttawa: can probably answer this better than me, but my understanding is that most PMIDs will never be updated since they're one-off articles rather than part of a series like the Cochrane ones are, so while we can check for updates to them they won’t be flagged by the bot. If there are any that aren’t Cochrane-related that do have an update, then they’ll be investigated by a human after being posted to the Cochrane page, and we can figure out how to deal with them then. Thanks. Mike Peel (talk) 14:24, 12 December 2017 (UTC)
Thanks for helping here The Earwig and Mike Peel. In my experience, most other PMIDs are not updated like Cochrane Reviews are, however, I can not speak for all journals/publishing companies. Other publications are certainly retracted/withdrawn, but I am also not sure what happens here to the PMIDs. This bot ran for quite a few years and seemed to work very well and be accurate. I performed a large number of the updates (at least 100). This means that I manually went through the citation needed tags + PMID list generated, and there were very few errors. I never saw an incidence where a non-Cochrane Review was flagged with the citation needed tag, for example. I hope this helps and somewhat answers the question. We have spent considerable time on this over the past 12 months, so we are now fairly caught up with the updates. In May 2017 we had about 300 updates to perform. I would expect that a full run of the bot would pull about 50-75 new updates needed (August-December updates that were published by Cochrane), and then if we run with monthly, it would pull about 15-20 a month. This means that the volunteers will be able to stay fairly up to date with the updates, and if there are errors (other reviews pulled, etc) we will be able to correct manually them within a month or so. If you have any other questions, or if there is anything that I can help with, please let me know. I am still learning about this, but we greatly appreciate your assistance on this! JenOttawa (talk) 14:38, 12 December 2017 (UTC)
Thanks for the prompt replies, everyone. This sounds good to me, so let's move forward with a trial run. Since the plan is for monthly runs, let's have the bot complete a full round of updates for this month and we can evaluate it from there. Approved for trial. — Earwig talk 17:57, 12 December 2017 (UTC)
@The Earwig: Thanks, it is now running. Mike Peel (talk) 18:17, 12 December 2017 (UTC)
It's taking longer to run than I was expecting (due to the number of unique pubmed pages it's fetching), but the edits so far seem to be OK. I'm heading offline for the eve now, so if there are any issues then please abort it by blocking the bot. Otherwise, I'll check things in the morning. Thanks. Mike Peel (talk) 23:28, 12 December 2017 (UTC)
Thanks again to both of you. Looks good so far. JenOttawa (talk) 01:28, 13 December 2017 (UTC)

90% of what the bot is marking for updates are to "withdrawn" reviews. I have reverted most of them and updated the one of two that were newer and not withdrawn.

The bot needs to exclude withdrawn articles. It also need to look for the newest version not just the next newer version. Best Doc James (talk · contribs · email) 05:30, 13 December 2017 (UTC)

  • OK, I think this test run has shown two issues - the need to handle withdrawn articles better, and also an intermittent problem with fetching the webpages (which is why the bot stopped at ~0200UT without finishing the run). I'll work on improving those before requesting another test run. Thanks. Mike Peel (talk) 19:00, 13 December 2017 (UTC)
  • @The Earwig: I've now updated the code to ignore updates that have themselves been withdrawn (per @Doc James:), and I'm also using a different package to fetch the webpages that will hopefully avoid timeouts. So I'm now ready to try another test run, if that's OK with you? Thanks. Mike Peel (talk) 16:30, 2 January 2018 (UTC)
    • Can we do 10 and I will than check? Best Doc James (talk · contribs · email) 04:39, 3 January 2018 (UTC)
    • @Mike Peel: That's OK with me. Could you also have the bot add the |date= parameter so AnomeBOT doesn't have to follow it around? — Earwig talk 05:01, 3 January 2018 (UTC)
    • Thanks - I've modified it to edit a maximum of 10 articles, and I've added the date parameter. I'll set it running later today. Thanks. Mike Peel (talk) 06:36, 3 January 2018 (UTC)
      • Now running. The addition of whitespace at [8] was unexpected, but should be fixed in the next run. Thanks. Mike Peel (talk) 07:17, 3 January 2018 (UTC)
      • Restarted due to a bug in the code for the data parameter, now fixed. As a result, the whitespace thing I mentioned in the line above is now fixed. Thanks. Mike Peel (talk) 15:28, 3 January 2018 (UTC)
      • @The Earwig: Now  Done with 10 pages edited. @Doc James: spotted a case where a withdrawn one wasn't caught as the pubmed website didn't use the same punctuation after "WITHDRAWN" in the title, which I've now worked around (by not including the punctuation in the check). The bot did need prodding at one point as it hung again on fetching a page from pubmed, so I'll look into other ways of doing that, but I'd like to do a complete run next please. Thanks. Mike Peel (talk) 16:00, 4 January 2018 (UTC)
        • @Mike Peel: Is it intended for all updates to go to a page titled "August 2017"? This seems confusing. Other than that, I don't have any real concerns. — Earwig talk 20:33, 4 January 2018 (UTC)
Thanks for working on this Mike Peel and Earwig. At this time, I do not have a concern about the updates going to the August 2017 page. Unless we were to put a re-direct in, the volunteers are already using this page and Mike had added the function to archive updates marked as "done". Thanks again, JenOttawa (talk) 00:41, 5 January 2018 (UTC)
  • Okay reviewed them all. Looks good. I think we can go a batch of 100 next? What do you think User:JenOttawa? Doc James (talk · contribs · email) 09:57, 5 January 2018 (UTC)
  • It's easy to change the age if needed, it's using the "August 2017" page as per JenOttawa. I've set it running again now, it will edit a maximum of 500 pages this run, which I anticipate will be a complete set. Then we can switch to running it monthly via a cron job if formally approved. Thanks. Mike Peel (talk) 10:24, 5 January 2018 (UTC)
Thanks Doc James and Mike Peel. I appreciate you reviewing the updates added so far. On the new updates that I have reviewed, I do not see the "update needed" tag added to the WP article. For example,
AArticle Meningitis (edit) old review PMID:18254003 new review PMID:27121755
Everything else looks great so far. The update needed tags are not 100% necessary, how do you feel Doc James? Thanks again, Jenny JenOttawa (talk) 01:20, 6 January 2018 (UTC)
@JenOttawa: The bot added it, but @Doc James: then updated the ref and didn't mark it as done. Thanks. Mike Peel (talk) 06:26, 6 January 2018 (UTC)
Yes sorry Jen, went through them all and did not mark as done. Will do that. Doc James (talk · contribs · email) 06:48, 6 January 2018 (UTC)
  • The latest run just completed, with 6738 pages checked. 0 tagged in this run, so they were all tagged in the previous one. If everything's OK, then perhaps this can be approved/closed, and I'll set it to run monthly from now on. Thanks. Mike Peel (talk) 11:20, 6 January 2018 (UTC)
  • Running this monthly would be very help for Cochrane-Wikipedia volunteers. Thanks Mike Peel and Doc James for all your hard work! Thanks Earwig for your help getting this going as well. JenOttawa (talk) 21:20, 6 January 2018 (UTC)

InfoboxBot

Operator: Garzfoth (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)

Time filed: 04:36, Tuesday, October 24, 2017 (UTC)

Automatic, Supervised, or Manual: Supervised

Programming language(s): Python and mwparserfromhell

Source code available: No source code available at this time, sorry. Example for original functionality: User:InfoboxBot/wikipedia_edit_pages_clean.py Yes (available at User:InfoboxBot/wikipedia_edit_pages_clean.py)

Function overview: This bot would assist me in fixing various widespread yet minor issues with non-standard infobox parameters in articles (primarily focused on issues with Template:Infobox power station and possibly Template:Infobox dam).

Links to relevant discussions (where appropriate): I do not believe that this bot would be controversial - any changes made by it are going to be uncontroversial minor changes.

Edit period(s): As needed (it'll vary significantly). It will not be anywhere near continuous.

Estimated number of pages affected: There are ~2500 articles using infobox power station and ~3500 articles using infobox dam. The number of articles out of these that would be affected by my bot is unknown. For now, let's call it an absolute upper limit of ~6000 affected articles.

Namespace(s): Mainspace only.

Exclusion compliant (Yes/No): No, as in my experience articles with infobox power station or infobox dam on them never use the bots template in the first place. I am not adverse to implementing detection for this template in the future, but I don't see the need for it unless I broaden the scope of the bot's work to different infoboxes.


Function details: I have already scraped all articles with infobox power station and infobox dam in them, placed the infobox data from said articles into a MySQL database, and am using analysis of that dataset/database to discover issues that can be fixed via this approach. Here is a good example of what kind of issues this bot can help me fix:

  • For infobox param "th_fuel_primary": There are 153 articles using the term "[[Coal]]", 90 articles using the term "Coal", 80 articles using the term "Coal-fired", and 14 articles using the term "[[Coal]]-fired". This bot can automatically change the value of "th_fuel_primary" to "[[Coal]]" for the 184 articles that use equivalent terms, resulting in 337 articles that all use the same correct homogenous terminology and are all wikilinked correctly.

So yeah, this is essentially just a specialized high-speed-editing/assisted-editing tool. As far as I understand, it is still possibly classified as a bot and thus I have to submit it to BRFA as I am doing now. I did run this on my personal account for a single run (on the infobox param "status" - changing the non-standard value "Active" into "O" (expands to "Operational") for 185 articles) before realizing that it may be classifiable as a bot (and that I was also performing operations too fast if the bot action speed limits applied - I had quite a bit of trouble locating the actual documentation on this so I had initially assumed that it was the same as the API itself and set a 1s + overhead delay between requests) and stopping. So if you want a demonstration of what this bot does in the real world, just look at the long string of commits in my history with the edit summary "Automated edit: fixing infobox parameter "status"".

Discussion

Could the bot implement some of User:Headbomb/sandbox (expand collapsed sections)? Headbomb {t · c · p · b} 11:07, 24 October 2017 (UTC)
1.a/1.c crash my scraping script, so I’ve already manually fixed those in all affected articles using either infobox dam or infobox power station. I can look into building a new script to locate and automatically fix those types of issues in other infoboxes, it would be an interesting problem to try to solve automatically, but no promises on that since it might not be doable automatically with high confidence.
For the rest, yes, the bot can do at least some of them if not most or all of them (and in fact I was already planning on implementing a number of those items), although it’s going to require additional work to implement them, and my first priority is still going to be fixing the more substantial issues. Garzfoth (talk) 17:36, 25 October 2017 (UTC)
I would greatly appreciate getting a response to at least the specific question of if this use is classified as a bot or not (i.e. does it actually need approval as a standalone bot through BRFA or can I just run it on my personal (or InfoboxBot?) account(s)?)... I have been waiting two and a half weeks for another response and it's getting a bit frustrating. I would prefer to have an account with the bot flag to run it on simply because of the expanded API limits available in that case (and being able to edit without unnecessarily cluttering up anyone's watchlist, since I could then flag my edits as bot-made which allows them to be easily hidden by users if desired), but I do not by any means need the bot flag to operate the program. Garzfoth (talk) 19:58, 11 November 2017 (UTC)

{{BAGAssistanceNeeded}}

It has been over a month since the last response. I would greatly appreciate a response to at least the question highlighted in bold above (is this use even classifiable as a bot or can I just run this as a script on my personal account without approval required?). Thanks! Garzfoth (talk) 21:17, 26 November 2017 (UTC)

From the BOTPOL definitions, the fact that you aren't personally approving each edit means that this is probably a bot, and would likely need to be approved here. It shouldn't be controversial, though. Going through the edits you made (convenience link!), the random sample that I picked all look good. It would be nice if you had some examples of the Coal change, as opposed to just the "Active" to "O" change, however. Even better would be if the code were somewhere BAG members and others could review it - you don't even have to put it on GitHub, as it's just as readable in the bot's userspace.
One important change you should make is the edit frequency: 1 second between edits is too low. For nonessential maintenance tasks, the usual delay is 10 seconds (source: WP:BOTREQUIRE). I'm not a BAG member myself, so I can't grant a trial; so I'll leave the tag here. You should probably fix the rate thing before the trial, though. Enterprisey (talk!) 13:40, 5 December 2017 (UTC)
Thanks for the feedback! I am aware of the editing frequency issue (it's specifically mentioned in my BRFA if you missed it), I would of course change that to 10 seconds between edits for a production run, as I said I only operated that fast in the first place because I originally could not locate the correct documentation on bot policies and had assumed that the general API rate limits applied.
I can't exactly give more precise examples of changes since I apparently wasn't supposed to be running the bot without BRFA approval in the first place, but I suppose I could manually make some example edits to show what the bot would be capable of doing? My main goal originally was just to homogenize a lot of common simple stuff like the coal example, but then I got branched out and started thinking of wider applications, so my application is admittedly a bit open-ended.
As far as the code goes, I dislike open-sourcing anything I've written for personal use until it's been extensively polished because I keep a lot of debug stuff commented out and don't write my commented notes for a general audience, so it gets more than a bit sloppy/unprofessional and I prefer to only publish very clean code unless absolutely necessary. I guess I could strip the comments entirely and publish it more or less as-is though. I'll think about that.
I'll leave the tag up until someone from BRFA can drop by to discuss a trial. Garzfoth (talk) 03:35, 11 December 2017 (UTC)
I've cleaned up and posted the original code used for the Active => O change run: User:InfoboxBot/wikipedia_edit_pages_clean.py Garzfoth (talk) 03:48, 11 December 2017 (UTC)

@Garzfoth: This request has sat for a very long time. I would like to apologize for that.

Minor code review. This line:

	tpl = next(x for x in templates if x.startswith("{{Infobox power station") or x.startswith("{{infobox power station") or x.startswith("{{Infobox power plant") or x.startswith("{{infobox power plant") or x.startswith("{{Infobox wind farm") or x.startswith("{{infobox wind farm") or x.startswith("{{Infobox nuclear power station") or x.startswith("{{infobox nuclear power station"))

would look better as:

    tpl = next(x for x in templates if x.name.matches(["Infobox power station", "Infobox power plant", "Infobox wind farm", "Infobox nuclear power station"]))

Now, my only real concern here is that certain changes can seem uncontroversial on the surface but are actually not once you do them en-masse. The "Active" to "O" thing is surely fine, but whether or not to wikilink "Coal" is something I could see as contentious. How do you determine what the convention is when the most common option is used by only 45% of articles (153/337, per your numbers)? Arguments could exist either way, and it might depend on the article (maybe).

Anyway, let's do a fairly loose trial to get a sense of the kinds of changes you would like to make and how they pan out. If possible, please do a variety of types of fixes, but if you only have a couple in mind right now, that's fine too. Approved for trial (100 edits). — Earwig talk 06:23, 12 December 2017 (UTC)

Thank you for your comments. The code suggestion is extremely helpful, I tested it and subsequently refactored all of my code (including components that have not been published such as the scraping stuff) to incorporate it.
I have thought extensively about the issue of balancing too-minor/controversial changes with real action for a while now. For wikilinking stuff like that I think it's no contest — a wikilink is almost always going to be justified for stuff like that (especially as the infobox is a separate entity and the MOS makes the provision that repeating links in infoboxes is fine if helpful for the readers). For capitalization issues, it's a messier situation, but I think the best approach is to focus on choosing the option that makes the most grammatical sense (something I've tried to clarify with limited research), fits best within the generalized context of an infobox, adheres to the MOS, is the most visually consistent & pleasing with other infobox elements, and corresponds with the established consensus (I can see how popular each option is while analyzing the DB for variables to work on, so that lets me measure the rough level of consensus for existing options). I'm actually really curious if anyone will object to the capitalization standardization I'm using — if it triggers an objection, I'll of course discuss the issue, and if the discussion results are to use non-capitalization for the standard (or whatever else), I can then use the bot to put the articles in line with the outcome of the discussion instead.
I started on the trial run. Here are changes done so far:
IPS parameter
name/key/category
Original value Modified value #
th_technology steam [[Steam turbine]] 2
th_technology Steam [[Steam turbine]] 17
th_technology [[gas turbine]] [[Gas turbine]] 3
th_technology [[Gas Turbine]] [[Gas turbine]] 3
country United States [[United States]] 5[a]
country England [[England]] 5[b]
ps_units_manu_model Siemens [[Siemens]] 3
ps_units_manu_model Vestas [[Vestas]] 2
status Operating O (expands to Operational) 5[c]
status operational O (expands to Operational) 17
status Baseload O (expands to Operational) 6
status Peak O (expands to Operational) 5
th_fuel_primary Coal [[Coal]] 5[d]
th_fuel_primary Coal-fired [[Coal]] 5[e]
th_fuel_primary [[Natural Gas]] [[Natural gas]] 5[f]
th_fuel_primary [[natural gas]] [[Natural gas]] 5[g]
th_fuel_primary Natural gas [[Natural gas]] 5[h]
Total edits made during initial trial: 98
  1. ^ There were 257 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  2. ^ There were 105 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  3. ^ There were 38 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  4. ^ There were 88 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  5. ^ There were 72 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  6. ^ There were 27 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  7. ^ There were 24 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
  8. ^ There were 23 instances to correct, but due to the 100 edit limit on this trial I edited only 5 as examples. There are a number of other examples in this category that were excluded from the trial — this is just a representative example of an edit within this category.
During the run only one edit was reverted (this one), with the reason being "editing tests". The editor in question subsequently thanked the bot's account for a different edit, and I'll be replying to their message on the bot's talk page to explain the matter and see what their views on the capitalization change really are (i.e. did they truly intend to revert or did they simply not notice that the edit actually changed something).
Here is the updated primary bot code, with various improvements made, functionality added, code cleaned up, and most code comments preserved (even the stupid ones): User:InfoboxBot/wikipedia_edit_pages_clean.py
Thanks again! Garzfoth (talk) 14:02, 15 December 2017 (UTC)
WP:OVERLINK applies. You should not be linking countries like the U.S. and England. — JJMC89(T·C) 19:40, 15 December 2017 (UTC)
That seems fair enough for the specific case of countries. Here's a question: if WP:OVERLINK unambiguously applies to the country field, then would it be justified to edit the infobox to remove all country wikilinks for violating WP:OVERLINK? This would mean for example that all instances of country = [[United States]] would be changed to country = United States, and so on and so forth for all the other countries. Garzfoth (talk) 14:25, 19 December 2017 (UTC)

Bots that have completed the trial period

Approved requests

Bots that have been approved for operations after a successful BRFA will be listed here for informational purposes. No other approval action is required for these bots. Recently approved requests can be found here (edit), while old requests can be found in the archives.


Denied requests

Bots that have been denied for operations will be listed here for informational purposes for at least 7 days before being archived. No other action is required for these bots. Older requests can be found in the Archive.

Expired/withdrawn requests

These requests have either expired, as information required by the operator was not provided, or been withdrawn. These tasks are not authorized to run, but such lack of authorization does not necessarily follow from a finding as to merit. A bot that, having been approved for testing, was not tested by an editor, or one for which the results of testing were not posted, for example, would appear here. Bot requests should not be placed here if there is an active discussion ongoing above. Operators whose requests have expired may reactivate their requests at any time. The following list shows recent requests (if any) that have expired, listed here for informational purposes for at least 7 days before being archived. Older requests can be found in the respective archives: Expired, Withdrawn.