Commons:Village pump
This page is used for discussions of the operations, technical issues, and policies of Wikimedia Commons. Recent sections with no replies for 7 days and sections tagged with {{Section resolved|1=--~~~~}} may be archived; for old discussions, see the archives; the latest archive is Commons:Village pump/Archive/2024/11. Please note:
Purposes which do not meet the scope of this page:
Search archives: |
Legend |
---|
|
|
|
|
|
Manual settings |
When exceptions occur, please check the setting first. |
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day and sections whose most recent comment is older than 7 days. | |
October 14
Google's semi-censorship of Wikimedia Commons must end
Please see meta:Community Wishlist/Wishes/Do something about Google & DuckDuckGo search not indexing media files and categories on Commons. I think we can and should do something about Google not indexing most files (including all videos) and category pages on Commons. Prototyperspective (talk) 15:42, 14 October 2024 (UTC)
- It is a private company and if not violating the law, they can do whatever (...) they want. If they choose to ignore stuff on commons - that´s fine. Alexpl (talk) 20:02, 14 October 2024 (UTC)
- I was not saying it's illegal. That may be fine according to law. I wonder if it's fine to Commons that users' contributions are just blacked out and not available to people. Prototyperspective (talk) 21:39, 14 October 2024 (UTC)
- Huge filesizes for photos are a cost factor when it comes to processing and are almost never worth it anyway. I dont blame them from not wanting photos with the megabytes in the three digits to show up, whenever somebody types in a generic searchterm. Alexpl (talk) 14:13, 15 October 2024 (UTC)
- This seems offtopic. 1. Most files on WMC are not many MBs large and this is not about some particular few large files. 2. It only shows gstatic thumbnails in Google Search, not the whole image, and it's the same for DDG and other search engines.
It's absurd to argue that Google's storage or processing would have notable issues that out of the millions of indexed website makes WMC one whose media is not findable.
You can of course defend anti-WMC practices – despite that I don't understand why Commons contributors could be supportive of that – but this point does not make sense, partly because this isn't about the <0.1% of WMC files that are large image files to begin with. Prototyperspective (talk) 14:33, 15 October 2024 (UTC)- This is not the first time I have seen you try to dismiss comments with which you disagree as "off topic", when they are not. Please do not so that. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:46, 15 October 2024 (UTC)
- I said it seems offtopic and I did notdismiss the comment but address it comprehensively. When I say it seems offtopic that is for example because I may have misunderstood it and/or the user may want to clarify how it would be ontopic. I do wonder why you're so super sensitive about me using the word offtopic. The user did say something but did not explain how it relates to this subject and clarifying that with clear language is I think more constructive than beating around the bush. Prototyperspective (talk) 16:41, 15 October 2024 (UTC)
- This is not the first time I have seen you try to dismiss comments with which you disagree as "off topic", when they are not. Please do not so that. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:46, 15 October 2024 (UTC)
- There already is a thumbnail for every file here anyway so not even any need to create any anew. Prototyperspective (talk) 15:30, 15 October 2024 (UTC)
- This seems offtopic. 1. Most files on WMC are not many MBs large and this is not about some particular few large files. 2. It only shows gstatic thumbnails in Google Search, not the whole image, and it's the same for DDG and other search engines.
- Huge filesizes for photos are a cost factor when it comes to processing and are almost never worth it anyway. I dont blame them from not wanting photos with the megabytes in the three digits to show up, whenever somebody types in a generic searchterm. Alexpl (talk) 14:13, 15 October 2024 (UTC)
- I was not saying it's illegal. That may be fine according to law. I wonder if it's fine to Commons that users' contributions are just blacked out and not available to people. Prototyperspective (talk) 21:39, 14 October 2024 (UTC)
- See also meta:Talk:Community Wishlist/Wishes/Do something about Google & DuckDuckGo search not indexing media files and categories on Commons. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:41, 14 October 2024 (UTC)
- There is a commercial interest in steering the search results to commercial and social websites. These generate clicks, not the commons. I do have the impression that Google is much more interested in SDC of files than the Commons categories. Every effort should be made to fill in the P:P180. Google certainly uses the labels in Wikidata as datafeed for the search engines. Also used for educating the translation software.Smiley.toerist (talk) 10:12, 15 October 2024 (UTC)
- Wikipedia itself is indexed rather highly on Google search results though. And it does index images that are used in Wikipedia articles, but this treatment isn't extended to the other Wikimedia projects. (I can't speak for other media files however). ReneeWrites (talk) 18:26, 15 October 2024 (UTC)
- Yes Wikipedia is, but not Commons, the second largest Wikimedia project with a type of content that lots of people are interested in, watch and search for (media of all kinds). It does not index any video on here (at least in my tests I could not find any so far even when searching for the exact title) and images I think are only indexed when they're used in Wikipedia articles and even then often missing from the main results. One part of the proposal is systematic tests/investigations so there is some data on this. I think overall the indexing is pretty bad even when one is searching for a subject that WMC has lots of high quality contents and other image results that are shown are fairly low-quality. One could also focus on the videos. Prototyperspective (talk) 20:32, 15 October 2024 (UTC)
- Google often indexes images that are not in a Wikipedia article. I find plenty if I do specifically an image search. But it doesn't tend to list pages that are mainly an image in its general results, so Commons image pages often don't show in the result if you do a general Google search. - Jmabel ! talk 05:11, 16 October 2024 (UTC)
- Rarely it does, but indexing a random tiny subset of files doesn't change anything about the issue and only makes it harder to notice this. I did not find plenty of images for prior searches I did where I then either used an image not from WMC despite that I know WMC has at least as good images well-organized or used the WMC search. Again, investigations are the first step of what is proposed so maybe you could share your searches. Images certainly shouldn't show up in the general search results (well nearly always) – I made it clear that this is about the Images and Videos tabs of these sites...only when it comes to category pages is this about the general search results. I currently don't have many good examples. Things I searched for (those may not be the best examples) I think included roughly
Rivers from space
andAlgae blooms from space
andSatellite picture of cities at night
. This is not about Google&DDG not indexing any files on WMC. Please let me know if that should be clearer in the proposal. It is about them indexing only very few images (and those are not even the most relevant or best) when it should be many (e.g. in searches where WMC has lots of good-organized files), not showing nearly all categories in the results and not indexing any videos. Maybe it should be clearer that isn't necessarily all Google's fault – the investigations may reveal things Wikimedia community & tech could do to improve its inclusion in external search results – however such steps depend on investigations and don't mean step 2 & 3 are invalid, other things could follow up on that step in addition and shape these two. Prototyperspective (talk) 11:30, 16 October 2024 (UTC)- @Prototyperspective: Colourpicture Publishers. There isn't that many results to begin with, but maybe it's at the top because the category has a description that contains the companies name in it? --Adamant1 (talk) 01:21, 18 October 2024 (UTC)
- Yes, that's the kind of investigations I'm proposing are done large scale and in systematic ways (and well visibly e.g. published in diff) so we can identify cases that are well indexed, find out why, and identify cases that should be well-indexed but aren't and so on.
- It could be that it's at the top because it contains a long descriptive category description – which most cats however don't really need because the category title is self-explanatory – as well as an infobox with all sorts of data. It's not unlikely also because there's few other websites with info on that subject, especially not recent ones that are linked from other pages. As a result of findings like your example, one could for example conduct tests (and/or check the theory via the dataset) whether it's the company's name in the description that caused the cat to show up this high or the description and consider things like adding category-descriptions (partly automatically via WP article leads and/or Wikidata item description). An open letter doesn't have to be as provocative and confrontational as the title of this thread, one could nicely ask Google & Co to improve their results by considering specific things or identified requested changes. Relevant to that is that Google & Co heavily make use of Wikimedia content in all sorts of ways but this isn't about fairly giving back (some media attention however could be due to that and reference that): it would be about them improving their search results for everyone so it shows media or pages that the person searching would likely find useful (e.g. via considering how many files and how many Wikipedia-used files are contained in the category). (When it comes to videos however it seems like purposeful exclusion.) Prototyperspective (talk) 08:24, 18 October 2024 (UTC)
- @Prototyperspective: Colourpicture Publishers. There isn't that many results to begin with, but maybe it's at the top because the category has a description that contains the companies name in it? --Adamant1 (talk) 01:21, 18 October 2024 (UTC)
- Rarely it does, but indexing a random tiny subset of files doesn't change anything about the issue and only makes it harder to notice this. I did not find plenty of images for prior searches I did where I then either used an image not from WMC despite that I know WMC has at least as good images well-organized or used the WMC search. Again, investigations are the first step of what is proposed so maybe you could share your searches. Images certainly shouldn't show up in the general search results (well nearly always) – I made it clear that this is about the Images and Videos tabs of these sites...only when it comes to category pages is this about the general search results. I currently don't have many good examples. Things I searched for (those may not be the best examples) I think included roughly
- Google often indexes images that are not in a Wikipedia article. I find plenty if I do specifically an image search. But it doesn't tend to list pages that are mainly an image in its general results, so Commons image pages often don't show in the result if you do a general Google search. - Jmabel ! talk 05:11, 16 October 2024 (UTC)
- Yes Wikipedia is, but not Commons, the second largest Wikimedia project with a type of content that lots of people are interested in, watch and search for (media of all kinds). It does not index any video on here (at least in my tests I could not find any so far even when searching for the exact title) and images I think are only indexed when they're used in Wikipedia articles and even then often missing from the main results. One part of the proposal is systematic tests/investigations so there is some data on this. I think overall the indexing is pretty bad even when one is searching for a subject that WMC has lots of high quality contents and other image results that are shown are fairly low-quality. One could also focus on the videos. Prototyperspective (talk) 20:32, 15 October 2024 (UTC)
- Wikipedia itself is indexed rather highly on Google search results though. And it does index images that are used in Wikipedia articles, but this treatment isn't extended to the other Wikimedia projects. (I can't speak for other media files however). ReneeWrites (talk) 18:26, 15 October 2024 (UTC)
- Google clearly does take these images into account. I looked up a handful of terms:
Google Images searches |
---|
|
If you narrow your search to CC images, you get more from Flickr and Commons:
Google Images searches - Narrowed to Creative Commons |
---|
|
I don't believe there even is a problem. Sure, results from WMF projects are only 1 or 2 in many cases, but:
- it's not like there was any other site that did have a majority of the top results
- you can improve them by searching for CC content
- Wikipedia was almost always in the results, even if they didn't have a majority in the top images (which there's no reason it should, might I add). I can't say the same about other results I saw, like Britannica, NatGeo, Adobe Stock, etc.
- Google is showing results from Wikipedia, Commons, and even smaller projects like Wikispecies and Wikivoyage, at times .I wouldn't put it past them that they're prioritizing commercial and social sites that run Google Ads (purely speculation from my part, don't take my word for it), but I find it hard to believe that they're straight up censoring, shadowbanning, or otherwise limiting results from WMF projects. Rubýñ (Scold) 17:21, 15 October 2024 (UTC)
- I haven't repeated all the searches to test this, but with the ones I did I only got 1 result from WMF, and it was the image in the infobox of the Wikipedia article about the subject. ReneeWrites (talk) 20:29, 15 October 2024 (UTC)
- I personally use Ecosia to search things and I often just type in something in Ecosia rather than search it here because I am too lazy to use the convoluted Wikimedia internal search method (yes, using external websites to find something is oftentimes easy than the internal "search" engines on Wikimedia websites), but I noticed that in the past few months Ecosia has been suppressing non-Wikipedia Wikimedia websites more, now, this seems to coincide with the switch where Ecosia now mixes in Google Search search results with those from Microsoft Bing, before this change Ecosia exclusively used Microsoft Bing and while I've used Microsoft Bing as my main search enginge since 2011~2012'ish, I switched to Ecosia a couple of years ago (after I saw one of their advertisements on Google YouTube) and I occasionally compare it with Google Search and other search engines. Judging by the fact that Google Search suppresses Wikimedia Commons and Microsoft Bing does this to a lesser extent I assume that this likely is a deliberate choice by those companies. But it could probably also be something internal at Wikimedia websites as all non-article space pages at Wikipedia are also excluded from search engines (meaning that someone cannot find any Wikipedia policy pages unless someone looks for them within Wikipedia, which I've always found to be a rather odd choice).
- Now, we know that Google Search, Microsoft Bing, Ecosia, DuckDuckGo, Yahoo! Search, Etc. all heavily rely on Wikidata, perhaps linking all Wikimedia Commons category pages with Wikidata items might help integrate this website better with search engines, if you think about it, the exclusion of the Wikimedia Commons is exclusively the exclusion of the Wikimedia Commons, I have no trouble finding results from the Wiktionary or Wikivoyage, which probably means that the integration between Wikidata and other Wikimedia websites helps them. Now, I know that "SEO" is considered "a curse word among Wikimedians", but if we want the Wikimedia Commons to show up in search results we most likely do need to link to Wikidata and properly use redirects, alternative titles, translations, Etc. in a way that makes sense. For example, if you search for alternative titles on Wikipedia you get them, like "Communist Germany" in a search enginge you'll find the DDR because "Communist Germany" is a redirect at Wikipedia. Meanwhile, we tend to have highly specific titles and redirects are typically deleted. But my guess is that the main culprit is the lack of Wikidata integration at the Wikimedia Commons, I wonder if files with more optimised structured data also show up in search engine results more as these are dependent on Wikidata items. Alternatively, we could compare if categories with or without Wikidata integration show up more in internet search enginges. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 18:52, 19 October 2024 (UTC)
- Thanks for this interesting info contribution.
- Comparing indexing results between search engines like so and across time (especially after algorithms were reported to be changed albeit it's often probably not announced) could help identify causes and potential mitigation measures.
- I never noticed or thought about search engines not indexing policy and meta pages of Wikimedia sites (nonWMC), if so that's also I think something that would be good to be changed if possible. For example, new editors or readers may search for these with a search engine instead of the internal one. If they searched for a meta/help pages on Commons it's often quite possible they can't find it because they don't show up in the search results even when in the MediaSearch' Categories and Pages tab (issue #8 here).
- [Google & Co] all heavily rely on Wikidata that good integration with Wikidata is a cause for SE indexing or good indexing and that improving that integration are two hypotheses that could be tested. I do not think this is the case much because category pages that are linked to Wikidata items also do not show up and only a tiny sub < 0,01% of files are used in Wikidata items or usable there while most items are somewhere underneath a category that is linked to Wikidata item. I think 'it's not linked to a Wikidata item' or 'it doesn't have structured data depicts statements' would be not much more than false excuses (not necessarily deliberate) for not indexing and I don't see why it would rely on / require it / why it should be expected. Moreover, some categories should probably be well-indexed without being linked to a Wikidata item or linking such would be inappropriate or at least can't be done at scale(?) – e.g. Category:Drone videos with lots of organized content can't even be found in DuckDuckGo when searching for
drone videos wiki
(btw I think it should also show up high for searches likefree drone videos
). The linked proposal however is interesting but I have doubts this can be done both at scale and affects the SE much. Data suggesting such as has any significant effect is also missing. So I don't think it would solve this, e.g. videos on WMC still don't show up in the videos tab and many large categories are already linked. - and properly use redirects, alternative titles, translations, Etc. in a way that makes sense Agree. One option is to sync ENWP redirects of items to WMC so WMC has the same redirects [ie a tool for doing so]. Another is Adding machine translated category titles and this could also be implemented via redirects and be extended to category descriptions. This however is another case that I don't think should be required for the pages to show up in search results but only improve them. It's possible that this would solve this even if it shouldn't be that way due to how pages are ranked. Note that this may require that the category page is an actual url with an actual title and not not the same url with some Javascript dynamically changing the title depending on the user language. Another option of creating redirects of translated titles – Category:Tiere (de; only plural form not singular) currently redirects to Category:Animals – can't be done at scale and may cause issues (such as HotCat autocompletes).
- In any case such comparison data would be great even if it's just a small factor (I doubt it's the main culprit for the plural indexing issues).
- Prototyperspective (talk) 20:03, 19 October 2024 (UTC)
- From everything I've been able to tell, Google does index pages in "Commons" space. For example, do a Google search on "structured data commons" (no quotes). - Jmabel ! talk 16:43, 20 October 2024 (UTC)
- Yes, this is known, e.g. the intro already is about "most" files, not "all" files as well as results' ranking/findability. I've yet got to see a WMC video in the videos tab however. Prototyperspective (talk) 16:46, 20 October 2024 (UTC)
- Sorry I misunderstood your comment Jmabel – it's addressing point #2 and you're right on that.
- Some examples of low-views useful major categories below. Please comment if anybody knows more in regards to why Videos on WMC are not showing in the Videos tab of Google, DuckDuckGo, etc. Maybe one could ask them or see if there's any other large websites whose videos are not shown there (and why).
- Yes, this is known, e.g. the intro already is about "most" files, not "all" files as well as results' ranking/findability. I've yet got to see a WMC video in the videos tab however. Prototyperspective (talk) 16:46, 20 October 2024 (UTC)
- From everything I've been able to tell, Google does index pages in "Commons" space. For example, do a Google search on "structured data commons" (no quotes). - Jmabel ! talk 16:43, 20 October 2024 (UTC)
- Thanks for this interesting info contribution.
- Prototyperspective (talk) 17:23, 26 October 2024 (UTC)
- The 14th most viewed page and the second most viewed category on Commons [1] in also a video category [2]. Views on all Commons pages are quit low there is nothing special with videos on Commons. GPSLeo (talk) 19:13, 26 October 2024 (UTC)
- Yes, even Commons pages with most view get few views which is consistent with the problem description in the proposal. I did not suggest there was something special with videos except that none of them are shown in and indexed in the videos tab of the search engines. Prototyperspective (talk) 19:29, 26 October 2024 (UTC)
- The 14th most viewed page and the second most viewed category on Commons [1] in also a video category [2]. Views on all Commons pages are quit low there is nothing special with videos on Commons. GPSLeo (talk) 19:13, 26 October 2024 (UTC)
- Prototyperspective (talk) 17:23, 26 October 2024 (UTC)
- It's a good thing, if Google keeps us a relative secret. This is a databank for a select audience, that’s hopefully using items for creating content, or research. It's not a social media website for easy access to every airhead in creation, we don't need the level of vandalism, that would surely follow.
- As a matter of fact, we scavenge off commercial websites, without them, we would have limited access to new materiel. It would be detrimental, to attempt to replace them, no good would come of it. Broichmore (talk) 12:26, 29 October 2024 (UTC)
- Even for "select audience" it's known, used and discoverable far too little. They also use the Videos tab for example. Moreover, I do not agree with this elitism. Free media and free knowledge is about society overall not some very small group. With increased use, there would also be increased contributors who watch pages and Wikipedia is used much more and is not overrun by vandalism, it probably doesn't increase linearly with increased public use and even if it would there can be and are technological means to detect vandalism. The site would not replace commercial websites even if far more popular. I do not agree that we scavenge off these either. Prototyperspective (talk) 12:54, 29 October 2024 (UTC)
- So, to wrap this up: you want to upload stuff on Commons and have it shown in google´s services in a predictable way. This would only make sense for either advertising or some sort of campaigning and that is "no bueno". Alexpl (talk) 15:43, 30 October 2024 (UTC)
- No this doesn't wrap it up at all and it's entirely unrelated to advertising or some sort of ad-like campaigning. It's also not about a "predictable way". Prototyperspective (talk) 16:03, 30 October 2024 (UTC)
- Sure. Alexpl (talk) 18:30, 31 October 2024 (UTC)
- Its to bad the Phabricator ticket is stalled out. It doesn't seem like anything else can be done about it outside of that though. --Adamant1 (talk) 19:15, 31 October 2024 (UTC)
- I named three specific things in the linked proposal. These things can be done. Prototyperspective (talk) 21:11, 31 October 2024 (UTC)
- Sure, but I was specifically referring to this discussion. Not suggestions you've made in other proposals. Can anything be done about it in this conversation? Probably not. Can things be done about in other conversations or places? Maybe. But I'm not replying to someone else in another conversation now am I? --Adamant1 (talk) 21:34, 31 October 2024 (UTC)
- I named three specific things in the linked proposal. These things can be done. Prototyperspective (talk) 21:11, 31 October 2024 (UTC)
- Its to bad the Phabricator ticket is stalled out. It doesn't seem like anything else can be done about it outside of that though. --Adamant1 (talk) 19:15, 31 October 2024 (UTC)
- Sure. Alexpl (talk) 18:30, 31 October 2024 (UTC)
- I don't think it's appropriate (let alone necessary) to make assumptions about why someone would support this initiative, especially if those assumptions are going to be bad ones. For my part I just like the information I add to these projects (whether this is Commons or Wikipedia itself) to be findable, but the difference between how the Google search engine treats these two projects is night and day. ReneeWrites (talk) 15:57, 3 November 2024 (UTC)
- No this doesn't wrap it up at all and it's entirely unrelated to advertising or some sort of ad-like campaigning. It's also not about a "predictable way". Prototyperspective (talk) 16:03, 30 October 2024 (UTC)
- So, to wrap this up: you want to upload stuff on Commons and have it shown in google´s services in a predictable way. This would only make sense for either advertising or some sort of campaigning and that is "no bueno". Alexpl (talk) 15:43, 30 October 2024 (UTC)
- Even for "select audience" it's known, used and discoverable far too little. They also use the Videos tab for example. Moreover, I do not agree with this elitism. Free media and free knowledge is about society overall not some very small group. With increased use, there would also be increased contributors who watch pages and Wikipedia is used much more and is not overrun by vandalism, it probably doesn't increase linearly with increased public use and even if it would there can be and are technological means to detect vandalism. The site would not replace commercial websites even if far more popular. I do not agree that we scavenge off these either. Prototyperspective (talk) 12:54, 29 October 2024 (UTC)
- Regardless of the effect size, I doubt we can do much about this directly. The search-engine market is far less competitive than it appears; almost all search engines have Google, Microsoft Bing, or the PRC government behind their backends (see Wikipedia:List of search engines). There are also serious obstacles to market entry, like Cloudflare prohibiting even medium-sized search engines from crawling and indexing the pages they host. So search engine backends wield a lot of oligopoly power, whether they want to or not.
- I'd suggest our most effective move would be to make Commons pages more visible through more specialized, non-oligopoly search tools. For instance, we could make all Commons videos available on PeerTube, a decentralized, ActivityPub-federating video platform. This would make them searchable through Sepia Search. It would also make it possible to download large videos from Commons (which fails often enough that I've given up on it) and make downloading videos faster. We could also reach out to new market entrants like Mojeek.
- We could also raise our profile directly, for instance by encouraging professional groups to use Commons (academics, journalists, people distributing public health information...). Explain that they can be contributors, users of existing content, and requesters of custom content at our graphics labs. Train librarians. Train students. That sort of thing.
- Oh, and we could urge regulatory action to increase competition in the market. HLHJ (talk) 16:16, 10 November 2024 (UTC)
- And how much would that be? To handle that sort of traffic costs more money - for very little benefit to the average user. Alexpl (talk) 16:28, 10 November 2024 (UTC)
- PeerTube is peer-to-peer, designed to keep bandwidth costs down. You can run a server on a desktop computer, like a torrent. Certainly the WMF can afford servers, their main expense is salaries. We could expect new users of our content, because it would make our media available on all ActivityPub-federating platforms, like Mastodon, Pixelfed, etc.. Making content available to new users benefits them and is our basic goal; making knowledge available, to everyone. HLHJ (talk) 02:47, 11 November 2024 (UTC)
- Yes, not much but some things. I listed some of those things, I'll repeat two: 1. doing systematic research and compiling a dataset 2. writing an open letter with some publicity via WMF.
The obstacles to market entry are very interesting, did not know about that cloudflare thing, and things like this could be addressed by digital policy if it was known etc. PeerTube integration could be useful for scaling / reducing server load and large files but I don't think it's helpful here except maybe as an option of what could be done if search engines better index videos and that causes server loads. I never had any issues with downloading videos from WMC. I find Distributed search engines like YaCy interesting but things related to these is not really addressing this issue for probably the next 10 years. The suggestion about proactively reaching out to potential contributors is good but it also wouldn't address this issue – it doesn't improve the indexing and public use/awareness of the site, and how do you explain them why they should contribute here if their media nearly don't get any views? I think whatever reasons people have for contributing to Commons like public education or organizing free media drastically reduce in meaning if the site simply doesn't get used. Most files here are not used in Wikipedias and the file organization, searchability, descriptions, etc are all not relevant if this site is just for hosting files that Internet users can find and make use of when they happen to read the Wikipedia article it's used in. I think before reaching out to potential especially valuable contributors (PEVC?), we should work on solving the problem of the site's use/value/popularity/awareness. I think there's two approaches:- developments and digital policy activity to enable better (e.g. more neutrality and possibly less misinfo-spewing without any warning tags) alternatives (broader)
- all sorts of activity (including digital policy activity but this may not be key or needed here) to improve the few search engines used in the real world (Google, Bing, DuckDuckGo) toward better inclusion of Commons (more impactful, easier, and more immediate)
- If there was an open letter, I think it would probably be good to include some info about the first point but probably more as some sort of supporting context for why the few search engines should index the site & include its contents (eg in the Video tab) better. Maybe this could also boost some activity in regards to developing / helping the development of better alternatives but this is more (or better kept to be) about a real-world-pragmatic thing. Prototyperspective (talk) 17:26, 10 November 2024 (UTC)
- The simplest regulatory method for increasing competition is to make crawl data public. Crawling the web takes massive amounts of time and energy, and there is no objective need for each search engine company to do its own crawl. But big crawls cost millions, so no-one wants to share their expensive asset. It's a huge waste.[3]
- "Contribute so I can use your images on Wikipedia" works. "Search because there are good images you can use here" also works. A copy-paste html code snippet for embedding an image in your website might help. I'd also like better video transcript-making tools, a semi-automated process like OCR on Wikisource, so I don't spend all my time typing out timings. We have an advantage in manual transcripts.
- I just think the chance of major search engines saying "Thank you for your open letter. We'd never thought to make Commons more visible! We should do that!" are nil. HLHJ (talk) 03:01, 11 November 2024 (UTC)
- And how much would that be? To handle that sort of traffic costs more money - for very little benefit to the average user. Alexpl (talk) 16:28, 10 November 2024 (UTC)
- "Should," yes. "Can," well that's a whole other task. The decline of Google search into surfacing spam and AI slop over legitimate content has been extensively reported on this year, and while it would be great if we could singlehandedly un-enshittify Google search it is a problem much bigger than Commons. Gnomingstuff (talk) 00:25, 13 November 2024 (UTC)
- See also this phab ticket (also in margin, no inline template?). We mess up our end, too.
- Trying to make a search algorithm distinguish content written by a Large Language Model seems like an AI-hard problem. HLHJ (talk) 04:44, 14 November 2024 (UTC)
October 31
Almost 400k files need license review
I just did a search of Category:License review needed and subcategories and saw almost 400k files!!!
The result is that some of those files have been marked for review for years and the source die before anyone review the file. Then we have two choises:
- Mark the file for deletion (just like what is standard for recent files that fail upload)
- Keep the file
I'm sure reviewers feel tempted do skip such old files because it does not feel right to delete a file that could have been saved if it was reviewed right after the file was uploaded.
The good news is that many of those files might actually not need a "normal" review to confirm the license. For example a bot can verify a video have the right license but it can't check if there are any derivative work in the video. So it might help if we somehow could sort the files in those that urgently need a review and those that can wait. If anyone have ideas feel free to fix the problem.
If a file is checked 1 or 10 years after upload and no longer available we could create a template like {{Grandfathered old file}} that say that uploader claim the file is licensed freely but we can't verify that (now).
If we do so then we could move files that can't be reviewed from the normal review categories and hopefully it will be easier for reviewers to keep up with new uploads. It's like link rot. We can't fix what is allready broken but we can focus on new files.
Question is if that is an acceptable solution? Or does someone have a better idea? --MGA73 (talk) 16:04, 31 October 2024 (UTC)
- Delete the files. Otherwise, we create a playground for underworked attorneys to hassle Wikimedia/Foundation for years - before we ultimately have to delete those files anyway. Alexpl (talk) 16:55, 31 October 2024 (UTC)
- There is 30k+ files from Finna.fi which could be reviewed by software if somebody would like to write script which compares image to image in Finna and confirms that the licence is correct. I could even write script for that if somebody wants to run it. (note: I am participated to uploading the images). I suppose that there is other images uploaded from well formed repositories with API too which could be reviewed automatically too. --Zache (talk) 17:20, 31 October 2024 (UTC)
- I don't see how (all) files can/should be deleted as long as there is no obvious violation of guidelines or laws (and probably a huge amount of files is good (and several files are in use etc. etc.)) --PantheraLeo1359531 😺 (talk) 17:36, 31 October 2024 (UTC)
- Where exactly are those "400k" files? There are e.g. ~110,000 files in subcats of CAT:URAA (which includes +600 artist categories whose works are potentially affected by URAA paranoia), or ~130,000 files in CAT:PD-Art (PD-old default) (which are in 95% of cases obvious PD-old-70 or similar). There are 'only' 70,000 files using the actual {{LicenseReview}} template, and from my experience it dosen't seem to be the case that those files are more likely to be copyright violations than other any file on Commons (pretty much the opposite is the case). ~TheImaCow (talk) 17:56, 31 October 2024 (UTC)
- @TheImaCow: I agree that many files does not require an actual review but there are other review templates that LicenseReview. For example YouTube, Flickr and GODL-India. That is why I said it might help if we sort the categories in files that should be reviewed where someone confirm that the file is on some website with some license and files that need some other review were we do not need to compare the file to some website. --MGA73 (talk) 18:07, 31 October 2024 (UTC)
- About the files in those two subcats, I was wondering to what extent they are part of the actual license review process (and should therefore only be dealt with by a license reviewer). Unlike PDM Flickr files and those manually tagged for license review, the files wouldn't be in those subcats if the uploader had used the correct templates to begin with. If the uploader could have done that, couldn't any Commons user in good standing just add the relevant tag (or nominate for deletion), without using a {{License review}} template at all? Felix QW (talk) 17:36, 8 November 2024 (UTC)
- @Felix QW: yes I think you are right. We should have 2 different categories. One where we need trusted users to verify that a specific file is on a specific website and has a specific license and one for other types of review that does not require users to check a specific website. --MGA73 (talk) 17:46, 8 November 2024 (UTC)
- I agree. I do think though that when a license review has been requested manually (as for the templates added by ShakespeareFan00), then it should still be dealt with by a license reviewer (as the successor to the previous more specific user group of PD Reviewers), despite not requiring the verification of a specific website. Felix QW (talk) 20:24, 8 November 2024 (UTC)
- @Felix QW: I think any user can remove a request for a license review if they have a good reason. In this case what is needed is that someone find out who wrote the articles and I'm a license reviewer and I have no idea who the author of those articles are. If a non-reviewer knows who did then I see no reason why they can't add that information. --MGA73 (talk) 20:31, 8 November 2024 (UTC)
- I agree that they can, and in this particular case - as you said below - the more precise template should have been used instead of the review template. I think since it is there now though, such a user should add the information but keep the license review template, and then a reviewer checks that everything makes sense and fills in the template. Because understanding the details of global copyright is quite different from verifying pages, COM:PD review was originally a separate process, with a separate user group. Recently, they were integrated into license review and I think we are still working out what that precisely means. Felix QW (talk) 09:52, 9 November 2024 (UTC)
- @Felix QW: I think any user can remove a request for a license review if they have a good reason. In this case what is needed is that someone find out who wrote the articles and I'm a license reviewer and I have no idea who the author of those articles are. If a non-reviewer knows who did then I see no reason why they can't add that information. --MGA73 (talk) 20:31, 8 November 2024 (UTC)
- I agree. I do think though that when a license review has been requested manually (as for the templates added by ShakespeareFan00), then it should still be dealt with by a license reviewer (as the successor to the previous more specific user group of PD Reviewers), despite not requiring the verification of a specific website. Felix QW (talk) 20:24, 8 November 2024 (UTC)
- @Felix QW: yes I think you are right. We should have 2 different categories. One where we need trusted users to verify that a specific file is on a specific website and has a specific license and one for other types of review that does not require users to check a specific website. --MGA73 (talk) 17:46, 8 November 2024 (UTC)
- Where exactly are those "400k" files? There are e.g. ~110,000 files in subcats of CAT:URAA (which includes +600 artist categories whose works are potentially affected by URAA paranoia), or ~130,000 files in CAT:PD-Art (PD-old default) (which are in 95% of cases obvious PD-old-70 or similar). There are 'only' 70,000 files using the actual {{LicenseReview}} template, and from my experience it dosen't seem to be the case that those files are more likely to be copyright violations than other any file on Commons (pretty much the opposite is the case). ~TheImaCow (talk) 17:56, 31 October 2024 (UTC)
- I don't see how (all) files can/should be deleted as long as there is no obvious violation of guidelines or laws (and probably a huge amount of files is good (and several files are in use etc. etc.)) --PantheraLeo1359531 😺 (talk) 17:36, 31 October 2024 (UTC)
- @Alexpl: underworked attorneys could have done that already if they want. Some of the file have been here for many years. If the files are uploaded by users with a good upload history I would not worry that much. If uploaded by someone with only one upload or with 10 uploads where 9 was deleted as copyvios I would worry much more. In any case if someone send a take down notice then I’m sure the file would be deleted even if it had a template saying file was claimed to be free but sadly not reviewed in time. --MGA73 (talk) 05:59, 1 November 2024 (UTC)
- A bot could identify files, that have a source, that is archived in archive.org or archive.is or both and add this information to the talk page of the file. Files without an archive version could get priority for review. --C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 07:05, 1 November 2024 (UTC)
- That is simply most (or so I think) files uploaded with video2commons for example. I don't know why you suggest deletion. They definitely should not be deleted just because somehow a license review tag was added. Most files simply do not have such a tag but are likewise not license reviewed, there is no reason for deleting files that have this template set. Once again I strongly disagree Alexpl but also I don't understand why he would even comment something like that.
- For license review, please prioritize those files that are in use. Various tools like GLAMorgan can be used to see files that are in use that are in category Category:License review needed. This tag / category is useful for that but maybe it should be used more sparingly, e.g. only for uploads by new users or a subset of video2commons uploads and/or the reviewing could be automated.
- Prototyperspective (talk) 12:02, 1 November 2024 (UTC)
- Here's one further idea: a link archival bot for external links on Commons (anywhere but especially in the source field of {{Information}}). There have been many requests & proposals for this in the Community Wishlists and so on but they are usually focused on Wikipedia. It seems like on Wikipedia lots of this is being done. Not so much on Commons except for vid2commons which seems to request an IA-archival for every video/audio import. This recent Wishlist proposal has "All projects" specified so its scope includes Commons; probably more could and should be done: Automatic Archiving of Cited Web Pages in Web Archive. Prototyperspective (talk) 17:27, 1 November 2024 (UTC)
Thank you for all the ideas. It would be great if they could be implemented. :-)
I mentioned a template earlier and I made an example of how it might look:
This image was originally posted to a website and claimed to be licensed under a free license. An administrator or reviewer <user> tried on the <date> to confirm that the above/below mentioned license was valid. However the file was not available on the specified source so the copyright status could not be confirmed. Administrator/reviewer found no indications that the copyright claim can't be trusted. If you disagree you can start a deletion request and state your reasons. |
I think such a template would be useful because it will make it possible to get the file away from the review category and at the same time it tell everyone that there is no reason to asking for a new review. --MGA73 (talk) 16:48, 1 November 2024 (UTC)
- Support such a template.
- we need a bot to go through files with a youtube source and test if the youtube source is ccby. when no, fail the review; when yes, mark it with a template that says something like "bot xx confirms that the given source youtubeURL is ccby" and auto categorises to a category "youtube files reviewed by bot". if a human reviews after the bot review, it gets categorised to "youtube files reviewed by bot and reviewer".
- we also need bots/some better automatic processes for all the iranian news photos.
- RoyZuo (talk) 18:50, 1 November 2024 (UTC)
- Re 2.: Agree. However, it's not so simple: often people upload videos they don't have rights for under CCBY or only mean the music is CCBY but not the video. Sometimes, a different license is specified in the file description but usually that's just CCBYSA or CCBY4.0 instead of CCBY3.0. Sometimes, a license may be specified in the description but not in the file metadata but I think this is an edge case that shouldn't be a problem. Lastly, some files were CCBY at the time of upload but had this changed later on or the video is down. In any case, I don't think most of these 400 k files are videos from youtube. Prototyperspective (talk) 19:08, 1 November 2024 (UTC)
- All the special cases can be handled in a DR started by the bot, or by the uploader replacing the failed review template with one that says "this youtube file fails bot review but is actually good so a human please review it".
- as long as a bot starts working and continues non stop, any new youtube uploads will be handled shortly after upload. then it's the uploader's responsibility to explain all those special cases (changed licence, taken down video...). if they cant do that in like 1 or 2 days after upload, the file deserves speedy deletion.
- https://commons.wikimedia.org/w/index.php?search=incategory:License_review_needed+youtube 17545 / 76125 = 23%. RoyZuo (talk) 19:31, 1 November 2024 (UTC)
- RoyZuo there are allready too many DR to handle. If a bot start thousands then the system will crash. I agree that files that fail a review shortly after upload should be deleted. But I think that a "no source" is better than a DR. --MGA73 (talk) 17:41, 3 November 2024 (UTC)
- Simple: rate-limit the bot to create 10 DR per month for old files (uploaded before the bot starts working). RoyZuo (talk) 19:38, 3 November 2024 (UTC)
- @RoyZuo: 10 DR per month is not even a drop in the bucket, certainly not a reason to use a bot. - Jmabel ! talk 19:41, 5 November 2024 (UTC)
- Simple: rate-limit the bot to create 10 DR per month for old files (uploaded before the bot starts working). RoyZuo (talk) 19:38, 3 November 2024 (UTC)
- I'm happy to design the templates, but i dont have the coding skills for the bot testing youtube url bit. RoyZuo (talk) 19:45, 3 November 2024 (UTC)
- I just noticed that it seems that the YouTubeReview template puts files in both Category:License review needed and Category:YouTube review needed. I think files should be in only one of the categories. --MGA73 (talk) 05:53, 4 November 2024 (UTC)
- RoyZuo there are allready too many DR to handle. If a bot start thousands then the system will crash. I agree that files that fail a review shortly after upload should be deleted. But I think that a "no source" is better than a DR. --MGA73 (talk) 17:41, 3 November 2024 (UTC)
- Support. No legitimate file, specially a good quality one, should be deleted because of lack of information, if that information was publicly available in the past. MGeog2022 (talk) 19:33, 10 November 2024 (UTC)
- Re 2.: Agree. However, it's not so simple: often people upload videos they don't have rights for under CCBY or only mean the music is CCBY but not the video. Sometimes, a different license is specified in the file description but usually that's just CCBYSA or CCBY4.0 instead of CCBY3.0. Sometimes, a license may be specified in the description but not in the file metadata but I think this is an edge case that shouldn't be a problem. Lastly, some files were CCBY at the time of upload but had this changed later on or the video is down. In any case, I don't think most of these 400 k files are videos from youtube. Prototyperspective (talk) 19:08, 1 November 2024 (UTC)
- Comment There was an attempt earlier at Commons:Bots/Requests/EatchaBot 3 / Category:Arranged license review project to make review easier. I think it did help but it have now stopped. Maybe there are some ideas or code that can be of use for future bots. I also like the idea Zache mention about having a bot to confirm that files from Finna match the source. It is probably not possible to make one bot that can solve all problems but it will help if one or more bots can do some tasks and reduce the amount of files that humans have to work on. --MGA73 (talk) 19:40, 1 November 2024 (UTC)
- Comment there is certainly a real issue here, but I have no idea how it would best be addressed. In an awful lot of these cases, the original source is no longer available. - Jmabel ! talk 17:39, 3 November 2024 (UTC)
- There're 6k pd files https://commons.wikimedia.org/w/index.php?search=incategory:License_review_needed+PD . many of them are there probably because of User:ShakespeareFan00 https://commons.wikimedia.org/w/index.php?oldid=519632949 . RoyZuo (talk) 19:57, 3 November 2024 (UTC)
- Yes and that is a different type of review. Even if the source die it will not be a problem. --MGA73 (talk) 20:08, 3 November 2024 (UTC)
- Weird was that ever a publication known for featuring the names of the writers with a large portrait next to the articles?
∞∞ Enhancing999 (talk) 09:38, 4 November 2024 (UTC)- lol and I would have prefered that the review template was remove and the other one was kept. It is more specific. --MGA73 (talk) 14:22, 4 November 2024 (UTC)
- There is also the issue that those files do not land in Category:PD files for review, where they would belong. Now that {{PDreview}} has been deprecated, a mechanism should be found whereby {{LicenseReview}} categorises files into Category:PD files for review instead of the parent category if a public domain statement is queried. Felix QW (talk) 14:11, 10 November 2024 (UTC)
This pops up every once in a while, see Commons:Village_pump/Archive/2023/05#License_reviews and Commons:Village_pump/Archive/2023/04#103,857_unreviewed_files. Multichill (talk) 19:52, 11 November 2024 (UTC)
- If only a good solution would also pop up :-) I think that adding a date could be helpful in some cases. Because new uploads are more likely to still be online. --MGA73 (talk) 20:08, 11 November 2024 (UTC)
November 01
Obtuse bot created categories
Apparently User:Gzen92Bot has been mass creating thousands of categories that only contain a couple of images and basing the names of the categories on the file names. Category:"Papier dominoté. Damier alternant le motif du dé, face cinq, un carré plein, deux carrés avec deux fleurs stylisées différentes, un carré avec un motif " géométrique ", sur fond vert pâle - btv1b10576326x being one of thousands of examples. People can look through Category:Files from Gallica needing categories (images) to find a ton more. Creating 20 word categories based on purely descriptive file names seems sub-suboptimal at best though. More so given that it's being done in mass and through automated editing. I'm not really sure what to do about it though since I'm not an expert on bots. Let alone am I even sure if it's an issue to begin with. But it does seem like a needlessly obtuse way to do things. So does anyone else have an opinion about it or know what can be done done to fix the issue assuming it even is one? --Adamant1 (talk) 04:51, 1 November 2024 (UTC)
- @Adamant1: I fully agree. Creation of >7,000 uncategorized and possibly-nonsense categories is not appropriate. Doubly so given that this does not seem to be an approved task for the bot. I have blocked the bot until/unless the task is approved.
- @Gzen92: This is the third time your bot has been blocked for operating with an unapproved task. Per Commons:Bots#Permission to run a bot, it is not optional to seek approval for bot tasks. Pi.1415926535 (talk) 05:46, 1 November 2024 (UTC)
- @Adamant1: As a regular user with some background in research data management, I completely agree as well. Thanks for pursuing the matter. RobbieIanMorrison (talk) 06:53, 1 November 2024 (UTC)
- Gee .. what's the cleanup plan for these?
∞∞ Enhancing999 (talk) 07:48, 1 November 2024 (UTC)- Please delete all the subcategories of Category:Files from Gallica needing categories (images). Prototyperspective (talk) 11:56, 1 November 2024 (UTC)
- Strong oppose towards such mass deletions. These categories appear to contain similar images, which can greatly aid the manual, proper catgorisation on commons - these categories may or may not be deleted if the images in them have been properly categorized. ~TheImaCow (talk) 16:24, 1 November 2024 (UTC)
- Most of them contain just 2 images. The files would be upmerged. Prototyperspective (talk) 17:20, 1 November 2024 (UTC)
- Strong oppose towards such mass deletions. These categories appear to contain similar images, which can greatly aid the manual, proper catgorisation on commons - these categories may or may not be deleted if the images in them have been properly categorized. ~TheImaCow (talk) 16:24, 1 November 2024 (UTC)
- Please delete all the subcategories of Category:Files from Gallica needing categories (images). Prototyperspective (talk) 11:56, 1 November 2024 (UTC)
- @Adamant1, Pi.1415926535, and Enhancing999: I continued uploading following Commons:Bots/Requests/Gzen92Bot-4, but I agree with the additional categories. I will make a new request (I will indicate the link here soon). This raises questions: there are millions of files to upload and it cannot be done manually, so from how many files should a category be created? How to name the categories (other than with the name of the file)? Following the decision I could easily empty the categories. Gzen92 (talk) 08:19, 1 November 2024 (UTC)
- If you are not able to categorize the photos properly when uploading such an amount of photos you should slow down the upload process and create them manually. GPSLeo (talk) 08:29, 1 November 2024 (UTC)
- Categorisation of images on Commons is not a requirement when uploading images & it shouldn't be - especially not for batch/GLAM uploads. A category such as "Images to check" is sufficient & often much better than automated categorisation. There are still thousands of content categories with random junk in them that was dumped there by automatic categorisation from ten years ago which needs to be cleaned up. A bunch of images, or also a bunch of 500,000 images waiting in a "to check/to categorize" category don't hurt anyone whatsoever, as opposed to poorly done automatic categorisation. ~TheImaCow (talk) 16:24, 1 November 2024 (UTC)
- I made the request. Gzen92 (talk) 17:26, 1 November 2024 (UTC)
- I'm not sure if it's practical in this case but the way I'd do it is to categorize the images by subject. For instance "maps from Gallica", "books from Gallica", Etc. Etc. Then people sub-categorize the images beyond that if they want to. But at least it doesn't lead to a bunch of random categories. --Adamant1 (talk) 18:42, 1 November 2024 (UTC)
- I made the request. Gzen92 (talk) 17:26, 1 November 2024 (UTC)
- Categorisation of images on Commons is not a requirement when uploading images & it shouldn't be - especially not for batch/GLAM uploads. A category such as "Images to check" is sufficient & often much better than automated categorisation. There are still thousands of content categories with random junk in them that was dumped there by automatic categorisation from ten years ago which needs to be cleaned up. A bunch of images, or also a bunch of 500,000 images waiting in a "to check/to categorize" category don't hurt anyone whatsoever, as opposed to poorly done automatic categorisation. ~TheImaCow (talk) 16:24, 1 November 2024 (UTC)
- If you are not able to categorize the photos properly when uploading such an amount of photos you should slow down the upload process and create them manually. GPSLeo (talk) 08:29, 1 November 2024 (UTC)
- Comment I'm not a fan of mass creation of categories with very few files in them (generally I do not like categories with very few files and I prefer to have 20 photos of John Doe in one category rather than to have 10 categories of John Doe in 2020, John Doe in 2021 or John Doe wearing a yellow hat looking west). But now they are created I agree with TheImaCow that it might be better to keep them untill better categories are created. --MGA73 (talk) 18:04, 1 November 2024 (UTC)
- At Commons:Bots/Requests/Gzen92Bot-6 there is now a discussion if the user should be trust to allow more uploads without categorization or cleanup of the current mess.
∞∞ Enhancing999 (talk) 10:46, 3 November 2024 (UTC)- @Adamant1, Enhancing999, TheImaCow, Prototyperspective, and MGA73: the millions of files in Gallica will not be able to be categorized automatically (default maintenance category). So :
- 1) Empty the 7,000 categories of Category:Files from Gallica needing categories (images), put the files in Category:Files from Gallica needing categories (images).
- 2) Continue uploading files to Category:Files from Gallica needing categories (images).
- Is that what you need to do? Gzen92 (talk) 09:43, 8 November 2024 (UTC)
- Instead of 7000 or 50000 categories with strange names will it be possible to make fewer categories and put the files in them? For example 500 categories with more generic names? Putting millions of files in just one category does not sound optimal. --MGA73 (talk) 11:22, 8 November 2024 (UTC)
- User:Multichill can you remember where the mapping of images from Geograph was done? I think perhaps a similar method could perhaps work here. --MGA73 (talk) 11:24, 8 November 2024 (UTC)
- Yes, that's an idea. With the author or what is represented. The problem is that it is not structured data, it's text (example author "Atget, Eugène (1857-1927). Photographe" or title "[Eglise] St Sulpice - Buffet d'orgues dessiné par Chalgrin - A été orné de statues de Clodion : [photographie] / [Atget]"), it's complicated. Gzen92 (talk) 12:41, 8 November 2024 (UTC)
- Some effort is needed to map existing metadata to Commons categories. Professionals at GLAMs should be able to work it out.
- Millions of uncategorized files aren't useful. Files dumps should be avoided.
∞∞ Enhancing999 (talk) 08:31, 9 November 2024 (UTC)
- User:Multichill can you remember where the mapping of images from Geograph was done? I think perhaps a similar method could perhaps work here. --MGA73 (talk) 11:24, 8 November 2024 (UTC)
- Instead of 7000 or 50000 categories with strange names will it be possible to make fewer categories and put the files in them? For example 500 categories with more generic names? Putting millions of files in just one category does not sound optimal. --MGA73 (talk) 11:22, 8 November 2024 (UTC)
The "obtuse" categories group the files by the originating works so they seem to be useful. It should be made sure that they do not interfere with manually curated categories or pages like "special: uncategorized categories" but as long as they stay in their own maintenance system I see no need to mass delete them. More important is to develop rules and a workflow how to proceed with this huge upload. Many of the files are valuable and can be put to good use so a more positive view may be adequate. Does anyone remember Commons:British Library/Mechanical Curator collection ten years ago? I´m not sure whether User:Jheald or User:Pigsonthewing initiated that and they chose a different approach (automated table of contents with a focus on commons workflow and manual upload instead of automated upload) but they may have some advice on the handling of British Library´s french counterpart. I hope they are still around :-) --Rudolph Buch (talk) 10:57, 9 November 2024 (UTC)
While ironing my laundry I thought about it a bit more and have a few suggestions:
- (1) Check if the bot needs these exact category names to avoid double uploads. If yes, we shouldn´t change them for now even though they are strange.
- (2) Make sure that the provenance of the files from Gallica is included by a template in the file descriptions so this information can´t get lost by any recategorization done manually. Same for the uploader information, if Gzen wishes to retain that.
- (3) Allow the manual creation of a set of maintenance subcategories to group Gallica files and cats by country and by object type (e.g. Category:Gallica - Uncategorized buildings in France or Category:Gallica - Uncategorized people of Italy and invite everyone to move (not copy!) all suitable content there. Reason: Anyone can do that kind of rough sorting in a first manual run. For a a finer categorization people with interest and expertise in the specific topic can proceed from there.
- (4) Define how comprehensive an image must be categorized before it can be released from the maintenance categories.
- (5) Create a special Gallica dust bin, e.g. Category:Gallica - files and cats to be deleted, to avoid the complicated nominations for deletion of files and categories that contain have no useful content
- (6) Move all the empty images, backsides of postcards and obsolete categories into the dust bin, but keep and rename all categories that group a series of files like book pages or images from the same artist or style.
--Rudolph Buch (talk) 17:30, 9 November 2024 (UTC)
- I don't think building a parallel temporary hierarchy for a millions of files is the way to go. If there are issues with mapping meta data to our categories, this should be looked at by specialists.
∞∞ Enhancing999 (talk) 17:36, 9 November 2024 (UTC)- The file name is the Gallica "title", I can truncate it or put only the Gallica identifier (btv...).
- I will try to extract all the authors and see how many there are (unique). If there are not too many, I can match them with existing categories.
- Otherwise I can use the date to make categories by year or decade.
- But with so many files, there will always be a need for better human classification. Gzen92 (talk) 21:40, 10 November 2024 (UTC)
- By author, 25,200 cases. About 11,100 complete (example "Dautel, Pierre-Victor (1873-1954 ; sculpteur)") and they must be associated with a category. And very often only family names (example "Dannbach, P"). Gzen92 (talk) 10:27, 11 November 2024 (UTC)
- By date, 4,387 uniques (there are intervals, example "1840-1860"), 563 if I take the first year. With about 1,200,000 images, 2,000 files on average by categories. Gzen92 (talk) 10:50, 11 November 2024 (UTC)
- Hi, I'm also against mass-deletions of actual content. However, Gzen92, my suggestion here is to (regrettably: manually) make a list of images that you want to upload as just one single file, without the reverse, like for example in the current Category:(Paris, hôtel de Châtillon) Profil du corps de logis et des pavillons sur la rue (profil de la cour d'honneur du côté droit, second projet) - (dessin) - btv1b6937302q. The architectural drawing is certainly of interest for Commons, the flipside is not. A bit less than half the categories you create, just have these "2-file cases". If you don't upload the reverse/flipside in the first place, there is also no need to create a category (which will have to get deleted eventually, when interested users process the images). These single-images can then be placed directly in Category:Files from Gallica needing categories (images). Best regards. --Enyavar (talk) 06:41, 14 November 2024 (UTC)
- Hello. Problem of the reverse side: the description is common to all the images of an id, there is no indication "reverse side". 458,000 id so 458,000 BnF pages to go see and choose the photos, it is not possible.
- I propose:
- Subcategory by year in Category:Files from Gallica needing categories (images), for example Category:Files from Gallica needing categories (images of 1880).
- No category for 2 files because often reverse sides (category with 3 or more files).
- At the end of the import, I will manually browse the categories by year to visually identify the reverse sides and move them to a "trash" folder. Gzen92 (talk) 07:23, 15 November 2024 (UTC)
- Hi, I'm also against mass-deletions of actual content. However, Gzen92, my suggestion here is to (regrettably: manually) make a list of images that you want to upload as just one single file, without the reverse, like for example in the current Category:(Paris, hôtel de Châtillon) Profil du corps de logis et des pavillons sur la rue (profil de la cour d'honneur du côté droit, second projet) - (dessin) - btv1b6937302q. The architectural drawing is certainly of interest for Commons, the flipside is not. A bit less than half the categories you create, just have these "2-file cases". If you don't upload the reverse/flipside in the first place, there is also no need to create a category (which will have to get deleted eventually, when interested users process the images). These single-images can then be placed directly in Category:Files from Gallica needing categories (images). Best regards. --Enyavar (talk) 06:41, 14 November 2024 (UTC)
November 04
FYI
For the next few weeks, I'm looking forward to nominating some kind Wikimedians from this project on m:Merchandise giveaways to appreciate their contributions. I nominated @Abzeronow yesterday and I am hopeful that his contributions are valued. You might want to take a look at at the nomination. Regards, Aafi (talk) 09:49, 4 November 2024 (UTC)
- Curious how much that cost? Aren't donations to WMF to run the servers and pay for MediaWiki developments?
∞∞ Enhancing999 (talk) 09:53, 4 November 2024 (UTC)- Perhaps check m:Wikimedia merchandise for this purpose. Regards, Aafi (talk) 10:07, 4 November 2024 (UTC)
- It doesn't say anything about the cost of not selling the merchandise and not spending the charity funds on fixing the misconfigured Commons upload function instead.
∞∞ Enhancing999 (talk) 10:16, 4 November 2024 (UTC)- The stability of uploads (on the server side) has been improved significantly in this year. (this allows more stable upload tools by users. I cannot comment on the Upload Wizard, i nearly never use the Wizard) C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 16:37, 5 November 2024 (UTC)
- Apparently uploads by users at Commons are slowed down or stopped each time another wiki does some large scale cache invalidation, e.g. to add "JsonConfig tracking category" at dewiki (phab:T378352), More about it at Commons:Village_pump/Technical#Upload_Wizard_very_slow.
∞∞ Enhancing999 (talk) 21:15, 6 November 2024 (UTC)- Something discovered only 2 weeks ago. It's getting fixed. —TheDJ (talk • contribs) 09:34, 11 November 2024 (UTC)
- Apparently uploads by users at Commons are slowed down or stopped each time another wiki does some large scale cache invalidation, e.g. to add "JsonConfig tracking category" at dewiki (phab:T378352), More about it at Commons:Village_pump/Technical#Upload_Wizard_very_slow.
- The stability of uploads (on the server side) has been improved significantly in this year. (this allows more stable upload tools by users. I cannot comment on the Upload Wizard, i nearly never use the Wizard) C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 16:37, 5 November 2024 (UTC)
- It doesn't say anything about the cost of not selling the merchandise and not spending the charity funds on fixing the misconfigured Commons upload function instead.
- "Aren't donations to WMF to run the servers and pay for MediaWiki developments?" No. They are to support the foundation, which has as it's mission: 'to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally'. Handing out merch is a very small gesture that is relatively cheap to do and a gesture of appreciation from community to some of its members. Keeping the site up and developing the software is another (infinitely more expensive) part of that. —TheDJ (talk • contribs) 09:43, 11 November 2024 (UTC)
- Perhaps check m:Wikimedia merchandise for this purpose. Regards, Aafi (talk) 10:07, 4 November 2024 (UTC)
- Making such an announcement while having a request for Oversight rights running is a bit odd as it looks like you would try to buy votes. GPSLeo (talk) 16:23, 4 November 2024 (UTC)
November 05
November 06
{{TOO-US}}
When are we actually supposed to use this template?--Trade (talk) 12:24, 6 November 2024 (UTC)
- I don't think it is ever the only alternative, but it can clarify the reason something is {{PD-ineligible}} in the U.S. - Jmabel ! talk 19:29, 6 November 2024 (UTC)
- US law is the “default” on Commons because Commons is hosted there. I’m not sure what purpose this has. Dronebogus (talk) 12:17, 9 November 2024 (UTC)
- It's one of the oldest license templates we have and still used by many older files. --Rosenzweig τ 12:12, 11 November 2024 (UTC)
- US law is the “default” on Commons because Commons is hosted there. I’m not sure what purpose this has. Dronebogus (talk) 12:17, 9 November 2024 (UTC)
November 07
November 08
Implicit dual-licensing
Commons:Deletion requests/Files found with "with an active link required" recently concluded that if somebody CC-licences a photo and specifies additional restrictions on its usage, this is meaningless, and all they've actually done is dual-license it. Anybody who wants to reuse the image can choose the base CC licence and ignore the additions because any condition provided for outside of the license is not part of the license and does not constitute an additional restriction.
Should we put an explanatory template on such files? Commons visitors would be forgiven for assuming that such conditions were additional restrictions, possibly in Commons' voice, that had to be obeyed. Belbury (talk) 11:07, 8 November 2024 (UTC)
- Do we need to retain the text describing the non-free license at all? If we're confident that the files can be reused under a CC license, we shouldn't need to retain information about alternate licensing terms. Omphalographer (talk) 04:13, 9 November 2024 (UTC)
- Commons:Multi-licensing says to retain this kind of thing, that Commons "tries to preserve mention" of overly restrictive licences (such as non-commercial ones) when they're multi-licenced alongside a valid free one. Belbury (talk) 18:55, 13 November 2024 (UTC)
November 09
November 10
Copy captions and alt texts from Wikisource?
Separately, wWould it be possible to automatically copy the captions and alt descriptions added to thea book transcription, adding them to the pages of the corresponding Commons files? Example in links. HLHJ (talk) 22:52, 10 November 2024 (UTC). (edited HLHJ (talk) 15:33, 11 November 2024 (UTC))
November 11
Commons mentioned in Hyperallergic
When Copyright Transforms the Right to Remember at Hyperallergic. Subtitle: "Images of “We Are Our Mountains,” an Armenian monument in occupied Artsakh, have disappeared from Wikimedia Commons in the months since Azerbaijan’s invasion."
Doesn't look like there's anything to be done. Artwork created in the USSR, then in [an area internationally regarded as part of] Azerbaijan, which only has non-commercial FOP. But some legal speculation in the article that may be worth discussing. — Rhododendrites talk | 00:26, 11 November 2024 (UTC)
- All I can see is an empty white screen. Seems they're hyperallergic against Firefox. -- Herbert Ortner (talk) 09:30, 11 November 2024 (UTC)
- Better discuss this at Commons talk:Freedom of panorama, where it is currently being discussed. The topic forum is becoming fragmented (also including a message on my talk page). JWilz12345 (Talk|Contributions) 01:42, 13 November 2024 (UTC)
For those wondering why you got unsubscribed from commons-l...
First, I am sorry. It was me, hastily clicking "confirm" to remove all subscribers instead of specific user I wanted to remove.
[06:22:19] <revi> oh shit
[06:23:07] <revi> I just clicked "remove all members" for commons-l and mindlessly clicked "confirm", would it be possible to undo... this catastrophy?
Yeah, I am stupid. Mea culpa. What I wanted to do was "unsubscribe that fakemailgenerator user", but I ended up clicking "remove all" instead of "remove selected".
I filed a task to see if WMF can undo my grave mistake. Again, I am sorry for all those confused.
After calming myself down, I just took second look on subscriber lists, and it seems like... I closed the browser fast enough to stop truly removing everyone, so people with email address K (and later in latin alphabet) survived, but A to K was affected.
Well, those who received this in your inbox is probably unaffected, so... if someone asks, tell them to resubscribe or wait to see if WMF can resubscribe you. :P
(Pasted from my posts to commons-l)
Yes, I am certified to be stupid at this point. Sorry for those who got unsubscribed. — regards, Revi 06:51, 11 November 2024 (UTC)
- I think you could blame the interface.
∞∞ Enhancing999 (talk) 07:05, 11 November 2024 (UTC)- Maybe, but I should have read that RED button more carefully. :-p — regards, Revi 07:21, 11 November 2024 (UTC)
- Note: Database got rolled back and (unless you manually subscribed again) you were automatically re-subscribed with your preferences intact. (If you manually re-subscribed, your preferences are not restored.) — regards, Revi 08:56, 14 November 2024 (UTC)
Broken link for share-alike clause in Template:Cc-by-sa-3.0, Template:Cc-by-sa-2.0, ...
The same problem also affects Template:Cc-by-sa-2.0, Template:Cc-by-sa-4.0,3.0,2.5,2.0,1.0 (and related CC templates), showing this SA clause (and in many translations of these templates, if they use wikilinks with the prefix alias, instead of the canonical prefix, or plain external links, see Category:CC license tags). The problem is apparently caused by externally translated messages such as {{int:Wm-license-cc-conditions-share alike-text (Cc-by-sa-2.0)}}
. Everything is admin-protected and cannot be fixed (the problem is also present in the English translation source).
- See also Template talk:Cc-by-sa-3.0#Broken link for share-alike clause, for the 1st template that I detected and where I 1st signaled it.
So this affects A LOT of file description pages on Commons, whose licencing conditions are NOT displayed correctly. verdy_p (talk) 08:09, 11 November 2024 (UTC)
- Someone accidentally added an extra line break on MediaWiki:Wm-license-cc-conditions-share alike-text/fr. Already fixed on Translatewiki, but might take some time to show here so I modified the local version for now. Multichill (talk) 17:47, 11 November 2024 (UTC)
Unknown station and historic coaches in Denmark
Where could this be? It is along an electric line in Denmark. I suppose this ia historic collection of coaches used for some special train.Smiley.toerist (talk) 11:23, 11 November 2024 (UTC)
- The picture was taken at Hillerød Station. The historic coaches belongs to Nordsjællands Jernbaneklub. --Dannebrog Spy (talk) 12:10, 11 November 2024 (UTC)
Copyright for street poster?
I have taken a photo of a pro-Palestine poster that was hanging outside in the street, it is not visible who created the poster. Am I allowed to upload it or is there a copyright for the poster? Supreme Deliciousness (talk) 14:17, 11 November 2024 (UTC)
- There can be: it all depends on where the poster was hung. E.g. in the United States, the threshold of originality is higher than in Australia, so a design has to be more complex before it is even able to be copyrighted in American than down under. —Justin (koavf)❤T☮C☺M☯ 14:21, 11 November 2024 (UTC)
- It was in Amsterdam, NL. The design is not complex. Simple political, caricature and slogan, no author name. can I upload it? --Supreme Deliciousness (talk) 14:37, 11 November 2024 (UTC)
- I recommend you take a look at Commons:Copyright rules by territory/Netherlands and use your best judgement. If you make a mistake, it's not the end of the world. Someone making a good faith upload that happens to be a copyright violation will not face some kind of punishment: the file just gets deleted and we all move on. Looking forward to seeing it. —Justin (koavf)❤T☮C☺M☯ 14:52, 11 November 2024 (UTC)
- Even if there is a copyright for the poster (and a caricature is probably artistically complex enough to incur copyright), the Netherlands has some Freedom of the Panorama, so if you photographed it from a public street it's probably okay. It's not as though you are likely to be infringing on commercial profits or otherwise harming the author; the author probably wants people to see their poster and register their protest, and does not want or expect to make money off selling posters. This is actually a legal consideration. HLHJ (talk) 15:04, 11 November 2024 (UTC)
- I am attempting to upload now and when I ad "FoP-Netherland" with "{{}}" it still says "The wikitext you entered doesn't contain a valid license template." --Supreme Deliciousness (talk) 15:55, 11 November 2024 (UTC)
- Unfortunately, these arguments cannot be taken into account. Copyright is also granted for any kind of advertisement --PantheraLeo1359531 😺 (talk) 15:13, 11 November 2024 (UTC)
- Depending on the jurisdiction, they can. For instance, in Canada, while ads are copyright, almost any third-party reproduction of them is legal.[4] I think there was even a case where someone posted a movie trailer on YouTube without comment, and the movie-maker sued (because the trailer was awful and people mocked it), and their claim got rejected on the grounds that reposting an ad was fair dealing. And the international Berne 3-part test is pretty much those considerations (but does not mention the rights of non-copyright holders, which are important in Canadian law). HLHJ (talk) 15:51, 11 November 2024 (UTC)
- Even if there is a copyright for the poster (and a caricature is probably artistically complex enough to incur copyright), the Netherlands has some Freedom of the Panorama, so if you photographed it from a public street it's probably okay. It's not as though you are likely to be infringing on commercial profits or otherwise harming the author; the author probably wants people to see their poster and register their protest, and does not want or expect to make money off selling posters. This is actually a legal consideration. HLHJ (talk) 15:04, 11 November 2024 (UTC)
- I recommend you take a look at Commons:Copyright rules by territory/Netherlands and use your best judgement. If you make a mistake, it's not the end of the world. Someone making a good faith upload that happens to be a copyright violation will not face some kind of punishment: the file just gets deleted and we all move on. Looking forward to seeing it. —Justin (koavf)❤T☮C☺M☯ 14:52, 11 November 2024 (UTC)
- It was in Amsterdam, NL. The design is not complex. Simple political, caricature and slogan, no author name. can I upload it? --Supreme Deliciousness (talk) 14:37, 11 November 2024 (UTC)
I have uploaded it. https://commons.wikimedia.org/wiki/File:Pro-Palestinian_Resistance_Poster_Amsterdam.jpg#%7B%7Bint%3Alicense-header%7D%7D If there is something wrong, please fix it. --Supreme Deliciousness (talk) 16:21, 11 November 2024 (UTC)
- To answer your question, this is photograph is fine. The relevant ruling is here: COM:FOP Netherlands. Specifically these parts:
- It is not an infringement of copyright to reproduce and publish pictures of a work, as meant in article 10 (...) which are made to be permanently located in public places, as long as the work is depicted as it is located in the public space.
- With regards to "permanent": Article 18 is limited to works that were originally made for being placed permanently in public places. The literature mentions that this would also apply to graffiti, even if these normally are removed rather quickly. This is consistent with the interpretation of "permanent" e.g. in Germany as explained here; the "natural lifetime" of a graffito is considered to end with its removal.
- While a poster is not the same as graffiti, the same principle applies. This is also extended to things like public advertisements, which also feature copyrighted material.
- You start entering risky territory, copyright-wise, when you divorce this image from its context (this being the public space). For instance, if you made a derivative version of the poster that's a .svg of the graphical elements of the poster, those are probably still copyrighted and may be deleted, despite being a derivative of a free image, even if no author is known. ReneeWrites (talk) 23:39, 11 November 2024 (UTC)
Multilingual signature design
I would like to design my signature myself and would like the word for the talk page to be adapted to the language set. Which code do I have to use in the wikitext? --KimKelting (talk) 16:31, 11 November 2024 (UTC)
- KimKelting, see WP:CUSTOMSIG for details. Note that it is on English Wikipedia. Ratekreel (talk) 18:19, 11 November 2024 (UTC)
- Oh, sorry, disregard the above reply. You can add
{{int:Talkpagelinktext}}
, which will display the word for talk page according to language. Ratekreel (talk) 18:26, 11 November 2024 (UTC)- When I enter this, it makes
{{SUBST:int:Talkpagelinktext}}
out of it KimKelting (talk) 07:37, 12 November 2024 (UTC)
- When I enter this, it makes
November 12
Charts extension is about to be deployed
Hey everyone,
As a heads up, WMF is preparing to deploy the Chart extension to Commons the week of Nov 25th, 2024, with deployment to pilot wikis soon after. Charts are already enabled on testwiki and testcommons, where you can find the documentation. The extension has been designed to use the Commons Data namespace as the central store for definitions and datasets, making it easy to include a chart on any wiki.
We know that visibility into pages in the Data namespace is low, creating gaps in the current ability to patrol it. While the initial deployment to pilot wikis should be minimally disruptive, we are considering improvements to the Data namespace that would help make storing charts on Commons sustainable in the long run. We're open to suggestions about what other improvements you’d like to see and we are available to answer any questions you have about the deployment.
Thanks in advance for you help! -- Sannita (WMF) (talk) 10:32, 12 November 2024 (UTC)
- @Sannita (WMF) Thanks for the info. The main reason the Data namespace is flying under the radar of most Commons users is almost certainly that it doesn't work with Categories (see phab:T242596). Categories are out main way of organizing Media. If
Data:
pages are not showing up in Categories, for most people over here they might just as well not exist at all. Reason number 2 would be lack of Structured Data integration (phab:T235332) - which is somewhat surprising given how much StructuredData has been pushed by WMF/WMDE in the past. Don't you folks talk to each other across teams? El Grafo (talk) 18:58, 13 November 2024 (UTC) - You mentioned testcommons but testcommons shows that it is disabled and editing there is not possible. And one question: If the community decides to block anon users from editing charts can this be done through a config change or do we need to create an AbuseFilter if we want to block them? GPSLeo (talk) 19:23, 13 November 2024 (UTC)
Charts built with OECD Data
Hello,
I have created an updated version of this chart: https://commons.wikimedia.org/wiki/File:Tax_revenue_as_a_percentage_of_GDP_(1985-2014).png It depicts data from OECD Data Explorer.
According to https://www.oecd.org/en/about/oecd-open-by-default-policy.html this data - which was published before before 1 July 2024 - is "generally available for commercial and non-commercial purposes on terms similar to CC BY 4.0."
The Terms & Conditions linked state:
You must give appropriate credit to the OECD by using the citation associated with the relevant Data, or, if no specific citation is available, you must cite the source information using the following format: OECD (year), (dataset name),(data source) DOI or URL (accessed on (date)). When sharing or licensing work created using the Data, you agree to include the same acknowledgment requirement in any sub-licenses that you grant, along with the requirement that any further sub-licensees do the same.
How would i correctly label this work in the upload wizard? It contains the work of others (the data by OECD), but it is not licensed under one of the free licenses (only a "similar" one).
Is it enough to label the data as licensed under a free license, publish under CC BY 4.0 and add a source in the summary? — Preceding unsigned comment added by Aryezz (talk • contribs) 11:34, 12 November 2024 (UTC)
- Is it enough to label the data as licensed under a free license, publish under CC BY 4.0 and add a source in the summary? I think the answer is yes.
- Moreover, I'd be interested in whether one is required to use data that is explicitly PD/CCBY for charts – I think one could also use other data for the creation of datagraphics as long as the image is CCBY (eg due to being self-made). Prototyperspective (talk) 18:47, 12 November 2024 (UTC)
- @Aryezz, yes, as Prototyperspective says, data by itself is not copyrightable. As long as only the data and not its original presentation, format, style or literal wording are used, data can be taken even from completely non-free sources (let's say, for example, Encyclopædia Britannica). MGeog2022 (talk) 14:35, 17 November 2024 (UTC)
- You must give appropriate credit to the OECD
- By this, they are meaning that you should mention OECD as the origin of the data. Even if they try to place additional restritions on the usage of publicly availble data, I doubt it can have any legal validity. For example, if in a non-freely licensed publication you say that country X has a population of 1 million, you can't restrict third parties from using that information in any way they want, even if you try to put those kind of restrictions in a written form. I believe the only exception to this would be confidential information. MGeog2022 (talk) 14:41, 17 November 2024 (UTC)
- @Aryezz, yes, as Prototyperspective says, data by itself is not copyrightable. As long as only the data and not its original presentation, format, style or literal wording are used, data can be taken even from completely non-free sources (let's say, for example, Encyclopædia Britannica). MGeog2022 (talk) 14:35, 17 November 2024 (UTC)
Why categories "London by topic" and "Porto by topic" act differently #2
This follows Commons:Village pump/Archive/2024/09#Why categories "London by topic" and "Porto by topic" act differently
@JotaCartas, Jmabel, and Joshbaumgartner: I investigated. The two categories London by topic and Porto by topic both include {{Country category}}. This uses {{Country category/data}} with by=topic
and name=London
or name=Porto
. {{Country category/data}} collects the informations that {{Country category/layout}} finally display. The problem is that {{Country category/data}} use a lot of {{Country label}} that recalls {{Country label/K}} for London (and a row exists) and {{Country label/N}} for Porto (and a row doesn't exist). ZandDev (talk) 13:05, 12 November 2024 (UTC)
November 13
Project scope: question concerning videos
Hello,
I have a question, or a request for opinions, about our project scope concerning video files. While working on license reviews, I happen now and then over video files without sound; at the source (like Youtube), the clips do have sound. I do not know for every case why the audio data was removed, it is likely so to avoid copyright infringements. I challenged one of these files with a deletion request for being out of scope as lacking educational usefulness. This opinion seems to get challenged by Green Giant among others in this discussion. On this deletion request page, there are already clashing opinions, with Srittau supporting the notion of a lack of usefulness.
I, on my part, do think that subtitles are not enough to heave a tampered video with sounds removed over the threshold of educational usability. I'd rather have a nicely curated media repository instead of a heap of data with little usefulness, even if this means that the amount of video data for Commons gets reduced as a result. There is no point in removing useful data – vocal information may e.g. serve for people endeavouring to learn a language, more so than subtitles. Of course, videos that are already published without sound as a concise decision by a videographer would still be allowable. What does the majority think? Shall video clips with sound data removed in order to avoid copyright issues that have sound at the source be unconditionally seen as in scope (barring other issues) or is the sound removal a valid reason for deletion? Regards, Grand-Duc (talk) 03:25, 13 November 2024 (UTC)
- I also think if a video is published under a CC license and we challenged the legitimacy of this claim for the audio I would also not trust this claim for the video. In most cases I would delete the entire video per COM:PCP. If there are explicitly separate licenses for video it is something different. In such cases I would keep the video only version. GPSLeo (talk) 07:07, 13 November 2024 (UTC)
- is the sound removal a valid reason for deletion No, it is not. Exceptions include if the audio is an essential part of the video (and with no plausible substitution any time soon). Prototyperspective (talk) 07:16, 13 November 2024 (UTC)
- Actually the source is not under a free license. So the issue is not scope, but copyright. Yann (talk) 09:55, 13 November 2024 (UTC)
- More generally, the only cases where the video is OK but not the sound are old films with a new soundtrack. I have never seen a recent free video with a copyrighted sound. Yann (talk) 09:57, 13 November 2024 (UTC)
- There are lots of videos with nonfree sound that have their sound muted (including recent ones). Good time to mention that somebody should take care of Category:Videos containing non-free audio as well as the other cat linked there. It can be a bit more difficult to fix in an optimal way when only parts of videos extensively contain nonfree audio while other parts contain useful speech audio that would be good to keep. Prototyperspective (talk) 10:01, 13 November 2024 (UTC)
- I've seen plenty. A common one is conference presentations where the conference video was released as CC-by-sa 4.0, but where the conference organizer had copyrighted intro/outro/background music at the venue that nobody had considered. —TheDJ (talk • contribs) 11:21, 13 November 2024 (UTC)
- In such a case I would cut away the break entirely. If there is a speaker and from the neighbouring room there is some music audible it would falls under de minimis. GPSLeo (talk) 07:43, 14 November 2024 (UTC)
- I also think that this particular video is not very useful this way. And even with subtitles, it is questionable AND you are modifying the video to a level that materially alters it, while not being very distinct from the original. Japan has moral rights, which means that the author is allowed protection of the integrity of the work. I think it can be argued that that integrity get pretty broken down here and I think it is not a good look for our project. —TheDJ (talk • contribs) 11:26, 13 November 2024 (UTC)
- @TheDJ: If it is free-licensed in a way that allows derivative works, "integrity of the work" would seem moot. - Jmabel ! talk 18:34, 13 November 2024 (UTC)
- I'd like to place a clarification of my ideas that seems to be necessary. There are in my opinion two different crowds of Commons contributors, of course with large overlaps. One of these crowds are uploaders, the other are maintainers. The maintainers take care of operations like license reviewing, file moving, categorization and so on. I do see an obligation to provide good quality data among the uploader crowd so as to not unnecessarily add to the maintainer workload. Completely removing audio so as to filter out possible copyright infringements of the original videographer on media like interviews or vocal explanations is not a suitable way of working, I dare to say. I'd rather have less videos than clutter our repository with media with dubious usability at best that will hide the good works in their mass. Is this something that could be working into a RFC or policy? Regards, Grand-Duc (talk) 00:05, 14 November 2024 (UTC)
- As a general matter: there is probably a lot less user review of audio/video uploads to Commons than there should be. Reviewing video content requires dramatically more time and effort than reviewing images; even with the smaller number of files being uploaded, many are probably not getting viewed at all. Omphalographer (talk) 20:16, 14 November 2024 (UTC)
Tramtype Wroclaw
Unfortunately there is no wikipedia articles wich list the tram numbers of the tramtypes. I'm looking for 2242. It looks like Konstal 105Na, but I am not certain. Smiley.toerist (talk) 10:59, 13 November 2024 (UTC)
- Solved, I found a close number (2250) in File:Konstal 105Na, -2250, MPK Wrocław (35054236092).jpg.Smiley.toerist (talk) 11:05, 13 November 2024 (UTC)
Long-term disputes on various wikis involving a cross-wiki IP author
There are numerous disputes involving an IP user indulging in cross-wiki spam, particularly articles on West Germanic varieties. I am hounded for a while.
The probable IP adresses indlude:
- 2003:de:3717:716f:e95b:e6c7:5bb:48f5
- 2003:DE:370C:38E4:4448:5249:EA82:E5FA
- 2003:DE:3717:718E:65C8:BEBB:58D6:1D36
- 2003:DE:3717:716F:5DCE:8967:6BA9:C376
- 2003:DE:3700:A013:B8D1:4127:BE29:FBC6
https://en.wiktionary.org/wiki/Special:Contributions/2003:DE:370C:38E4:4448:5249:EA82:E5FA has a current block. This probably is the same person. A particular hobby of this user is to revert me on wiktionary, if I write that Hollandic isn't part of Low German. What shoukl — Preceding unsigned comment added by Sarcelles (talk • contribs) 17:46, 13 November 2024 (UTC)
- @Sarcelles: Is this some sort of request for administrative action? If so, it belongs on the appropriate Administrators' noticeboard, not on the Village pump. Conversely, if it is something you are just bringing up for general discussion, I don't know what you want discussed. - Jmabel ! talk 18:37, 13 November 2024 (UTC)
- None of these accounts have edited in recent weeks, some not in as long as half a year, so it is hard to imagine what anyone can do about this at this point. - Jmabel ! talk 18:40, 13 November 2024 (UTC)
- 2A01:599:30A:8340:4A39:F118:FF32:1257 is a recently used reincarnation. Sarcelles (talk) 18:45, 13 November 2024 (UTC)
- https://en.wiktionary.org/wiki/Special:Contributions/2003:DE:371A:22A6:78F9:E411:9550:9ED4
- the block log says:
- 8.11.2024, 21:12:36: Surjection blocked 2003:DE:0:0:0:0:0:0/32 (block log), expiring 8.12.2024, 21:12:36 (Abusing multiple accounts/block evasion: 2003:DE:371A:22A9:319A:E2C4:1B5A:C283)
- 5.11.2024, 06:03:47: Surjection blocked 2003:DE:3710:0:0:0:0:0/44 (block log), expiring 18.11.2024, 21:40:20 (Disruptive edits: xwiki povpushing: see w:Wikipedia:Sockpuppet investigations/Naramaru) Sarcelles (talk) 20:25, 13 November 2024 (UTC)
- https://en.wiktionary.org/wiki/Special:Contributions/2003:DE:371A:22A9:319A:E2C4:1B5A:C283
- 8.11.2024, 21:12:36: Surjection blocked 2003:DE:0:0:0:0:0:0/32 (block log), expiring 8.12.2024, 21:12:36 (Abusing multiple accounts/block evasion: 2003:DE:371A:22A9:319A:E2C4:1B5A:C283)
- 5.11.2024, 06:03:47: Surjection blocked 2003:DE:3710:0:0:0:0:0/44 (block log), expiring 18.11.2024, 21:40:20 (Disruptive edits: xwiki povpushing: see w:Wikipedia:Sockpuppet investigations/Naramaru) Sarcelles (talk) 20:49, 13 November 2024 (UTC)
- https://commons.wikimedia.org/w/index.php?title=File%3ADeutsche_Mundarten.png&diff=948595578&oldid=946447257 was a removal of the deletion message, probably by the same IP. Sarcelles (talk) 20:22, 14 November 2024 (UTC)
- 2A01:599:30A:8340:4A39:F118:FF32:1257 is a recently used reincarnation. Sarcelles (talk) 18:45, 13 November 2024 (UTC)
- Someone being blocked on Wiktionary is neither here nor there if they haven't edited recently on Commons.
- https://commons.wikimedia.org/w/index.php?title=File:Deutsche_Mundarten.png&diff=next&oldid=946447257 is problematic, but it's the only edit from that IP. Blocking an IP that was used once doesn't do anything except take up the time of the admin who blocks it. - Jmabel ! talk 21:37, 14 November 2024 (UTC)
- It can be anticipated, that this author continues to be active on several wikis including Commons. I think this is a good place to discuss this cross-wiki spam. On en.wiktionary I have been removing numerous typical edits by this user. Sarcelles (talk) 14:29, 16 November 2024 (UTC)
- Whatta bunch of nonsense … -- MicBy67 (talk) 00:14, 17 November 2024 (UTC)
Parking assistants category?
The lady on the picture on the right is basically a replacement of the parking machine: She takes payment for parking, indicates where there are available places, and stops the traffic when a car needs to park in or out. She is likely employed by the municipality. Is there a proper name for this type of profession? Do we have a category describing this activity? Ymblanter (talk) 21:24, 13 November 2024 (UTC)
- I think this fits: Category:Parking marshals. It also links to this category: Category:Traffic wardens. ReneeWrites (talk) 22:21, 13 November 2024 (UTC)
- Great, thanks. Ymblanter (talk) 08:11, 14 November 2024 (UTC)
November 15
Audio files made by Flame, not lame
The audios made by this user are detected as being made by a (now) nonexistent user Flame because of the comma in her username. Rodrigo5260 (talk) 03:24, 15 November 2024 (UTC)
- Flame, not lame.
- Example File:LL-Q1860 (eng)-Flame, not lame-all-out.wav.
- @Rodrigo5260: Not sure what you mean be "detected". Are you talking about the wrong "recorder" credit, or is there more to this? - Jmabel ! talk 03:40, 15 November 2024 (UTC)
- Yes, that, and that forces me to edit it manually, which takes a lot of time. Rodrigo5260 (talk) 03:41, 15 November 2024 (UTC)
- @Jmabel forgot this. Rodrigo5260 (talk) 04:20, 15 November 2024 (UTC)
- So presumably a problem somewhere in Template:Lingua Libre record. User:0x010C who started that seems to be more or less gone. @Lucas Werkmeister: any thoughts on this, or on who might need to be brought into the discussion? - Jmabel ! talk 05:06, 15 November 2024 (UTC)
- I don’t understand the problem yet. The speaker and recorder are both "User:Flame, not lame", right? And the author link goes to User:Flame, not lame, which is an existing user (redlink notwithstanding). Is the problem just that the link text is given as "Flame" instead of "Flame, not lame"? Lucas Werkmeister (talk) 19:13, 15 November 2024 (UTC)
- Yes, it is. Rodrigo5260 (talk) 02:12, 16 November 2024 (UTC)
- I think it's standard wikitext behaviour.
- [[Commons:Bla, bla|]]
- is converted to
- So it's a bug in the lingualibre upload tool.
∞∞ Enhancing999 (talk) 12:17, 16 November 2024 (UTC)- Indeed, the file’s source wikitext says
| author = [[User:Flame, not lame|Flame]]
, so the template is rendering that link faithfully. If it’s true that the Lingua Libre uploader is relying on the pipe trick, then it should be changed to not do that (and just remove theUser:
prefix from the link text explicitly). Lucas Werkmeister (talk) 16:08, 16 November 2024 (UTC)
- Indeed, the file’s source wikitext says
- I think it's standard wikitext behaviour.
- Yes, it is. Rodrigo5260 (talk) 02:12, 16 November 2024 (UTC)
- I don’t understand the problem yet. The speaker and recorder are both "User:Flame, not lame", right? And the author link goes to User:Flame, not lame, which is an existing user (redlink notwithstanding). Is the problem just that the link text is given as "Flame" instead of "Flame, not lame"? Lucas Werkmeister (talk) 19:13, 15 November 2024 (UTC)
- Yes, that, and that forces me to edit it manually, which takes a lot of time. Rodrigo5260 (talk) 03:41, 15 November 2024 (UTC)
November 16
Photo challenge September results
Rank | 1 | 2 | 3 |
---|---|---|---|
image | |||
Title | Fare gates at Stevens MRT station in Singapore, including a wider gate for priority users |
Wheelchair ramp, Confey Railway Station, Ireland. |
Wheelchair racer during Paralympic Games 2024 |
Author | S5A-0043 | Leimanbhradain | Ibex73 |
Score | 9 | 9 | 8 |
Rank | 1 | 2 | 3 |
---|---|---|---|
image | |||
Title | Altstadt Meißen, Dach Des Hauses Markt 3. | Workers re-doing the artistic roof line on a thatched cottage |
Holzschindeldach des Frohnauer Hammer (Sachsen) |
Author | Kora27 | Cbuske46 | YvoBentele |
Score | 19 | 18 | 8 |
Congratulations to S5A-0043, Leimanbhradain, Ibex73, Kora27, Cbuske46 and YvoBentele. -- Jarekt (talk) 15:09, 16 November 2024 (UTC)
How do you nominate .djvu pages for deletion?
Currently i cannot find any way to link to individual pages. Only the .djvu file as a whole can be linked --Trade (talk) 17:16, 16 November 2024 (UTC)
- Then, a suggestion: nominate the whole file and name the pages who you deem problematic. Regards, Grand-Duc (talk) 17:35, 16 November 2024 (UTC)
Issues with interwiki
Should Category:4th-century people of France and Category:4th-century Frankish people be linked to each other? Trade (talk) 19:32, 16 November 2024 (UTC)
- You can always use a hat note to explain the relationship, rather than go through Wikidata to say that they represent exactly the same concept. - Jmabel ! talk 20:11, 16 November 2024 (UTC)
- I dont know much about the history of France Trade (talk) 23:10, 16 November 2024 (UTC)
- I do think the issue of having "-century people of" categories for countries that didn't exist until several centuries later is an issue that we need to take a look at Trade (talk) 23:15, 16 November 2024 (UTC)
- Everybody knows w:Charlemagne had a Belgian passport, not a French one ;)
∞∞ Enhancing999 (talk) 11:17, 17 November 2024 (UTC)
- Everybody knows w:Charlemagne had a Belgian passport, not a French one ;)
Cisgender
I could take this to a CfD, but I think this needs more attention than that typically gets. Starting (I believe) 2024-10-12, Web-julio introduced several categories such as Category:Cisgender people, Category:Cisgender women, and Category:Cisgender men. Given what a high percentage of humans are cisgendered, this strikes me as a very ill-conceived direction to go, like having a category for "four-limbed British admirals" or "songs with less than 12 verses". I think this should be turned back before we find ourselves extending this to well over 95% of our content that involves humans.
I ran across this when Web-julio recently added Category:Cisgender women as a parent of Category:Cecilia Augspurger.
As I've said many times: the purpose of categorization is not an abstract exercise in ontology. It is to help people find appropriate media. - Jmabel ! talk 20:23, 16 November 2024 (UTC)
- I agree, stick to the simplest term. --RAN (talk) 20:35, 16 November 2024 (UTC)
- Delete these categories. modern_primat ඞඞඞ ----TALK 20:58, 16 November 2024 (UTC)
- Delete per nom. This user's behaviour with regards to categories warrants a closer look in general. He has created over 500 categories in the last 5 days, almost all pertaining to very specific or overly-broad categories about sex and gender, Pokémon, including the genders of Pokémon. ReneeWrites (talk) 23:26, 16 November 2024 (UTC)
- Keep, if there's Category:Male humans by eye color, including the ones that are the populational majority, then so should cisgender. Also, if they are not categorized with these categories, they loose gender categories as they are that way on Wikidata. See this listeria list. Web-julio (talk) 23:28, 16 November 2024 (UTC)
- Also, I do have criteria for cisgender inclusion. Not every non-trans person self-identifies as cisgender, and if reliable sources exist for people specifically identifying as cisgender, they should be respected. Web-julio (talk) 23:29, 16 November 2024 (UTC)
- Delete per nom. No one eye color is not on >99% of population. MBH 02:07, 17 November 2024 (UTC)
- Nor gender modalities. Web-julio (talk) 02:19, 17 November 2024 (UTC)
- @Web-julio: I strongly urge you not to continue editing in this direction while this discussion plays out. So far, literally everyone else who has weighed in here disagrees with you, and there is a very strong chance you are editing against a general consensus. - Jmabel ! talk 02:10, 17 November 2024 (UTC)
- But did I? I didn't add anyone else on cisgender categories after this discussion started. And they had few subcats anyways. Web-julio (talk) 02:12, 17 November 2024 (UTC)
- @Web-julio: I didn't say you did, but your comments here seem to be dismissive of what others are saying, so I considered it best to warn you not to walk out on the thin ice. - Jmabel ! talk 05:48, 17 November 2024 (UTC)
- @Jmabel Well, when you commented I was arguing alone, I didn't reply to anyone else except nominator. Actually, I replied and after that that it showed Renee's comment, the modern_primat's comment is just a !vote. No one argued against my comments specifically, the one being dismissed is me. Anyways, let me address ReneeWrites' comment: she criticized my category creation in general, including Pokémon-related categories, which I expanded on. almost all pertaining to very specific or overly-broad categories tells a lot that I don't have a pattern, because in fact all categories are either specific or broad, so I guess this is good or indifferent. While for including the genders of Pokémon, Wikidata is even more hyperspecific (thanks OmegaFallon), I didn't even create categories for gender ratios (such as 12.5% male, 87.5% female gender ratio (Q116752968) and 75% male, 25% female gender ratio (Q116752957)). However, is it my contributions in general that are being discussed or Cis people's categories specifically? So that I know what I'm defending. Web-julio (talk) 06:07, 17 November 2024 (UTC)
- I can't vouch for what Renee is criticizing, but my issue is about the "cisgender" categories. I think my initial comment above is perfectly clear, so I don't see any need to elaborate. - Jmabel ! talk 06:17, 17 November 2024 (UTC)
- You have an issue, but didn't argue. When I was just explaining why I created, yet you had an issue with my explanation too. ¯\_(ツ)_/¯ Web-julio (talk) 18:57, 17 November 2024 (UTC)
- I can't vouch for what Renee is criticizing, but my issue is about the "cisgender" categories. I think my initial comment above is perfectly clear, so I don't see any need to elaborate. - Jmabel ! talk 06:17, 17 November 2024 (UTC)
- @Jmabel Well, when you commented I was arguing alone, I didn't reply to anyone else except nominator. Actually, I replied and after that that it showed Renee's comment, the modern_primat's comment is just a !vote. No one argued against my comments specifically, the one being dismissed is me. Anyways, let me address ReneeWrites' comment: she criticized my category creation in general, including Pokémon-related categories, which I expanded on. almost all pertaining to very specific or overly-broad categories tells a lot that I don't have a pattern, because in fact all categories are either specific or broad, so I guess this is good or indifferent. While for including the genders of Pokémon, Wikidata is even more hyperspecific (thanks OmegaFallon), I didn't even create categories for gender ratios (such as 12.5% male, 87.5% female gender ratio (Q116752968) and 75% male, 25% female gender ratio (Q116752957)). However, is it my contributions in general that are being discussed or Cis people's categories specifically? So that I know what I'm defending. Web-julio (talk) 06:07, 17 November 2024 (UTC)
- @Web-julio: I didn't say you did, but your comments here seem to be dismissive of what others are saying, so I considered it best to warn you not to walk out on the thin ice. - Jmabel ! talk 05:48, 17 November 2024 (UTC)
- But did I? I didn't add anyone else on cisgender categories after this discussion started. And they had few subcats anyways. Web-julio (talk) 02:12, 17 November 2024 (UTC)
- Delete These are both not defining and also not helpful for actually finding media, plus they will inevitably result in all kinds of weird nonsense with users having pet theories about how a certain ancient Roman orator may have had whatever gender tendencies and other bizarre retroactive fiction. Categorizing by various other gender identities is sensible and useful (and itself fraught enough), but it's actually probably more rare for someone to make "being cisgender" a core part of that person's public persona than being transgender is. The whole exercise is probably well-intentioned in its outset, but deeply flawed in implementation and users should definitely seek consensus or discussion before even attempting such a radical overhaul of the categorization system. —Justin (koavf)❤T☮C☺M☯ 06:30, 17 November 2024 (UTC)
- Delete Trying to duplicate the Wikidata database in Commons categories is always a bad idea. Categories are for the most important links everything else is a task for Wikidata and Wikipedia. GPSLeo (talk) 07:58, 17 November 2024 (UTC)
- Delete per nom; I just think the emphasis should be on "exercise" in the last paragraph of the explanation and GPSLeo's comment could also be meant and/or understood in imo flawed ways: duplicating it entirely or indiscriminately is a problem but at the same time duplicating it redundantly by hand is also an issue due to which some (not all) properties/data should be synced somehow (such as Category:Free software programmed in C++ which could readily be populated via WD data and vice versa). Prototyperspective (talk) 11:39, 17 November 2024 (UTC)
Inflation calculator template
Can we migrate en:Wikipedia:Template:Inflation and the subtemplates to Commons and Wikisource? We host news articles that have money values that have no context until adjusted into today's dollars. When I read that something was $100 in 1900, I have no idea if that is a lot or a little. RAN (talk) 20:32, 16 November 2024 (UTC)
November 17
Remove irremovable parent categories from the categories
I want to remove some irremovable parent categories that are useless from the following categories:
Category:Young people in Cuba, Category:In Cuba, and Category:Children in North America from Category:Children in Cuba
Category:Society in Cuba from Category:People in Cuba
Category:People of Cuba by stage of development from Category:Children of Cuba
Category:75-6895 (aircraft) from Category:F-104S Starfighter
Category:Teaching by country of location, Category:Teaching in South America and Category:Teaching of Venezuela from Category:Teaching in Venezuela
Category:Telugu-language writers from Category:Translators to Telugu
Category:United States House of Representatives elections in New York (state), 2016 from Category:2016 United States House of Representatives election maps of New York (state)
Category:Volcanism of the Czech Republic from Category:Volcanology of the Czech Republic
I talked about the similar problem in Category talk:Children in Cuba. I hope you help me. Also, tell me how to remove seemingly irremovable categories with no hassle. OperationSakura6144 (talk) 04:28, 17 November 2024 (UTC)
- Not what you are asking but: why exactly would you want to remove Category:Young people in Cuba as a parent category of Category:Children in Cuba, or Category:United States House of Representatives elections in New York (state), 2016 from Category:2016 United States House of Representatives election maps of New York (state)? Offhand, both of these seem correct.- Jmabel ! talk 05:53, 17 November 2024 (UTC)
- All of these seem to be driven by templates. You'd have to take it up with the people who edited the templates. - Jmabel ! talk 05:56, 17 November 2024 (UTC)
- But, how can I do it? I don't know if templates have to do with it. If so, how to know if that's the case and solve the underlying problems of irremovable parent categories? OperationSakura6144 (talk) 06:01, 17 November 2024 (UTC)
- Comment out the template in the wikitext editor, click Preview, and see if it removes the cat. If it does, you can ask on that template's talk page. Prototyperspective (talk) 11:40, 17 November 2024 (UTC)
- I successfully removed the parent categories Category:People of Cuba by stage of development from Category:Children of Cuba, Category:Society in Cuba from Category:People in Cuba, and Category:F-104S Starfighter from Category:75-6895 (aircraft) (sorry for the swap, by the way). I also removed the mentioned unnecessary parent categories from Category:Children in Cuba successfully, but I accidentally replaced Category:Young people in Cuba with Young people of Cuba which is now a new problem to me. I want User:Joshbaumgartner in the topic to discuss about it and the main problem. OperationSakura6144 (talk) 11:06, 17 November 2024 (UTC)
- But, how can I do it? I don't know if templates have to do with it. If so, how to know if that's the case and solve the underlying problems of irremovable parent categories? OperationSakura6144 (talk) 06:01, 17 November 2024 (UTC)
November 18
The current version of the photo is obviously a mirror inversion, because Engels' frock coat is buttoned on the female side, and the Milanese buttonhole on Marx's jacket is on the right side, while should be on the left. What needs to be done to flip it back? --Romano1981 (talk) 12:04, 18 November 2024 (UTC)