Help:Guide for Administrators/Content

From Bahaiworks
Guide for Administrators
Content

The primary purpose of Bahai.works is to support individuals who are researching and writing articles on Bahá’í history. A special focus is given to the needs of editors who are actively working on bahaipedia.org. To that end, the types of content that should be prioritized for addition to bahai.works are periodicals and books. Secondary focus should be given to adding other types of content, such as booklets and pamphlets which provide insight to the types of thought among Bahá’ís during certain periods. Another way that Bahai.works increases access to content is by providing transcripts of talks audio and video talks.

A. Digitizing & equipment[edit]

My choice in equipment has always been limited by cost. Many of these options are probably good enough but not professional. They represent what I use, but there are probably many other more better or more professional options.

Paper - I began first with a cheap flatbed scanner from one of those all in one printers, but I ended up having to redo most of the work from when I was using that scanner. Eventually I upgraded to the Fujitsu ScanSnap iX500. The iX500 is an old model now but it has been very reliable and it scans both sides of a page at the same time, so after cutting the binding off a book or magazine (Yescom Industrial Guillotine) you can feed everything through the scanner in just a few minutes. I scan everything in "Color" mode, I find that other modes introduce artifacts and degrade the OCR results. Also, once uploaded to works page images get compressed so having begun with a clear scan makes the web-version much more legible. I scan at 300dpi because higher than that and the scanner takes twice as long to scan one page. An individual more experienced then me in scanning recommended this flatbed A3 Avision Bookedge Scanner but it was too expensive for me.

Audio - I bought a cheap Cassette to MP3 Converter for cassette tapes. Essentially you put the tape in, connect it to the computer, start Audacity, set the input to the mp3 converter, hit 'record' and then start playing the cassette tape. You should be able to see the recording (and hear it if you want) as it's playing.

Video - I purchased a vhs/dvd player and a Elgato Video Capture (connects the vhs player to the computer).

Microfilm/microfiche - These should be taken to your nearest big city library. They will have the equipment necessary to view and scan this type of format. The equipment is quite expensive. For other types of film you can look up a Magnasonic All-in-One 22MP Film Scanner.

Uploading content - Anything meant to be available publicly without restrictions should be uploaded to bahai.media. There are scripts on the server for bulk importing of files (documentation in s3, AWS-Resources/ec2/mediawiki/running-maintenance-scripts.txt). Files that are going to be restricted access like copyright protected books should be uploaded here to bahai.works with an accesscontrol tag (see D. Access restrictions below). Note, files uploaded to works are automatically protected from public access (server configuration $wgFileBackends['s3']['privateWiki'] = true;), but thumbnails generated from images or PDF files that are placed in the /thumb/ directory are publicly accessible (see the bahaiworks s3 Bucket policy).


B. Adding content[edit]

Titles: Page titles should match the publication title, including any diacritical marks. If the title changed use either the first, or most commonly known title. For example, Baha’i News (1924-1990) was originally titled Baha’i News Letter and then Baha’i News and eventually Bahá’í News. Subsets of content like volumes, issues and chapters should be in the format Title/Volume/Issue or Title/Chapter. Eg: Baha'i News/Issue 100 or 175 Years of Persecution/Chapter 1. This enables subsets of content to be searched, see Bahaiworks:Search.

Place content on a /Text subpage when the work does not have any other logical place for it. Eg: The Incredible Paradox did not have a table of contents so the text was placed here The Incredible Paradox/Text. /Text should include all text. Using /Text or following a publication's table of contents is a judgement call based on the total length of a document and its value. Something short, or something unlikely to be widely used can use /Text for expediency.

Duplicate titles: For works with the same title, append the author's last name after the title, e.g.: The Bahá’í Faith (Holley).

Previous versions or editions: The main namespace should be reserved for a works most recent available edition. Use the Edition namespace for editions of prior years. See this page for an example. The purpose of having the most recent edition in the main namespace under a common title is that readers are given the most up to date version first. The reason a separate namespace was created for prior versions (instead of just adding a publication year to the end of the title for example) was to allow someone performing a search to exclude prior editions. Imagine for example searching for a quote and getting multiple duplicate pages (from various editions of a work), this may be confusing and clutter the search results.

OCRing documents: See https://github.com/bahaipedia/node-ocr

Keeping track of content: All content (excluding audio/video) should be listed on Publications (A-Z) and one more page depending on type, e.g., Periodicals or Books. Content should have this header at the top: Template:Header. When the "year" parameter is used, content will be categorized into Category:Works by year (the exception is if the work exists on a subpage as various volumes of The Bahá’í World do, so add the year manually in those cases). When a publisher is listed works will be categorized into Category:Works by publisher. The "publisher" parameter of the header template only accepts certain values. See Template:Header&action=edit and look for "Categorize by Publisher". Therefore if George Ronald was the publisher the header template should read | publisher = GR. If the author link is red the author page does not exist and it should be created. Add a link to the author page on Authors. Publications should always be listed on the author's page, e.g.: Author:William Sears.


C. Managing content[edit]

Managing text and tracking proofreading: Text that has not been proofread should have the {{ocr}} template at the top. This template adds pages into the Pages needing proofreading category automatically. Text that has been proofread should have the {{ps|#}} template added, where # is the number of times the page has been proofread. E.g., {{ps|1}} will place the page into Category:Pages proofread once. The ps template creates a link to the attached discussion page, the intention being that proofreaders would list their name and state that proofreading was complete. E.g.: "# Proofreading complete ~~~~". See this page for example.


D. Access restrictions[edit]

Copyright protected text must be kept private. Review the two relevant court decisions here and here. At the top of a page that contains copyright protected text add the access control key: <accesscontrol>Access:Title</accesscontrol>. This key will prevent access to everyone except those users listed on Access:Title. Title should be unique across all titles and roughly match the publication title. If Access:Title does not exist, the system will give a warning "Title Access:Title does not exist. Create and then blank the page." A page must have some content to be created, but after that it can be blanked.

Restricting search results: If you do not want users to be able to search within a document you can further restrict access by adding "nosearch" in the accesscontrol tag: <accesscontrol>(nosearch),Access:Title</accesscontrol>.

Custom Restricted Access messages: When a user attempts to open a page with an access control tag they are presented with a message that states they have no access to the page, this message is generated by Template:AC-Message. This message can be customized by creating a page titled "AC-Message", for example: Book Title/AC-Message. There is a template for these custom messages: Template:AC-Template. See for example this page or view this page in a private/incognito tab to see it in action.

Granting access: Access can be granted to a document by having someone create an account, and then adding their username to the Access control page. For example to give User:JohnSmith access to All Things Made New you would need to identify the correct access key. View any protected page, and you will see <accesscontrol>Access:AllThingsMadeNew</accesscontrol> so navigate to Access:AllThingsMadeNew and add: *JohnSmith. The user account JohnSmith will now be able to view the content. Note that Access pages are case sensitive, so *johnsmith will not work.

Granting access automatically via checkout: A checkout button can be added to the {{restricted use}} template, with parameters for the number of simultaneous checkouts, and length of checkout. The normal restricted access header is something like {{restricted use|where=uk|until=2043 }} but to enable access via checkout add these additional parameters: access_page, allowed_users, max_concurrent_users, checkout_days. E.g.:

{{restricted use|where=uk|until=2043|access_page=Access:AllThingsMadeNew |allowed_users= |max_concurrent_users=1 |checkout_days=7 }}

This means that 1 user at a time will be able to check out the book All Things Made New for 7 days. This type of temporary access would depend on the copyright holder giving permission; at this time no permission has been sought so the program is not in use.

Note: the allowed_users parameter expects an access control page. With allowed_users=Access:GeorgeRonald, for example, only users who are listed on Access:GeorgeRonald would be able to press the "check out" button. An example use case would be a publisher having control over which users check out their content, or an Assembly controlling who could access a periodical they published.