Skip to content

Add markdown version #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 5, 2022
Merged

Add markdown version #35

merged 3 commits into from
Nov 5, 2022

Conversation

xogcox
Copy link
Collaborator

@xogcox xogcox commented Oct 16, 2022

I've made a pandoc-style markdown version (relevant issue: #8) from the HTML, covering all pages except for index.html, which I left out because it has more complicated HTML and relatively little text. Everything is in a separate folder called markdown. There is also a pandoc template for header and footer material, and the shell script generate.sh calls pandoc (and sed) to generate all the HTML files with appropriate settings. The chapter list is auto-generated from chapter headings, which I thought would be convenient, although it also makes the script more complicated.

I made the markdown files from the original HTML by writing a Haskell program, but didn't include it in the files because I thought it might be better not to add Haskell as a dependency, and because the code is fragile and also rather messy.

Markdown can include raw HTML tags, but I avoided these where possible. The only case where I needed to use HTML tags in the markdown was in a few locations which use styling inside inline code.

The HTML generated from the markdown is in markdown/generated_html. I have compared it with the original HTML semi-automatically and I think I've found all the differences. It should be close in appearance to the original, although it would need some changes to the CSS to be styled properly.

I decided to make a pull request right away rather than discussing it first because it's easier to show the files I have this way. It's not a request to merge instantly, so feel free to discuss it if you want.

Here is a list of differences from the original HTML that affect styling and appearance:

  • Inline code: <span> tags are changed to <code>. Most inline code fragments are marked <span class="fixed">. For convenience, I deleted this attribute is deleted, so it becomes just <code>. Other <span class="..."> tags keep their attributes but change to <code class="...">.

  • Code blocks: <pre name="code" class="..."> changes to <pre class="..."><code> ... .

  • Italics and emphasis: The original HTML uses <i> for italics, and both <em> and <b> for bold (although <b> is not often used) and never uses <strong>. In markdown, one star means <em>, which is shown as italics, and two stars means <strong>, which is shown as bold, so I changed <i> to <em> and <em> and
    <b> to <strong>.

  • Non-inline images: The original HTML has images at the top level of the body, but pandoc puts these non-inlined images in <p> tags.

  • Chapter list: In the original HTML, the chapter list has the class attribute .chapters. Since you can't assign attributes to lists in pandoc, this is omitted. Also, in the original HTML, the chapter list is inside a <p> tag, but not in the pandoc-generated version.

  • Punctuation: Apostrophes and double quotes are straight in the original HTML. By default pandoc makes them curly. It's possible to change this setting, but turning this off would also turn off recognition of em-dash and ellipsis markdown. Pandoc also adds non-breaking spaces in a few places after "i.e." and "e.g.".

  • Capitalization of "higher order functions": I changed the title at the top of this chapter from "Higher order functions" to "Higher Order Functions", to make it consistent with other chapters and how it's written elsewhere (in the "next chapter" and "previous chapter" links and the chapter list).

Other differences from the original HTML:

  • The original HTML uses <a name="..."> for mid-chapter links on <h2> titles. Pandoc uses the HTML5 method, which is to add an id attribute to the tag itself. Pandoc also gives other headings id tags automatically.

  • The link to the Creative Commons license was originally marked with a rel="nofollow" attribute. I kept this in the markdown, but pandoc automatically adds rel="license" to the link and then removes nofollow, with a warning about duplicated attributes.

  • For punctuation such as em-dashes and non-breaking spapces, Pandoc prefers literal Unicode characters to HTML attributes. It also uses HTML attributes in some places where the original HTML doesn't.

  • The generated markdown also removes some non-significant whitespace and corrects some missing <p> tags.

@xogcox xogcox changed the title Add markdown version (#8) Add markdown version Oct 16, 2022
@MatthijsBlom
Copy link
Contributor

MatthijsBlom commented Oct 16, 2022

I have not yet taken a detailed look at this yet. (Thanks in any case!) Still, I have a suggestion: put all sentences on separate lines. It helps with maintenance, and a switch to another data format seems like an excellent time for such a change.

@xogcox
Copy link
Collaborator Author

xogcox commented Oct 16, 2022

Each sentence in the markdown on a new line? I never thought about that, but yes, it might be clearer. It would take a little time to split the sentences correctly, but I can try, if there are no objections.

@MatthijsBlom
Copy link
Contributor

I thought maybe Pandoc would be able to do this, but it seems this option is not implemented yet.

@xogcox
Copy link
Collaborator Author

xogcox commented Oct 22, 2022

I have changed the markdown so that each sentence is on a separate line, as you suggested. After thinking about it, I agree that this format is better because it provides more information and has no real disadvantages.

@smith558
Copy link
Collaborator

smith558 commented Nov 5, 2022

@xogcox This is an awesome contribution. Thank you! 🙂

@smith558 smith558 merged commit bee6b46 into learnyouahaskell:main Nov 5, 2022
@smith558
Copy link
Collaborator

smith558 commented Nov 5, 2022

@xogcox I think there are some things to iron out. In particular, the paths to assets are broken.

@xogcox
Copy link
Collaborator Author

xogcox commented Nov 12, 2022

Thank you for the merge! I'm glad you like it.

Yes, the HTML files generated from the markdown aren't ready for use yet. My aim was to produce markdown-generated HTML that is close to the existing website HTML, to compare the differences between them. In particular, I didn't change the paths to assets listed in the file. Because these are relative paths, and the generated files are in a different folder, I'd expect them to break.

It would be possible to fix this by copying the assets folder, or by changing the paths. But as I said, I only intended the generated HTML to be used for comparison, not for actually putting on a website right now. Also, some of the differences in the markdown-generated HTML affect styling, so the CSS would need changes to work properly.

I haven't done these extra tasks like editing CSS files yet because I wanted to stop and get feedback first. I feel like I should check explicitly: Is there interest in using the markdown-generated HTML as the source for generating the website, so that in future changes would be made to the markdown rather than to the HTML directly? (That would be the obvious reason for having markdown, but it's a major change, so I wanted to ask.)

If so, as the next step, I would like to make a PR (or PRs) for gradual changes to the existing HTML and CSS (that is, the HTML and CSS for the current website, based on the original LYAH) to bring it closer to the markdown-generated version. This would hopefully make it easier to transition to the markdown version and show any potential problems.

What do you think? Of course, please tell me if you have any ideas or think anything should be done differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants