I used to not think much about SEO. I thought it was about marketing tricks and misleading users. But putting all that aside, in the end, you do want people to find your site and this means making your site crawlable by the search engine bots such as Googlebot and Bingbot. You have to help them a little do their part.
robots.txt introduces your site to search engine bots
Hugo does create a basic
robots.txt by default which allows all crawlers access to everything. It is suboptimal though and we can provide some more information to help search engine bots find their way around our site.
Your Hugo theme likely creates some taxonomy index pages about tags and categories. You probably don’t mean to include these pages in the search results but rather the pages they link to. Another problem these automatically generated pages may cause, is slowing down the crawling of your site, because search engines allocate a crawling budget to you; you had better make the best use of it. Finally search engines just don’t like duplicate pages and they may deduce that your site if of poor quality. I assume you have a /posts/ or similar page anyway, with links of interest listed there.
Another recommended piece of information to put in
robots.txt is the sitemap of your site. A sitemap is the list of URLs of your site, so that a crawler doesn’t have to discover them by itself. Hugo also generates a
sitemap.xml for you, but this isn’t mentioned in the
robots.txt file it creates.
The easiest way to create a custom
robots.txt file is to disable Hugo’s automatic generation. Change this setting in
enableRobotsTXT = false
And then we can put our custom
robots.txt in the
User-agent: * Disallow: /tags/ Disallow: /categories/ Sitemap: https://www.example.com/sitemap.xml
Next time you run
hugo, this file will be copied as-is in the
Descriptions, because first impressions matter
Next after the title of your page and its URL, the most important thing to consider from a SEO perspective is its description. Although it will not affect itself the ranking of your page, it will affect the click rate since a better written description can attract more clicks. Keep in mind that search engine may or may not actually use your description if it’s too short or misleading. The descriptions are created by your theme and the generated HTML looks like this:
<meta name="description" content="I am a description."> <meta itemprop="description" content="I am a description."> <meta name="twitter:description" content="I am a description."/> <meta property="og:description" content="I am a description." /> ...
The first line is the “proper” description. The next lines are generated by Hugo’s internal templates and in particular
<meta itemprop="description" ...>, defined in
_internal/schema.html, seems to be preferred over
<meta name="description" ...> by Bing and the other engines it shares data with (more on that later).
What ends up in the description is up to the theme. If you didn’t specify a description in the front matter of your post, it is likely that it will try to generate a summary and use it as a description. The problem here is that autogenerated summaries aren’t very good, since they are usually just the first lines of your post; this may be not what you intended. So I suggest that you have a look at the HTML that your theme generates and if it’s not satisfactory, specify a description of your own in the front matter. It is suggested that the description is 70–155 characters to fit in the search engine results.
Don’t forget about Bing
After putting the URL of your sitemap in
robots.txt it is still a good idea to submit it yourself to Google Search Console. This way, you can monitor the crawling of your page by Googlebot and catch any errors. You can also keep an eye on which searches drive traffic to your site.
When it comes to search engines, Bing is a distant #2. But it’s still important, because its crawler Bingbot also provides data to Duckduckgo and possibly other lesser known engines such as Qwant and Ecosia. Therefore it makes sense to make sure that Bingbot can crawl your site correctly.
Bing Webmaster Tools acknowledge that you probably already used Google Search Console before coming to them, so they make it easy to login by using your Google account (along with other options such as a Microsoft account). After that you are given the option to import the domains and sitemaps you have set up in Google Search Console. I must say it all worked flawlessly and effortlessly.
Plus I didn’t have to deal with certain bugs of Google Search Console. You are also given options similar to the ones Google Search Console gives you, such as URL Inspection and Performance metrics.
Inform search engines about site changes
After making changes to your site and posting something new, you can let the search engines know that the sitemap has changed. An automatic way of doing this is pinging them with the sitemap URL.
# Ping Google about changes in the sitemap curl "https://www.google.com/ping?sitemap=https://www.example.com/sitemap.xml" # Ping Bing about changes in the sitemap curl "https://www.bing.com/ping?sitemap=https://www.example.com/sitemap.xml"
You can include it in your deployment method; left as an exercise to the reader.
Congratulations, you went so far and now… you wait. It can take a few days to a few weeks until the search engines have crawled your site and this assuming nothing unexpected has gone wrong and there are no problems to fix. In the meantime I suggest that you take the opportunity to practice your Zen Buddhism exercises and take your mind away from all that pointless worrying.