<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Blogs | Aasish Rijal]]></title><description><![CDATA[Blogs | Aasish Rijal]]></description><link>https://blogs.rjlaasish.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1765193928419/f370c8df-7417-4038-8de0-b30705ab0e90.webp</url><title>Blogs | Aasish Rijal</title><link>https://blogs.rjlaasish.com</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 14 May 2026 22:09:13 GMT</lastBuildDate><atom:link href="https://blogs.rjlaasish.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Deeper Dive into robots.txt: Optimizing Crawl Directives for Performance and SEO]]></title><description><![CDATA[For many developers and SEO practitioners, the robots.txt file is often considered a set-it-and-forget-it artifact. We place it at the root, disallow common administrative paths, and point to a sitemap. Yet, for semi-professionals keen on optimizing ...]]></description><link>https://blogs.rjlaasish.com/deeper-dive-into-robotstxt</link><guid isPermaLink="true">https://blogs.rjlaasish.com/deeper-dive-into-robotstxt</guid><category><![CDATA[Web Development]]></category><category><![CDATA[Crawler]]></category><category><![CDATA[SEO]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Aasish Rijal]]></dc:creator><pubDate>Sat, 10 Jan 2026 03:22:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768015245031/89699514-4cc1-4a9d-9843-5e7bc1888b54.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For many developers and SEO practitioners, the <code>robots.txt</code> file is often considered a set-it-and-forget-it artifact. We place it at the root, disallow common administrative paths, and point to a sitemap. Yet, for semi-professionals keen on optimizing site performance, managing crawl budget, and strategically influencing search engine behavior, <code>robots.txt</code> offers a more granular level of control that warrants a deeper understanding.</p>
<p>This isn't about the <em>what</em> as much as the <em>how</em> and <em>why</em> for more complex scenarios.</p>
<h3 id="heading-beyond-the-basics-understanding-robotstxts-role">Beyond the Basics: Understanding <code>robots.txt</code>'s Role</h3>
<p>At its core, <code>robots.txt</code> (the Robots Exclusion Protocol, or REP) is a standardized directive for web robots. It's crucial to reiterate: <strong>it's a request, not an enforcement mechanism.</strong> While compliant search engine bots (like Googlebot, Bingbot) largely respect these directives, malicious scrapers will often ignore them entirely. Therefore, it's never a security measure.</p>
<p>Its primary utility lies in:</p>
<ol>
<li><p><strong>Crawl Budget Optimization:</strong> For large-scale sites (e-commerce, news portals, SPAs with dynamic routing), managing crawl budget is paramount. Directing bots away from low-value, duplicate, or infinite-space URLs ensures that resources are spent on indexing high-value content.</p>
</li>
<li><p><strong>Server Load Management:</strong> Preventing aggressive crawling of resource-intensive sections can reduce server load, especially during peak traffic or on less robust infrastructure.</p>
</li>
<li><p><strong>Content Control in SERPs:</strong> Preventing specific non-public sections (staging environments, internal tools, sensitive documents) from appearing in search results.</p>
</li>
<li><p><strong>Influence on Indexing (Indirectly):</strong> While <code>robots.txt</code> doesn't directly prevent indexing (a <code>Disallow</code> only stops crawling, not necessarily indexing if external links exist), it significantly reduces the likelihood. For direct indexing control, <code>noindex</code> meta tags or HTTP headers are more reliable.</p>
</li>
</ol>
<h3 id="heading-key-directives-and-advanced-considerations">Key Directives and Advanced Considerations</h3>
<p>Let’s break down and see how they actually work,</p>
<ul>
<li><p><code>User-agent:</code>:</p>
<ul>
<li><p><code>User-agent: *</code>: Applies to all bots. Use this for general rules.</p>
</li>
<li><p><code>User-agent: Googlebot</code>: Targets specifically Google's main crawler.</p>
</li>
<li><p><code>User-agent: Googlebot-Image</code>: For Google's image crawler. Useful for fine-tuning image-specific crawl behavior (e.g., disallowing large image galleries if you only want product images indexed).</p>
</li>
<li><p><code>User-agent: AdsBot-Google</code>: For Google Ads landing page quality checks. Avoid disallowing this if you run Google Ads.</p>
</li>
</ul>
</li>
<li><p><code>Disallow:</code>:</p>
<ul>
<li><p><strong>Wildcards (</strong><code>*</code>) and <code>$</code>: These are powerful.</p>
<ul>
<li><p><code>Disallow: /wp-admin/</code> (Blocks the <code>/wp-admin/</code> directory)</p>
</li>
<li><p><code>Disallow: /*?*</code> (Blocks all URLs with query parameters. <strong>Use with extreme caution</strong> as it can block legitimate faceted navigation or search results you <em>do</em> want indexed. Consider canonical tags first.)</p>
</li>
<li><p><code>Disallow: /*.json$</code> (Blocks all JSON files. Useful for API endpoints not meant for public consumption.)</p>
</li>
<li><p><code>Disallow: /category/*/page/</code> (Blocks specific patterns within subdirectories.)</p>
</li>
</ul>
</li>
<li><p><strong>CSS/JS/Images</strong>: <strong>Absolutely ensure you are <em>not</em> disallowing critical CSS, JavaScript, or images.</strong> Googlebot (and other modern crawlers) render pages to understand layout, mobile-friendliness, and content. Blocking these assets will lead to a degraded "rendering" of your page by the crawler, potentially impacting SEO. If you have legacy <code>robots.txt</code> files, perform an audit.</p>
</li>
</ul>
</li>
<li><p><code>Allow:</code>:</p>
<ul>
<li><p>This directive overrides a <code>Disallow</code> for specific files or subdirectories <em>within</em> a disallowed path.</p>
</li>
<li><p><code>Disallow: /private/</code></p>
</li>
<li><p><code>Allow: /private/public-doc.pdf</code> (Allows access to this specific PDF within the otherwise blocked <code>/private/</code> directory). The most specific rule typically wins.</p>
</li>
</ul>
</li>
<li><p><code>Sitemap:</code>:</p>
<ul>
<li><p>Always include the full URL(s) to your XML sitemap(s). This is a strong hint to search engines about the comprehensive structure of your site.</p>
</li>
<li><p>Multiple <code>Sitemap:</code> directives are allowed, especially for large sites using sitemap indices.</p>
</li>
</ul>
</li>
<li><p><code>Crawl-delay:</code> (Deprecated for Google, but still respected by some others):</p>
<ul>
<li>This directive requests a delay between successive crawl requests to prevent overwhelming the server. Google no longer supports <code>Crawl-delay</code> and instead recommends adjusting crawl rate directly in Google Search Console. Other bots (e.g., Yandex) still honor it.</li>
</ul>
</li>
</ul>
<p>Example:</p>
<pre><code class="lang-plaintext"># This applies to all bots
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /search-results?* # Prevents internal search result pages from being indexed

# If you have a staging site you don't want indexed:
# User-agent: *
# Disallow: /

# Tell bots where your sitemap is
Sitemap: https://www.yourbeautifulsite.com/sitemap.xml
</code></pre>
<h3 id="heading-common-robotstxt-pitfalls">Common <code>robots.txt</code> Pitfalls</h3>
<ol>
<li><p><strong>Over-blocking:</strong> The most common mistake. Accidentally blocking essential CSS/JS, or entire sections that <em>should</em> be indexed (e.g., product filters, pagination). Always test your <code>robots.txt</code> in Google Search Console's Robots Testing Tool.</p>
</li>
<li><p><strong>Confusing</strong> <code>Disallow</code> with <code>Noindex</code>: A <code>Disallow</code> prevents crawling. If a page is disallowed but linked to externally, it might still appear in search results (though without a description). To <em>guarantee</em> a page doesn't appear in the SERPs, use a <code>noindex</code> meta tag or X-Robots-Tag HTTP header. <strong>Crucially, for Google to <em>see</em> and <em>respect</em> a</strong> <code>noindex</code> tag, the page <em>must</em> be crawlable.</p>
</li>
<li><p><strong>Missing</strong> <code>robots.txt</code>: If no <code>robots.txt</code> exists, search engine bots assume they can crawl everything. While this isn't inherently bad, it means you're not exercising any control over crawl behavior.</p>
</li>
<li><p><strong>Blocking</strong> <code>sitemap.xml</code>: Ensure your sitemap is always crawlable.</p>
</li>
<li><p><strong>Dynamic</strong> <code>robots.txt</code> for SPAs/Frameworks:</p>
<ul>
<li><p><strong>Next.js/Nuxt.js</strong>: Leverage their built-in capabilities to generate <code>robots.txt</code> dynamically (e.g., <code>app/robots.ts</code> in Next.js 13+). This allows environment-specific rules (e.g., <code>Disallow: /</code> on staging) and programmatically adding sitemap URLs.</p>
</li>
<li><p><strong>Client-Side Rendered (CSR) SPAs (React, Vue, Angular)</strong>: While these frameworks primarily run client-side, the <code>robots.txt</code> file is still a static server-side asset. Place it in your <code>public</code> folder and ensure your build process makes it available at the root. The directives apply <em>before</em> the client-side app even loads.</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-best-practices">Best Practices</h3>
<ul>
<li><p><strong>Version Control:</strong> Treat your <code>robots.txt</code> like any other critical code. Put it in version control.</p>
</li>
<li><p><strong>Audit Regularly:</strong> As your site grows and changes, review your <code>robots.txt</code> for outdated or overly aggressive directives.</p>
</li>
<li><p><strong>Combine with Search Console:</strong> Utilize Google Search Console (GSC) for crawl stats, index coverage reports, and the Robots Testing Tool to validate your directives. Pay attention to "Blocked by robots.txt" issues.</p>
</li>
<li><p><strong>Consider Internationalization (i18n):</strong> If you have multiple language versions on subdomains or subdirectories, ensure your <code>robots.txt</code> doesn't inadvertently block entire language sections.</p>
</li>
<li><p><strong>Performance Implications:</strong> While small, a clean and optimized <code>robots.txt</code> contributes to overall site health, which indirectly aids SEO performance and user experience by ensuring crawlers focus on what matters.</p>
</li>
</ul>
<h3 id="heading-conclusion">Conclusion</h3>
<p>The <code>robots.txt</code> file is more than just a formality; it's a powerful tool in your SEO and site management arsenal. By moving beyond basic <code>Disallow</code> rules and understanding the nuances of its directives, you can exert precise control over search engine behavior, optimize your crawl budget, and ultimately enhance the discoverability and performance of your web assets.</p>
<p>Mastering this humble text file is a mark of a truly professional approach to web development and digital presence.</p>
<p>Happy coding, and happy crawling!</p>
]]></content:encoded></item><item><title><![CDATA[Hello World! Why I Finally Decided to Start Blogging]]></title><description><![CDATA[Every developer, creator, and thinker eventually faces the blinking cursor on a blank page. It’s intimidating. It’s the digital equivalent of standing on a stage when the spotlight first hits you.
For the longest time, I’ve consumed content. I’ve rea...]]></description><link>https://blogs.rjlaasish.com/why-i-started-blogging</link><guid isPermaLink="true">https://blogs.rjlaasish.com/why-i-started-blogging</guid><category><![CDATA[introduction]]></category><category><![CDATA[Web Development]]></category><category><![CDATA[Blogging]]></category><category><![CDATA[#learning-in-public]]></category><dc:creator><![CDATA[Aasish Rijal]]></dc:creator><pubDate>Tue, 09 Dec 2025 09:27:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765271484434/214c5b69-0f69-4b45-859f-4b988e6a4db4.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every developer, creator, and thinker eventually faces the blinking cursor on a blank page. It’s intimidating. It’s the digital equivalent of standing on a stage when the spotlight first hits you.</p>
<p>For the longest time, I’ve consumed content. I’ve read countless tutorials, scrolled through endless threads, and bookmarked more articles than I could ever finish. But today, I’m flipping the switch from consumer to creator.</p>
<p>This is my "Hello World" post, a commitment to stepping out of my comfort zone and documenting my journey.</p>
<h2 id="heading-why-now">Why Now?</h2>
<p>It’s easy to feel like everything has already been said. The internet is saturated with experts and hot takes. A common feeling that stopped me before was: <em>"Who am I to teach anyone anything?"</em></p>
<p>But I realized that blogging isn't just about teaching as an expert; it's about documenting as a learner.</p>
<p>I'm starting this blog for three main reasons:</p>
<h3 id="heading-1-learning-in-public">1. Learning in Public</h3>
<p>There is no better way to solidify your understanding of a topic than trying to explain it to someone else. By writing about what I’m building or learning, I’m forcing myself to understand it deeply.</p>
<h3 id="heading-2-creating-a-knowledge-repository">2. Creating a Knowledge Repository</h3>
<p>How many times have you solved a tricky problem, only to face the exact same issue six months later and forget how you fixed it? This blog will serve as my external brain—a searchable archive of the challenges I've overcome.</p>
<h3 id="heading-3-connecting-with-others">3. Connecting with Others</h3>
<p>Hashnode has an incredible community. I want to connect with other like-minded people, share ideas, and get feedback on my work.</p>
<h2 id="heading-what-you-can-expect-here">What You Can Expect Here</h2>
<p>I plan to keep this blog practical and honest. It won't just be highlight reels of successes; I also want to share the messy middle parts of projects and the failures I learn from.</p>
<p>In the coming weeks and months, I’ll be writing primarily about:</p>
<ul>
<li><p>JavaScript frameworks and front-end development</p>
</li>
<li><p>My journey building my current SaaS side-project</p>
</li>
<li><p>Productivity tips for remote workers or specific tools you love</p>
</li>
<li><p>Lessons learned from my day-to-day work.</p>
</li>
</ul>
<h2 id="heading-the-commitment">The Commitment</h2>
<p>Sticking to a blogging schedule is tough. My initial goal is to publish one quality article every month.</p>
<p>If you’re also on a journey of learning and creating, I’d love to connect. Feel free to say hi in the comments below or connect with me on <a target="_blank" href="https://www.linkedin.com/in/aasish-rijal111/">LinkedIn</a>.</p>
<p>Thanks for reading the first one. Now, it’s time to get to work on the second.</p>
]]></content:encoded></item></channel></rss>