In this article, we will explore how to use Python to analyze Google search results and extract useful information about the most relevant headers and links for a specific keyword. This information can be valuable for SEO specialists and content creators looking to improve their website’s performance in search engines.

Introduction to the Code

The code presented in this article is designed to work on Google Colab, a platform that allows you to write and run Python code in the cloud. You can access the code at the following link: Code on Google Colab.

The code uses several Python libraries, such as Requests, BeautifulSoup, NLTK, Gensim, and Spacy, to perform the analysis. These libraries allow for making web requests, parsing HTML content, processing text, and working with language models.

What Does the Code Do?

The main goal of the code is to find and score the most relevant headers and links in relation to a specific keyword. To achieve this, the code performs the following steps:

  1. Perform a Google search using the provided keyword.
  2. Extract the URLs of the web pages from the search results and process (scrape) them to obtain the full text of each page, along with its headers and links.
  3. Clean and process the obtained text, removing unnecessary characters, lemmatizing (reducing words to their basic form), removing accents, and eliminating irrelevant words (stop words).
  4. Create a word2vec model from the processed text. This model allows converting words into vectors and calculating the similarity between them.
  5. Score the headers and links of the web pages based on their relevance to the keyword. Relevance is determined using the semantic similarity calculated by the word2vec model.
  6. Group similar headers together.
  7. Display the results in tables, including the headers and links along with their relevance score and, in the case of headers, also display the groupings of similar headers.

How to Use the Code?

To use the code, you first need to open the Code on Google Colab link and make a copy of the notebook to your own Google Drive.

Once you have opened the notebook, you can change the keyword in the following line of code:

keyword = "fiebre"

Simply replace “fiebre” with the keyword you want to analyze. Then, run all the cells in the notebook (you can do this by selecting “Runtime” in the menu bar and then “Run all”).

At the end of the process, you will see two tables showing the most relevant headers and links, along with their relevance score. Groupings of similar headers will also be displayed.

Example: Change the Country

The current code performs a Google search in Spanish and uses the specific results from Spain. If you want to change the country and language of the search results, you can modify the following line in the start function of the serp class:

URL = "https://www.google.com/search?hl=es&gl=es&q=%s&oq=%s" % (self.query, self.query)

Here, hl and gl are parameters that indicate the language and country, respectively. To change the country and language, simply replace “es” with the corresponding language and country code. For example, if you want to search in English in the U.S. search results, change the line to:

URL = "https://www.google.com/search?hl=en&gl=us&q=%s&oq=%s" % (self.query, self.query)

Uses and Applications

This type of analysis can be useful for understanding which topics and approaches are most relevant in web content related to the keyword. You can use this information to improve your website’s content and SEO performance. By identifying the most relevant headers and links, you can tailor your content to better address the needs and expectations of your audience and improve your website’s visibility in search engines.

Additional Resources

For more information about the libraries and techniques used in this code, check out the following links:

  • Requests: Library for making HTTP requests in Python.
  • BeautifulSoup: Library for extracting information from HTML and XML documents.
  • NLTK: Library for working with human language data in Python.
  • Gensim: Library for working with language models and performing semantic analysis.
  • Spacy: Library for natural language processing in Python.

Remember that you can access the code at the following link: Code on Google Colab. Good luck with your analyses and SEO improvements!