TwitterDataFetcher

The TwitterDataFetcher class is used to specifically query Twitter data. To simplify the queries to the Twitter API, this software component uses already existing open-source software for interacting with the API, namely the Tweepy Python package.

It uses the Tweepy Client class to query the data dictionaries of the Twitter Search API v2 as well as the Tweepy API class to access the data dictionaries based on the Twitter Search API v1.

The Twitter Search API v1 is mainly used to query user and tweet objects. Although this API version is partially deprecated, it offers a comparable content to the latest API version and often requires less API calls to receive the same information compared to the v2 API.

Additional direct requests to the Twitter Search API v2 are performed, too, using the Python requests library to query endpoints that have been migrated or deprecated in the Tweepy package.

This class can also be used to in isolation to collect Twitter data. It requires, therefore, authentication for the Twitter platform. Provide the secrets for the Twitter API and, if desired, for the Botometer API. You can find a list of required secrets in the user guide for the TwitterAPI class.

This class has the concern to fetch Twitter data. No further processing is performed within this class.

Initialization

If you want to use this class for data processing or other package components, follow the steps below.

Import the TwitterDataFetcher class from the fetch module.

from pysna.fetch import TwitterDataFetcher

fetcher = TwitterDataFetcher(
    bearer_token: Any | None = None,
    consumer_key: Any | None = None,
    consumer_secret: Any | None = None,
    access_token: Any | None = None,
    access_token_secret: Any | None = None,
    x_rapidapi_key: Any | None = None,
    x_rapidapi_host: Any | None = None
)

and invoke a function:

user_id = 123450897612
fetcher.get_latest_activity(user_id)

Find the necessary secrets on the user guide instructions.

Methods

Private Methods

manual_request

Performs a manual request to the Twitter API. Returns JSON formatted API response.

Function:

TwitterDataFetcher._manual_request(url: str, method: str = "GET", header: dict | None = None, payload: dict | None = None, additional_fields: Dict[str, List[str]] | None = None)

Args:

url (str): API URL (without specified fields)
method (str): Request method according to REST. Defaults to "GET".
header: Custom HTTP Header. Defaults to None.
payload: JSON data for HTTP requests. Defaults to None.
additional_fields (Dict[str, List[str]] | None, optional): Fields can be specified (e.g., tweet.fields) according to the official API reference. Defaults to None.

The function will raise an exception if the response status code is unlike 200.

With this function, performig manual requests is facilitated as the query string is built by the function based on the provided input arguments.

The url argument has to be provided in raw form (i.e., without any parameters or fields).
The method argument allows to specify the REST request method (i.e., GET, POST, PUT, DELETE). Defaults to GET.
The header argument allows to specify a custom header. This is useful if another API besides the Twitter API is fetched. If no custom header is provided, the default header for the Twitter API authentification is used based on the provided bearer_token during instantiation.
The payload argument allows to send data for a POST or PUT request. The data must be provided as a dictionary.
The additional_fields argument is used to specify Twitter fields (i.e., user fields or tweet fields) and, thus, enhance the query and return additional information. The argument can be used as follows:

{"tweet.fields": ["public_metrics"]}

The function will then build the query string and send it to the API.

You can find the full list of Twitter fields in the documentation: https://developer.twitter.com/en/docs/twitter-api/fields

Source Code

def _manual_request(self, url: str, method: str = "GET", header: dict | None = None, payload: dict | None = None, additional_fields: Dict[str, List[str]] | None = None) -> dict:
    """Perform a manual request to the Twitter API.

    Args:
        url (str): API URL (without specified fields)
        method (str): Request method according to REST. Defaults to "GET".
        header (dict | None): Custom HTTP Header. Defaults to None.
        payload (dict | None): JSON data for HTTP requests. Defaults to None.
        additional_fields (Dict[str, List[str]] | None, optional): Fields can be specified (e.g., tweet.fields) according to the official API reference. Defaults to None.

    Raises:
        Exception: If status code != 200.

    Returns:
        dict: JSON formatted response of API request.
    """
    # if additional_fields were provided
    if additional_fields:
        # init empty string
        fields = "?"
        # create fields string dynamically for every field in additional_fields
        for field in additional_fields.keys():
            # e.g., in format "tweet.fields=lang,author_id"
            fields += f"{field}={','.join(additional_fields[field])}&"
        # append fields to url
        url += fields[:-1]
    if header is None:
        # set header
        header = {"Authorization": f"Bearer {self._bearer_token}"}
    response = requests.request(method=method, url=url, headers=header, json=payload)
    if response.status_code != 200:
        raise Exception("Request returned an error: {} {}".format(response.status_code, response.text))
    return response.json()

paginate

Custom pagination function

It turns out that the pagination functions from the Tweepy Python packge are considerably slower than doing the pagination manually. For this reason, this function was designed.

Function:

TwitterDataFetcher._paginate(func, params: Dict[str, str | int], limit: int | None = None, response_attribute: str = "data", page_attribute: str | None = None)

Args:

func: Function used for pagination
params (Dict[str, str | int]): Dict containing request parameters. Must be of the form {'id': ..., 'limit': ..., 'pagination_token': ...}
limit (int | None, optional): Maximum number of results. Defaults to None, thus, no limit.
response_attribute (str, optional): Attribute of the Response object. Defaults to "data". Options: ["data", "includes"]
page_attribute (str, optional): The attribute that should be extracted for every entry of a page. Defaults to None.

The params argument is used to specify the parameters for the next page. Therefore, an id is needed as well as a key indicating the maximm number of results (i.e., limit). None indicates that no limit is desired and, thus, all available results will be returned. The pagination_token key can be set to None initially. This pagination token will be reset during iteraion. In case, you wish to start from a different page than the first one, provide a pagination token. All parameters must be provided via a dictionary of the form:

{"id": 1234456,
"limit": None, # no limit
"pagination_token": None}

The response_attribute argument specifies where to collect the data from the response. If data is specified, the results are received from the default attribute field of the response. If includes is specified, the results are obtained from the additional information provided by the Twitter fields.`

The page_attribute argument specifies what attribute should be extracted for every entry of a page. For instance, if this argument is set to id, then the IDs will be extracted from every entry (e.g., user IDs of user objects).

Inside that function, a counter is incremented for every result that has been fetched. If the limit was reached, the function will break out the loop and will return immediately the obtained results. Otherwise, the function will check if last page was reached and will fetch the next page (if available).

Source Code

def _paginate(self, func, params: Dict[str, str | int], limit: int | None = None, response_attribute: str = "data", page_attribute: str | None = None) -> list:
    """Pagination function

    Args:
        func: Function used for pagination
        params (Dict[str, str  |  int]): Dict containing request parameters. Should be of the form {'id': ..., 'max_results': ..., 'pagination_token': ...}
        limit (int | None, optional): Maximum number of results. Defaults to None, thus, no limit.
        response_attribute (str, optional): Attribute of the Response object. Defaults to "data". Options: ["data", "includes"]

    Raises:
        KeyError: 'id', 'max_results', and 'pagination_token' should be provided in the params dict.

    Returns:
        set: Results
    """
    # init counter
    counter = 0
    # init empty results set
    results = list()
    # set break out var
    break_out = False
    while not break_out:
        # make request
        response = func(**params)
        # if any data exists
        if response.__getattribute__(response_attribute) is not None:
            # iterate over response results
            for item in response.__getattribute__(response_attribute):
                # add result
                if page_attribute is None:
                    results.append(item)
                else:
                    results.append(item.__getattribute__(page_attribute))
                # increment counter
                counter += 1
                # if limit was reached, break
                if (limit is not None) and (counter == limit):
                    # set break_out var to true
                    break_out = True
                    break
            # if last page was reached
            if "next_token" not in response.meta:
                break
            # else, set new pagination token for next iteration
            else:
                params["pagination_token"] = response.meta["next_token"]
        # if no data exists, break
        else:
            break
    return results

get_user_object

Request Twitter user object using Tweepy. The user object is fetched from the Twitter Search API v1. For this, the Tweepy API class is used.

Function:

TwitterDataFetcher.get_user_object(user: str | int)

The function takes in either the user ID as string or integer or the user's unique screen name. It returns the requested API v1 user object.

The function handles the performed request based on what user identifier was given.

If the requested user has been suspended from Twitter, an error will be returned and a messeage will be logged to stdout.

Source Code

def get_user_object(self, user: str | int) -> tweepy.models.User:
    """Request Twitter User Object via tweepy

    Args:
        user (str): Either User ID or screen name

    Returns:
        tweepy.User: Twitter User object from tweepy
    """
    try:
        # check if string for user1 is convertible to int in order to check for user ID or screen name
        if (isinstance(user, int)) or (user.isdigit()):
            # get profile for user by user ID
            user_obj = self.api.get_user(user_id=user)
        else:
            # get profile for user by screen name
            user_obj = self.api.get_user(screen_name=user)
    except tweepy.errors.Forbidden as e:
        # log to stdout
        log.error("403 Forbidden: access refused or access is not allowed.")
        # if user ID was provided
        if user.isdigit() or isinstance(user, int):
            url = f"https://api.twitter.com/2/users/{user}"
        else:
            # if screen name was provided
            url = f"https://api.twitter.com/2/users/by/username/{user}"
        response = self._manual_request(url)
        # if an error occured that says the user has been suspended
        if any("User has been suspended" in error["detail"] for error in response["errors"]):
            log.error("User has been suspended from Twitter. Requested user: {}".format(user))
            raise e
        else:
            raise e
    return user_obj

get_user_follower_ids

Request Twitter follower IDs from user.

Function:

TwitterDataFetcher.get_user_follower_ids(user: str | int)

This function takes in a Twitter user identifier (either ID or unique screen name). It returns all follower user IDs from the specified user as a set. Here, the ``tweepy.Cursor```is used for pagination.

The function handles the performed request based on what user identifier was given.

Source Code

def get_user_follower_ids(self, user: str | int) -> Set[int]:
    """Request Twitter follower IDs from user

    Args:
        user (str | int): Either User ID or screen name.

    Returns:
        Set[int]: Array containing follower IDs
    """
    # check if string for user1 is convertible to int in order to check for user ID or screen name
    if (isinstance(user, int)) or (user.isdigit()):
        params = {"user_id": user}
    else:
        params = {"screen_name": user}

    follower_ids = list()
    for page in tweepy.Cursor(self.api.get_follower_ids, **params).pages():
        follower_ids.extend(page)
    return set(follower_ids)

get_user_followee_ids

Request Twitter followee IDs from user.

Function:

TwitterDataFetcher.get_user_followee_ids(user: str | int)

This function takes in a Twitter user identifier (i.e., either ID or unique screen name) and returns a set containing all IDs from the user's followees (AKA friends or follows).

The function handles the performed request based on what user identifier was given.

Source Code

def get_user_followee_ids(self, user: str | int) -> Set[int]:
    """Request Twitter followee IDs from user

    Args:
        user (str): Either User ID or screen name.

    Returns:
        Set[int]: Array containing follow IDs
    """
    # check if string for user1 is convertible to int in order to check for user ID or screen name
    if (isinstance(user, int)) or (user.isdigit()):
        params = {"user_id": user}
    else:
        params = {"screen_name": user}

    followee_ids = list()
    for page in tweepy.Cursor(self.api.get_friend_ids, **params).pages():
        followee_ids.extend(page)
    return set(followee_ids)

get_latest_activity

Returns latest user's activity by fetching the top element from its timeline.

Function:

TwitterDataFetcher.get_latest_activity(user: str | int)

This function takes in a Twitter user identifier (i.e., either ID or unique screen name) and returns the latest activity from the user's timeline. Therefore, the _manual_request function is used to request the corresponding endpoint.

Often, this will be a tweet composed by the user itself. Then, all available data of that tweet will be returned as a dictionary.

The function handles the performed request based on what user identifier was given.

Source Code

def get_latest_activity(self, user: str | int) -> dict:
    """Returns latest user's activity by fetching the top element from its timeline.

    Args:
        user (str | int): User ID or screen name.

    Returns:
        dict: Latest activity.
    """
    # if screen name was provided
    if (isinstance(user, str)) and (user.isdigit() is False):
        url = f"https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name={user}&include_rts=true&trim_user=true&tweet_mode=extended"
    # else go with user ID
    else:
        url = f"https://api.twitter.com/1.1/statuses/user_timeline.json?user_id={user}&include_rts=true&trim_user=true&tweet_mode=extended"
    response_json = self._manual_request(url)
    # return the first item since timeline is sorted descending
    return response_json[0]

get_latest_activity_date

Get latest activity date from specified user by fetching the top element from its timeline and extract the creation date.

Function:

TwitterDataFetcher.get_latest_activity_date(user: str | int)

This function takes in a Twitter user identifier (i.e., either ID or unique screen name) and returns the latest activity date from the user's timeline. Therefore, the _manual_request function is used to request the corresponding endpoint.

The latest activity date is determined by fetching the latest activity from the user's timeline first, and then extracting the creation date. Usually, this will be a tweet composed by the user. If this is the case, the creation date of that tweet will be returned, representing the latest public available activity date.

Source Code

def get_latest_activity_date(self, user: str | int) -> str:
    """Get latest activity date from specified user by fetching the top element from its timeline.

    Args:
        user (str | int): User ID or screen name.

    Returns:
        str: Activity date of latest activity.
    """
    # if screen name was provided
    if (isinstance(user, str)) and (user.isdigit() is False):
        url = f"https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name={user}&include_rts=true&trim_user=true"
    # else go with user ID
    else:
        url = f"https://api.twitter.com/1.1/statuses/user_timeline.json?user_id={user}&include_rts=true&trim_user=true"
    response_json = self._manual_request(url)
    # return the first item since timeline is sorted descending
    return response_json[0]["created_at"]

get_relationship

Get relationship between two Twitter users.

Function:

TwitterDataFetcher.get_relationship(source_user: str | int, target_user: str | int)

The function takes in a source and a target user identifier. It uses the Tweepy.API.get_friendship function to get the relationship. Therefore, this function handles the performed query based on the provided user identifiers.

The function will return the parsed JSON relationship for the source and target user as a dictionary.

Source Code

def get_relationship(self, source_user: str | int, target_user: str | int) -> dict:
    """Get relationship between two users.

    Args:
        user1 (str | int): Source user ID or screen name.
        user2 (str | int): Target user ID or screen name.

    Returns:
        dict: Unpacked tuple of JSON from tweepy Friendship model.

    Reference: https://developer.twitter.com/en/docs/twitter-api/v1/accounts-and-users/follow-search-get-users/api-reference/get-friendships-show#example-response
    """

    params = {"source_id": None, "source_screen_name": None, "target_id": None, "target_screen_name": None}
    # if source_user is int or a digit
    if (isinstance(source_user, int)) or (isinstance(source_user, str) and (source_user.isdigit())):
        params["source_id"] = source_user
    # else if screen name was provided
    elif (isinstance(source_user, str)) and (not source_user.isdigit()):
        params["source_screen_name"] = source_user
    else:
        log.error("No ID or username provided for {}".format(source_user))

    # if target_user is int or a digit
    if (isinstance(target_user, int)) or (isinstance(target_user, str) and (target_user.isdigit())):
        params["target_id"] = target_user
    # else if screen name was provided
    elif (isinstance(target_user, str)) and (not target_user.isdigit()):
        params["target_screen_name"] = target_user
    else:
        log.error("No ID or username provided for {}".format(target_user))

    relationship = self.api.get_friendship(**params)
    return {"source": relationship[0]._json, "target": relationship[1]._json}

get_relationship_pairs

Creates pairs for each uniqie combination of provided users based on their relationship.

Function:

TwitterDataFetcher.get_relationship_pairs(users: List[str | int])

This function takes in a list of user identifiers (i.e., IDs or unique screen names). It will create a pair of each combination of the provided users and returns their individual relationships.

For instance, if three users WWU_Muenster, goetheuni, UniKonstanz were provided, the pairs are determined as follows:

(WWU_Muenster, goetheuni)
(WWU_Muenster, UniKonstanz)
(goetheuni, WWU_Muenster)
(goetheuni, UniKonstanz)
(UniKonstanz, WWU_Muenster)
(UniKonstanz, goehteuni)

These pairs are set as dictionary keys. The respective relationships are stored as dictionary values.

Source Code

def get_relationship_pairs(self, users: List[str | int]) -> dict:
    """Creates pairs for each unique combination of provided users based on their relationship.

    Args:
        users (List[str  |  int]): List of user IDs or screen names.

    Returns:
        dict: Pairs of users containing their relationship to each other.
    """
    # init emtpy relationships dict
    relationships = dict()
    # iterate over every pair combination of provided users
    for user in users:
        for other_user in users:
            if user != other_user:
                relationships[(user, other_user)] = self.get_relationship(source_user=user, target_user=other_user)
    return relationships

get_liked_tweets_ids

Get (all) liked tweet IDs of the provided user.

Function:

TwitterDataFetcher.get_liked_tweets_ids(user: str | int, limit: int | None = None)

Args:

user (str | int): User ID or screen name.
limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

This function uses the custom TwitterDataFetcher._paginate function to get the specified number of results. To get the tweet IDs, the tweepy.Client.get_liked_tweets function is used.

The function wil return a Python set of the IDs of the liked tweets by the user.

The function handles the performed request based on what user identifier was given.

Source Code

def get_liked_tweets_ids(self, user: str | int, limit: int | None = None) -> list():
    """Get (all) liked Tweets of provided user.

    Args:
        user (str | int): User ID or screen name.
        limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

    Returns:
        Set[int]: Tweet Objects of liked Tweets.
    """
    # if user ID was provided
    if (isinstance(user, int)) or (user.isdigit()):
        params = {"id": user, "max_results": 100, "pagination_token": None}
    else:
        user_obj = self.get_user_object(user)
        params = {"id": user_obj.id, "max_results": 100, "pagination_token": None}

    page_results = self._paginate(self.client.get_liked_tweets, params, limit=limit, page_attribute="id")
    return page_results

get_composed_tweets_ids

Get (all) composed tweet IDs of provided user by pagination.

Function:

TwitterDataFetcher.get_composed_tweets_ids(user: str | int, limit: int | None = None)

Args:

user (str | int): User ID or screen name.
limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

This function uses the custom TwitterDataFetcher._paginate function to get the specified number of results. To get the tweet IDs, the tweepy.Client.get_users_tweets function is used.

The function wil return a Python set of the IDs of the composed tweets by the user.

The function handles the performed request based on what user identifier was given.

Source Code

def get_composed_tweets_ids(self, user: str | int, limit: int | None = None) -> list:
    """Get (all) composed Tweets of provided user by pagination.

    Args:
        user (str | int): User ID or screen name.
        limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

    Returns:
        list: Tweet Objects of composed Tweets.
    """

    # user ID is required, if screen name was provided
    if (isinstance(user, str)) and (not user.isdigit()):
        user = self.get_user_object(user).id
    # set params
    params = {"id": user, "max_results": 100, "pagination_token": None}
    # get page results
    page_results = self._paginate(self.client.get_users_tweets, params, limit=limit, page_attribute="id")
    return page_results

get_botometer_scores

Returns bot scores from the Botometer API for the specified Twitter account.

Function:

TwitterDataFetcher.get_botometer_scores(user: str | int)

This function takes in a Twitter account identifier (i.e., ID or unique screen name.)

This function relies on the external Botometer API. To use this function, the corresponding RapidAPI secrets need to be provided. See the secrets overview for more details.

The function gets the user's timeline first and takes the latest 100 tweets from its timeline. Then, this data is send via the `payload argument of the TwitterDataFetcher._manual_request function using a POST request. Then, the JSON response is returned.

Source Code

def get_botometer_scores(self, user: str | int) -> dict:
    """Returns bot scores from the Botometer API for the specified Twitter user.

    Args:
        user (str | int): User ID or screen name.

    Returns:
        dict: The raw Botometer scores for the specified user.

    Reference: https://rapidapi.com/OSoMe/api/botometer-pro/details
    """
    if (self._x_rapidapi_key is None) or (self._x_rapidapi_host is None):
        raise ValueError("'X_RAPIDAPI_KEY' and 'X_RAPIDAPI_HOST' secrets for Botometer API need to be provided.")
    # get user object
    user_obj = self.get_user_object(user)
    # get user timeline
    timeline = list(map(lambda x: x._json, self.api.user_timeline(user_id=user_obj.id, count=200)))
    # get user data
    if timeline:
        user_data = timeline[0]["user"]
    else:
        user_data = user_obj._json
    screen_name = "@" + user_data["screen_name"]
    # get latest 100 Tweets
    tweets = list(map(lambda x: x._json, self.api.search_tweets(screen_name, count=100)))
    # set payload
    payload = {"mentions": tweets, "timeline": timeline, "user": user_data}
    # set header
    headers = {"content-type": "application/json", "X-RapidAPI-Key": self._x_rapidapi_key, "X-RapidAPI-Host": self._x_rapidapi_host}
    # set url
    url = "https://botometer-pro.p.rapidapi.com/4/check_account"
    # get results
    response = self._manual_request(url, "POST", headers, payload)
    return response

get_tweet_object

Request Twitter tweet object via tweepy.

Function:

TwitterDataFetcher.get_tweet_object(tweet: str | int)

The function takes in either the tweet ID as string or integer. It returns the extended tweet object requested via the API v1 using the tweepy.API.get_status function.

If the requested tweet object has been deleted, an error will be returned and a messeage will be logged to stdout.

Reference: https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet

Source Code

def get_tweet_object(self, tweet: str | int) -> tweepy.models.Status:
    """Request Twitter Tweet Object via tweepy

    Args:
        tweet (int | str): Tweet ID

    Returns:
        tweepy.models.Status: tweepy Status Model

    Reference: https://developer.twitter.com/en/docs/twitter-api/v1/data-dictionary/object-model/tweet
    """
    try:
        tweet_obj = self.api.get_status(tweet, include_entities=True, tweet_mode="extended")
    except tweepy.errors.NotFound as e:
        log.error("404 Not Found: Resource not found.")
        raise e
    except tweepy.errors.Forbidden as e:
        log.error("403 Forbidden: access refused or access is not allowed.")
        raise e
    return tweet_obj

get_liking_users_ids

Get (all) liking users of provided tweet by pagination.

Function:

TwitterDataFetcher.get_liking_users_ids(tweet_id: str | int, limit: int | None = None)

Args:

tweet (str | int): Tweet ID.
limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

The function takes in the tweet ID as string or integer representation as well as the limit argument. If limit is none, all available results will be returned. It returns the user IDs of the users that liked the specified tweet.

This function uses the custom TwitterDataFetcher._paginate function to get the specified number of results. To get the user IDs, the tweepy.Client.get_liking_users function is used.

Source Code

def get_liking_users_ids(self, tweet_id: str | int, limit: int | None = None) -> list:
    """Get (all) liking users of provided Tweet by pagination.

    Args:
        tweet (str | int): Tweet ID.
        limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

    Returns:
        Set[int]: User Objects as list.
    """
    # set params
    params = {"id": tweet_id, "max_results": 100, "pagination_token": None}
    # get page results
    page_results = self._paginate(self.client.get_liking_users, params, limit=limit, page_attribute="id")
    return page_results

get_retweeters_ids

Get (all) retweeting users of provided tweet by pagination.

Function:

TwitterDataFetcher.get_retweeters_ids(tweet_id: str | int, limit: int | None = None)

Args:

tweet (str | int): Tweet ID.
limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

The function takes in the tweet ID as string or integer representation as well as the limit argument. If limit is none, all available results will be returned. It returns the user IDs of the users that retweeted the specified tweet.

This function uses the custom TwitterDataFetcher._paginate function to get the specified number of results. To get the user IDs, the tweepy.Client.get_retweeters function is used.

Source Code

def get_retweeters_ids(self, tweet_id: str | int, limit: int | None = None) -> list:
    """Get (all) retweeting users of provided Tweet by pagination.

    Args:
        tweet (str | int): Tweet ID.
        limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

    Returns:
        Set[int]: User Objects of retweeting users.
    """
    params = {"id": tweet_id, "max_results": 100, "pagination_token": None}
    # get page results
    page_results = self._paginate(self.client.get_retweeters, params, limit=limit, page_attribute="id")
    return page_results

get_quoting_users_ids

Get (all) quoting users of provided Tweet by pagination.

Function:

TwitterDataFetcher.get_quoting_users_ids(tweet_id: str | int, limit: int | None = None)

Args:

tweet_id (str | int): Tweet ID.
limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

The function takes in the tweet ID as string or integer representation as well as the limit argument. If limit is none, all available results will be returned. It returns the user IDs of the users that quoted the specified tweet.

This function uses the custom TwitterDataFetcher._paginate function to get the specified number of results. To get the tweet objects, the tweepy.Client.get_quote_tweets function is used. Then, the quoting users IDs are extracted from the additional information provided within the includes fields of each page. For more details, see the instructions on the TwitterDataFetcher._paginate function

Source Code

def get_quoting_users_ids(self, tweet_id: str | int, limit: int | None = None) -> list:
    """Get (all) quoting users of provided Tweet by pagination.

    Args:
        tweet_id (str | int): Tweet ID.
        limit (int | None): The maximum number of results to be returned. By default, each page will return the maximum number of results available.

    Returns:
        list: User Objects of quoting users.
    """
    params = {"id": tweet_id, "max_results": 100, "pagination_token": None}
    # get page results
    page_results = self._paginate(self.client.get_quote_tweets, params, limit=limit, response_attribute="includes", page_attribute="id")
    return page_results

get_context_annotations_and_entities

Get context annotations and entities from a tweet object.

Function:

TwitterDataFetcher.get_context_annotations_and_entities(tweet_id: str | int)

The function takes in the tweet ID as string or integer representation.

The function returns the context annotations (e.g., topics) and named entities of the specified tweet. Therefore, it uses the TwitterDataFetcher._manual_request function. The tweet fields for context_annotations and entities are set. If any context annotation or named entity exist, the JSON response of the request is returned, else None.

Reference: https://developer.twitter.com/en/docs/twitter-api/annotations/overview

Source Code

def get_context_annotations_and_entities(self, tweet_id: str | int) -> dict | None:
    """Get context annotations and entities from a Tweet.

    Args:
        tweet_id (str | int): Tweet ID

    Returns:
        dict | None: context annotations and entities if available, else None.

    Reference: https://developer.twitter.com/en/docs/twitter-api/annotations/overview
    """
    url = f"https://api.twitter.com/2/tweets/{tweet_id}"
    response_json = self._manual_request(url, additional_fields={"tweet.fields": ["context_annotations", "entities"]})
    # if key is not awailable, return None
    if "context_annotations" or "entities" in response_json["data"]:
        return response_json["data"]
    else:
        return None

get_public_metrics

Get public metrics from tweet object.

Function:

TwitterDataFetcher.get_public_metrics(tweet_id: str | int)

The function takes in the tweet ID as string or integer representation.

The following public metrics are returned:

impressions_count (=views)
quote_count
reply_count
retweet_count
favorite_count (=likes)

Here you can find an interpretation of the metrics: https://developer.twitter.com/en/docs/twitter-api/metrics

The function returns the public metrics of the tweet. Therefore, it uses the TwitterDataFetcher._manual_request function. The tweet field for public_metrics is set. If any context annotation or named entity exist, the JSON response of the request is returned, else None.

Source Code

def get_public_metrics(self, tweet_id: str | int) -> dict:
    """Get public metrics from Tweet Object

    Args:
        tweet_id (str | int): Tweet ID

    Returns:
        dict: Available public metrics for specified Tweet.

    Metrics:
        - impressions_count (=views)
        - quote_count
        - reply_count
        - retweet_count
        - favorite_count (=likes)

    Reference: https://developer.twitter.com/en/docs/twitter-api/metrics
    """
    # set URL
    url = f"https://api.twitter.com/2/tweets/{tweet_id}"
    # make request
    response_json = self._manual_request(url, additional_fields={"tweet.fields": ["public_metrics"]})
    # get public metrics from JSON response
    public_metrics = response_json["data"]["public_metrics"]
    return public_metrics

TwitterDataFetcher

Initialization

Methods

Private Methods

manual_request

paginate

Twitter user related methods

get_user_object

get_user_follower_ids

get_user_followee_ids

get_latest_activity

get_latest_activity_date

get_relationship

get_relationship_pairs

get_liked_tweets_ids

get_composed_tweets_ids

get_botometer_scores

Tweet related methods

get_tweet_object

get_liking_users_ids

get_retweeters_ids

get_quoting_users_ids

get_context_annotations_and_entities

get_public_metrics