Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: amcurley
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 23.8 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed about 5 years ago
Metadata Files
Readme Citation

README.md

Project ContentGen

Problem Statement

According to HubSpot companies spend 46% of their budget on content creation (HubSpot, 2017) and 24% of marketers plan on increasing their investment in content marketing in 2020 (HubSpot, 2020). Content creation is obviously a very important aspect of the overall marketing plan for a company. There are usually a lot of moving pieces that go into creating an effective content marketing plan such as photographers, editors, videographers, models, writers, etc.

This leads us to our problem:

  • Using machine learning can we streamline and automate the content generation process?

  • Is it ethical to use this technology for content generation?

Project Layout

Caution: This webapp does show photorealistic images of people and has links to the full access of over 15,000 images of computer generated people. Before using this application consider using this application only for knowledge and bringing awareness to the ethical concerns of using this technology.

Web application:
https://test-heroku-content.herokuapp.com/

This project was broken into two pieces:

  1. The first part of the project uses StyleGan2 for face generation and DeepFakes for moving these images.

  2. The second part of the project uses GPT-2 for blog post generation.

Examples of both of these parts of the project can be viewed in the web application listed above.

Executive Summary

This project began with the goal of creating computer generated content. This could be a successful service that companies can utilize in their content marketing efforts. The first aspect of this project focuses on influencer marketing. Influencer marketing is a huge asset for companies, however it can become fairly expensive as the quality of the influencer increases (number of followers and engagement rate). Using computer generated influencers, companies can in theory "deploy" thousands of these influencers to promote their products and services. The second aspect of this project is focused on utilizing computer generated blog posts for content marketing. According to Hubspot "About 64% of marketers actively invest time in search engine optimization (SEO)(HubSpot, 2020)." With computer generated blog posts marketers can cut down the time it takes to make quality blog posts at scale.

PersonGen

The first part of this application is called PersonGen. This is the influencer generation section of ContentGen. Using a GAN or a Generative Adversarial Network is the best way to generate faces for the influencers. There are multiple GAN frameworks to choose from, however I chose StyleGan2 because of the success and quality of the generated images that are being produced from StyleGan2. I generated 15,000 images of fake faces that will be used for the influencers that a company can use and implement in their content marketing strategy. The ability for the influencers to move is extremely important due to how successful video content is doing in 2020 such as videos on TikTok. This is a famous video on TikTok currently with 470 million views and within 30 seconds I was able to replicate the motion on a computer generated person using DeepFakes.

image

Using the computer generated images from the application and the notebook deep_fakes.ipynb, a company can upload a video of a person talking and can make the generated person move like the person in the video. The deep fake isn't perfect however with certain tweaks and eventually using vid2vid these videos would perform much better.

Here is an example of myself moving the GAN generated images:

image

BlogGen

The second part of the application is called BlogGen. The user/business will go onto the application and go to the "BlogGen" section in the navigation bar. The user will select a topic and write a title for the blog post. If a user does not choose a topic and only writes a title the generator will still be able to generate a body of text that the person can then use for their own blog. For this part of the project I utilized GPT-2 for the text generation. The topics that a user can select are all of the topics available on Medium. For each of these topics I used a 3-4 sentence primer about this topic that I got from Wikipedia and other websites, so my generator can develop more coherent blogs about that specific topic and title.

Here is an example of BlogGen in action:
image

Ethics

After developing these pieces of the project I became very aware of the ethical concerns of using this technology for content generation. For people in technology and business this is a useful service that can accelerate their marketing efforts. However, the implementation of GANs and DeepFakes for content generation replaces a lot of jobs such as photographers, videographers, editors, and models. This technology can also allow a user to make people do and say whatever they please. Businesses can use this technology to exploit the flaws and addictive nature of social media more than they already are. Although GANs and DeepFakes are not physically hurting someone, the ethical concerns arise when taking into account job loss, deep fakes in politics, unrealistic standards for beauty, etc.

Computer generated blog posts take away from the creative beauty of writing. This technology also has ethical concerns that need to be addressed. Generated blog posts replace the need for dedicated writers. A single person can edit a generated blog from this application and cut down the time of blog post creation by a drastic amount. These blog posts were generated from data trained on text all across the internet and could contain bias, inappropriate text, and factual inaccuracies.

When using these applications it is important to take into consideration the above paragraphs.

Data

For StyleGan2 I used the ffhq-dataset due to the data already being prepared and localized properly. I attempted to train StyleGan2 on data that I collected from Instagram influencers to generate bodies however the data was not properly localized. This was due to time constraints. Having a perfect dataset for this application is essential because the metrics we care about are if the image is distinguishable from an actual human.

For GPT-2 I used the 124M parameter model due to computing constraints. When a user on my web application clicks one of the predetermined topics it will prime GPT-2 with a snippet of text about that topic. The data I used to prime GPT-2 can be seen here: Topics

Conclusions and Future Steps

Using this technology can and will most likely be the catalyst that enables companies to create more content than ever before. With the use of GANs companies can generate everything from faces to art work. With the use of GPT-2 and eventually GPT-3 companies will be able to generate almost perfect blog posts with a click of a button. The question to ask is not if companies will use this or not, it is when will companies begin to use this? and will we even know that they are? This brings up the ethical concerns of using these generative models for content is right or not. This will be up to the end user/company to decide.

Improving the effectiveness and quality of these models will happen in due time. With the release of GPT-3 the blogs that are being generated should be around 1000x better since the models are trained on 1000x more data. With the public release of GPT-3 approaching I will be able to use this newer model to generate more coherent sentences and paragraphs. Improving the influencer generation will happen when the time and cost of creating these videos decreases. Additionally I would like to add a feature to the webapp where a user can interact with the latent space of the GAN, allowing the user to control the features of the image such as glasses or no glasses, female or male, long hair or bangs, etc.

The idea of computer generated content can extend to a few more areas such as music generation, art generation, AI social media accounts, etc. These will all be future features for this application. Bringing awareness to the unethical applications of this technology is the only way I think we can further push humanity in the right direction. This will be an application where any person who is not in technology can view the possibilities and begin to question if what they are seeing is actually real or not.

Sources

HubSpot Content Marketing Statistics
StyleGan2
GPT-2
GPT-2 Primers

Owner

  • Login: amcurley
  • Kind: user

Citation (citations.txt)

Pages used in topics

Arts & Entertainment:
https://en.wikipedia.org/wiki/Art
https://en.wikipedia.org/wiki/Book
https://en.wikipedia.org/wiki/Comics
https://en.wikipedia.org/wiki/Fiction
https://en.wikipedia.org/wiki/Film
https://en.wikipedia.org/wiki/Video_game
https://en.wikipedia.org/wiki/Humour
https://www.readersdigest.ca/culture/10-short-jokes-anyone-can-remember/
https://en.wikipedia.org/wiki/Music
https://en.wikipedia.org/wiki/Nonfiction
https://en.wikipedia.org/wiki/Photography
https://en.wikipedia.org/wiki/Podcast
https://en.wikipedia.org/wiki/Poetry
https://en.wikipedia.org/wiki/Television
https://en.wikipedia.org/wiki/Visual_design_elements_and_principles

Culture:
https://en.wikipedia.org/wiki/Culture
https://en.wikipedia.org/wiki/Food
https://en.wikipedia.org/wiki/Language
https://en.wikipedia.org/wiki/Do_it_yourself
https://en.wikipedia.org/wiki/Outdoor_recreation
https://en.wikipedia.org/wiki/Pet
https://en.wikipedia.org/wiki/Philosophy
https://en.wikipedia.org/wiki/Sport
https://en.wikipedia.org/wiki/Fashion
https://en.wikipedia.org/wiki/Travel
https://en.wikipedia.org/wiki/Crime

Equality:
https://en.wikipedia.org/wiki/Accessibility
https://en.wikipedia.org/wiki/Disability
https://en.wikipedia.org/wiki/Equality
https://en.wikipedia.org/wiki/Feminism
https://en.wikipedia.org/wiki/LGBT
https://en.wikipedia.org/wiki/Racism

Health
https://en.wikipedia.org/wiki/Addiction
https://en.wikipedia.org/wiki/COVID-19_pandemic
https://en.wikipedia.org/wiki/Physical_fitness
https://en.wikipedia.org/wiki/Health
https://en.wikipedia.org/wiki/Mental_health

Industry
https://en.wikipedia.org/wiki/Business
https://en.wikipedia.org/wiki/Design
https://en.wikipedia.org/wiki/Economy
https://en.wikipedia.org/wiki/Freelancer
https://en.wikipedia.org/wiki/Leadership
https://en.wikipedia.org/wiki/Marketing
https://en.wikipedia.org/wiki/Media_(communication)
https://en.wikipedia.org/wiki/Product_management
https://en.wikipedia.org/wiki/Distributed_workforce
https://en.wikipedia.org/wiki/Startup_company
https://en.wikipedia.org/wiki/User_experience
https://en.wikipedia.org/wiki/Venture_capital
https://en.wikipedia.org/wiki/Work_(human_activity)

Personal Development
https://en.wikipedia.org/wiki/Creativity
https://en.wikipedia.org/wiki/Mindfulness
https://en.wikipedia.org/wiki/Money
https://en.wikipedia.org/wiki/Productivity

Politics
https://en.wikipedia.org/wiki/2020_United_States_presidential_election
https://en.wikipedia.org/wiki/Gun_control
https://en.wikipedia.org/wiki/Immigration
https://en.wikipedia.org/wiki/Justice
https://en.wikipedia.org/wiki/Politics

Programming
https://en.wikipedia.org/wiki/Android_software_development
https://en.wikipedia.org/wiki/Data_science#:~:text=Data%20science%20is%20a%20%22concept,domain%20knowledge%20and%20information%20science.
https://en.wikipedia.org/wiki/IOS_SDK
https://en.wikipedia.org/wiki/JavaScript
https://en.wikipedia.org/wiki/Machine_learning
https://en.wikipedia.org/wiki/Computer_programming
https://en.wikipedia.org/wiki/Software_engineering

Science
https://en.wikipedia.org/wiki/Biotechnology
https://en.wikipedia.org/wiki/Climate_change
https://en.wikipedia.org/wiki/Mathematics
https://en.wikipedia.org/wiki/Neuroscience
https://en.wikipedia.org/wiki/Psychology
https://en.wikipedia.org/wiki/Science
https://en.wikipedia.org/wiki/Space

Self
https://en.wikipedia.org/wiki/Horoscope
https://en.wikipedia.org/wiki/Beauty
https://en.wikipedia.org/wiki/Family
https://en.wikipedia.org/wiki/Lifestyle_(sociology)
https://en.wikipedia.org/wiki/Parenting
https://en.wikipedia.org/wiki/Romance_(love)
https://en.wikipedia.org/wiki/Self-reflection
https://en.wikipedia.org/wiki/Self
https://en.wikipedia.org/wiki/Human_sexuality
https://en.wikipedia.org/wiki/Spirituality

Society
https://en.wikipedia.org/wiki/Universal_basic_income
https://en.wikipedia.org/wiki/Cannabis
https://en.wikipedia.org/wiki/City
https://en.wikipedia.org/wiki/Education
https://en.wikipedia.org/wiki/History
https://en.wikipedia.org/wiki/Psychedelic_drug
https://en.wikipedia.org/wiki/Religion
https://en.wikipedia.org/wiki/San_Francisco
https://en.wikipedia.org/wiki/Social_media
https://en.wikipedia.org/wiki/Society
https://en.wikipedia.org/wiki/Transport
https://en.wikipedia.org/wiki/World

Technology
https://en.wikipedia.org/wiki/Artificial_intelligence
https://en.wikipedia.org/wiki/Blockchain
https://en.wikipedia.org/wiki/Cryptocurrency
https://en.wikipedia.org/wiki/Computer_security
https://www.prowess.org.uk/what-is-a-digital-life/
https://en.wikipedia.org/wiki/Future
https://en.wikipedia.org/wiki/Gadget
https://en.wikipedia.org/wiki/Privacy
https://en.wikipedia.org/wiki/Self-driving_car
https://en.wikipedia.org/wiki/Technology

Random
https://en.wikipedia.org/wiki/Boston_Celtics
https://theblondeabroad.com/ultimate-budget-travel-guide-for-argentina/
https://en.wikipedia.org/wiki/Hummus
https://en.wikipedia.org/wiki/Democratic_Party_(United_States)
https://en.wikipedia.org/wiki/Republican_Party_(United_States)

GitHub Events

Total
Last Year

Dependencies

gpt/requirements.txt pypi
  • fire >=0.1.3
  • regex ==2017.4.5
  • requests ==2.21.0
  • tqdm ==4.31.1
requirements.txt pypi