contentgen-heroku
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: amcurley
- Language: Jupyter Notebook
- Default Branch: master
- Size: 23.8 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Project ContentGen
Problem Statement
According to HubSpot companies spend 46% of their budget on content creation (HubSpot, 2017) and 24% of marketers plan on increasing their investment in content marketing in 2020 (HubSpot, 2020). Content creation is obviously a very important aspect of the overall marketing plan for a company. There are usually a lot of moving pieces that go into creating an effective content marketing plan such as photographers, editors, videographers, models, writers, etc.
This leads us to our problem:
Using machine learning can we streamline and automate the content generation process?
Is it ethical to use this technology for content generation?
Project Layout
Caution: This webapp does show photorealistic images of people and has links to the full access of over 15,000 images of computer generated people. Before using this application consider using this application only for knowledge and bringing awareness to the ethical concerns of using this technology.
Web application:
https://test-heroku-content.herokuapp.com/
This project was broken into two pieces:
The first part of the project uses StyleGan2 for face generation and DeepFakes for moving these images.
The second part of the project uses GPT-2 for blog post generation.
Examples of both of these parts of the project can be viewed in the web application listed above.
Executive Summary
This project began with the goal of creating computer generated content. This could be a successful service that companies can utilize in their content marketing efforts. The first aspect of this project focuses on influencer marketing. Influencer marketing is a huge asset for companies, however it can become fairly expensive as the quality of the influencer increases (number of followers and engagement rate). Using computer generated influencers, companies can in theory "deploy" thousands of these influencers to promote their products and services. The second aspect of this project is focused on utilizing computer generated blog posts for content marketing. According to Hubspot "About 64% of marketers actively invest time in search engine optimization (SEO)(HubSpot, 2020)." With computer generated blog posts marketers can cut down the time it takes to make quality blog posts at scale.
PersonGen
The first part of this application is called PersonGen. This is the influencer generation section of ContentGen. Using a GAN or a Generative Adversarial Network is the best way to generate faces for the influencers. There are multiple GAN frameworks to choose from, however I chose StyleGan2 because of the success and quality of the generated images that are being produced from StyleGan2. I generated 15,000 images of fake faces that will be used for the influencers that a company can use and implement in their content marketing strategy. The ability for the influencers to move is extremely important due to how successful video content is doing in 2020 such as videos on TikTok. This is a famous video on TikTok currently with 470 million views and within 30 seconds I was able to replicate the motion on a computer generated person using DeepFakes.

Using the computer generated images from the application and the notebook deep_fakes.ipynb, a company can upload a video of a person talking and can make the generated person move like the person in the video. The deep fake isn't perfect however with certain tweaks and eventually using vid2vid these videos would perform much better.
Here is an example of myself moving the GAN generated images:

BlogGen
The second part of the application is called BlogGen. The user/business will go onto the application and go to the "BlogGen" section in the navigation bar. The user will select a topic and write a title for the blog post. If a user does not choose a topic and only writes a title the generator will still be able to generate a body of text that the person can then use for their own blog. For this part of the project I utilized GPT-2 for the text generation. The topics that a user can select are all of the topics available on Medium. For each of these topics I used a 3-4 sentence primer about this topic that I got from Wikipedia and other websites, so my generator can develop more coherent blogs about that specific topic and title.
Here is an example of BlogGen in action:

Ethics
After developing these pieces of the project I became very aware of the ethical concerns of using this technology for content generation. For people in technology and business this is a useful service that can accelerate their marketing efforts. However, the implementation of GANs and DeepFakes for content generation replaces a lot of jobs such as photographers, videographers, editors, and models. This technology can also allow a user to make people do and say whatever they please. Businesses can use this technology to exploit the flaws and addictive nature of social media more than they already are. Although GANs and DeepFakes are not physically hurting someone, the ethical concerns arise when taking into account job loss, deep fakes in politics, unrealistic standards for beauty, etc.
Computer generated blog posts take away from the creative beauty of writing. This technology also has ethical concerns that need to be addressed. Generated blog posts replace the need for dedicated writers. A single person can edit a generated blog from this application and cut down the time of blog post creation by a drastic amount. These blog posts were generated from data trained on text all across the internet and could contain bias, inappropriate text, and factual inaccuracies.
When using these applications it is important to take into consideration the above paragraphs.
Data
For StyleGan2 I used the ffhq-dataset due to the data already being prepared and localized properly. I attempted to train StyleGan2 on data that I collected from Instagram influencers to generate bodies however the data was not properly localized. This was due to time constraints. Having a perfect dataset for this application is essential because the metrics we care about are if the image is distinguishable from an actual human.
For GPT-2 I used the 124M parameter model due to computing constraints. When a user on my web application clicks one of the predetermined topics it will prime GPT-2 with a snippet of text about that topic. The data I used to prime GPT-2 can be seen here: Topics
Conclusions and Future Steps
Using this technology can and will most likely be the catalyst that enables companies to create more content than ever before. With the use of GANs companies can generate everything from faces to art work. With the use of GPT-2 and eventually GPT-3 companies will be able to generate almost perfect blog posts with a click of a button. The question to ask is not if companies will use this or not, it is when will companies begin to use this? and will we even know that they are? This brings up the ethical concerns of using these generative models for content is right or not. This will be up to the end user/company to decide.
Improving the effectiveness and quality of these models will happen in due time. With the release of GPT-3 the blogs that are being generated should be around 1000x better since the models are trained on 1000x more data. With the public release of GPT-3 approaching I will be able to use this newer model to generate more coherent sentences and paragraphs. Improving the influencer generation will happen when the time and cost of creating these videos decreases. Additionally I would like to add a feature to the webapp where a user can interact with the latent space of the GAN, allowing the user to control the features of the image such as glasses or no glasses, female or male, long hair or bangs, etc.
The idea of computer generated content can extend to a few more areas such as music generation, art generation, AI social media accounts, etc. These will all be future features for this application. Bringing awareness to the unethical applications of this technology is the only way I think we can further push humanity in the right direction. This will be an application where any person who is not in technology can view the possibilities and begin to question if what they are seeing is actually real or not.
Sources
HubSpot Content Marketing Statistics
StyleGan2
GPT-2
GPT-2 Primers
Owner
- Login: amcurley
- Kind: user
- Repositories: 1
- Profile: https://github.com/amcurley
Citation (citations.txt)
Pages used in topics Arts & Entertainment: https://en.wikipedia.org/wiki/Art https://en.wikipedia.org/wiki/Book https://en.wikipedia.org/wiki/Comics https://en.wikipedia.org/wiki/Fiction https://en.wikipedia.org/wiki/Film https://en.wikipedia.org/wiki/Video_game https://en.wikipedia.org/wiki/Humour https://www.readersdigest.ca/culture/10-short-jokes-anyone-can-remember/ https://en.wikipedia.org/wiki/Music https://en.wikipedia.org/wiki/Nonfiction https://en.wikipedia.org/wiki/Photography https://en.wikipedia.org/wiki/Podcast https://en.wikipedia.org/wiki/Poetry https://en.wikipedia.org/wiki/Television https://en.wikipedia.org/wiki/Visual_design_elements_and_principles Culture: https://en.wikipedia.org/wiki/Culture https://en.wikipedia.org/wiki/Food https://en.wikipedia.org/wiki/Language https://en.wikipedia.org/wiki/Do_it_yourself https://en.wikipedia.org/wiki/Outdoor_recreation https://en.wikipedia.org/wiki/Pet https://en.wikipedia.org/wiki/Philosophy https://en.wikipedia.org/wiki/Sport https://en.wikipedia.org/wiki/Fashion https://en.wikipedia.org/wiki/Travel https://en.wikipedia.org/wiki/Crime Equality: https://en.wikipedia.org/wiki/Accessibility https://en.wikipedia.org/wiki/Disability https://en.wikipedia.org/wiki/Equality https://en.wikipedia.org/wiki/Feminism https://en.wikipedia.org/wiki/LGBT https://en.wikipedia.org/wiki/Racism Health https://en.wikipedia.org/wiki/Addiction https://en.wikipedia.org/wiki/COVID-19_pandemic https://en.wikipedia.org/wiki/Physical_fitness https://en.wikipedia.org/wiki/Health https://en.wikipedia.org/wiki/Mental_health Industry https://en.wikipedia.org/wiki/Business https://en.wikipedia.org/wiki/Design https://en.wikipedia.org/wiki/Economy https://en.wikipedia.org/wiki/Freelancer https://en.wikipedia.org/wiki/Leadership https://en.wikipedia.org/wiki/Marketing https://en.wikipedia.org/wiki/Media_(communication) https://en.wikipedia.org/wiki/Product_management https://en.wikipedia.org/wiki/Distributed_workforce https://en.wikipedia.org/wiki/Startup_company https://en.wikipedia.org/wiki/User_experience https://en.wikipedia.org/wiki/Venture_capital https://en.wikipedia.org/wiki/Work_(human_activity) Personal Development https://en.wikipedia.org/wiki/Creativity https://en.wikipedia.org/wiki/Mindfulness https://en.wikipedia.org/wiki/Money https://en.wikipedia.org/wiki/Productivity Politics https://en.wikipedia.org/wiki/2020_United_States_presidential_election https://en.wikipedia.org/wiki/Gun_control https://en.wikipedia.org/wiki/Immigration https://en.wikipedia.org/wiki/Justice https://en.wikipedia.org/wiki/Politics Programming https://en.wikipedia.org/wiki/Android_software_development https://en.wikipedia.org/wiki/Data_science#:~:text=Data%20science%20is%20a%20%22concept,domain%20knowledge%20and%20information%20science. https://en.wikipedia.org/wiki/IOS_SDK https://en.wikipedia.org/wiki/JavaScript https://en.wikipedia.org/wiki/Machine_learning https://en.wikipedia.org/wiki/Computer_programming https://en.wikipedia.org/wiki/Software_engineering Science https://en.wikipedia.org/wiki/Biotechnology https://en.wikipedia.org/wiki/Climate_change https://en.wikipedia.org/wiki/Mathematics https://en.wikipedia.org/wiki/Neuroscience https://en.wikipedia.org/wiki/Psychology https://en.wikipedia.org/wiki/Science https://en.wikipedia.org/wiki/Space Self https://en.wikipedia.org/wiki/Horoscope https://en.wikipedia.org/wiki/Beauty https://en.wikipedia.org/wiki/Family https://en.wikipedia.org/wiki/Lifestyle_(sociology) https://en.wikipedia.org/wiki/Parenting https://en.wikipedia.org/wiki/Romance_(love) https://en.wikipedia.org/wiki/Self-reflection https://en.wikipedia.org/wiki/Self https://en.wikipedia.org/wiki/Human_sexuality https://en.wikipedia.org/wiki/Spirituality Society https://en.wikipedia.org/wiki/Universal_basic_income https://en.wikipedia.org/wiki/Cannabis https://en.wikipedia.org/wiki/City https://en.wikipedia.org/wiki/Education https://en.wikipedia.org/wiki/History https://en.wikipedia.org/wiki/Psychedelic_drug https://en.wikipedia.org/wiki/Religion https://en.wikipedia.org/wiki/San_Francisco https://en.wikipedia.org/wiki/Social_media https://en.wikipedia.org/wiki/Society https://en.wikipedia.org/wiki/Transport https://en.wikipedia.org/wiki/World Technology https://en.wikipedia.org/wiki/Artificial_intelligence https://en.wikipedia.org/wiki/Blockchain https://en.wikipedia.org/wiki/Cryptocurrency https://en.wikipedia.org/wiki/Computer_security https://www.prowess.org.uk/what-is-a-digital-life/ https://en.wikipedia.org/wiki/Future https://en.wikipedia.org/wiki/Gadget https://en.wikipedia.org/wiki/Privacy https://en.wikipedia.org/wiki/Self-driving_car https://en.wikipedia.org/wiki/Technology Random https://en.wikipedia.org/wiki/Boston_Celtics https://theblondeabroad.com/ultimate-budget-travel-guide-for-argentina/ https://en.wikipedia.org/wiki/Hummus https://en.wikipedia.org/wiki/Democratic_Party_(United_States) https://en.wikipedia.org/wiki/Republican_Party_(United_States)
GitHub Events
Total
Last Year
Dependencies
- fire >=0.1.3
- regex ==2017.4.5
- requests ==2.21.0
- tqdm ==4.31.1