Artificial Intelligence and Python

Artificial Intelligence and Python

Overview

During the downtime due to the pandemic-inspired shelter-in-place in California, I’ve decided to learn some new technologies and concepts related to software design. This has been my first foray into online learning using one of the e-learning platforms. In this case, I signed up for Udacity (https://www.udacity.com).

So far, I’ve been quite impressed. Of course, it does not perfectly substitute live classroom instruction, but it is still quite effective, plus I can skip to the good parts. The course I signed up for is titled “Artificial Intelligence and Python”.

While I’ve programmed scripts with Python before, I never knew that it had some many math and statistic-intensive libraries. Hence, it’s a great language for developing AI-related software since AI is all about statistics: predicting future events based upon historical information.

While the basics of Python programming is not that interesting, the libraries and tools associated with Python are fascinating and actually lots of fun to work with. This blog post will talk a bit about those libraries and also how they apply to AI. As of mid-April 2020, I have not finished the course yet, so emphasis is on the Python libraries used for AI, but not quite AI, itself.

NumPy

NumPy is a library for Python. It is short for “Numerical Python”, and it includes a large amount of functionality related to multi-dimensional arrays and matrices in addition to many mathematical functions. Users can create arrays from Python dictionaries, and then manipulate the arrays in many different ways including reshaping arrays, adding arrays, multiplying arrays, and much much more.

Here’s an example of how a simple numpy array works in Python:

1. import numpy as np
2. x = np.array([0,1,2,3,4,5,6,7,8,9])
3. print(x)
4. [0 1 2 3 4 5 6 7 8 9 10]

Line 1 imports the numpy library and renames it.
Line 2 defines a single row array with values from 0 to 9.
Line 3 prints the array and Line 4 displays it.

Let’s take it a step further:

5. x.reshape(2,5)
6. print(x)
7. [[0,1,2,3,4]
[5,6,7,8,9]]

Line 5 executes the “reshape” function that changes the shape of the array from a single row to a 2×5 array as seen in Line 7.
Other functions allow you to insert rows in an array:

8. x = np.insert(x, 1, [10,11,12,13,14], axis=0)

The above “insert” statement inserts a new row at row 1 (row numbers start at 0). The numbers it inserts are: 10,11,12,13,14. The “axis” tells whether to insert a row (0) or column (1).

You can also perform mathematics on 2 arrays including addition, subtraction, multiplication, and division. For example:

x = [[0,1,2,3,4]
[5,6,7,8,9]]

y = [[6,3,2,8,7]
[1,6,7,3,10]]

print(x + y) = [[6,4,4,11,11]
[6,12,14,11,19]]

The above “x” adds each of the elements from each row and column and creates the corresponding matrix with the added values. You can also do the same with the other mathematical functions.

There are many functions associated with numpy that can be found in lots of online documentation.

Pandas

Pandas is another Python library that deals with data analysis and manipulation. It takes the numpy arrays one step further and allow the creation of complex arrays. Let’s look at an example:

  1. import pandas as pd
  2. groceries = pd.Series(data=[30,6,’Yes’,’No’], index=[‘eggs’,’apples’,’milk’,’bread’])

Line 1 imports the Pandas library. Line 2 creates a complex matrix where the first column is the index of labels and the 2nd column is the actual data. It looks like this:

eggs 30
apples 6
milk Yes
bread No

dtype: object

If you “print groceries[‘eggs’]”. The result is: 30.

Pandas allows you to perform mathematics on values in a matrix:

print(groceries / 2) =

eggs 15
apples 3
milk Yes
bread No

You can also create a more complex matrix by creating a dictionary of Pandas series:

3. items = {‘Bob’ : pd.Series(data = [245, 25, 55], index = [‘bike’, ‘pants’, ‘watch’]),
‘Alice’ : pd.Series(data = [40, 110, 500, 45], index = [‘book’, ‘glasses’, ‘bike’, ‘pants’])}
4. shopping_carts = pd.DataFrame(items)
5. shopping_carts             

  Bob Alice
bike 245.0 500.0
book NaN 40.0
glasses NaN 110.0
pants 25.0 45.0
watch 55.0 NaN

shopping_carts[‘Bob’][‘Bike’] displays “245.0”.

You can also create a Python dictionary and then create a Panda dataframe from the dictionary along with indexes. See the following:

#Create a list of Python dictionaries
items2 = [{‘bikes’: 21, ‘pants’: 36, ‘watches’: 40},  {‘watches’: 12, ‘glasses’: 51, ‘bikes’: 18, ‘pants’:9}]

#Create a Panda DataFrame
store_items = pd.DataFrame(items2, index=[‘store 1’, ‘store 2’])

#Display the DataFrame
store_items

It displays as:

  bikes pants watches glasses
store 1 20 30 35 NaN
store 2 15 5 10 50.0

To add a column: 

store_items[‘shirts’] = [15,2]

Now, the DataFrame displays:

  bikes pants watches glasses shirts
store 1 20 30 35 NaN 15
store 2 15 5 10 50.0 2

Anaconda and Jupyter Notebooks

Now that I have covered a bit of NumPy and Pandas for manipulating data arrays, let’s delve a bit into a Python platform called Anaconda. Anaconda is a “navigator” that allows users to download any and all libraries available for the Python platform. These libraries include mathematical libraries, different types of Python compilers, artificial intelligence libraries (like PyTorch) and the Jupyter Notebook which is a web-based user interface for displaying comments and typing in Python code that runs on command. It’s a tool not necessarily to write production-level Python code, but more a tool to train and test out python code while displaying well-formatted comments.

Below is an example of a Jupyter webpage (or notebook) that includes a comments section with images and then a subsequent code section. This Jupyter notebook talks about Python tensors and Pytorch, the essentials for artificial intelligence.

Sample Jupyter page

 

Neural Networks

Neural networks have been around for a while. The basically emulate the way our brains work. The networks are built from individual parts approximating neurons which are interconnected and are the basis for how the brain learns.

“Digital” neurons are no different. They are interconnected in such a way that over time they learn and are able to apply the learned knowledge to enable useful applications such as natural language (like Alexa) and image identification (like Google Lens). It really is amazing how well it works, and the progress over the past five years alone has been remarkable. I’ll talk more about that later.

So how does it work exactly? Let’s take the example of identifying text in an image; specifically, digits 0 to 9. Just 5 years ago, this was a very complicated problem. Today, it’s a trivial one. The image below displays greyscale handwritten digits where each image is 28×28 pixels.  

Greyscale handwritten digits

Greyscale handwritten digits

The first step is to train the software or the “network” in AI lingo. This means feeding it 100s if not 1000s of sample 28×28 pixel images of digits and tagging those images with the actual numbers so the software learns what number the image represents exactly. Luckily, Pytorch includes lots of tagged training data called MNIST. This data can be used to train the network so when you present your own image of a digit, it will correctly interpret what it is.

Single digit

Single digit

The above image is an example of a greyscale “8” that is 28 x 28 pixels. This is the type of image that would be fed into the network to train it that this type of image is an “8”.

Neural Networks

Neural Networks

The above images shows a simple neural network. The far left-hand side displays the inputs (x1 and x2). In our example, the inputs would be the color of each of the 28 x 28 pixels. The values (w1 and w2) are called “weights” These weights are multiplied by each of the corresponding inputs (i.e. – dot product of two vectors) and then inputted into a function that creates an output value (0 to 9) that is compared to the actual value assigned to the image. For example, in the digit training image above (the number “8”), the tag assigned to this image is 8. Therefore, the calculated output is compared to the tagged value. If it matches, then we’ve trained it well for that particular test image and the weights will be reused. If not, then we need to go back and adjust the weights to create a new output value. This back and forth can occur 1000s of times until the correct formula is found.

A Progress Report on gbXML Validation Efforts

A Progress Report on gbXML Validation Efforts

Overview

Carmel Software was hired by the National Renewable Energy Lab (NREL) to update and improve the Green Building XML (http://www.gbXML.org) schema, all in the name of improving interoperability amongst disparate building design software tools. This progress report summarizes the work completed over the past year.

Overall Goal and Objectives

The goal of this project was to validate the National Renewable Energy Lab’s (NREL’s) OpenStudio software tool to produce valid gbXML and also set the stage to validate other BIM and building analysis software tools in the future. There were three main objectives of this validation work.

  1. The first objective was to demonstrate that the OpenStudio software could pass the gbXML validation procedure.
  2. The second objective was to encourage other software vendors to certify their software using the validation procedure as well.
  3. The third objective was to work towards a generic validator that could be used on general user models, as opposed to the strict test case models required by the gbXML validation procedure. Delivering a fully working validator was out of scope of this work, therefore requirements were developed to set the stage for future work.

In support of the first objective, we developed OpenStudio models which represented the buildings in the validation procedure test case. These models were generated programmatically using the OpenStudio Ruby Application Programming Interface (API) to reduce maintenance costs and to allow them to be leveraged for other purposes. OpenStudio passed the validation procedures and is now the first software authoring tool to officially become “gbXML certified” (http://gbxml.org/OpenStudioCertification_Latest ).

In support of the second objective, we drastically improved the validation website, documentation, and tools required to apply the validation procedure. We publicized the validation procedure and encouraged gbXML software vendors to apply it to their tools. This required contacting other gbXML software vendors directly as well as making public announcements, conducting live webinars, and promoting other ways to generate user interest in the validation efforts. We publicized the state of the OpenStudio validation efforts to encourage other software tools to apply the validation procedure to their own tools. In addition, we simplified and streamlined the validation process to allow the process to be both more responsive and clear to market demands, while also allowing room for future growth of test cases. In fact, Autodesk will soon be validated to Level 2 compliance (see below) and is working with gbXML.org on Level 3 compliance.

In support of the third objective, we developed requirements for a generic validator. User stories were developed and preliminary mockups were made. The most important user story developed was to allow a user to upload their gbXML file to the gbXML.org website and view a 3D representation in their web browser. This would allow the user to identify any problems with their model visually. The second use case was for software to detect certain classes of problems with the user’s model and identify those visually in the 3D display. Finally, the use case of upgrading user models to the current gbXML schema version was identified.

Tasks

Here are the tasks that were performed to achieve the above objectives:

  1. Developed OpenStudio Models for Validation Procedure
    We developed OpenStudio models for a number of the buildings in the validation procedure test suite. The test cases were derived from the ASHRAE RP-1468 documentation. The models were programmatically generated using the OpenStudio Ruby API. All scripts and model content developed for this purpose were developed under the Lesser General Public License (LGPL) open source license and are maintained in a public repository on github.com (https://github.com/GreenBuildingXML/openstudio-gbxml-validation ).
  2. Validated OpenStudio gbXML Export/Import
    We applied validation procedures to each of the OpenStudio models developed in task 1. We verified that each of the validation test cases imported correctly into the OpenStudio SketchUp plug-in. We also addressed any issues with the validation software that were found in this work. For example, the OpenStudio validation forced gbXML.org to address geometry created by thin-walled geometries, and metric-only software. Previous versions of the validator only addressed thick-walled geometry engines, and supported IP-units. The validation process and OpenStudio were made more robust as a result of this process.

    Highlights of improvements/fixes to OpenStudio as a result of this project:
    • correctly specifies SlabOnGrade elements
    • correctly handles area calculations of sloping floors and ceilings (before calculated area as zero)
    • handles most second-level space boundary translations, automatically, for simple geometries
    • gbXML export up-to-date with version 6.01 of gbXML

    Highlights of improvements/fixes to gbXML validation as a result of this process:
    • unit of measure handling
    • special procedure development for thin-walled geometry creators, opening up validation to a wider audience
    • improved validation website user interface and user experience
    • better error-handling and user messaging when validation fails
    • improved geometry validation engine

  3. Improved Validation Website and Documentation
    The previous validation website was not hosted on the gbxml.org domain which reduced its visibility and credibility. For this task, we moved the gbxml.org website to a hosting platform that supports the validator software. The gbxml.org website was redesigned to more prominently display the validation procedure and documentation. The website now shows which software tools have been validated or are in the process of being validated. See http://www.gbxml.org for more details.
  4. Promoted gbXML Validation Efforts
    This task was performed in parallel with other tasks listed above; the purpose was to keep other software vendors and the public up to date on gbXML validation efforts. We attended the four day SimBuild 2016 Conference in Salt Lake City in August 2016 to promote gbXML validation efforts. In addition, we conducted two live webinars to explain our work.
  5. Generic Validator
    Delivering a fully functional generic validator was out of scope for this work. Therefore, work on this area was focused on developing user stories and requirements for future work. The most important user story developed was to allow a user to upload their gbXML file to the gbXML.org website and view a 3D representation in their web browser. This would allow the user to identify any problems with their model visually. The second use case was for software to detect certain classes of problems with the user’s model and identify those visually in the 3D display. Finally, the use case of upgrading user models to the current gbXML schema version was identified.

    Integration with the Autodesk Forge API was investigated as a potential solution for viewing user submitted 3D models. However, this is still a work-in-progress since the Forge API is in beta phase.

Significant Findings and Issues

During this project, we decided that 3 levels of gbXML certification, or compliance, were required to provide more clarity to the community of users and vendors.:

  1. Level 1 compliance involves validating that a gbXML file is a well formed XML (per the W3C ISO Standard) and also conforms to the gbXML XSD (from gbXML versions 0.37 to 6.01, depending upon which version the software tool currently exports).
  2. Level 2 compliance involves validating a gbXML file against 8 to 10 geometric “test-cases” that are based upon ASHRAE Research Project 1468, “Development of a Reference Building Information Model (BIM) for Thermal Model Compliance Testing”. Level 2 requires that second level space boundaries be correct for the simplest test cases, and pass a small subset of translation edge cases.
  3. Level 3 compliance has yet to be fully defined. However, Level 3 does involve certain levels of vendor tool automation that goes far above and beyond Levels 1 and 2 compliance. We are currently working with Autodesk to better define Level 3 compliance (See the Future Work section for more details).

List of Issues

OpenStudio passed the following test cases:

  1. Test Case 3: Test for proper second level space boundary representation
  2. Test Case 6: Test for proper second level space boundary representation
  3. Test Case 7: Test for basic pitched roof representation
  4. Test Case 8: Test for basic sloped slab on grade representation
  5. Test Case 12: Test for proper second level space boundary representation
  6. Basic Whole Building Test Case 1: Test for proper second level space boundary representation

To pass these test cases, it did require that NREL make some updates to the OpenStudio SDK responsible for gbXML export since there were a few errors that were pervasive for every test case. Therefore, NREL needed to make some changes to the gbXML export feature of OpenStudio to be compliant with the validation process. gbXML provided NREL with guidance as to how each XML file should look in order to pass, which NREL took and made changes to their code base. In some cases, gbXML decided to relax configurable constraints, to allow the document to pass.
Below is the full list of issues and changes to the OpenStudio code base made as a result of this work effort:

  1. There were no BuildingStorey definitions in any export from OpenStudio. This is a required element, but we relaxed this for validation purposes.
  2. The Building->Area calculation that was done during export did not meet gbXML specifications for the Building Area calculation. To be fair, gbXML may not have the tightest definition in terms of the criteria for when a Space Area is included in the Building Area, but the point is, plenums and other non-occupiable spaces shouldn’t be included in the building area calculation.
  3. Any time there was a floor that was on-grade (industry standard is when z=0, and it seems OpenStudio follows this convention) we would expect the surfaceTypeEnum for that surface to be SlabOnGrade, but OpenStudio defines these surfaces as UndergroundFloor. This needed to be changed.
  4. Thin-Shelled geometry challenges: The fact that OpenStudio makes gbXML derived from a thin-walled geometry paradigm consistent with the building energy modeling paradigm as opposed to a BIM model posed a basic philosophical challenge for the validation process. Originally, the validator was designed for BIM-centric tools that have wall thicknesses inherent in the modeling environment. Autodesk Revit, for example, assumes that the wall thicknesses affect the volume and area calculations, even though the wall vertices go to the centerline of the thickness (as if the walls had no thickness at all). When we tried to use these same standard test files as a comparison point for a gbXML created via OpenStudio, we encountered an issue. Whereas we modeled the walls in OpenStudio as if they were on the centerline, with the same coordinates as the standard files, now the volume and area calculations came out differently than the standard files. Of course, if we tried changing things around and modeled the surfaces in OpenStudio at the inner wetted perimeter, we get the volume and area correct, but now the polygon coordinates are in a different location and don’t match the standard file PolyLoop coordinates. So either way, it is not a perfect process.
  5. Changes to the validator code base: We made several improvements to the validation engine itself, and also to the standard test case XML files as a result of this work that has already been incorporated into the latest version of the validator. We found two problems with the standard test case files: Occasionally, there were still errors in the files that should no longer persist by the end of this process. The whole building test cases, made at the end of phase two, were particularly problematic because we made them with Honeybee out of Grasshopper. Secondly, sometimes the decimal precision was pretty extreme because for all the other test cases we used Revit in US-IP units to create the test cases originally. This meant for sloping surfaces, there was sometimes units like 9’10-11/16” that made for some difficulties when re-creating the test cases. We tightened these up so for a majority of the test cases there was much less guess-work and complexities around the units of measure.

Future Work – New gbXML Features

Further Develop Level 3 Compliance
We will be working with Autodesk to achieve Level 3 certification, which has yet to be fully defined. It does involve certain levels of vendor tool automation that goes far above and beyond Levels 1 and 2 compliance. Autodesk desires to achieve Level 3 compliance for both Autodesk Revit and Insight 360. Some examples of Autodesk-specific Level 3 automation include:

  1. Handling ‘real’ architectural models i.e. supporting a very wide variety of modeling practices, coping with natural inaccuracies (small gaps / overlaps) and scaling from concept to detailed design
  2. Automatic perimeter / core thermal zoning
  3. Automatic identification of elements acting as shading (as opposed to room bounding) without manual definition
  4. Handling ‘Sandwich’ conditions i.e. when two or more elements are adjacent or very closely adjacent and often not perfectly parallel but essentially a single gbXML surface
  5. Definition of material thermal properties (which get very interesting when combined with sandwich conditions)
  6. Working with or with explicit room/space objects and their associate metadata
  7. Above / below grade
  8. Columns
  9. Openings
  10. Ceiling voids

Discuss the possibility of “use case” based validation. For example, what fields are required for the energy modeling use case (or maybe the OpenStudio use case)? What fields are required for the HVAC loads use case? See http://www.hpxmlonline.com/tools-resources/data-selection-tool/ for an example of this.

Develop a gbXML Conversion Tool
Both NREL and Autodesk have requested that we develop a simple software tool (web-based) that converts previous versions of gbXML (i.e. – 0.37) to later or the current versions since major tools such as Autodesk Revit continue to support the 0.37 version while validation only supports the latest version. This is not an easy task on the part of the software vendor to update to the latest version since tools like Autodesk Revit have long-term “wish-list to release cycles” (often 1 or more years). Therefore, developing a conversion tool will allow these previous versions of gbXML to update to later versions.


Develop a Generic Validator and Open-Source Geometry Engine

While we have successfully developed a test-case validation tool (http://gbxml.org/validator/Pages/TestPage.aspx ), we still need to develop a “generic” validation software tool that can be used by more stakeholders including energy modelers, engineers, architects, and others. This tool should be able to validate any generic gbXML, not just test-case gbXML files. We believe we are closer than ever to achieving this vision, however, there is limited time and budget to test such concepts. Developing unit tests, and making incremental improvements to this code base, is a full-time effort that is constantly in need of development efforts.
The software development effort taken thus far to develop vendor certification tools will be leveraged for the generic validator, one that can accept any user model. A web page shall be developed on gbXML.org so a user can upload their gbXML model and the validator shall determine if there are any defects. If there are defects, the website shall provide information to alert the user to any defects that were found. A web based 3D visualization tool may be provided to visualize the uploaded model and to identify the defects in a meaningful way. We plan on using Autodesk’s Forge Viewer API to translate the gbXML geometry into a web-based model.


Create a gbXML Portal and Accompanying Web Service

Along with the generic validator wish-list item above, another long-time goal has been to create a gbXML “portal” that allows users to register and upload gbXML files for remote storage. We would create a web service (or web API) that could be accessed by authorized software tools to import and/or export gbXML files to and from this portal. This would provide the following benefits:

  1. Different “dot” versions of gbXML could automatically be converted to the appropriate “dot” gbXML version that is supported by a consuming software tool.
  2. Different versions of the same gbXML file that is produced by a BIM authoring tool could be stored in the portal so that a consuming tool could easily re-import it and update any changed information. Think of it as a sort of “version control” function.
  3. If enough gbXML files are uploaded to the portal, we could begin to analyze the data and look for trends that may benefit the industry. For example, information from the uploaded models could be analyzed so as to gain insight on the use of gbXML in practice, e.g. which software tool authored the model and which defects were found.
Carmel Receives U.S. DOE Funding for Sustainable Building Design

Carmel Receives U.S. DOE Funding for Sustainable Building Design

Carmel Software is a member of the Board of Directors of gbXML.org which houses the Green Building XML open schema. This schema helps facilitate the transfer of building properties stored in 3D building information models (BIM) to engineering analysis tools, all in the name of designing more energy efficient buildings.

image

Here is a more layman’s explanation of what this all means:

Green Building XML (gbXML) is a schema or “language” that allows BIM (building information modeling) authoring software tools such as Autodesk Revit to communicate with building analysis tools such as Trane TRACE.

For example, a user is able to design a 3D virtual model of a building in Autodesk Revit. This model includes complete visual geometry of the building and information about the types of walls, windows, roofs, lighting and occupancy density. Since this information is required by building energy analysis software tools, it is redundant to re-enter all of it into a stand-alone energy analysis software tool when it is readily available from the 3D model. This is where gbXML helps: A software tool such as Autodesk Revit is able to “Save As” gbXML meaning that it is able to export all of its geometry and other building information into the gbXML language format. Taking this gbXML file, the user is now able to “import” this building information into software tools such as EnergyPlus or Trane Trace without manually re-entering all of this data by hand. The end result of all of this is that an energy modeler is better able to design more energy efficient buildings for purposes of, say, LEED certification.

In theory, the above workflow sounds seamless and attractive to anyone involved with modeling the energy usage of a building. In reality, the process is fraught with enough complications that energy modelers often forego this process in favor of more manual methods. These complications result from inconsistencies in how the various software tools integrate with gbXML.

Today, gbXML has the industry support of leading 3D BIM vendors such as Autodesk, Bentley, and Graphisoft. In addition, we have been funded in the past by the US Department of Energy and its various labs to further develop the gbXML schema and also better market it to industry stakeholders.

In the Fall of 2015, the National Renewable Energy Laboratory (NREL) agreed to fund Carmel Software to further improve the gbXML schema in the following ways:

1. Carmel will develop OpenStudio (NREL’s cross-platform
collection of software tools to support whole building energy modeling
using EnergyPlus) models which represent the buildings in the validation procedure test case developed in Phase I. These models will be generated programmatically using the OpenStudio Ruby Application Programming Interface (API) to reduce maintenance costs and to allow them to be leveraged for other purposes.

2. Carmel will improve the validation website, documentation, and tools required to apply the validation procedure. Carmel will publicize the validation procedure and encourage gbXML software vendors to apply it to their tools.

3. Carmel will apply the technologies developed for the validation procedure to a general use validation tool. This tool will be able to validate any generic gbXML file and, it will also include a web-based 3D model viewer.

This will be a 9 month project scheduled for completion in September 2016. We will be presenting results at the SimBuild 2016 Conference in Salt Lake City.